If a certain Saturday night at the Middleton Theater had been his first night working the job, Alex might have thought that the $11 he deposited at the bank was typical. He would have expected even less on Sunday night. After many months of working there, however, he knew that a “new” movie would bring a rush of at least twenty people on opening night. He also knew from experience that the heat and humidity on that Saturday night, given the well-known lack of air-conditioning, discouraged locals from coming in. Also, it was their fourth week showing Weekend at Bernie’s.
Knowing that business would eventually pick up again, Alex was using what most of us would call common sense, but technically could be labeled a form of Bayesian inference. As creatures who develop their own subjective models of the world based on their own unique life experiences, humans use Bayesian inference quite naturally. We continually update our models using new evidence we gather every day. Let’s say you own a restaurant and, over time, you’ve come to know that when you get 30 dinner reservations by four in the afternoon, you’ll have about 200 customers for the night, but if you have 70 reservations by six, you’ll have about 250 customers by night’s end. The longer you’ve owned the restaurant, the better you probably are at predicting how many customers you’ll have.
The general Bayesian approach has several steps. First, quantify your model as a probability distribution, known as a prior distribution. Next, gather information about the world you are trying to predict and update your prior. This updated version is technically known as the posterior distribution. Now let the posterior become the prior and repeat these last couple steps—collecting information and updating the prior given your new observations—over and over. The refined probability distribution is your result, and it helps you model and predict the world. With respect to your restaurant and the predicted number of customers, you’ve been Bayesian updating your priors every night, for all those years.
Technically, Bayes’s rule—named for the English statistician and clergyman Thomas Bayes, who formulated it—states that the probability of your explanation for what you observed is proportional to two probabilities multiplied together, or namely the probability of the observations themselves and likelihood of those observations given your explanation. It’s funny how awkward Bayes’s rule sounds when stated formally. Intuitively, we all understand it. Here’s another example. Say a tornado hits your house, and you’re sure this happened because your sister-in-law put a curse on your family. But hold that thought and become a Bayesian. The tornado itself was a rare event. Besides that, your sister-in-law puts curses on people all the time without tornadoes hitting their houses. More formally, we multiply the low frequency of the tornadoes themselves by the low frequency of the cursing being immediately followed by a targeted tornado. Whether you’re a human judging these probabilities intuitively or a machine calculating them precisely, Bayes’s rule tells you the tornado was probably not caused by your sister’s curse.
What happens when we combine Bayes’s rule with the transmission chains we discussed in chapter 3? The answer is, we get not only views of how people think but also a model for artificial intelligence. The “neural conversational model” that Google researchers Oriyol Vinyals and Quoc Le are developing for your next customer service conversation is simple. Instead of handcrafting the conversation rules, like when you book an airline ticket with a computerized agent, their model is uncomplicated and flexible. “We simply feed the predicted output,” they write, “as input to predict the next output.” The human says “hello!” and the machine responds, “hello!” Convincing so far. An efficient dialogue about log-in details ensues. Later, the human asks, “What is the purpose of existence?” The machine replies, “To find out what happens when we get to the planet Earth.” Must be joking. “What is the purpose of dying?” asks the human. “To have a life,” replies the machine.
That last response is revealing, if not a little scary. Machine learning looks backward, meaning that its input is the previous output. To feed into its next response, it concatenates what has been said so far. It’s called iterated learning, and our colleague Stephan Lewandowsky of Bristol University maintains that humans, too, think this way. It’s as if you receive a message from yourself, update that message for a moment, and then pass that updated message on to yourself for the next moment. If yesterday the sun rose at 5:45 a.m., and today it rose at 5:46 a.m., you update the model for tomorrow: sunrise at 5:47 a.m.
Lewandowsky’s team explored iterated learning with a transmission chain experiment in which they asked people to estimate a phenomenon that hadn’t finished yet, like how much longer you’ll be on hold during a phone call or what the total gross will be for a movie already in theaters. A participant would read a question such as “If you were assessing an insurance case for a thirty-nine-year-old man, how old would you expect him to be at death?” The participant’s response, of course, will be some number larger than thirty-nine. Next, they ask the participant the question again, except the current number was drawn at random between zero and their previous response. So if the life expectancy had been estimated at sixty-seven, the question might be updated to “If you were assessing an insurance case for a fifty-one-year-old man …,” and so on. People answered about one question per second on a computer, with different topics interwoven, until twenty questions had been asked for each topic. Responses from the twenty participants not only converged on the final answer quickly, after about the first five steps, but all the guesses taken together basically matched the distribution of the real-world answers taken together. Through iterated learning, people could make good estimates for all sorts of things—eventual gross sales for a movie, human life spans, the length of poems, the length of reigns of pharaohs, movie run times, and even baking times of cakes in the oven.
Lewandowsky called it the “wisdom of individuals”—like crowdsourcing your own previous estimates. If we crowdsource lots of these estimates by people making them individually, the “wisdom of crowds” can be an even more precise estimate of the real answer. For the future of cultural evolution, this is important for two reasons. First, social influence—for example, observing someone else’s answer—can ruin the wisdom of crowds, and online algorithms are constantly showing us other people’s “answers.” Second, if humans fall short much of the time, it suggests Bayesian inference as a model for an artificial intelligence: update its existing knowledge of a distribution using new information and then sample its new estimate from that updated distribution. We’ll see later how this has both promise and problems.
Bayesian inference uses the past to predict the future, but it can also be used to interpret the past. If you follow the National Collegiate Athletic Association’s Final Four in college basketball, imagine that after the tournament, ESPN asks you to grade the strength of all sixty-four teams, given only the tournament results. All you have for data are the final scores along with the winning and losing teams. How do you grade them all based on that? A round-robin tournament would have let you compare each pair of teams, but the Final Four is one loss and you’re out.
You start to panic but then calm down, realizing you can deliver a good result to ESPN. With your laptop computer, you begin by guessing at a strength score for each team. Then you let the computer “play” the tournament, based on the initial setup of teams in the round of sixty-four. Each time two teams play each other in the simulated tournament, the computer picks a winner probabilistically, based like loaded dice on the relative strengths you assigned. You could compare your simulated result to the actual tournament, but even better, you run the tournament a thousand times and compare the most common (likely) result to the actual tournament.
But you’ve only just started. Your initial guesses for the different strengths of the sixty-four teams are almost certainly not correct. Now you need to guess a new set of strength estimates for all the teams, simulate the tournament another thousand times, then guess another set and resimulate another thousand times again and again, each time comparing the set of a thousand simulations against the actual tournament results. Finally, you accept the set of estimates whose simulations were closest on average to the actual results. This is your answer. Given just one tournament result, you have learned about the relative strengths of all teams and their likely chances against each other. If all that seems like a lot of complicated work, your paycheck from ESPN makes you feel a lot better.
Now if we can use Bayesian inference for a basketball competition, it’s not so hard to zoom out and view ancient cultural competition in the same way. To see how this might work, let’s go back three thousand years, to western Africa, east of which there was more rainfall and greener savanna grasslands, and even rain forest, in what is now the southern portion of the Sahara. In western Africa, pastoralist speakers of an ancestral version of Bantu began one of the world’s epic intergenerational migrations. Over several centuries, the Bantu dispersal ultimately extended southward over a huge swath of sub-Saharan Africa. The dispersal of these cattle herders, who were patrilineal, which means they traced descent and rights to property through the male side, culturally bulldozed most horticultural and/or matrilineal groups in their path. Left behind was a continent largely populated by Bantu speakers who herded cattle and inherited their lineage identity as well as wealth through the fathers’ side. Much of indigenous sub-Saharan African culture today owes at least something to this migration.
Archaeology tells us much of the story of this great Bantu dispersal and the cultures that predated it, including Batwa groups, whose languages are rich with botanical terms suited to forest adaptations; proto-Khoisan-speaking groups, whose descendants include the !Kung San of the Kalahari; and the Hadza- and Sandawe-speaking hunter-gatherers of Tanzania. But let’s say we want to learn more from this prehistoric event—something about how cultures change more generally. Is there more information somehow embedded in this record?
It turns out there is, and it involves adding Bayesian methods to the phylogenetic ones we discussed in chapter 4. With respect to Bantu expansion, researchers started with the phylogenetic tree of Bantu languages, which had already been constructed by linguists. They then considered how two specific cultural practices—inheritance and livestock herding—might have changed along the branches of this linguistic history. They divided linguistic groups into four sets: cattle and matriliny, cattle and patriliny, no cattle and patriliny, and no cattle and matriliny.
What next? For one, we need to know the character states at the tips of the language tree for each African linguistic group when it was first described. Fortunately, we can get this information from the handy Ethnographic Atlas, which was compiled by George P. Murdock in the 1960s and 1970s, and recorded essential facts for over a thousand societies. The Ethnographic Atlas noted that the Tiv of Nigeria and Cameroon, for example, were patrilineal and herded cattle, whereas the Gangela-speaking Luimbe of Angola or Ndonga-speaking Ambo of Nigeria were matrilineal but also herded cattle.
So far, so good: we have a language tree, and we know which of our four possible character states is at each tip. Now all we need is a model of cultural change that gets us from the original, ancestral state—the “root” of the tree—to correctly predicting the character states at the tips of the tree. What we’re after are the probabilities that one event leads to another—that is, if we find the right set of probabilities, then when we simulate the model, it should give us the known result.
Bayesian phylogenetic analysis derives general understanding from just one historical event. The Bantu colonized Africa only once, giving us one language tree and the contemporary cultures at its tips. At each juncture on the tree, a linguistic group will be modeled as existing in one of the above four combinations of character states. Along a phylogenetic branch extending from that juncture, the group can change one character state at a time—let’s say from matriliny and cattle to patriliny and cattle. What we want to know is, when one changes, how likely is it that the other will change as a response? Specifically, when Bantu migration introduced cattle herding to a matrilineal group, did this force cattle inheritance over to the patrilineal system?
The answer to the analysis is the set of probabilities of change among the four states. It’s a bit like guessing the odds of one basketball team beating another: we guess at the probabilities and then run the model many times for each set of them. In the figure, big arrows represent the most likely/common change, and little arrows mean it would rarely happen. Four arrows represent the transitions in one direction, and four more arrows go the other way. Eight arrows represent the eight probability values that need to be guessed.
Got that so far? We have eight arrows representing transitions between four different states. Now start with an ancestral society at the root of the tree—let’s say matrilineal and no cattle. At the next juncture, it makes a change: either it gains cattle or switches to patrilineal still without cattle. We choose the change at random, but with probabilities given by our arrows that we assigned before running the model. We roll these loaded dice and go to patrilineal without cattle. The descendants of this new group roll their own dice, this time choosing to gain cattle or go back to matrilineal still without cattle, with the arrows giving us the relative odds. No double jumps are allowed, and staying put is also an option. We keep doing this until we have filled out the whole language tree (remember, we started with the language, which does not change).
Now looking at the end points of the tree, check how well the simulation matches the true states that we know from the Ethnographic Atlas. Have the computer run the simulation again and again, maybe a thousand times, just for this particular set of probability arrows. The degree of match with the actual record determines the likelihood that this specific choice of probabilities represents reality. Now change those probabilities, just by a little, and do it all over again. Then again, and again, for different combinations of the eight probabilities (arrows), until we have covered all the sets of probabilities.
In the Bantu study, researchers found the transition probabilities to be quite lopsided. If a matrilineal group acquired cattle, it had a 27 percent chance of becoming patrilineal in the next phylogenetic step. Once a group was patrilineal and herded cattle, it had almost no chance (0.2 percent) of reverting to matriliny. A matrilineal group had a 68 percent chance of losing the cattle if it had them, but only a 16 percent chance of gaining cattle if it didn’t.
The cow was “the enemy of matriliny,” researchers Clare Holden and Ruth Mace rightly concluded. The wider implication is that introducing a new resource can change or disrupt family life. Working in Ethiopia, for example, Mhairi Gibson and Eshetu Gurmu observed that when piped water was introduced to rural villages—ones where women previously had to spend hours carrying water—two changes occurred. First, fertility (the number of children per mother) increased slightly, and second, younger siblings started leaving their families and migrating to the cities. Although development economists would not have expected these responses to supplying water to villages, Gibson was well aware of the Bantu phylogenetic study, which indicated that family organization is a part of a cultural system that coevolves with resources. To her, there was no difference between adding livestock or potable water to the system.
Another goal is to infer the state of the root of the tree. For instance, what was the kinship system among the ancestors of almost all Europeans? Did married couples live in the husband’s or the wife’s village? To answer this question, Oxford anthropologist Laura Fortunato started with the well-established Indo-European language tree, the root of which is the Proto-Indo-European language. She then hung the cultural ornaments on the tips of the tree by looking up the kinship system—matrilocal, patrilocal, and neolocal—among the contemporary Indo-European-speaking societies listed in the Ethnographic Atlas. After all the simulations were completed, the strongest arrows (probabilities) pointed toward patrilocal for ancient Proto-Indo-European. This fit with independent genetic and archaeological evidence, and it’s a remarkable finding.
Taking advantage of preexisting phylogenetic trees of Pacific languages, Tom Currie and his colleagues used Bayesian phylogenetic analysis to understand political systems in ancient Polynesia. Starting some fifty-five hundred years ago, the Pacific was colonized by Austronesian people from southern China or Taiwan. This expansion occurred so rapidly that it has been referred to as an “express train.” By thirty-two hundred years or so ago, we can recognize in the archaeological record the remains of the so-called Lapita people, who were among the best navigators the world has ever seen. In their famous double-hulled canoes, navigating by the stars and inferring the presence of islands over the horizon by the action of tiny wavelets, Lapita mariners dispersed their culture from Island Melanesia as far as Tonga and Samoa in just a few centuries. In the following millennium, their descendants colonized the rest of Polynesia as far north as Hawaii, as far south as New Zealand, and as far east as Rapa Nui (Easter Island). These colonists brought with them not only yams, pigs, and chickens but also pottery making, fishing, and their Austronesian language.
Some of the first Lapita colonists of the archipelago of Vanuatu, east of New Guinea, buried a male “leader” with the skulls of three others placed on his chest, possibly sometime after his death or from sacrifice at the time of his death. Isotopic studies show that the man had sailed to Vanuatu, whereas the other three were probably raised there. A few others had the same exotic isotopic signatures as the leader, and most of them were buried with their heads pointing south, which was uncommon in that cemetery, containing over thirty people. Clearly, even this small group of early Pacific colonists already had a set of identities, and probably some social hierarchy, that had to do with where they were from. Hierarchy is probably essential to navigating such enormous distances. When several Mexican fishermen drifted from a mile offshore all the way to Island South East Asia in 2006, they respected their captain during the entire eight weeks, drinking rainwater and eating the raw turtle meat that ultimately killed the American who was on board with them.
The point is, this rudimentary hierarchy among early Polynesian colonists, once left to their own devises on far-flung islands and archipelagoes, ultimately evolved different political systems. On the coral atoll of Tokelau, the prevailing system was maopoopo—meaning “together both in body and soul”—with land owned corporately by larger family groups. The Hawaiian Islands, in contrast, comprised highly stratified chiefdoms, with a queen ruling the islands at the time of contact in the eighteenth century.
To figure out how this variety of political systems could evolve from a founder system—the one that buried a guy with three heads on his chest—Currie and his colleagues used our familiar methods. First, they needed a language tree, so they accessed published Austronesian language trees, took a sample of a thousand of the most likely ones, and settled on a single tree that represented the chronological order in which linguistic-cultural groups colonized the Pacific. Next they defined four simplified categories of political systems: acephalous (no head), followed by simple chiefdom (one leadership level), complex chiefdom (two leadership levels), and state. From ethnographic and historical records on the Austronesian-speaking societies of Island South East Asia and the Pacific, Currie and colleagues could categorize eighty-four societies at the tips of the ethnolinguistic tree.
The transitions they simulated—the arrows in the figure—represent how probable a change was in a political system over time. When they compared their resulting phylogeny to the eighty-four Austronesian-speaking societies in recent times, the fit was best if acephalous societies moved up to the simple chiefdom level two-thirds of the time, but fell back again a third of the time. Once a simple chiefdom became a complex one, however, or a complex chiefdom moved up to being a state, there effectively was no going back. The root of the tree was probably acephalous (about 75 percent) but possibly (25 percent) a simple chiefdom. This more or less fits with the chronological position of the guy with the three heads.
Now that we’ve looked at the ratchet effect of political systems, let’s explore the other topic never to discuss at the dinner table: religion. Despite their common origin in the Lapita dispersal thirty-five hundred years ago, Pacific societies evolved a remarkable variety of religious practices. One of these was human sacrifice. “The methods of sacrifice,” Joseph Watts and his colleagues merrily pointed out in their paper, “included burning, drowning, strangulation, bludgeoning, burial, being crushed under a newly built canoe, being cut to pieces, as well as being rolled off the roof of a house and then decapitated.” Ouch.
For their Bayesian phylogenetic analysis, Watts and his colleagues inserted two character states, known from historical sources, at the tree tips: the level of hierarchy of the societies plus whether or not they practiced human sacrifice. They actually had three levels of hierarchy, so they needed to run the model twice—the first time budding low and intermediate levels of hierarchy, and the second time budding the intermediate and high levels of hierarchy. Complementing the study of political change, their analysis showed that human sacrifice “locks in” the jump from low to intermediate hierarchy and then helps it jump up another notch, to high hierarchy. Religion came first, followed by stratified societies. This, too, seems consistent with the ancient Vanuatu man with the accompanying three decapitated heads.
The wider impact of these studies is Bayesian phylogenetic analysis itself. Again, think of it like decorating a Christmas tree: take a historical language tree and “hang” some ornaments of culture on the tips of it. Through simulations on the tree, the goal may be to infer the causal relationship among the ornaments or to figure out more about the ancient root of the tree. Whether the tree is tangled or not—whether its continuities are “vertical” through time or “horizontal” through space—however, makes a big difference in cultural evolution, as we will see in the next chapter.