4

PROBABILITY AND RANDOMNESS

A thousand stories which the ignorant tell, and believe, die away at once, when the computist takes them in his gripe.

—Samuel Johnson1

Though Albert Einstein never said most of the things he supposedly said, he did say, in several variations, “I shall never believe that God plays dice with the world.”2 Whether or not he was right about the subatomic world, the world we live in certainly looks like a game of dice, with unpredictability at every scale. The race is not always to the swift, nor the battle to the strong, nor bread to the wise, nor favor to those of skill, but time and chance happen to them all. An essential part of rationality is dealing with randomness in our lives and uncertainty in our knowledge.

What Is Randomness? Where Does It Come From?

In the strip below, Dilbert’s question awakens us to the fact that the word “random” in common parlance refers to two concepts: a lack of patterning in data, and a lack of predictability in a process. When he doubts that the consecutive nines produced by the troll are truly random, he’s referring to their patterning.

DILBERT © 2001 Scott Adams, Inc. Used by permission of ANDREWS MCMEEL SYNDICATION. All rights reserved.

Dilbert’s impression that there’s a pattern in the sequence is not a figment of his imagination, like seeing butterflies in inkblots. Nonrandom patterning can be quantified. Brevity is the soul of pattern: we say that a dataset is nonrandom when its shortest possible description is shorter than the dataset itself.3 The description “6 9s” is two characters long (in an efficient shorthand for the descriptions), whereas the dataset itself, “999999,” is six characters long. Other strings we feel to be nonrandom also submit to compression: “123456” boils down to “1st 6”; “505050” squashes down to “3 50s.” In contrast, data we feel to be random, like “634579,” can’t be abridged into anything more concise; they must be rendered verbatim.

The troll’s answer captures the second sense of randomness: an anarchic, unpredictable generating process. The troll is correct that a random process can generate nonrandom patterns, at least for a while—for six digits’ worth of output, in this case. After all, if there’s no rhyme or reason to the generator, what’s to stop it from producing six 9s or any other nonrandom pattern, at least occasionally? As the generator continues and the sequence gets longer, we can expect random patterning to reassert itself, because the freakish run is unlikely to continue.

The troll’s punch line is profound. As we shall see, mistaking a nonrandom pattern for a nonrandom process is one of the thickest chapters in the annals of human folly, and knowing the difference between them is one of the greatest gifts of rationality that education can confer.

All this raises the question of what kinds of physical mechanism can generate random events. Einstein notwithstanding, most physicists believe there is irreducible randomness in the subatomic realm of quantum mechanics, like the decay of an atomic nucleus or the emission of a photon when an electron jumps from one energy state to another. It’s possible for this quantum uncertainty to be amplified to scales that impinge on our lives. When I was a research assistant in an animal behavior lab, the refrigerator-sized minicomputers of the day were too slow to generate random-looking numbers in real time, and my supervisor had invented a gadget with a capsule filled with a radioactive isotope and a teensy-weensy Geiger counter that detected the intermittent particle spray and tripped a switch that fed the pigeon.4 But in most of the intermediate-sized realm in which we spend our days, quantum effects cancel out and may as well not exist.

So how could randomness arise in a world of billiard balls obeying Newton’s equations? As the 1970s poster proclaimed (satirizing billboards about the speed limit), “Gravity. It isn’t just a good idea. It’s the law.”5 In theory, couldn’t the demon imagined by Pierre-Simon Laplace in 1814, who knew the position and momentum of every particle in the universe, plug them into equations for the laws of physics and predict the future perfectly?

In reality, there are two ways in which a law-governed world can generate events that for all intents and purposes are random. One of them is familiar to popular science readers: the butterfly effect, named after the possibility that the flapping of a butterfly’s wings in Brazil could trigger a tornado in Texas. Butterfly effects can arise in deterministic nonlinear dynamical systems, also known as “chaos,” where minuscule differences in initial conditions, too small for any instrument to measure, can feed on themselves and blow up into gargantuan effects.

The other way in which a deterministic system can appear to be random from a human vantage point also has a familiar name: the coin flip. The fate of a tossed coin is not literally random; a skilled magician can flick one just so to get a head or a tail on demand. But when an outcome depends on a large number of tiny causes that are impractical to keep track of, like the angles and forces that launched the penny and the wind currents buffeting it in midair, it might as well be random.

What Does “Probability” Mean?

When the TV meteorologist says there’s a 30 percent chance of rain in the area tomorrow, what does she mean? Most people are foggy about the answer. Some think it means it will rain in 30 percent of the area. Others think it means it will rain 30 percent of the time. A few think it means that 30 percent of meteorologists think it will rain. And some think it means it will rain somewhere in the area on 30 percent of the days in which a prediction like this is made. (The last of these is in fact closest to what the meteorologist had in mind.)6

Weather-watchers are not the only ones who are confused. In 1929, Bertrand Russell noted that “probability is the most important concept in modern science, especially as nobody has the slightest notion what it means.”7 More accurately, different people have different notions of what it means, as we saw in chapter 1 with the Monty Hall and Linda problems.8

There is the classical definition of probability, which goes back to the origins of probability theory as a way of understanding games of chance. You lay out the possible outcomes of a process that have an equal chance of occurring, add up the ones that count as examples of the event, and divide by the number of possibilities. A die can land on any of six sides. An “even number” corresponds to its landing on the sides with two dots, four dots, or six dots. With three ways it can land “even” out of the six possibilities in all, we say that the classical probability it will roll “even” is three out of six, or .5. (In chapter 1 I used the classical definition to explain the correct strategy in the Monty Hall dilemma, and noted that miscounting the possible outcomes was what lured some of the overconfident experts to the incorrect strategy.)

But why did we think that the outcome of landing on each face had an equal chance of happening in the first place? We assessed the die’s propensity, its physical disposition to do various things. This includes the symmetry of the six faces, the haphazard way the shooter releases it, and the physics of tumbling.

Closely related is a third, subjectivist interpretation. Before you fling the die, based on everything you know, how would you quantify, on a scale from 0 to 1, your belief that it will land even? This credence estimate is sometimes called the Bayesian interpretation of probability (a bit misleadingly, as we’ll see in the next chapter).

Then there is the evidential interpretation: the degree to which you believe the information presented warrants the conclusion. Think of a court of law, where in judging the probability that the defendant is guilty, you ignore inadmissible and prejudicial background information and consider only the strength of the prosecutor’s case. It was the evidential interpretation that made it rational to judge that Linda, having been presented as a social justice warrior, was likelier to be a feminist bank teller than a bank teller.

Finally there is the frequentist interpretation: if you did toss the die many times, say, a thousand, and counted the outcomes, you’d find that the result was even in around five hundred of the tosses, or half of them.

Ordinarily the five interpretations are aligned. In the case of a coin toss, the penny is symmetrical; coming up heads comprises exactly one out of the two possible outcomes; your gut feeling is halfway between “heads for sure” and “tails for sure”; the argument for heads is as strong as the argument for tails; and in the long run half the tosses you’ll see are heads. The probability of heads is .5 in every case. But the interpretations don’t mean the same thing, and sometimes they part company. When they do, statements about probabilities can result in confusion, controversy, even tragedy.

Most dramatically, the first four interpretations apply to the vaguely mystical notion of the probability of a single instance. What is the probability that you are over fifty? That the next pope will be Bono? That Britney Spears and Katy Perry are the same person? That there is life on Enceladus, one of the moons of Saturn? You might object that the questions are meaningless: either you are over fifty or you aren’t, and “probability” has nothing to do with it. But in the subjectivist interpretation, I can put a number on my ignorance. This offends some statisticians, who want to reserve the concept of probability for relative frequency in a set of events, which are really real and can be counted. One quipped that single-event probabilities belong not in mathematics but in psychoanalysis.9

Laypeople, too, can have trouble wrapping their minds around the concept of the numerical probability of a single event. They are mad at the weatherman after getting soaked on a day when she had predicted a 10 percent chance of rain, and they laugh at the poll aggregator who predicted that Hillary Clinton had a 60 percent chance of winning the 2016 presidential election. These soothsayers defend themselves by invoking a frequentist interpretation of their probabilities: on one out of ten days in which she makes such a prediction, it rains; in six of ten elections with those polling numbers, the leading candidate wins. In this strip, Dilbert’s boss illustrates a common fallacy:

DILBERT © 2020 Scott Adams, Inc. Used by permission of ANDREWS MCMEEL SYNDICATION. All rights reserved.

As we saw in chapter 1 with Linda and will see again in the next chapter, reframing a probability from credence in a single event to frequency in a set of events can recalibrate people’s intuitions. A prosecutor in a big city who says “The probability that the DNA on the victim’s clothing would match the DNA on the suspect if he were innocent is one in a hundred thousand” is likelier to win a conviction than one who says “Out of every hundred thousand innocent people in this city, one will show a match.” The first feels like an estimate of subjective doubt that is indistinguishable from zero; the second invites us to imagine that falsely accused fellow, together with the many others living in the metropolis.

People also confuse probability in the frequentist sense with propensity. Gerd Gigerenzer recounts a tour of an aerospace factory in which the guide told the visitors that its Ariane rockets had a 99.6 percent security factor.10 They were standing in front of a poster depicting the ninety-four rockets and their histories, eight of which crashed or blew up. When Gigerenzer asked how a rocket with a 99.6 percent security factor could fail almost 9 percent of the time, the guide explained that the factor was calculated from the reliabilities of the individual parts, and the failures were the result of human error. Of course what we ultimately care about is how often the rocket slips the surly bonds of earth or buys the farm, regardless of the causes, so the only probability that matters is the overall frequency. By the same misunderstanding, people sometimes wonder why a popular candidate who is miles ahead in the polls is given only a 60 percent chance of winning the election, when nothing but a last-minute shocker could derail him. The answer is that the probability estimate takes into account last-minute shockers.

Probability versus Availability

Despite the difference in interpretations, probability is intimately tied to events as a proportion of opportunities, whether directly, in the classical and frequentist definitions, or indirectly, with the other judgments. Surely whenever we say that one event is more probable than another, we believe it will occur more often given the opportunity. To estimate risk, we should tally the number of instances of an event and mentally divide it by the number of occasions on which it could have taken place.

Yet one of the signature findings in the science of human judgment is that this is not how human probability estimation generally works. Instead, people judge the probability of events by the ease with which instances come into mind, a habit that Tversky and Kahneman called the availability heuristic.11 We use the ranking from our brain’s search engine—the images, anecdotes, and mental videos it coughs up—as our best guess of the probabilities. The heuristic exploits a feature of human memory, namely that recall is affected by frequency: the more often we encounter something, the stronger the trace it leaves in our brains. So working backwards and estimating frequency from recallability often works serviceably well. When asked which birds are most common in a city, you would not do badly by tapping your memory and guessing pigeons and sparrows rather than waxwings and flycatchers, instead of going to the trouble of consulting a bird census.

For most of human existence, availability and hearsay were the only ways to estimate frequency. Statistical databases were kept by some governments, but they were considered state secrets and divulged only to administrative elites. With the rise of liberal democracies in the nineteenth century, data came to be considered a public good.12 Even today, when data on just about everything is a few clicks away, not many people avail themselves of it. We instinctively draw on our impressions, which distort our understanding whenever the strengths of those impressions don’t mirror frequencies in the world. That can happen when our experiences are a biased sample of those events, or when the impressions are promoted or demoted in our mental search results by psychological amplifiers such as recency, vividness, or emotional poignancy. The effects on human affairs are sweeping.

Outside our immediate experience, we learn about the world through the media. Media coverage thus drives people’s sense of frequency and risk: they think they are likelier to be killed by a tornado than by asthma, despite asthma being eighty times deadlier, presumably because tornadoes are more photogenic.13 For similar reasons the kinds of people who can’t stay out of the news tend to be overrepresented in our mental censuses. What percentage of teenage girls give birth each year, worldwide? People guess 20 percent, around ten times too many. What proportion of Americans are immigrants? Around 28 percent, say survey respondents; the correct answer is 12 percent. Gay? Americans guess 24 percent; polls indicate 4.5 percent.14 African Americans? About a third, people say, around two and half times higher than the real figure, 12.7 percent. That’s still more accurate than their estimate for another conspicuous minority, Jews, where respondents are off by a factor of nine (18 versus 2 percent).15

The availability heuristic is a major driver of world events, often in irrational directions. Other than disease, the most lethal risk to life and limb is accidents, which kill about five million people a year (out of 56 million deaths in all), about a quarter of them in traffic accidents.16 But except when they take the life of a photogenic celebrity, car crashes seldom make the news, and people are insouciant about the carnage. Plane crashes, in contrast, get lavish coverage, but they kill only about 250 people a year worldwide, making planes about a thousand times safer per passenger mile than cars.17 Yet we all know people with a fear of flying but no one with a fear of driving, and a gory plane crash can scare airline passengers onto the highways for months afterward, where thousands more die.18 The SMBC cartoon makes a similar point.

Among the most vivid and ghastly deaths imaginable is the one described in the song from The Threepenny Opera: “When that shark bites with his teeth, babe, scarlet billows start to spread.”19 In 2019, after a Cape Cod surfer became the first shark fatality in Massachusetts in more than eighty years, the towns equipped every beach with menacing Jaws-like warning billboards and hemorrhage-control kits, and commissioned studies on towers, drones, planes, balloons, sonar, acoustic buoys, and electromagnetic and odorant repellents. Yet every year on Cape Cod between fifteen and twenty people die in car crashes, and cheap improvements in signage, barriers, and traffic law enforcement could save many more lives at a fraction of the cost.20

Used by permission of Zach Weinersmith

The availability bias may affect the fate of the planet. Several eminent climate scientists, having crunched the numbers, warn that “there is no credible path to climate stabilization that does not include a substantial role for nuclear power.”21 Nuclear power is the safest form of energy humanity has ever used. Mining accidents, hydroelectric dam failures, natural gas explosions, and oil train crashes all kill people, sometimes in large numbers, and smoke from burning coal kills them in enormous numbers, more than half a million per year. Yet nuclear power has stalled for decades in the United States and is being pushed back in Europe, often replaced by dirty and dangerous coal. In large part the opposition is driven by memories of three accidents: Three Mile Island in 1979, which killed no one; Fukushima in 2011, which killed one worker years later (the other deaths were caused by the tsunami and from a panicked evacuation); and the Soviet-bungled Chernobyl in 1986, which killed 31 in the accident and perhaps several thousand from cancer, around the same number killed by coal emissions every day.22

Availability, to be sure, is not the only distorter of risk perception. Paul Slovic, a collaborator of Tversky and Kahneman, showed that people also overestimate the danger from threats that are novel (the devil they don’t know instead of the devil they do), out of their control (as if they can drive more safely than a pilot can fly), human-made (so they avoid genetically modified foods but swallow the many toxins that evolved naturally in plants), and inequitable (when they feel they assume a risk for another’s gain).23 When these bugbears combine with the prospect of a disaster that kills many people at once, the sum of all fears becomes a dread risk. Plane crashes, nuclear meltdowns, and terrorist attacks are prime examples.


Terrorism, like other losses of life with malice aforethought, brews up a different chemistry of fear. Body-counting data scientists are often perplexed at the way that highly publicized but low-casualty killings can lead to epochal societal reactions. The worst terrorist attack in history by far was 9/11, and it claimed 3,000 lives; in most bad years, the United States suffers a few dozen terrorist deaths, a rounding error in the tally of homicides and accidents. (The annual toll is lower, for example, than the number of people killed by lightning, bee stings, or drowning in bathtubs.) Yet 9/11 led to the creation of a new federal department, massive surveillance of citizens and hardening of public facilities, and two wars which killed more than twice as many Americans as the number who died in 2001, together with hundreds of thousands of Iraqis and Afghans.24

To take another low-death/high-fear hazard, rampage killings in American schools claim around 35 victims a year, compared with about 16,000 routine police-blotter homicides.25 Yet American schools have implemented billions of dollars of dubious safety measures, like installing bulletproof whiteboards and arming teachers with pepperball guns, while traumatizing children with terrifying active-shooter drills. In 2020 the brutal murder of George Floyd, an unarmed African American man, by a white police officer led to massive protests and the sudden adoption of a radical academic doctrine, Critical Race Theory, by universities, newspapers, and corporations. These upheavals were driven by the impression that African Americans are at serious risk of being killed by the police. Yet as with terrorism and school shootings, the numbers are surprising. A total of 65 unarmed Americans of all races are killed by the police in an average year, of which 23 are African American, which is around three tenths of one percent of the 7,500 African American homicide victims.26

It would be psychologically obtuse to explain the outsize reaction to publicized killings solely by availability-inflated fear. As with many signs of apparent irrationality, there are other logics at work, in the service of goals other than accurate probabilities.

Our disproportionate reaction to murder most foul may be irrational in the framework of probability theory but rational in the framework of game theory (chapter 8). Homicide is not like other lethal hazards. A hurricane or shark doesn’t care how we will respond to the harm they have in store for us, but a human killer might. So when people react to a killing with public shock and anger, and redouble their commitment to self-defense, justice, or revenge, it sends a signal to the premeditating killers out there, possibly giving them second thoughts.

Game theory may also explain the frenzy set off by a special kind of event that Thomas Schelling described in 1960, which may be called a communal outrage.27 A communal outrage is a flagrant, widely witnessed attack upon a member or symbol of a collective. It is felt to be an intolerable affront and incites the collective to rise up and righteously avenge it. Examples include the explosion of the USS Maine in 1898, leading to the Spanish-American War; the sinking of the RMS Lusitania in 1915, tipping the United States toward entering World War I; the Reichstag fire of 1933, enabling the establishment of the Nazi regime; Pearl Harbor in 1941, sending America into World War II; 9/11, which licensed the invasions of Afghanistan and Iraq; and the harassment of a produce peddler in Tunisia in 2010, whose self-immolation set off the Tunisian Revolution and Arab Spring. The logic of these reactions is common knowledge in the technical sense of something that everyone knows that everyone knows that everyone knows.28 Common knowledge is necessary for coordination, in which several parties act in the expectation that each of the others will too. Common knowledge can be generated by focal points, public happenings which people see other people seeing. A public outrage can be the common knowledge that solves the problem of getting everyone to act in concert when a vexation has built up gradually and the right moment to deal with it never seems to arrive. The unignorable atrocity can trigger simultaneous indignation in a dispersed constituency and forge them into a resolute collective. The amount of harm inflicted by the attack is beside the point.

Not just beside the point but taboo. A communal outrage inspires what the psychologist Roy Baumeister calls a victim narrative: a moralized allegory in which a harmful act is sanctified, the damage consecrated as irreparable and unforgivable.29 The goal of the narrative is not accuracy but solidarity. Picking nits about what actually happened is seen as not just irrelevant but sacrilegious or treasonous.30

At best, a public outrage can mobilize overdue action against a long-simmering trouble, as is happening in the grappling with systemic racism in response to the Floyd killing. Thoughtful leadership can channel an outrage into responsible reform, captured in the politician’s saying “Never let a crisis go to waste.”31 But the history of public outrages suggests they can also empower demagogues and egg impassioned mobs into quagmires and disasters. Overall, I suspect that more good comes from cooler heads assessing harms accurately and responding to them proportionately.32


Outrages cannot become public without media coverage. It was in the aftermath of the Maine explosion that the term “yellow journalism” came into common usage. Even when journalists don’t whip readers into a jingoistic lather, intemperate public reactions are a built-in hazard. I believe journalists have not given enough thought to the way that media coverage can activate our cognitive biases and distort our understanding. Cynics might respond that the journalists couldn’t care less, since the only thing that matters to them is clicks and eyeballs. But in my experience most journalists are idealists, who feel they answer to the higher calling of informing the public.

The press is an availability machine. It serves up anecdotes which feed our impression of what’s common in a way that is guaranteed to mislead. Since news is what happens, not what doesn’t happen, the denominator in the fraction corresponding to the true probability of an event—all the opportunities for the event to occur, including those in which it doesn’t—is invisible, leaving us in the dark about how prevalent something really is.

The distortions, moreover, are not haphazard, but misdirect us toward the morbid. Things that happen suddenly are usually bad—a war, a shooting, a famine, a financial collapse—but good things may consist of nothing happening, like a boring country at peace or a forgettable region that is healthy and well fed. And when progress takes place, it isn’t built in a day; it creeps up a few percentage points a year, transforming the world by stealth. As the economist Max Roser points out, news sites could have run the headline 137,000 People Escaped Extreme Poverty Yesterday every day for the past twenty-five years.33 But they never ran the headline, because there was never a Thursday in October in which it suddenly happened. So one of the greatest developments in human history—a billion and a quarter people escaping from squalor—has gone unnoticed.

The ignorance is measurable. Pollsters repeatedly find that while people tend to be too optimistic about their own lives, they are too pessimistic about their societies. For instance, in most years between 1992 and 2015, an era that criminologists call the Great American Crime Decline, a majority of Americans believed that crime was rising.34 In their “Ignorance Project,” Hans and Ola Rosling and Anna Rosling-Rönnlund have shown that the understanding of global trends in most educated people is exactly backwards: they think that longevity, literacy, and extreme poverty are worsening, whereas all have dramatically improved.35 (The Covid-19 pandemic set these trends back in 2020, almost certainly temporarily.)

Availability-driven ignorance can be corrosive. A looping mental newsreel of catastrophes and failures can breed cynicism about the ability of science, liberal democracy, and institutions of global cooperation to improve the human condition. The result can be a paralyzing fatalism or a reckless radicalism: a call to smash the machine, drain the swamp, or empower a demagogue who promises “I alone can fix it.”36 Calamity-peddling journalism also sets up perverse incentives for terrorists and rampage shooters, who can game the system and win instant notoriety.37 And a special place in Journalist Hell is reserved for the scribes who in 2021, during the rollout of Covid vaccines known to have a 95 percent efficacy rate, wrote stories on the vaccinated people who came down with the disease—by definition not news (since it was always certain there would be some) and guaranteed to scare thousands from this lifesaving treatment.

How can we recognize the genuine dangers in the world while calibrating our understanding to reality? Consumers of news should be aware of its built-in bias and adjust their information diet to include sources that present the bigger statistical picture: less Facebook News Feed, more Our World in Data.38 Journalists should put lurid events in context. A killing or plane crash or shark attack should be accompanied by the annual rate, which takes into account the denominator of the probability, not just the numerator. A setback or spate of misfortunes should be put into the context of the longer-term trend. News sources might include a dashboard of national and global indicators—the homicide rate, CO2 emissions, war deaths, democracies, hate crimes, violence against women, poverty, and so on—so readers can see the trends for themselves and get a sense of which policies move the needle in the right direction. Though editors have told me that readers hate math and will never put up with numbers spoiling their stories and pictures, their own media belie this condescension. People avidly consume data in the weather, business, and sports pages, so why not the news?

Conjunctive, Disjunctive, and Conditional Probabilities

A TV meteorologist announces there is a 50 percent chance of rain on Saturday and a 50 percent chance of rain on Sunday and concludes there is a 100 percent chance of rain over the weekend.39 In an old joke, a man carries a bomb onto a plane for his own safety because, he figures, what are the chances that a plane will have two bombs on it? And then there’s the argument that the pope is almost certainly a space alien. The probability that a randomly selected person on earth is the pope is tiny: one out of 7.8 billion, or .00000000013. Francis is the pope. Therefore, Francis is probably not a human being.40

In reasoning about probability, it’s easy to go off the rails. These clangers come from misapplying the next step in understanding probability: how to calculate the probabilities of a conjunction, a disjunction, a complement, and a conditional. If these terms sound familiar, it’s because they are the probabilistic equivalents of and, or, not, and if-then from the previous chapter. Though the formulas are simple, each lays a trap, and tripping them is what gives rise to the probability gaffes.41

The probability of a conjunction of two independent events, prob(A and B), is the product of the probabilities of each: prob(A) × prob(B). If the Greens have two children, what is the probability that both are girls? It’s the probability that the first one is a girl, .5, times the probability that the second is a girl, also .5, or .25. Translating from single-event to frequentist language, we will find that in all the two-children families we look at, a quarter will be all-girl. More intuitive still, the classical definition of probability advises us to lay out the logical possibilities: Boy-Boy, Boy-Girl, Girl-Boy, Girl-Girl. One of these four is all-girl.

The trap in the conjunction formula lies in the proviso independent. Events are independent when they are disconnected: the chance of seeing one has no bearing on the chance of seeing the other. Imagine a society, perhaps not far off, in which people can choose the sex of their children. Imagine for the sake of the example that parents are gender chauvinists, half wanting only boys and the other only girls. If the first child is a girl, it tips us off that the parents preferred a girl, which means they would opt for a girl again, and vice versa if the first child is a boy. The events are not independent, and multiplication fails. If the preferences were absolute and the technology perfect, every family would have only sons or only daughters, and the probability that a two-child family is all-girl would be .5, not .25.

Failing to think about whether events are independent can lead to big boo-boos. When a streak of rare occurrences pops up in entities that are not quarantined from one another—the occupants of a building, who give each other colds, or the members of a peer group, who copy each other’s fashions, or the survey answers from a single respondent, who retains his biases from question to question, or the measurements of anything on successive days or months or years, which may show inertia—then the set of observations is in effect a single event, not a freakish run of events, and their probabilities may not be multiplied. For example, if the crime rate was below average in each of the twelve months after Neighborhood Watch signs were posted in a city, it would be a mistake to conclude that the run must be due to the signage rather than chance. Crime rates change slowly, with the patterns in one month carrying over to the next, so the outcome is closer to a single coin flip than a run of twelve coin flips.

In the legal arena, misapplying the formula for a conjunction is not just a math error but a miscarriage of justice. A notorious example is the bogus “Meadow’s Law,” named after a British pediatrician who declared that when crib deaths within a family are examined, “one is a tragedy, two is suspicious and three is murder unless there is proof to the contrary.” In the 1999 case of the attorney Sally Clark, who had lost two infant sons, the doctor testified that since the probability of a crib death in an affluent nonsmoking family is 1 in 8,500, the probability of two crib deaths is the square of that number, 1 in 72 million. Clark was sentenced to life imprisonment for murder. Appalled statisticians pointed out the mistake: crib deaths within a family are not independent, because siblings may share a genetic predisposition, the home may have elevated risk factors, and the parents may have reacted to the first tragedy by taking misguided precautions that increased the chance of a second. Clark was released after a second appeal (on different grounds), and in the following years hundreds of cases based on similar errors had to be reviewed.42

Another howler in calculating conjunctions had a cameo in the bizarre attempt by Donald Trump and his supporters to overturn the results of the 2020 presidential election based on baseless claims of voter fraud. In a motion filed with the US Supreme Court, the Texas attorney general Ken Paxton wrote: “The probability of former Vice President Biden winning the popular vote in the four Defendant States—Georgia, Michigan, Pennsylvania, and Wisconsin—independently given President Trump’s early lead in those States as of 3 a.m. on November 4, 2020, is less than one in a quadrillion, or 1 in 1,000,000,000,000,000. For former Vice President Biden to win these four States collectively, the odds of that event happening decrease to less than one in a quadrillion to the fourth power.” Paxton’s jaw-dropping math assumed that the votes being tallied over the course of the counting were statistically independent, like repeated rolls of a die. But urbanites tend to vote differently from suburbanites, who in turn vote differently from country folk, and in-person voters differ from those who mail in their ballots (particularly in 2020, when Trump discouraged his supporters from voting by mail). Within each sector, the votes are not independent, and the base rates differ from sector to sector. Since the results from each precinct are announced as they become available, and the mail-in ballots are counted later still, then as the different tranches are added up, the running tally favoring each candidate can rise or fall, and the final result cannot be extrapolated from the interim ones. The flapdoodle was raised to the fourth power when Paxton multiplied the bogus probabilities from the four states, whose votes are not independent either: whatever sways voters in the Great Lake State is also likely to sway them in America’s Dairyland.43


Statistical independence is tied to the concept of causation: if one event affects another, they are not statistically independent (though, as we shall see, the converse isn’t true: events that are causally isolated may be statistically dependent). That is why the gambler’s fallacy is a fallacy. One spin of a roulette wheel cannot impinge on the next, so the high roller who expects a run of blacks to set up a red will lose his shirt: the probability is always a bit less than .5 (because of the green slots with 0 and 00). This shows that fallacies of statistical independence can go both ways: falsely assuming independence (as in Meadow’s fallacy) and falsely assuming dependence (as in the gambler’s fallacy).

Whether events are independent is not always obvious. Among the most famous applications of research on cognitive biases to everyday life was Tversky’s analysis (with the social psychologist Tom Gilovich) of the “hot hand” in basketball.44 Every hoops fan knows that from time to time a player can be “on fire,” “in the zone,” or “unconscious,” especially “streak shooters” like Vinnie “The Microwave” Johnson, the 1980s Detroit Pistons guard who earned his sobriquet because he heats up in a hurry. In the teeth of incredulity from every fan, coach, player, and sportswriter, Tversky and Gilovich claimed that the hot hand was an illusion, a gambler’s fallacy in reverse. The data they analyzed suggested that the outcome of every attempt is statistically independent of the preceding run of attempts.

Now, before looking at the data, one cannot blow off the possibility of a hot hand on the grounds of causal plausibility in the way one can blow off the gambler’s fallacy. Unlike a roulette wheel, a player’s body and brain do have a memory, and it is far from superstitious to think that a spurt of energy or confidence might persist over a span of minutes. So it was not a breach of the scientific worldview when other statisticians took a second look at the data and concluded that the boffins were wrong and the jocks were right: there is a hot hand in basketball. The economists Joshua Miller and Adam Sanjurjo showed that when you select streaks of hits or misses from a long run of data, the outcome of the very next attempt is not statistically independent of that streak. The reason is that if the attempt had happened to be successful and continue the streak, it might have been counted as part of that streak in the first place. Any attempt which is singled out because it took place following a streak is biased to be an unsuccessful attempt: one that had no chance of being defined as part of the streak itself. That throws off the calculations of what one should expect by chance, which in turn throws off the conclusion that basketball players are no streakier than roulette wheels.45

The hot hand fallacy fallacy has three lessons. First, events can be statistically dependent not only when one event causally impinges on the other but when it affects which event is selected for comparison. Second, the gambler’s fallacy may arise from a not-so-irrational feature of perception: when we look for streaks in a long run of events, a streak of a given length really is likelier to be reversed than to continue. Third, probability can be truly, deeply unintuitive: even the mavens can mess up the math.


Let’s turn to the probability of a disjunction of events, prob(A or B). It is the probability of A plus the probability of B minus the probability of both A and B. If the Browns have two children, the probability that at least one is a girl—that is, that the first is a girl or the second is a girl—is .5 + .5 – .25, or .75. You can arrive at the same result by counting the combinations: Boy-Girl + Girl-Girl + Girl-Boy (three possibilities) out of Boy-Girl + Boy-Boy + Girl-Boy + Girl-Girl (four opportunities). Or by tallying frequencies: in a large set of families with two children, you’ll find that three fourths have at least one daughter.

The arithmetic of or shows us what went wrong with the weathercaster who said it was certain to rain over the weekend because there was a 50 percent chance of rain on each day: by simply adding the two probabilities, he inadvertently double-counted the weekends on which it would rain on both days, neglecting to subtract .25 for the conjunction. He applied a rule that works for exclusive-or (xor), namely A or B but not both. The probabilities of mutually exclusive events can be added to get the disjunction, and the sum of all of them is 1, certainty. The probability that a child is either a boy (.5) or a girl (.5) is their sum, 1, since the child must be either one or the other (this being an example to explain the math, I’ve adopted a gender binary and not considered intersex children). If you forget the difference and confuse overlapping with mutually exclusive events, you can get crazy results. Imagine the weathercaster predicting a .5 chance of rain on Saturday, on Sunday, and on Monday, and concluding that the chance of rain over the long weekend was 1.5.

The probability of the complement of an event, namely A not happening, is 1 minus the probability of it happening. This comes in handy when we have to estimate the probability of “at least one” event. Remember the Browns with their daughter, or perhaps two? Since having at least one daughter is the same as not having all sons, then instead of calculating the disjunction (first child is a girl or second child is a girl), we could have calculated the complement of a conjunction: 1 minus the chance of having all boys (which is .25), namely .75. In the case of two events it doesn’t make much difference which formula we use. But when we have to calculate the probability of at least one A in a large set, the disjunction rule requires the tedium of adding and subtracting a lot of combinations. It’s easier to calculate it as the probability of “not all not-A,” which is simply 1 minus a big product.

Suppose, for example, that every year there’s a 10 percent chance of a war breaking out. What are the chances of at least one war breaking out over a decade? (Let’s assume that wars are independent, not contagious, which seems to be true.)46 Instead of adding up the chance that a war will break out in Year 1 plus the chance that it will break out in Year 2 minus the chance that one will break out in both Year 1 and Year 2, and so on for all combinations, we can simply calculate the chance that no war will break out over all of the years and subtract it from 1. That is simply the chance that war will not break out in a given year, .9, multiplied by itself for each of the other years (.9 × .9 × . . . .9, or .910, which equals .35), which when subtracted from 1 yields .65.


Finally we get to a conditional probability: the probability of A given B, written as prob(A | B). A conditional probability is conceptually simple: it’s just the probability of the then in an if-then. It’s also arithmetically simple: it’s just the probability of A and B divided by the probability of B. Nonetheless, it’s the source of endless confusions, blunders, and paradoxes in reasoning about probability, starting with the hapless fellow in the XKCD cartoon on the following page.47 His error lies in confusing the simple probability or base rate of lightning deaths, prob(struck-by-lightning), with the conditional probability of a lightning death given that one is outside during an electrical storm, prob(struck-by-lightning | outside-in-storm).

xkcd.com

Though the arithmetic of a conditional probability is simple, it’s unintuitive until we make it concrete and visualizable (as always). Look at the Venn diagrams on the page after the next one, where the size of a region on the page corresponds to the number of outcomes. The rectangle, with an area of 1, embraces all the possibilities. A circle encloses all the As, and the top left figure shows that the probability of A corresponds to its area (dark) as a proportion of the whole rectangle (pale)—another way of saying the number of occurrences divided by the number of opportunities. The top right figure shows the probability of A or B, which is the total dark area, namely the area of A plus the area of B without double-counting the wedge in the middle they share, that is, the probability of A and B. That wedge, prob(A and B), is shown in the lower left diagram.

The bottom right diagram explains the deal with conditional probabilities. It indicates that we should ignore the vast space of everything that can possibly happen, bleached into white, and focus our attention only on the incidents in which B happens, the shaded circle. Now we scrutinize how many of those incidents are ones in which A also happens: the size of the A and B wedge as a proportion of the size of the B circle. Of all the interludes in which people walk in an electrical storm (B), what proportion of them result in a lightning strike (A and B)? That’s why we calculate the conditional, prob(A | B), by dividing the conjunction, prob(A and B), into the base rate, prob(B).

Here’s an example. The Grays have two children. The elder is a girl. Knowing this, what is the probability that both are girls? Let’s translate the question into a conditional probability, namely the probability that the first is a girl and the second is a girl given that the first is a girl, or, in fancy notation, prob(1st = Girl and 2nd = Girl | 1st = Girl). The formula tells us to divide the conjunction, which we already calculated to be .25, by the simple probability for the second child, .5, and we get .5. Or, thinking classically and concretely: Girl-Girl (one possibility) divided by Girl-Girl and Girl-Boy (two opportunities) equals one half.

Conditional probabilities add some precision to the concept of statistical independence, which I left hanging in the preceding subsection. The concept may now be defined: A and B are independent if, for all Bs, the probability of A given B is the same as the overall probability of A (and so on for B). Now, remember the illegal multiplication of probabilities for the conjunction of events when they are not independent? What are we supposed to do instead? Easy: the probability of the conjunction of A and B when they are not independent is the probability of A times the probability of B given A, to wit, prob(A) × prob(B | A).

Why am I belaboring the concept of conditional probability with all those synonymous representations—English prose, its logical equivalent, the mathematical formula, Venn diagrams, counting up the possibilities? It’s because conditional probability is such a source of confusion that you can’t have too many explanations.48

If you don’t believe me, consider the Whites, yet another two-child family. At least one of them is a girl. What is the probability that both are girls, namely, the conditional probability of two girls given at least one girl, or prob(1st = Girl and 2nd = Girl | 1st = Girl or 2nd = Girl)? So few people get the answer right that statisticians call it the Boy or Girl paradox. People tend to say .5; the correct answer is .33. In this case concrete thinking can lead to the wrong answer: people visualize an elder girl, realize she could have either a younger sister or a younger brother, and figure that the sister is one possibility out of those two. They forget that there’s another way of having at least one girl: she could be the younger of the two. Enumerating the possibilities properly, we get Girl-Girl (one) divided by [Girl-Girl plus Girl-Boy plus Boy-Girl] (three), which equals one third. Or, using the formula, we divide .25 (Girl and Girl) by .75 (Girl or Girl).

The Boy or Girl paradox is not just a trick with wording. It comes from a failure of the imagination to enumerate the possibilities, and appears in many guises, including the Monty Hall dilemma. Here’s a simpler yet exact equivalent.49 Some sidewalk card sharks make a living from roping passersby into playing Three Cards in a Hat. The shark shows them a card that is red on both sides, a card that is white on both sides, and a card that is red on one side and white on the other. He mixes them in a hat, draws one, notes that the face is (say) red, and offers the passerby even money that the other side is also red (they pay him a dollar if it’s red, he pays them a dollar if it’s white). It’s a sucker’s bet: the odds that it’s red are two in three. The rubes mentally count cards instead of sides of cards, forgetting that there are two ways that the all-red card, had it been chosen, could have appeared red side up.

And remember the guy who brought his own bomb onto the plane? He calculated the overall probability that a plane would have two bombs on it. But by bringing his own bomb aboard, he had already ruled out most of the possibilities in the denominator. The number he should care about is the conditional probability that a plane will have two bombs given that it already has a bomb, namely his own (which has a probability of 1). That conditional is the probability that someone else will have a bomb times 1 (the conjunction of his bomb and the other guy’s) divided by 1 (his bomb), which works out, of course, to the probability that someone else will have a bomb, just where he started. The joke was used to good effect in The World According to Garp. The Garps are looking over a house when a small plane crashes into it. Garp says, “We’ll take the house. The chances of another plane hitting this house are astronomical.”50

Forgetting to condition a base-rate probability by special circumstances in place—the lightning storm, the bomb you bring aboard—is a common probability blunder. During the 1995 trial of O. J. Simpson, the football star accused of murdering his wife, Nicole, a prosecutor called attention to his history of battering her. A member of Simpson’s “Dream Team” of defense attorneys replied that very few batterers go on to kill their wives, perhaps one in 2,500. An English professor, Elaine Scarry, spotted the fallacy. Nicole Simpson was not just any old victim of battering. She was a victim of battering who had her throat cut. The relevant statistic is the conditional probability that someone killed his wife given that he had battered his wife and that his wife was murdered by someone. That probability is eight out of nine.51


The other common error with conditional probability is confusing the probability of A given B with the probability of B given A, the statistical equivalent of affirming the consequent (going from If P then Q to If Q then P).52 Remember Irwin the hypochondriac, who knew he had liver disease because his symptoms matched the list perfectly, namely no discomfort? Irwin confused the probability of no symptoms given liver disease, which is high, with the probability of liver disease given no symptoms, which is low. That is because the probability of liver disease (its base rate) is low, and the probability of no discomfort is high.

Conditional probabilities cannot be flipped whenever base rates differ. Take a real-life example, the finding that a third of fatal accidents occur in the home, which inspired the headline Private Homes Are Dangerous Spots. The problem is that the home is where we spend most of our time, so even if homes are not particularly dangerous, a lot of accidents happen to us there because a lot of everything happens to us there. The headline writer confused the probability that we were at home given that a fatal accident has occurred—the statistic being reported—with the probability that a fatal accident occurs given that we are at home, which is the propensity that readers are interested in. We can grasp the problem more intuitively by looking at the diagram below, where the base rates are reflected in the relative sizes of the circles (say, with A as days with fatal accidents, B with days at home).

The left diagram shows the probability of A given B (the probability of a fatal accident given that one is at home); it is the area of the dark wedge (A and B) as a proportion of the big pale circle (B, being at home), which is small. The right diagram shows the probability of B given A (of being at home given there was a fatal accident); it is the area of that same dark wedge but this time as a proportion of the small pale circle, fatal accidents, and is much bigger.

One reason that conditional probabilities are so easy to get backwards is that the English language is ambiguous as to which is intended. “The probability of an accident taking place in the home is .33” could mean “as a proportion of accidents” or “as a proportion of time spent at home.” The difference can get lost in translation and spawn bogus estimates of propensities. A majority of bicycle accidents involve boys, so we get the headline Boys More at Risk on Bicycles, implying that boys are more reckless, whereas in fact they may just be more avid bicycle riders. And in what statisticians call the prosecutor’s fallacy, the DA announces that the likelihood of the victim’s blood type matching that on the defendant’s clothing by chance is just 3 percent, and concludes that the probability that the defendant is guilty is 97 percent. He has confused (and hopes the jurors will confuse) the probability of a match given the defendant’s innocence with the probability of the defendant’s innocence given a match.53 How to do the arithmetic properly is the topic of the next chapter, Bayesian reasoning.

Ambiguities in conditional probability can be incendiary. In 2019 a pair of social scientists created a furor when they published a study in the prestigious Proceedings of the National Academy of Sciences which, citing numbers like the ones I mentioned in an earlier section, claimed that police were likelier to shoot whites than blacks, contrary to the common assumption of racial bias. Critics pointed out that this conclusion pertained to the probability that someone is black given that they were shot, which is indeed lower than the corresponding probability for whites, but only because the country has fewer blacks than whites in the first place, a difference in the base rates. If the police are racially biased, that would be a propensity manifesting itself as a higher probability that someone is shot given that they are black, and the data suggest that the probability is indeed higher. Though the original authors noted that the suitable base rate is not obvious—should it be the proportion of blacks in the population, or in encounters with the police?—they realized they had made such a mess in how they had stated the probabilities that they formally retracted the paper.54

And the pope from outer space? That’s what you get when you confuse the probability of being the pope given that someone is human with the probability that someone is human given that he is the pope.55

Prior and Post Hoc Probabilities

A man tries on a custom suit and says to the tailor, “I need this sleeve taken in.” The tailor says, “No, just bend your elbow like this. See, it pulls up the sleeve.” The customer says, “Well, OK, but when I bend my elbow, the collar goes up the back of my neck.” The tailor says, “So? Raise your head up and back. Perfect.” The man says, “But now the left shoulder is three inches lower than the right one!” The tailor says, “No problem. Bend at the waist and it evens out.” The man leaves the store wearing the suit, his right elbow sticking out, his head craned back, his torso bent to the left, walking with a herky-jerky gait. A pair of pedestrians pass him by. The first says, “Did you see that poor disabled guy? My heart aches for him.” The second says, “Yeah, but his tailor is a genius—the suit fits him perfectly!”

The joke illustrates yet another family of probability blunders: confusing prior with post hoc judgments (also called a priori and a posteriori). The confusion is sometimes called the Texas sharpshooter fallacy, after the marksman who fires a bullet into the side of a barn and then paints a bull’s-eye around the hole. In the case of probability, it makes a big difference whether the denominator of the fraction—the number of opportunities for an event to occur—is counted independently of the numerator, the events of interest. Confirmation bias, discussed in chapter 1, sets up the error: once we expect a pattern, we seek out examples and ignore the counterexamples. If you take note of the predictions by a psychic that are borne out by events, but don’t divide by the total number of predictions, correct and incorrect, you can get any probability you want. As Francis Bacon noted in 1620, such is the way of all superstitions, whether in astrology, dreams, omens, or divine judgments.

Or financial markets. An unscrupulous investment advisor with a 100,000-person mailing list sends a newsletter to half of the list predicting that the market will rise and a version to the other half predicting that it will fall. At the end of every quarter he discards the names of the people to whom he sent the wrong prediction and repeats the process with the remainder. After two years he signs up the 1,562 recipients who are amazed at his track record of predicting the market eight quarters in a row.56

Though this scam is illegal if carried out knowingly, when it’s carried out naïvely it’s the lifeblood of the finance industry. Traders are lightning-quick at snapping up bargains, so very few stock-pickers can outperform a mindless basket of securities. One exception was Bill Miller, anointed by CNN Money in 2006 as “The Greatest Money Manager of Our Time” for beating the S&P 500 stock market index fifteen years in a row. How impressive is that? One might think that if a manager is equally likely to outperform or underperform the index in any year, the odds of that happening by chance are just 1 in 32,768 (215). But Miller was singled out after his amazing streak had unfolded. As the physicist Len Mlodinow pointed out in The Drunkard’s Walk: How Randomness Rules Our Lives, the country has more than six thousand fund managers, and modern mutual funds have been around for about forty years. The chance that some manager had a fifteen-year winning streak sometime over those forty years is not at all unlikely; it’s 3 in 4. The CNN Money headline could have read Expected 15-Year Run Finally Occurs: Bill Miller Is the Lucky One. Sure enough, Miller’s luck ran out, and in the following two years the market “handily pulverized him.”57

On top of confirmation bias, a major contributor to post hoc probability fallacies is our failure to appreciate how many opportunities there are for coincidences to occur. When we are allowed to identify them post hoc, coincidences are not unlikely at all; they’re pretty much guaranteed to happen. In one of his Scientific American columns, the recreational mathematician Martin Gardner asked, “Would you notice it if the license plate of a car just ahead of you bore digits that, read backward, gave your telephone number? Who except a numerologist or logophile would see the letters U, S, A symmetrically placed in louisiana or at the end of john philip sousa, the name of the composer of our greatest patriotic marches? It takes an odd sort of mind to discover that Newton was born the same year that Galileo died, or that Bobby Fischer was born under the sign of Pisces (the Fish).”58 But these numerologists and odd sorts of minds exist, and their post hoc sharpshooting can be spun into highfalutin theories. The psychoanalyst Carl Jung proposed a mystical force called synchronicity to explain the quintessential thing that needs no explanation, the prevalence of coincidence in the world.

When I was a child, what we now call memes were circulated in comic books and popular magazines. One that made the rounds was a list of the incredible similarities between Abraham Lincoln and John F. Kennedy. Honest Abe and JFK were both elected to Congress in ’46 and to the presidency in ’60. Both were shot in the head in the presence of their wives on a Friday. Lincoln had a secretary named Kennedy; Kennedy had a secretary named Lincoln. Both were succeeded by Johnsons who were born in ’08. Their assassins were both born in ’39 and had three names which add up to fifteen letters. John Wilkes Booth ran from a theater and was caught in a warehouse; Lee Harvey Oswald ran from a warehouse and was caught in a theater. What do these eerie parallels tell us? With all due respect to Dr. Jung, absolutely nothing, other than that coincidences happen more often than our statistically untutored minds appreciate. Not to mention the fact that when spooky coincidences are noticed, they tend to get embellished (Lincoln did not have a secretary named Kennedy), while pesky noncoincidences (like their different days, months, and years of birth and death) are ignored.

Scientists are not immune to the Texas sharpshooter fallacy. It’s one of the explanations for the replicability crisis that rocked epidemiology, social psychology, human genetics, and other fields in the 2010s.59 Think of all the foods that are good for you which used to be bad for you, the miracle drug that turns out to work no better than the placebo, the gene for this or that trait which was really noise in the DNA, the cute studies showing that people contribute more to the coffee fund when two eyespots are posted on the wall and that they walk more slowly to the elevator after completing an experiment that presented them with words associated with old age.

It’s not that the investigators faked their data. It’s that they engaged in what is now known as questionable research practices, the garden of forking paths, and p-hacking (referring to the probability threshold, p, that counts as “statistically significant”).60 Imagine a scientist who runs a laborious experiment and obtains data that are the opposite of “Eureka!” Before cutting his losses, he may be tempted to wonder whether the effect really is there, but only with the men, or only with the women, or if you throw out the freak data from the participants who zoned out, or if you exclude the crazy Trump years, or if you switch to a statistical test which looks at the ranking of the data rather than their values down to the last decimal place. Or you can continue to test participants until the precious asterisk appears in the statistical printout, being sure to quit while you’re ahead.

None of these practices is inherently unreasonable if it can be justified before the data are collected. But if they are tried after the fact, some combination is likely to capitalize on chance and cough up a spurious result. The trap is inherent to the nature of probability and has been known for decades; I recall being warned against “data snooping” when I took statistics in 1974. But until recently few scientists intuitively grasped how a smidgen of data snooping could lead to a boatload of error. My professor half-jokingly suggested that scientists be required to write down their hypotheses and methods on a piece of paper before doing an experiment and safekeep it in a lockbox they would open and show to reviewers after the study was done.61 The only problem, he noted, was that a scientist could secretly keep several lockboxes and then open the one he knew “predicted” the data. With the advent of the web, the problem has been solved, and the state of the art in scientific methodology is to “preregister” the details of a study in a public registry that reviewers and editors can check for post hoc hanky-panky.62


One kind of post hoc probability illusion is so common that it has its own name: the cluster illusion.63 We’re good at spotting tightly packed collections of things or events, because they often are part of a single happening: a barking dog that won’t shut up, a weather system that drenches a city for several days, a burglar on a spree who robs several stores in a block. But not all clusters have a root cause—indeed, most of them don’t. When there are lots of events, it’s inevitable that some will wander into each other’s neighborhoods and rub shoulders, unless some nonrandom process tries to keep them apart.

The cluster illusion makes us think that random processes are nonrandom and vice versa. When Tversky and Kahneman showed people (including statisticians) the results of real strings of coin flips, like TTHHTHTTTT, which inevitably have runs of consecutive heads or tails, they thought the coin was rigged. They would say a coin looked fair only if it was rigged to prevent the runs, like HTHTTHTHHT, which “looks” random even though it isn’t.64 I witnessed a similar illusion when I worked in an auditory perception lab. The participants had to detect faint tones, which were presented at random times so they couldn’t guess when a tone would come. Some said the random-event generator must be broken because the tones came in bursts. They didn’t realize that that’s what randomness sounds like.

Phantom clusters arise in space as well. The stars making up the ram, lion, crab, virgin, archer, and other constellations are not neighbors in any galaxy but are randomly sprinkled across the night sky from our terrestrial vantage point and only grouped into shapes by our pattern-seeking brains. Spurious clusters also arise in the calendar. People are surprised to learn that if 23 people are in a room, the chances that two will share a birthday are better than even. With 57 in the room, the odds rise to 99 percent. Though it’s unlikely that anyone in the room will share my birthday, we’re not looking for matches with me, or with anyone else singled out a priori. We’re counting matches post hoc, and there are 366 ways for a match to occur.

The cluster illusion, like other post hoc fallacies in probability, is the source of many superstitions: that bad things happen in threes, people are born under a bad sign, or an annus horribilis means the world is falling apart. When a series of plagues is visited upon us, it does not mean there is a God who is punishing us for our sins or testing our faith. It means there is not a God who is spacing them apart.


Even for those who grasp the mathematics of chance in all its bedeviling unintuitiveness, a lucky streak can seize the imagination. The underlying odds will determine how long, on average, a streak is expected to last, but the exact moment that luck runs out is an unfathomable mystery. This tension was explored in my favorite essay by the paleontologist, science writer, and baseball fan Stephen Jay Gould.65

Gould discussed one of the greatest achievements in sports, Joe DiMaggio’s hitting streak of fifty-six games in 1941. He explained that the streak was statistically extraordinary even given DiMaggio’s high batting average and the number of opportunities for streaks to have occurred in the history of the sport. The fact that DiMaggio benefited from some lucky breaks along the way does not diminish the achievement but exemplifies it, because no long streak, however pushed along by favorable odds, can ever unfold without them. Gould explains our fascination with runs of luck:

The statistics of streaks and slumps, properly understood, do teach an important lesson about epistemology, and life in general. The history of a species, or any natural phenomenon that requires unbroken continuity in a world of trouble, works like a batting streak. All are games of a gambler playing with a limited stake against a house with infinite resources. The gambler must eventually go bust. His aim can only be to stick around as long as possible, to have some fun while he’s at it, and, if he happens to be a moral agent as well, to worry about staying the course with honor. . . .

DiMaggio’s hitting streak is the finest of legitimate legends because it embodies the essence of the battle that truly defines our lives. DiMaggio activated the greatest and most unattainable dream of all humanity, the hope and chimera of all sages and shamans: he cheated death, at least for a while.