OVERLOOKED, UNDERVALUED ALTERNATIVE EXPLANATIONS

When evaluating a claim or argument, ask yourself if there is another reason—other than the one offered—that could account for the facts or observations that have been reported. There are always alternative explanations; our job is to weigh them against the one(s) offered and determine whether the person drawing the conclusion has drawn the most obvious or likely one.

For example, if you pass a friend in the hall and they don’t return your hello, you might conclude that they’re mad at you. But alternative explanations are that they didn’t see you, were late for a meeting, were preoccupied, were part of a psychology experiment, have taken a vow of silence for an hour, or were temporarily invaded by bodysnatchers. (Or maybe permanently invaded.)

Alternative explanations come up a great deal in pseudoscience and counterknowledge, and they come up often in real science too. Physics researchers at CERN reported that they had discovered neutrinos traveling faster than light. That would have upended a century of Einsteinian theory. It turns out it was just a loose cable in the linear accelerator that caused a measurement error. This underscores the point that a methodological flaw in an extremely complicated experiment is almost always the more likely explanation than something that would cause us to completely rewrite our understanding of the nature of the universe.

Similarly, if a Web page cites experiments showing that a brand-new, previously unheard-of cocktail of vitamins will boost your IQ by twenty points—and the drug companies don’t want you to know!—you should wonder how likely it is that nobody else has heard of this, and if an alternative explanation for the claim is simply that someone is trying to make money.

Mentalists, fortune-tellers, and psychics make a lot of money performing seemingly impossible feats of mind reading. One explanation is that they have tapped into a secret, hidden force that goes against everything we know about cause and effect and the nature of space-time. An alternative explanation is that they are magicians, using magic tricks, and simply lying about how they do what they do. Lending credence to the latter view is that professional magicians exist, including James Randi, who, so far, has been able to use clever illusions to duplicate every single feat performed by a mentalist. And often, the magicians—in an effort to discredit the self-proclaimed psychics—will tell you how they did the tricks. In fairness, I suppose that it’s possible that it is the magicians who are trying to deceive us—they are really psychics who are afraid to reveal their gifts to us (possibly for fear of exploitation, kidnapping, etc.) and they are only pretending to use clever illusions. But again, look at the two possibilities: One causes us to throw out everything we know about nature and science, and the other doesn’t. Any psychologist, law enforcement officer, businessperson, divorced spouse, foreign service worker, spy, or lawyer can tell you that people lie; they do so for a variety of reasons and with sometimes alarming frequency and alacrity. But if you’re facing a claim that seems unlikely, the more likely (alternative) explanation is that the person telling it to you is lying in one way or another.

People who try to predict the future without using psychic powers—military leaders, economists, business strategists—are often wildly off in their predictions because they fail to consider alternative explanations. This has led to a business practice called scenario planning—considering all possible outcomes, even those that seem unlikely. This can be very difficult to do, and even experts fail. In 1968, Will and Ariel Durant wrote:

In the United States the lower birth rate of the Anglo-Saxons has lessened their economic and political power; and the higher birth rate of Roman Catholic families suggests that by the year 2000 the Roman Catholic Church will be the dominant force in national as well as in municipal or state governments.

What they failed to consider was that, during those intervening thirty-two years, many Catholics would leave the Church, and many would use birth control in spite of the Church’s prohibitions. Alternative scenarios to their view in 1968 were difficult to imagine.

Social and artistic predictions get upended too: Experts said around the time of the Beatles that “guitar bands are on their way out.” The reviews of Beethoven’s Fifth Symphony on its debut included a number of negative pronouncements that no one would ever want to hear it again. Science also gets upended. Experts said that fast-moving trains would never work because passengers would die of asphyxiation. Experts thought that light moved through an invisible “ether.” Science and life are not static. All we can do is evaluate the weight of evidence and judge for ourselves, using the best tools we have at our disposal. One of those tools that is underused is employing creative thinking to imagine alternatives to the way we’ve been thinking all along.

Alternative explanations are often critical to legal arguments in criminal trials. The framing effects we saw in Part One, and the failure to understand that conditional probabilities don’t work backward, have led to many false convictions.

Proper scientific reasoning entails setting up two (or more) hypotheses and presenting the probabilities for both. In a courtroom, attorneys shouldn’t be focusing on the probability of a match, but the probability of two possible scenarios: What is the probability that the blood samples came from the same source, versus the probability that they did not? More to the point, we need to compare the probability of a match given that the subject is guilty with the probability of a match given that the subject is innocent. Or we could compare the probability that the subject is innocent given the data, versus the probability that the subject is guilty given the data. We also need to know the accuracy of the measures. The FBI announced in 2015 that microscopic hair analyses were incorrect 90 percent of the time. Without these pieces of information, it is impossible to decide the case fairly or accurately. That is, if we talk only in terms of a match, we’re considering only one-sided evidence, the probability of a match given the hypothesis that the criminal was at the scene of the crime. What we don’t know is the probability of a match given alternative hypotheses. And the two need to be compared.

This comes up all the time. In one case in the U.K., the suspect, Dennis Adams, was accused based solely on DNA evidence. The victim failed to pick him out of a lineup, and in court said that Adams did not look like her assailant. The victim added that Adams appeared two decades older than the assailant. In addition, Adams had an alibi for the night in question, which was corroborated by testimony from a third party. The only evidence the prosecution presented at trial was the DNA match. Now, Adams had a brother, whom the DNA would also have matched, but there was no additional evidence that the brother had committed the crime, and so investigators didn’t consider the brother. But they also lacked additional evidence against Dennis—the only evidence they had was the DNA match. No one in the trial considered the alternative hypothesis that it might have been Dennis’s brother. . . . Dennis was convicted both in the original trial and on appeal.

Built by the Ancients to Be Seen from Space

You may have heard the speculation that human life didn’t really evolve on Earth, that a race of space aliens came down and seeded the first human life. This by itself is not implausible, it’s just that there is no real evidence supporting it. That doesn’t mean it’s not true, and it doesn’t mean we shouldn’t look for evidence, but the fact that something could be true has limited utility—except perhaps for science fiction.

A 2015 story in the New York Times described a mysterious formation on the ground in Kazakhstan that could be seen only from space.

Satellite pictures of a remote and treeless northern steppe reveal colossal earthworks—geometric figures of squares, crosses, lines and rings the size of several football fields, recognizable only from the air and the oldest estimated at 8,000 years old.

The largest, near a Neolithic settlement, is a giant square of 101 raised mounds, its opposite corners connected by a diagonal cross, covering more terrain than the Great Pyramid of Cheops. Another is a kind of three-limbed swastika, its arms ending in zigzags bent counterclockwise.

It’s easy to get carried away and imagine that these great designs were a way for ancient humans to signal space aliens, perhaps following strict extraterrestrial instructions. Perhaps it was an ancient spaceship landing pad, or a coded message, something like “Send more food.” We humans are built that way—we like to imagine things that are out of the ordinary. We are the storytelling species.

Setting aside the rather obvious fact than any civilization capable of interstellar flight must have had a more efficient communication technology at their disposal than arranging large mounds of dirt on the ground, an alternative explanation exists. Fortunately, the New York Times (although not every other outlet that reported the story) provides it, in a quote from Dimitriy Dey, the discoverer of the mysterious stones:

“I don’t think they were meant to be seen from the air,” Mr. Dey, 44, said in an interview from his hometown, Kostanay, dismissing outlandish speculations involving aliens and Nazis. (Long before Hitler, the swastika was an ancient and near-universal design element.) He theorizes that the figures built along straight lines on elevations were “horizontal observatories to track the movements of the rising sun.”

An ancient sundial explanation seems more likely than space aliens. It doesn’t mean it’s true, but part of information literacy and evaluating claims is uncovering plausible alternatives, such as this.

The Missing Control Group

The so-called Mozart effect was discredited because the experiments, showing that listening to Mozart for twenty minutes a day temporarily increased IQ, lacked a control group. That is, one group of people was given Mozart to listen to, and one group of people was given nothing to do. Doing nothing is not an adequate control for doing something, and it turns out if you give people something to do—almost anything—the effect disappears. The Mozart effect wasn’t driven by Mozart’s music increasing IQ, it was driven by the boredom of doing nothing temporarily decreasing effective IQ.

If you bring twenty people with headaches into a laboratory and give them your new miracle headache drug and ten of them get better, you haven’t learned anything. Some headaches are going to get better on their own. How many? We don’t know. You’d need to have a control group of people with similar ages and backgrounds, and reporting similar pain. And because just the belief that you might get better can lead to health improvements, you have to give the control group something that enables that belief as much as the medicine under study. Hence the well-known placebo, a pill that is made to look exactly like the miracle headache drug so that no one knows who is receiving what until after the experiment is over.

Malcolm Gladwell spread an invalid conclusion in his book David and Goliath by suggesting that people with dyslexia might actually have an advantage in life, leading many parents to believe that their dyslexic children should not receive the educational remedies they need. Gladwell fell for the missing control condition. We don’t know how much more successful his chosen dyslexics might have been if they had been able to improve their condition.

The missing control group shows up in everyday conversation, where it’s harder to spot than in scientific claims, simply because we’re not looking for it there. You read—and validate—a new study showing that going to bed every night and waking up every morning at the same time increases productivity and creativity. An artist friend of yours, successful by any measure, counters that she’s always just slept whenever she wanted, frequently pulling all-nighters and sometimes sleeping for twenty hours at a time, and she’s done just fine. But there’s a missing control group. How much more productive and creative might she have been with a regular sleep schedule? We don’t know.

Two twins were separated at birth and reared apart—one in Nazi Germany and the other in Trinidad and Venezuela. One was raised as a Roman Catholic who joined the Hitler Youth, the other as a Jew. They were reunited twenty-one years later and discovered a bizarre list of similar behaviors that many fascinated people could only attribute to genetics: Both twins scratched their heads with their ring finger, both thought it was funny to sneak up on strangers and sneeze loudly. Both men wore short, neatly trimmed mustaches and rectangular wire-rimmed glasses, rounded at the corner. Both wore blue shirts with epaulets and military-style pockets. Both had the same gait when walking, and the same way of sitting in chairs. Both loved butter and spicy food, flushed the toilet before and after using it, and read the endings of books first. Both wrapped tape around pens and pencils to get a better grip.

Stories like this may cause you to wonder about how our behaviors are influenced by our genes. Or if we’re all just automatons, and our actions are predetermined. How else to explain such coincidences?

Well, there are two ways, and they both boil down to a missing control group. A social psychologist might say that the world tends to treat people who look alike in similar ways. The attractive are treated differently from the unattractive, the tall differently from the short. If there’s something about your face that just looks honest and free of self-interest, people will treat you differently from how they would if your face suggests otherwise. The brothers’ behaviors were shaped by the social world in which they live. We’d need a control group of people who are not related, but who still look astonishingly alike, and were raised separately, in order to draw any firm conclusions about this “natural experiment” of the twins separated at birth.

A statistician or behavioral geneticist would say that of the thousands upon thousands of things that we do, it is likely that any two strangers will share some striking similarities in dress, grooming, penchant for practical jokes, or odd proclivities if you just look long enough and hard enough. Without this control group—bringing strangers together and taking an inventory of their habits—we don’t know whether the fascinating story about the twins is driven by genetics or pure chance. It may be that genetics plays a role here, but probably not as large a role as we might think.

Cherry-picking

Our brains are built to make stories as they take in the vastness of the world with billions of events happening every second. There are apt to be some coincidences that don’t really mean anything. If a long-lost friend calls just as you’re thinking of her, that doesn’t mean either of you has psychic powers. If you win at roulette three times in a row, that doesn’t mean you’re on a streak and should bet your last dollar on the next spin. If your non-certified mechanic fixes your car this time, it doesn’t mean he’ll be able to do it next time—he may just have gotten lucky.

Say you have a pet hypothesis, for example, that too much Vitamin D causes malaise; you may well find evidence to support that view. But if you’re looking only for supporting evidence, you’re not doing proper research, because you’re ignoring the contradictory evidence—there might be a little of this or a lot, but you don’t know because you haven’t looked. Colloquially, scientists call this “cherry-picking” the data that suit your hypothesis. Proper research demands that you keep an open mind about any issue, and try to valiantly consider the evidence for and against, and then form an evidence-based (not a “gee, I wish this were so”–based) conclusion.

A companion to the cherry-picking bias is selective windowing. This occurs when the information you have access to is unrepresentative of the whole. If you’re looking at a city through the window of a train, you’re only seeing a part of that city, and not necessarily a representative part—you have visual access only to the part of the city with train tracks running through it, and whatever biases may attach to that. Trains make noise. Wealthier people usually occupy houses away from the noise, so the people who are left living near the tracks tend to have lower income. If all you know of a city is who lives near the tracks, you are not seeing the entire city.

This is of course related to the discussion in Part One about data gathering (how data are collected), and the importance of obtaining representative samples. We’re trying to understand the nature of the world—or at least a new city that the train’s passing through—and we want to consider alternative explanations for what we’re seeing or being told. A good alternative explanation with broad applicability is that you’re only seeing part of the whole picture, and the part you’re not seeing may be very different.

Maybe your sister is proudly displaying her five-year-old daughter’s painting. It may be magnificent! If you love the painting, frame it! But if you’re trying to figure out whether to invest in the child’s future as the world’s next great painter, you’ll want to ask some questions: Who cropped it? Who selected it? How big was the original? How many drawings did the little Picasso make before this one? What came before and what came after? Through selective windowing, you may be seeing part of a series of brilliant drawings or a lovely little piece of a much larger (and unimpressive) work that was identified and cropped by the teacher.

We see selective windowing in headlines too. A headline might announce that “three times more Americans support this new legislation than oppose it.” Even if you satisfy yourself, based on the steps in Part One of the Field Guide, that the survey was conducted on a representative and sufficiently large sample of Americans, you can’t conclude that the majority of Americans support the legislation. It could well be that 1 percent oppose it, 3 percent support it, and 94 percent remain undecided. Translate this same kind of monkeyshines to an election headline stating that five times as many Republicans support Candidate A than Candidate B for the presidential primaries. That may be true, but the headline might leave out that Candidate C is polling with 80 percent of the vote.

Try tossing a coin ten times. You “know” that it should come up heads half the time. But it probably won’t. Even if you toss it 1,000 times, you probably won’t get exactly 500 heads. Theoretical probabilities are achieved only with an infinite number of trials. The more coin tosses, the closer you’ll get to fifty-fifty heads/tails. It’s counterintuitive, but there’s a probability very close to 100 percent that somewhere in that sequence you’ll get five heads in a row. Why is this so counterintuitive? We didn’t evolve brains with a sufficient understanding of what randomness looks like. It’s not usually heads-tails-heads-tails, but there are going to be runs (also called streaks) even in a random sequence. This makes it easy to fool someone. Just make a cell phone video recording of yourself tossing a coin 1,000 times in a row. Before each toss, say, “This is going to be the first of five heads in a row.” Then, if you get a head, before the next toss, say, “This is going to be the second of five heads in a row.” If the next one is a tail, start over. If it’s not, before you make the next toss, say, “This is going to be the third of five heads in a row.” Then just edit your video so that it only includes those five in a row. No one will be any the wiser! If you want to really impress people, go for ten in a row! (There’s roughly a 38 percent chance of that happening in 1,000 tosses. Looking at this another way, if you ask a hundred people in a room to toss a coin five times, there is a 96 percent chance that one of them will get five heads in a row.)

The kinds of experiences that a seventy-five-year-old socialite has with the New York City police department are likely to be very different from those of a sixteen-year-old boy of color; their experiences are selectively windowed by what they see. The sixteen-year-old may report being stopped repeatedly without cause, being racially profiled and treated like a criminal. The seventy-five-year-old may fail to understand how this could be. “All my experiences with those officers have been so nice.”

Paul McCartney and Dick Clark bought up all the celluloid film of their television appearances in the 1960s, ostensibly so that they could control the way their histories are told. If you’re a scholar doing research, or a documentarian looking for archival footage, you’re limited to what they choose to release to you. When looking at data or evidence to support a claim, ask yourself if what you’re being shown is likely to be representative of the whole picture.

Selective Small Samples

Small samples are usually not representative.

Suppose you’re responsible for marketing a new hybrid car. You want to make claims about its fuel efficiency. You send a driver out in the vehicle and find that the car gets eighty miles to the gallon. That looks great—you’re done! But maybe you just got lucky. Your competitor does a larger test, sending out five drivers in five vehicles and gets a figure closer to sixty miles per gallon. Who’s right? You both are! Suppose that your competitor reported the results like this:

Test 1: 58 mpg

Test 2: 38 mpg

Test 3: 69 mpg

Test 4: 54 mpg

Test 5: 80 mpg

Road conditions, ambient temperature, and driving styles create a great deal of variability. If you were lucky (and your competitor unlucky) your one driver might produce an extreme result that you then report with glee. (And of course, if you want to cherry-pick, you just ignore tests one through four). But if the researcher is pursuing the truth, a larger sample is necessary. An independent lab that tested fifty different excursions might find that the average is something completely different. In general, anomalies are more likely to show up in small samples. Larger samples more accurately reflect the state of the world. Statisticians call this the law of large numbers.

If you look at births in a small rural hospital over a month and see that 70 percent of the babies born are boys, compared to 51 percent in a large urban hospital, you might think there is something funny going on in the rural hospital. There might be, but that isn’t enough evidence to be sure. The small sample is at work again. The large hospital might have reported fifty-one out of a hundred births were boys, and the small might have reported seven out of ten. As with the coin toss mentioned above, the statistical average of fifty-fifty is most recognizable in large samples.

How many is enough? This is a job for a professional statistician, but there are rough-and-ready rules you can use when trying to make sense of what you’re reading. For population surveys (e.g., voting preferences, toothpaste preferences, and such), sample-size calculators can readily be found on the Web. For determining the local incidence of something (rates such as how many births are boys, how many times a day the average person reports being hungry) you need to know something about the base rate (or incidence rate) of the thing you’re looking for. If a researcher wanted to know how many cases of albinism are occurring in a particular community, and then examined the first 1,000 births and found none, it would be foolish to draw any conclusions: Albinism occurs in only 1 in 17,000 births. One thousand births is too small a sample—“small” relative to the scarcity of the thing you’re looking for. On the other hand, if the study was on the incidence of preterm births, 1,000 should be more than enough because they occur in one in nine births.

Statistical Literacy

Consider a street game in which a hat or basket contains three cards, each with two sides: One card is red on both sides, one white on both sides, and one is red on one side and white on the other. The con man draws one card from the hat and shows you one side of it and it is red. He bets you $5 that the other side is also red. He wants you to think that there is a fifty-fifty chance that this is so, so you’re willing to bet against him, that is, that the other side is just as likely to be white. You might reason something like this:

He’s showing me a red side. So he has pulled either the red-red card or the red-white card. That means that the other side is either red or white with equal probability. I can afford to take this bet because even if I don’t win this time, I will win soon after.

Setting aside the gambler’s fallacy—many people have lost money by doubling down on roulette only to find out that chance is not a self-correcting process—the con man is relying on you (counting on you?) to make this erroneous assignment of probability, and usually talking fast in order to fractionate your attention. It’s helpful to work it out pictorially.

Here are the three cards:

Red

Red

White

White

Red

White

If he is showing you a red side, it could be any one of three sides that he’s showing you. In two of those cases, the other side is red and in only one case the other side is white. So there is a two in three chance that if he showed you red the other side will be red, not a one in two chance. This is because most of us fail to account for the fact that on the double-red card, he could be showing you either side. If you had trouble with this, don’t feel bad—similar mistakes were made by mathematical philosopher Gottfried Wilhelm Leibniz and many more recent textbook authors. When evaluating claims based on probabilities, try to understand the underlying model. This can be difficult to do, but if you recognize that probabilities are tricky, and recognize the limitations most of us have in evaluating them, you’ll be less likely to be conned. But what if everyone around you is agreeing with something that is, well, wrong? The exquisite new clothes the emperor is wearing, perhaps?