CHAPTER 5

History, the Fickle Teacher

The message of the previous three chapters is that commonsense explanations are often characterized by circular reasoning. Teachers cheated on their students’ tests because that’s what their incentives led them to do. The Mona Lisa is the most famous painting in the world because it has all the attributes of the Mona Lisa. People have stopped buying gas-guzzling SUVs because social norms now dictate that people shouldn’t buy gas-guzzling SUVs. And a few special people revived the fortunes of the Hush Puppies shoe brand because a few people started buying Hush Puppies before everyone else did. All of these statements may be true, but all they are really telling us is that what we know happened, happened, and not something else. Because they can only be constructed after we know the outcome itself, we can never be sure how much these explanations really explain, versus simply describe.

What’s curious about this problem, however, is that even once you see the inherent circularity of commonsense explanations, it’s still not obvious what’s wrong with them. After all, in science we don’t necessarily know why things happen either, but we can often figure it out by doing experiments in a lab or by observing systematic regularities in the world. Why can’t we learn from history the same way? That is, think of history as a series of experiments in which certain general “laws” of cause and effect determine the outcomes that we observe. By systematically piecing together the regularities in our observations, can we not infer these laws just as we do in science? For example, imagine that the contest for attention between great works of art is an experiment designed to identify the attributes of great art. Even if it’s true that prior to the twentieth century, it might not have been obvious that the Mona Lisa was going to become the most famous painting in the world, we have now run the experiment, and we have the answer. We may still not be able to say what it is about the Mona Lisa that makes it uniquely great, but we do at least have some data. Even if our commonsense explanations have a tendency to conflate what happened with why it happened, are we not simply doing our best to act like good experimentalists?1

In a sense, the answer is yes. We probably are doing our best, and under the right circumstances learning from observation and experience can work pretty well. But there’s a catch: In order to be able to infer that “A causes B,” we need to be able to run the experiment many times. Let’s say, for example, that A is a new drug to reduce “bad” cholesterol and B is a patient’s chance of developing heart disease in the next ten years. If the manufacturer can show that a patient who receives drug A is significantly less likely to develop heart disease than one who doesn’t, they’re allowed to claim that the drug can help prevent heart disease; otherwise they can’t. But because any one person can only either receive the drug or not receive it, the only way to show that the drug is causing anything is to run the “experiment” many times, where each person’s experience counts as a single run. A drug trial therefore requires many participants, each of whom is randomly assigned either to receive the treatment or not. The effect of the drug is then measured as the difference in outcomes between the “treatment” and the “control” groups, where the smaller the effect, the larger the trial needs to be in order to rule out random chance as the explanation.

In certain everyday problem-solving situations, where we encounter more or less similar circumstances over and over again, we can get pretty close to imitating the conditions of the drug trial. Driving home from work every day, for example, we can experiment with different routes or with different departure times. By repeating these variations many times, and assuming that traffic on any given day is more or less like traffic on any other day, we can effectively bypass all the complex cause-and-effect relationships simply by observing which route results in the shortest commute time, on average. Likewise, the kind of experience-based expertise that derives from professional training, whether in medicine, engineering, or the military, works in the same way—by repeatedly exposing trainees to situations that are as similar as possible to those they will be expected to deal with in their eventual careers.2

HISTORY IS ONLY RUN ONCE

Given how well this quasi-experimental approach to learning works in everyday situations and professional training, it’s perhaps not surprising that our commonsense explanations implicitly apply the same reasoning to explain economic, political, and cultural events as well. By now, however, you probably suspect where this is heading. For problems of economics, politics, and culture—problems that involve many people interacting over time—the combination of the frame problem and the micro-macro problem means that every situation is in some important respect different from the situations we have seen before. Thus, we never really get to run the same experiment more than once. At some level, we understand this problem. Nobody really thinks that the war in Iraq is directly comparable to the Vietnam War or even the war in Afghanistan, and one must therefore be cautious in applying the lessons from one to another. Likewise, nobody thinks that by studying the success of the Mona Lisa we can realistically expect to understand much about the success and failure of contemporary artists. Nevertheless, we do still expect to learn some lessons from history, and it is all too easy to persuade ourselves that we have learned more than we really have.

For example, did the so-called surge in Iraq in the fall of 2007 cause the subsequent drop in violence in the summer of 2008? Intuitively the answer seems to be yes—not only did the drop in violence take place reasonably soon after the surge was implemented, but the surge was specifically intended to have that effect. The combination of intentionality and timing strongly suggests causality, as did the often-repeated claims of an administration looking for something good to take credit for. But many other things happened between the fall of 2007 and the summer of 2008 as well. Sunni resistance fighters, seeing an even greater menace from hard-core terrorist organizations like Al Qaeda than from American soldiers, began to cooperate with their erstwhile occupiers. The Shiite militia—most importantly Moktada Sadr’s Mahdi Army—also began to experience a backlash from their grassroots, possibly leading them to moderate their behavior. And the Iraqi Army and police forces, finally displaying sufficient competence to take on the militias, began to assert themselves, as did the Iraqi government. Any one of these other factors might have been at least as responsible for the drop in violence as the surge. Or perhaps it was some combination. Or perhaps it was something else entirely. How are we to know?

One way to be sure would be to “rerun” history many times, much as we did in the Music Lab experiment, and see what would have happened both in the presence and also the absence of the surge. If across all of these alternate versions of history, violence drops whenever there is a surge and doesn’t drop whenever there isn’t, then we can say with some confidence that the surge is causing the drop. And if instead we find that most of the time we have a surge, nothing happens to the level of violence, or alternatively we find that violence drops whether we have a surge or not, then whatever it is that is causing the drop, clearly it isn’t the surge. In reality, of course, this experiment got run only once, and so we never got to see all the other versions of it that may or may not have turned out differently. As a result, we can’t ever really be sure what caused the drop in violence. But rather than producing doubt, the absence of “counterfactual” versions of history tends to have the opposite effect—namely that we tend to perceive what actually happened as having been inevitable.

This tendency, which psychologists call creeping determinism, is related to the better-known phenomenon of hindsight bias, the after-the-fact tendency to think that we “knew it all along.” In a variety of lab experiments, psychologists have asked participants to make predictions about future events and then reinterviewed them after the events in question had taken place. When recalling their previous predictions, subjects consistently report being more certain of their correct predictions, and less certain of their incorrect predictions, than they had reported at the time they made them. Creeping determinism, however, is subtly different from hindsight bias and even more deceptive. Hindsight bias, it turns out, can be counteracted by reminding people of what they said before they knew the answer or by forcing them to keep records of their predictions. But even when we recall perfectly accurately how uncertain we were about the way events would transpire—even when we concede to have been caught completely by surprise—we still have a tendency to treat the realized outcome as inevitable. Ahead of time, for example, it might have seemed that the surge was just as likely to have had no effect as to lead to a drop in violence. But once we know that the drop in violence is what actually happened, it doesn’t matter whether or not we knew all along that it was going to happen (hindsight bias). We still believe that it was going to happen, because it did.3

SAMPLING BIAS

Creeping determinism means that we pay less attention than we should to the things that don’t happen. But we also pay too little attention to most of what does happen. We notice when we just miss the train, but not all the times when it arrives shortly after we do. We notice when we unexpectedly run into an acquaintance at the airport, but not all the times when we do not. We notice when a mutual fund manager beats the S&P 500 ten years in a row or when a basketball player has a “hot hand” or when a baseball player has a long hitting streak, but not all the times when fund managers and sportsmen alike do not display streaks of any kind. And we notice when a new trend appears or a small company becomes phenomenally successful, but not all the times when potential trends or new companies disappear before even registering on the public consciousness.

Just as with our tendency to emphasize the things that happened over those that didn’t, our bias toward “interesting” things is completely understandable. Why would we be interested in uninteresting things? Nevertheless, it exacerbates our tendency to construct explanations that account for only some of the data. If we want to know why some people are rich, for example, or why some companies are successful, it may seem sensible to look for rich people or successful companies and identify which attributes they share. But what this exercise can’t reveal is that if we instead looked at people who aren’t rich or companies that aren’t successful, we might have found that they exhibit many of the same attributes. The only way to identify attributes that differentiate successful from unsuccessful entities is to consider both kinds, and to look for systematic differences. Yet because what we care about is success, it seems pointless—or simply uninteresting—to worry about the absence of success. Thus we infer that certain attributes are related to success when in fact they may be equally related to failure.

This problem of “sampling bias” is especially acute when the things we pay attention to—the interesting events—happen only rarely. For example, when Western Airlines Flight 2605 crashed into a truck that had been left on an unused runway at Mexico City on October 31, 1979, investigators quickly identified five contributing factors. First, both the pilot and the navigator were fatigued, each having had only a few hours’ sleep in the past twenty-four hours. Second, there was a communication mix-up between the crew and the air traffic controller, who had instructed the plane to come in on the radar beam that was oriented on the unused runway, and then shift to the active runway for the landing. Third, this mix-up was compounded by a malfunctioning radio, which failed for a critical part of the approach, during which time the confusion might have been clarified. Fourth, the airport was shrouded in heavy fog, obscuring both the truck and the active runway from the pilot’s view. And fifth, the ground controller got confused during the final approach, probably due to the stressful situation, and thought that it was the inactive runway that had been lit.

As the psychologist Robyn Dawes explains in his account of the accident, the investigation concluded that although no one of these factors—fatigue, communication mix-up, radio failure, weather, and stress—had caused the accident on its own, the combination of all five together had proven fatal. It seems like a pretty reasonable conclusion, and it’s consistent with the explanations we’re familiar with for plane crashes in general. But as Dawes also points out, these same five factors arise all the time, including many, many instances where the planes did not crash. So if instead of starting with the crash and working backward to identify its causes, we worked forward, counting all the times when we observed some combination of fatigue, communication mix-up, radio failure, weather, and stress, chances are that most of those events would not result in crashes either.4

The difference between these two ways of looking at the world is illustrated in the figure below. In the left-hand panel, we see the five risk factors identified by the Flight 2605 investigation and all the corresponding outcomes. One of those outcomes is indeed the crash, but there are many other noncrash outcomes as well. These factors, in other words, are “necessary but not sufficient” conditions: Without them, it’s extremely unlikely that we’d have a crash; but just because they’re present doesn’t mean that a crash will happen, or is even all that likely. Once we do see a crash, however, our view of the world shifts to the right-hand panel. Now all the “noncrashes” have disappeared, because we’re no longer trying to explain them—we’re only trying to account for the crash—and all the arrows from the factors to the noncrashes have disappeared as well. The result is that the very same set of factors that in the left-hand panel appeared do a poor job of predicting the crash now seems to do an excellent job.

By identifying necessary conditions, the investigations that follow plane crashes help to keep them rare—which is obviously a good thing—but the resulting temptation to treat them as sufficient conditions nevertheless plays havoc with our intuition for why crashes happen when they do. And much the same is true of other rare events, like school shootings, terrorist attacks, and stock market crashes. Most school shooters, for example, are teenage boys who have distant or strained relationships with their parents, have been exposed to violent TV and video games, are alienated from their peers, and have fantasized about taking revenge. But these same attributes describe literally thousands of teenage boys, almost all of whom do not go on to hurt anyone, ever.5 Likewise, the so-called systemic failure that almost allowed Umar Farouk Abdulmutallab, a twenty-three-year-old Nigerian, to bring down a Northwest Airlines flight landing in Detroit on Christmas Day 2009 comprised the sorts of errors and oversights that likely happen in the intelligence and homeland security agencies thousands of times every year—almost always with no adverse consequences. And for every day in which the stock market experiences a wild plunge, there are thousands of days in which roughly the same sorts of circumstances produce nothing remarkable at all.

IMAGINED CAUSES

Together, creeping determinism and sampling bias lead commonsense explanations to suffer from what is called the post-hoc fallacy. The fallacy is related to a fundamental requirement of cause and effect—that in order for A to be said to cause B, A must precede B in time. If a billiard ball starts to move before it is struck by another billiard ball, something else must have caused it to move. Conversely, if we feel the wind blow and only then see the branches of a nearby tree begin to sway, we feel safe concluding that it was the wind that caused the movement. All of this is fine. But just because B follows A doesn’t mean that A has caused B. If you hear a bird sing or see a cat walk along a wall, and then see the branches start to wave, you probably don’t conclude that either the bird or the cat is causing the branches to move. It’s an obvious point, and in the physical world we have good enough theories about how things work that we can usually sort plausible from implausible. But when it comes to social phenomena, common sense is extremely good at making all sorts of potential causes seem plausible. The result is that we are tempted to infer a cause-and-effect relationship when all we have witnessed is a sequence of events. This is the post-hoc fallacy.

Malcolm Gladwell’s “law of the few,” discussed in the last chapter, is a poster child for the post-hoc fallacy. Any time something interesting happens, whether it is a surprise best seller, a breakout artist, or a hit product, it will invariably be the case that someone was buying it or doing it before everyone else, and that person is going to seem influential. The Tipping Point, in fact, is replete with stories about interesting people who seem to have played critical roles in important events: Paul Revere and his famous midnight ride from Boston to Lexington that energized the local militias and triggered the American Revolution. Gaëtan Dugas, the sexually voracious Canadian flight attendant who became known as Patient Zero of the American HIV epidemic. Lois Weisberg, the title character of Gladwell’s earlier New Yorker article, who seems to know everyone, and has a gift for connecting people. And the group of East Village hipsters whose ironic embrace of Hush Puppies shoes preceded a dramatic revival in the brand’s fortunes.

These are all great stories, and it’s hard to read them and not agree with Gladwell that when something happens that is as surprising and dramatic as the Minutemen’s unexpectedly fierce defense of Lexington on April 17, 1775, someone special—someone like Paul Revere—must have helped it along. Gladwell’s explanation is especially convincing because he also relates the story of William Dawes, another rider that night who also tried to alert the local militia, but who rode a different route than Revere. Whereas the locals along Revere’s route turned out in force the next day, the townsfolk in places like Waltham, Massachusetts, which Dawes visited, seemed not to have found out about the British movements until it was too late. Because Revere rode one route and Dawes rode the other, it seems clear that the difference in outcomes can be attributed to differences between the two men. Revere was a connector, and Dawes wasn’t.6

What Gladwell doesn’t consider, however, is that many other factors were also different about the two rides: different routes, different towns, and different people who made different choices about whom to alert once they had heard the news themselves. Paul Revere may well have been as remarkable and charismatic as Gladwell claims, while William Dawes may not have been. But in reality there was so much else going on that night that it’s no more possible to attribute the outcomes the next day to the intrinsic attributes of the two men than it is to attribute the success of the Mona Lisa to its particular features, or the drop in violence in the Sunni Triangle of Iraq in 2008 to the surge. Rather, people like Revere, who after the fact seem to have been influential in causing some dramatic outcome, may instead be more like the “accidental influentials” that Peter Dodds and I found in our simulations—individuals whose apparent role actually depended on a confluence of other factors.

To illustrate how easily the post-hoc fallacy can generate accidental influentials, consider the following example from a real epidemic: the SARS epidemic that exploded in Hong Kong in early 2003. One of the most striking findings of the subsequent investigation was that a single patient, a young man who had traveled to Hong Kong by train from mainland China, and had been admitted to the Prince of Wales Hospital, had directly infected fifty others, leading eventually to 156 cases in the hospital alone. Subsequently the Prince of Wales outbreak led to a second major outbreak in Hong Kong, which in turn led to the epidemic’s spread to Canada and other countries. Based on examples like the SARS epidemic, a growing number of epidemiologists have become convinced that the ultimate seriousness of the epidemic depends disproportionately on the activities of superspreaders—individuals like Gaëtan Dugas and the Prince of Wales patient who single-handedly infect many others.7

But how special are these people really? A closer look at the SARS case reveals that the real source of the problem was a misdiagnosis of pneumonia when the patient checked into the hospital. Instead of being isolated—the standard procedure for a patient infected with an unknown respiratory virus—the misdiagnosed SARS victim was placed in an open ward with poor air circulation. Even worse, because the diagnosis was pneumonia, a bronchial ventilator was placed into his lungs, which then proceeded to spew vast numbers of viral particles into the air around him. The conditions in the crowded ward resulted in a number of medical workers as well as other patients becoming infected. The event was important in spreading the disease—at least locally. But what was important about it was not the patient himself so much as the particular details of how he was treated. Prior to that, nothing you could have known about the patient would have led you to suspect that there was anything special about him, because there was nothing special about him.

Even after the Prince of Wales outbreak, it would have been a mistake to focus on superspreading individuals rather than the circumstances that led to the virus being spread. The next major SARS outbreak, for example, took place shortly afterward in a Hong Kong apartment building, the Amoy Gardens. This time the responsible person, who had become infected at the hospital while being treated for renal failure, also had a bad case of diarrhea. Unfortunately, the building’s plumbing system was also poorly maintained, and the infection spread to three hundred other individuals in the building via a leaking drain, where none of these victims were even in the same room. Whatever lessons one might have inferred about superspreaders by studying the particular characteristics of the patient in the Prince of Wales Hospital, therefore, would have been next to useless in the Amoy Gardens. In both cases, the so-called superspreaders were simply accidental by-products of other, more complicated circumstances.

We’ll never know what would have happened at Lexington on July 17, 1775, had Paul Revere instead ridden William Dawes’s midnight ride and Dawes ridden Revere’s. But it’s entirely possible that it would have worked out the same way, with the exception that it would have been William Dawes’s name that was passed down in history, not Paul Revere’s. Just as the outbreaks at the Prince of Wales Hospital and the Amoy Gardens happened for a complex combination of reasons, so too the victory at Lexington depended on the decisions and interactions of thousands of people, not to mention other accidents of fate. In other words, although it is tempting to attribute the outcome to a single special person, we should remember that the temptation arises simply because this is how we’d like the world to work, not because that is how it actually works. In this example, as in many others, common sense and history conspire to generate the illusion of cause and effect where none exists. On the one hand, common sense excels in generating plausible causes, whether special people, or special attributes, or special circumstances. And on the other hand, history obligingly discards most of the evidence, leaving only a single thread of events to explain. Commonsense explanations therefore seem to tell us why something happened when in fact all they’re doing is describing what happened.

HISTORY CANNOT BE TOLD WHILE IT’S HAPPENING

The inability to differentiate the “why” from the “what” of historical events presents a serious problem to anyone hoping to learn from the past. But surely we can at least be confident that we know what happened, even if we can’t be sure why. If anything seems like a matter of common sense, it is that history is a literal description of past events. And yet as the Russian-British philosopher Isaiah Berlin argued, the kinds of descriptions that historians give of historical events wouldn’t have made much sense to the people who actually participated in them. Berlin illustrated this problem with a scene from Tolstoy’s War and Peace, in which “Pierre Bezukhov wanders about, ‘lost’ on the battlefield of Borodino, looking for something which he imagines as a kind of set-piece; a battle as depicted by the historians or the painters. But he finds only the ordinary confusion of individual human beings haphazardly attending to this or that human want … a succession of ‘accidents’ whose origins and consequences are, by and large, untraceable and unpredictable; only loosely strung groups of events forming an ever-varying pattern, following no discernable order.”8

Faced with such an objection, a historian might reasonably respond that Bezukhov simply lacked the ability to observe all the various parts of the battlefield puzzle, or else the wherewithal to put all the pieces together in his mind in real time. Perhaps, in other words, the only difference between the historian’s view of the battle and Bezukhov’s is that the historian has had the time and leisure to gather and synthesize information from many different participants, none of who was in a position to witness the whole picture. Viewed from this perspective, it may indeed be difficult or even impossible to understand what is happening at the time it is happening. But the difficulty derives solely from a practical problem about the speed with which one can realistically assemble the relevant facts. If true, this response implies that it ought to be possible for someone like Bezukhov to have known what was going on at the battle of Borodino in principle, even if not in practice.9

But let’s imagine for a moment that we could solve this practical problem. Imagine that we could summon up a truly panoptical being, able to observe in real time every single person, object, action, thought, and intention in Tolstoy’s battle, or any other event. In fact, the philosopher Arthur Danto proposed precisely such a hypothetical being, which he called the Ideal Chronicler, or IC. Replacing Pierre Bezukhov with Danto’s Ideal Chronicler, one could then ask the question, What would the IC observe? To begin with, the Ideal Chronicler would have a lot of advantages over poor Bezukhov. Not only could it observe every action of every combatant at Borodino, but it could also observe everything else going on in the world as well. Having been around forever, moreover, the Ideal Chronicler would also know everything that had happened right up to that point, and would have the power to synthesize all that information, and even make inferences about where it might be leading. The IC, in other words, would have far more information, and infinitely greater ability to process it, than any mortal historian.

Amazingly, in spite of all that, the Ideal Chronicler would still have essentially the same problem as Bezukhov; it could not give the kind of descriptions of what was happening that historians provide. The reason is that when historians describe the past, they invariably rely on what Danto calls narrative sentences, meaning sentences that purport to be describing something that happened at a particular point in time but do so in a way that invokes knowledge of a later point. For example, consider the following sentence: “One afternoon about a year ago, Bob was out in his garden planting roses.” This is what Danto calls a normal sentence, in that it does nothing more than describe what was happening at the time. But consider now the same sentence, slightly modified: “One afternoon about a year ago, Bob was out in his garden planting his prize-winning roses.” This is a narrative sentence, because it implicitly refers to an event—Bob’s roses winning a prize—that hadn’t happened at the time of the planting.

The difference between the two sentences seems negligible. But what Danto points out is that only the first kind of sentence—the normal one—would have made sense to the participants at the time. That is, Bob might have said at the time “I am planting roses” or even “I am planting roses and they are going to be prizewinners.” But it would be very strange for him to have said “I am planting my prize-winning roses” before they’d actually won any prizes. The reason is that while the first two statements make predictions about the future—that the roots Bob is putting in the ground will one day bloom into a rosebush, or that he intends to submit them to a contest and thinks he will win—the third is something different: It assumes foreknowledge of a very specific event that will only color the events of the present after it has actually happened. It’s the kind of thing that Bob could say only if he were a prophet—a character who sees the future with sufficient clarity that he can speak about the present as though looking back on it.

Danto’s point is that the all-knowing, hypothetical Ideal Chronicler can’t use narrative sentences either. It knows everything that is happening now, as well as everything that has led up to now. It can even make inferences about how all the events it knows about might fit together. But what it can’t do is foresee the future; it cannot refer to what is happening now in light of future events. So when English and French ships began to skirmish in the English Channel in 1337, the Ideal Chronicler might have noted that a war of some kind seemed likely, but it could not have recorded the observation “The Hundred Years War began today.” Not only was the extent of the conflict between the two countries unknown at the time, but the term “Hundred Years War” was only invented long after it ended as shorthand to describe what was in actuality a series of intermittent conflicts from 1337 to 1453. Likewise, when Isaac Newton published his masterpiece, Principia, the Ideal Chronicler might have been able to say it was a major contribution to celestial mechanics, and even predicted that it would revolutionize science. But to claim that Newton was laying the foundation for what became modern science, or was playing a key role in the Enlightenment, would be beyond the IC. These are narrative sentences that could only be uttered after the future events had taken place.10

This may sound like a trivial argument over semantics. Surely even if the Ideal Chronicler can’t use exactly the words that historians use, it can still perceive the essence of what is happening as well as they do. But in fact Danto’s point is precisely that historical descriptions of “what is happening” are impossible without narrative sentences—that narrative sentences are the very essence of historical explanations. This is a critical distinction, because historical accounts do often claim to be describing “only” what happened in detached, dispassionate detail. Yet as Berlin and Danto both argue, literal descriptions of what happened are impossible. Perhaps even more important, they would also not serve the purpose of historical explanation, which is not to reproduce the events of the past so much as to explain why they mattered. And the only way to know what mattered, and why, is to have been able to see what happened as a result—information that, by definition, not even the impossibly talented Ideal Chronicler possesses. History cannot be told while it is happening, therefore, not only because the people involved are too busy or too confused to puzzle it out, but because what is happening can’t be made sense of until its implications have been resolved. And when will that be? As it turns out, even this innocent question can pose problems for commonsense explanations.

IT’S NOT OVER TILL IT’S OVER

In the classic movie Butch Cassidy and the Sundance Kid, Butch, Sundance, and Etta decide to escape their troubles in the United States by fleeing to Bolivia, where, according to Butch, the gold is practically digging itself out of the ground. But when they finally arrive, after a long and glamorous journey aboard a steamer from New York, they are greeted by a dusty yard filled with pigs and chickens and a couple of run-down stone huts. The Sundance Kid is furious and even Etta looks depressed. “You get much more for your money in Bolivia,” claims Butch optimistically. “What could they possibly have that you could possibly want to buy?” replies the Kid in disgust. Of course we know that things will soon be looking up for our pair of charming bank robbers. And sure enough, after some amusing misadventures with the language, they are. But we also know that it is eventually going to end in tears, with Butch and Sundance frozen in that timeless sepia image, bursting out of their hiding place, pistols drawn, into a barrage of gunfire.

So was the decision to go to Bolivia a good decision or a bad one? Intuitively, it seems like the latter because it led inexorably to Butch and the Kid’s ultimate demise. But now we know that that way of thinking suffers from creeping determinism—the assumption that because we know things ended badly, they had to have ended badly. To avoid this error, therefore, we need to imagine “running” history many times, and comparing the different potential outcomes that Butch and the Kid might have experienced had they made different decisions. But at what point in these various histories should we make our comparison? At first, leaving the United States seemed like a great idea—they were escaping what seemed like certain death at the hands of the lawman Joe Lefors and his posse, and the journey was all fun and games. Later in the story, the decision seemed like a terrible idea—of all the many places they might have escaped to, why this godforsaken wasteland? Then it seemed like a good decision again—they were making loads of easy money robbing small-town banks. And then, finally, it seemed like a bad idea again as their exploits caught up to them. Even if you granted them the benefit of foresight, in other words—something we already know is impossible—they may still have reached very different conclusions about their choice, depending on which point in the future they chose to evaluate it. Which one is right?

Within the narrow confines of a movie narrative, it seems obvious that the right time to evaluate everything should be at the end. But in real life, the situation is far more ambiguous. Just as the characters in a story don’t know when the ending is, we can’t know when the movie of our own life will reach its final scene. And even if we did, we could hardly go around evaluating all choices, however trivial, in light of our final state on our deathbed. In fact, even then we couldn’t be sure of the meaning of what we had accomplished. At least when Achilles decided to go to Troy, he knew what the bargain was: his life, in return for everlasting fame. But for the rest of us, the choices we make are far less certain. Today’s embarrassment may become tomorrow’s valuable lesson. Or yesterday’s “mission accomplished” may become today’s painful irony. Perhaps that painting we picked up at the market will turn out to be an old master. Perhaps our leadership of the family firm will be sullied by the unearthing of some ethical scandal, about which we may not have known. Perhaps our children will go on to achieve great things and attribute their success to the many small lessons we taught them. Or perhaps we will have unwittingly pushed them into the wrong career and undermined their chances of real happiness. Choices that seem insignificant at the time we make them may one day turn out to be of immense import. And choices that seem incredibly important to us now may later seem to have been of little consequence. We just won’t know until we know. And even then we still may not know, because it may not be entirely up to us to decide.

In much of life, in other words, the very notion of a well-defined “outcome,” at which point we can evaluate, once and for all, the consequences of an action is a convenient fiction. In reality, the events that we label as outcomes are never really endpoints. Instead, they are artificially imposed milestones, just as the ending of a movie is really an artificial end to what in reality would be an ongoing story. And depending on where we choose to impose an “end” to a process, we may infer very different lessons from the outcome. Let’s say, for example, that we observe that a company is hugely successful and we want to emulate that success with our own company. How should we go about doing that? Common sense (along with a number of bestselling business books) suggests that we should study the successful company, identify the key drivers of its success, and then replicate those practices and attributes in our own organization. But what if I told you that a year later this same company has lost 80 percent of its market value, and the same business press that is raving about it now will be howling for blood? Common sense would suggest that perhaps you should look somewhere else for a model of success. But how will you know that? And how will you know what will happen the year after, or the year after that?

Problems like this actually arise in the business world all the time. In the late 1990s, for example, Cisco Systems—a manufacturer of Internet routers and telecommunications switching equipment—was a star of Silicon Valley and the darling of Wall Street. It rose from humble beginnings at the dawn of the Internet era to become, in March 2000, the most valuable company in the world, with a market capitalization of over $500 billion. As you might expect, the business press went wild. Fortune called Cisco “computing’s new superpower” and hailed John Chambers, the CEO, as the best CEO of the information age. In 2001, however, Cisco’s stock plummeted, and in April of 2001, it bottomed out at $14, down from its high of $80 just over a year earlier. The same business press that had fallen over itself to praise the firm now lambasted its strategy, its execution, and its leadership. Was it all a sham? It seemed so at the time, and many articles were written explaining how a company that had seemed so successful could have been so flawed. But not so fast: by late 2007, the stock had more than doubled to over $33, and the company, still guided by the same CEO, was handsomely profitable.11

So was Cisco the great company that it was supposed to have been in the late 1990s after all? Or was it still the house of cards that it appeared to be in 2001? Or was it both, or neither? Following the stock price since 2007, you couldn’t tell. At first, Cisco dropped again to $14 in early 2009 in the depths of the financial crisis. But by 2010, it had recovered yet again to $24. No one knows where Cisco’s stock price will be a year from now, or ten years from now. But chances are that the business press at the time will have a story that “explains” all the ups and downs it has experienced to that point in a way that leads neatly to whatever the current valuation is. Unfortunately, these explanations will suffer from exactly the same problem as all the explanations that went before them—that at no point in time is the story ever really “over.” Something always happens afterward, and what happens afterward is liable to change our perception of the current outcome, as well as our perception of the outcomes that we have already explained. It’s actually quite remarkable in a way that we are able to completely rewrite our previous explanations without experiencing any discomfort about the one we are currently articulating, each time acting as if now is the right time to evaluate the outcome. Yet as we can see from the example of Cisco, not to mention countless other examples from business, politics, and planning, there is no reason to think that now is any better time to stop and evaluate than any other.

WHOEVER TELLS THE BEST STORY WINS

Historical explanations, in other words, are neither causal explanations nor even really descriptions—at least not in the sense that we imagine them to be. Rather, they are stories. As the historian John Lewis Gaddis points out, they are stories that are constrained by certain historical facts and other observable evidence.12 Nevertheless, like a good story, historical explanations concentrate on what’s interesting, downplaying multiple causes and omitting all the things that might have happened but didn’t. As with a good story, they enhance drama by focusing the action around a few events and actors, thereby imbuing them with special significance or meaning. And like good stories, good historical explanations are also coherent, which means they tend to emphasize simple, linear determinism over complexity, randomness, and ambiguity. Most of all, they have a beginning, a middle, and an end, at which point everything—including the characters identified, the order in which the events are presented, and the manner in which both characters and events are described—all has to make sense.

So powerful is the appeal of a good story that even when we are trying to evaluate an explanation scientifically—that is, on the basis of how well it accounts for the data—we can’t help judging it in terms of its narrative attributes. In a range of experiments, for example, psychologists have found that simpler explanations are judged more likely to be true than complex explanations, not because simpler explanations actually explain more, but rather just because they are simpler. In one study, for example, when faced with a choice of explanations for a fictitious set of medical symptoms, a majority of respondents chose an explanation involving only one disease over an alternative explanation involving two diseases, even when the combination of the two diseases was statistically twice as likely as the single-disease explanation.13 Somewhat paradoxically, explanations are also judged to be more likely to be true when they have informative details added, even when the extra details are irrelevant or actually make the explanation less likely. In one famous experiment, for example, students shown descriptions of two fictitious individuals, “Bill” and “Linda” consistently preferred more detailed backstories—that Bill was both an accountant and a jazz player rather than simply a jazz player, or that Linda was a feminist bank teller rather than just a bank teller—even though the less detailed descriptions were logically more likely.14 In addition to their content, moreover, explanations that are skillfully delivered are judged more plausible than poorly delivered ones, even when the explanations themselves are identical. And explanations that are intuitively plausible are judged more likely than those that are counterintuitive—even though, as we know from all those Agatha Christie novels, the most plausible explanation can be badly wrong. Finally, people are observed to be more confident about their judgments when they have an explanation at hand, even when they have no idea how likely the explanation is to be correct.15

It’s true, of course, that scientific explanations often start out as stories as well, and so have some of the same attributes.16 The key difference between science and storytelling, however, is that in science we perform experiments that explicitly test our “stories.” And when they don’t work, we modify them until they do. Even in branches of science like astronomy, where true experiments are impossible, we do something analogous—building theories based on past observations and testing them on future ones. Because history is only run once, however, our inability to do experiments effectively excludes precisely the kind of evidence that would be necessary to infer a genuine cause-and-effect relation. In the absence of experiments, therefore, our storytelling abilities are allowed to run unchecked, in the process burying most of the evidence that is left, either because it’s not interesting or doesn’t fit with the story we want to tell. Expecting history to obey the standards of scientific explanation is therefore not just unrealistic, but fundamentally confused—it is, as Berlin concluded, “to ask it to contradict its essence.”17

For much the same reason, professional historians are often at pains to emphasize the difficulty of generalizing from any one particular context to any other. Nevertheless, because accounts of the past, once constructed, bear such a strong resemblance to the sorts of theories that we construct in science, it is tempting to treat them as if they have the same power of generalization—even for the most careful historians.18 When we try to understand why a particular book became a bestseller, in other words, we are implicitly asking a question about how books in general become bestsellers, and therefore how that experience can be repeated by other authors or publishers. When we investigate the causes of the recent housing bubble or of the terrorist attacks of September 11, we are inevitably also seeking insight that we hope we’ll be able to apply in the future—to improve our national security or the stability of our financial markets. And when we conclude from the surge in Iraq that it caused the subsequent drop in violence, we are invariably tempted to apply the same strategy again, as indeed the current administration has done in Afghanistan. No matter what we say we are doing, in other words, whenever we seek to learn about the past, we are invariably seeking to learn from it as well—an association that is implicit in the words of the philosopher George Santayana: “Those who cannot remember the past are condemned to repeat it.”19

This confusion between stories and theories gets to the heart of the problem with using common sense as a way of understanding the world. In one breath, we speak as if all we’re trying to do is to make sense of something that has already happened. But in the next breath we’re applying the “lesson” that we think we have learned to whatever plan or policy we’re intending to implement in the future. We make this switch between storytelling and theory building so easily and instinctively that most of the time we’re not even aware that we’re doing it. But the switch overlooks that the two are fundamentally different exercises with different objectives and standards of evidence. It should not be surprising then that explanations that were chosen on the basis of their qualities as stories do a poor job of predicting future patterns or trends. Yet that is nonetheless what we use them for. Understanding the limits of what we can explain about the past ought therefore to shed light on what it is that we can predict about the future. And because prediction is so central to planning, policy, strategy, management, marketing, and all the other problems that we will discuss later, it is to prediction that we now turn.