ACCORDING TO A WELL-KNOWN adage attributed to Dr. Theodore Woodward of the University of Maryland School of Medicine in the 1940s, “When you hear hoofbeats, think horses, not zebras.” The point is that horses are far more common than zebras, at least if we are not in a zoo or on the African savanna. And although it is remotely possible that the sound of hoofbeats was created by a thundering herd of zebras escaping from a nearby animal sanctuary, the more likely explanation is that the sound of hoofbeats was created by horses. Probabilities matter. And because probabilities matter, the most likely explanation of some phenomenon is, tautologically, most likely correct.
Of course, going with the most likely explanation doesn’t get you a Nobel Prize, or even a minute on the evening news.1 And that is why “man bites dog” persists as a hackneyed staple of the world of journalism. Dog bites man is common, expected, and hardly newsworthy. Man bites dog, being unexpected, makes news.
And it is not just daily journalism. A popular book of a few years ago—The Black Swan—celebrated the identification and investigation of unusual and unexpected events.2 And properly so, at least for some purposes. After all, no one would remember the Wright brothers if Wilbur had said, “Orville, why don’t we just take the train?” And when George and Ira Gershwin’s hit song from Porgy and Bess in 1935 observed that “it ain’t necessarily so,” it reminded us that probabilities are just that—probabilities—and that sometimes the less probable is more significant though less likely.
Recognizing and pursuing the unusual and unexpected is indeed valuable for discovery, creativity, and innovation.3 But it is also often valuable to expect the expected—to recognize that probabilities are important, and to rely on them. Absent further information, horses and not zebras is the winning bet.
The relevance of “horses, not zebras” for evidence is that evidence is about inference, and inference is about probability. Deductive inference (All birds have backbones; this parrot is a bird; therefore, this parrot has a backbone) is central to rational thinking. But so too is inductive inference, where the conclusion does not necessarily follow from the premises. For example, “Most Italian citizens speak Italian; this woman is an Italian citizen; therefore, she probably speaks Italian.” Such inductive inference, where the conclusion might not hold in some particular instance—is at the core of the idea of evidence. Someone being an Italian citizen is very strong evidence that they speak Italian, but this inductive inference might turn out to be mistaken in a particular case. Some Italian citizens do not speak Italian. They might come from the German-speaking region in northern Italy, or they might speak only the local dialect of Sicily or Venice, or they might be in a family of relatively recent immigrants whose family language is the language of their country of origin. This possibility of error, however, does not mean that someone’s Italian citizenship is not evidence that they speak Italian. Believing that someone who is an Italian citizen speaks Italian is a very good inductive inference, and what makes it a good inference is not that the inference is a logical necessity, as it would be in the case of deduction, but instead that it is based on evidence.
Consider the process of medical diagnosis, the domain in which “horses, not zebras” was developed and is often still used to help educate medical students about diagnostic techniques. The physician sees a symptom, or collection of symptoms, and infers (or hypothesizes) from her knowledge the likely cause of those symptoms. More precisely, she might not only see symptoms, but also learn various aspects of her patient’s lifestyle and medical history, which, when combined with symptoms, we can call “indications.” The physician might, for example, know that a patient who is displaying a ring-shaped redness on his skin and complaining of chills and headaches happens to enjoy hiking in short pants and camping in the wilderness. Based on these indications, the physician infers that the patient has Lyme disease, and she does so because these indications have previously usually been associated with Lyme disease.4 It is possible, of course, that these indications—these pieces of evidence—were caused by something other than Lyme disease. Some bruises might produce ring-shaped redness, and so might ringworm, and there are many possible causes of headaches and chills, even for hikers and campers who wear short pants. Such a confluence of indications being caused by something other than Lyme disease is hardly impossible, but it would be rare—and thus, given these indications, inferring ringworm as the cause of the circular redness, say, would be analogous to inferring zebras, with the role of horses being played by Lyme disease. Faced with the indications just described, a competent physician will ordinarily, in the absence of contrary evidence, diagnose and treat for Lyme disease. And she does so not because it is certain that Lyme disease is the correct diagnosis, but, instead, on the basis of probabilities.5 Ian Hacking describes inductive inferences, such as this one, as necessarily “risky,” because, unlike logical deduction, they could be wrong in a particular case.6 There is a chance, even if small, that the sounds are coming from zebras, no matter how strong the probabilistic inference in favor of horses, and thus inferring that the sound of hoofbeats is coming from horses involves some risk of error. But that risk is inherent in inductive reasoning, and thus inherent in reaching conclusions on the basis of evidence.
The example of Lyme disease calls to mind the relatively recent movement in health care going by the name “evidence-based medicine.” The label is initially alarming. Is there really some other kind of medicine, as the label suggests? And are there are real doctors who practice something other than evidence-based medicine? Evidence-free medicine, perhaps? That would be disturbing. Who would want a doctor who didn’t care about evidence?
But let us look closely at the evidence-based medicine movement. It originated at McMaster University in Canada, took hold with gusto in the United Kingdom, and now has a worldwide presence.7 And a movement it is, attracting a coterie of devoted adherents and occasionally provoking angry objectors.8 At the heart of the movement is the claim, as influentially put, that evidence-based medicine is the “conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients.”9 By itself, it is hard to see what could be controversial about that. But as we dig deeper into the debates that evidence-based medicine has inspired, it becomes clear that the controversy stems from evidence-based medicine’s originally explicit and still implicit claim that the evidence that comes from randomized clinical trials is at the top of an evidentiary hierarchy. For true believers in evidence-based medicine, the more qualitative and sometimes impressionistic evidence that comes from the knowledge, skills, and experiences of actual patient-treating physicians ranks lower on the evidentiary hierarchy and is therefore less valuable. But if you were one of these experienced physicians, having long based your diagnoses and treatments largely on the lessons you had gleaned from years of practice and hundreds or thousands of patients, you would take the evidential hierarchy of the evidence-based medicine movement as threatening. Or insulting. Or both.
This is hardly the place to referee the dispute between the enthusiasts of evidence-based medicine and their detractors. But the dispute highlights the idea that there can be better and worse evidence, with the measure of better or worse being the strength of the inference to some conclusion that comes from the evidence, and the measure of that strength being the probability that the conclusion is correct. As the evidence-based medicine movement reminds us, that probability is typically greatest when the evidence comes from carefully designed and conducted controlled experiments or other methods of equivalent rigor. A good example comes from the research on the effectiveness of the various Covid-19 vaccines. Prominently, the initial tests of the Moderna mRNA-1273 Covid-19 vaccine employed a study using more than thirty thousand subjects, half of whom were given the vaccine and the otherwise identical other half being given a placebo. It turned out that there were ninety-five infections in the placebo group and five in the treatment group, the differential producing the widely publicized conclusion of a 94.5 percent effectiveness rate.10 And this is just the kind of study that the evidence-based medicine movement puts on the top rung of its evidentiary ladder.
But now consider a hypothetical clinician who has treated thirty-eight unvaccinated patients who have already tested positive for Covid-19. She gives all of them the standard treatment for conventional influenza—Tamiflu, for example—and almost all of them suffer no further progression of the disease and need no hospitalization, although two of them do get worse and must be hospitalized. When she then sees her thirty-ninth unvaccinated Covid-19 patient, she infers from these past experiences that Tamiflu is effective in alleviating Covid-19 symptoms and prescribes and treats accordingly.
It is possible that the same percentage of patients that this clinician has treated would have gotten better even without the Tamiflu, and nothing in this experience-based inference about Tamiflu’s effectiveness definitively rules out this possibility. It is also possible that a different treatment would have produced an even higher recovery rate, whether measured by speed of recovery or percentage of those treated who recover. And because a well-designed and well-executed controlled trial would have been configured for the very purpose of ruling out these and other non-Tamiflu causes of the patients’ recovery, the probability of the conclusion drawn from the controlled trial being correct can be expected to be higher than the probability of the conclusion drawn from the experience-based inference being correct. Because reasoning from evidence is inductive, and because inductive reasoning is probabilistic, evidential reasoning is necessarily probabilistic, and higher probabilities are the measure of better evidence.
It is important to acknowledge, however, that there is nothing about an experiment or a laboratory that makes experiment-based evidence necessarily stronger than other types of evidence. The probability of a conclusion that comes from a controlled experiment is often but not invariably higher than the probability of conclusions derived from other types of evidence. There are badly designed experiments and sloppy laboratories. There are also experience-based qualitative inferences that come from a very large number of instances—data points—over a long period of time, and that qualitatively attempt to isolate causes and exclude alternatives in a manner that is less precise but theoretically similar to what scientists do with controlled laboratory experiments.11 And so although the disputes about evidence-based medicine teach us that there can better or worse evidence, and remind us that the probability of some conclusion being correct is the measure of the strength of the evidence, the disputes also remind us that qualitative or experience-based evidence is still evidence. By using the phrase “evidence-based,” the evidence-based medicine movement implicitly wants us to believe that medicine that does not rely heavily on published and peer-reviewed laboratory or other experimental evidence is not using evidence at all.12 But that is a mistake. There are other kinds of evidence, and sometimes those other kinds of evidence will produce inferences with a high probability of being correct. The question is not one of evidence versus no evidence, but of better versus worse evidence. The controlled experiment or randomized clinical trial evaluated in a peer-review process is the gold standard of scientific inference.13 But other forms of information and the inferences flowing from them can often be probably correct, even highly so. And thus these other forms of information can count as evidence, and frequently as very good evidence.
To observe that weaker evidence is still evidence is not to deny that there are people, far too often political and public figures, who make assertions that really are supported by no evidence at all. There is, for example, no evidence whatsoever that a mysterious figure called “Q” has infiltrated the Democratic Party with his brigade of Satanist pedophiles.14 To describe the claims that such a conspiracy exists as being made “without evidence” is entirely correct. And so is the conclusion that the charges of electoral fraud in the 2020 presidential election were made without evidence, as United States federal district judge Matthew W. Brann angrily concluded on November 21, 2020.15 But these are extreme cases. Far more commonly, accusations that some statement has been made, or conclusion reached, without evidence are accusations that the available evidence is not the right kind of evidence or is not enough evidence to satisfy the accuser. Sometimes the available evidence for some conclusion is so flimsy that it ought to be treated as completely nonexistent, even if that is not technically correct. But often the charge of “no evidence” reflects the mistaken belief that anything other than concrete physical or documentary evidence, or perhaps the testimony of eyewitnesses, does not count as evidence at all. The lesson of this section, one to which we will repeatedly return, is that this is simply not true. All sorts of things are evidence, including physical objects—prototypically the murder weapon or the body—written documents, personal observation, past experience, and what others have told us. And although so-called circumstantial evidence is commonly dismissed or denigrated on television or by lawyers for guilty defendants, the legal system properly recognizes that circumstantial evidence can be very good evidence, and so do the rest of us in countless ways and on countless occasions.16 Indeed, even the lack of evidence can be evidence.17 So although we should interrogate the conclusions of officials and others about the evidence supporting their conclusions, we should also interrogate those who desire a different kind of evidence than has been provided for some conclusion, with the aim of trying to determine exactly what kind of evidence would satisfy them.
Complaints about the absence of evidence are as often a mask for complaints about the quantity of evidence as they are about the type of evidence. We will return in Chapter 3 to quasi-quantitative issues of just how much evidence, and of what strength, we need for some conclusion or some action. Now, however, it is important to highlight the significant differences, not only between “no evidence” and “not the kind of evidence that will satisfy me,” but also between “no evidence” and “not enough evidence to satisfy me.” And just as the former complaint is often couched—and clouded—in phrases such as “no hard evidence,” “no concrete evidence,” and “no direct evidence,” the latter is often expressed as “no conclusive evidence,” “no definitive evidence,” or even “no proof.”18 In slightly different ways, each of these contains the (perhaps unintended) negative inference that the complainer wishes to denigrate at least some evidence in support of the conclusion. Sometimes the denigration is justified, and sometimes not, but such phrases should put the listener or reader on alert that there is indeed some evidence, rather than there being no evidence at all.
Now let us step back. What is the point of evidence? When put that way, it becomes clear that we do not value evidence for its own sake. Evidence is not like happiness, pleasure, or dignity, which are plausibly considered ends and not means. Rather, evidence is a means to some end, and the end is some factual conclusion of interest to us. And embedded in the factual conclusions that interest us is the assumption that those conclusions are valuable because and when they are true. So we can say, conventionally, that evidence is valuable insofar as it leads to truth—or, more precisely, to a belief in things that are true.
With respect to any piece of alleged evidence, therefore, we need to ask two questions. The second is what the evidence is trying to establish, but the first is whether the alleged evidence is itself true. If yellowish-appearing skin is evidence of hepatitis, the first question is whether the skin is actually yellowish. If automobile engine “pinging” is evidence of too-low-octane fuel, first we must determine whether there is in fact a pinging sound. Similarly, and where the truth of the evidence is less obvious, if we are to count as evidence a witness’s observation of the defendant lingering in the vicinity of the bank shortly before the bank that the defendant is charged with robbing was robbed, we first need to know whether it really was the defendant that the witness saw. And if in an election it is evidence of voter suppression that the percentage of voter turnout in a largely African American neighborhood is much lower than the percentage in a largely white neighborhood, we must determine initially whether the percentage of voter turnout in the African American neighborhood actually is lower than in the white neighborhood.
Obviously, these “first step” foundations are themselves the product of evidence. That the defendant was lingering near the bank at a certain time is a conclusion from the evidence supplied by a witness’s perception, and the perception is evidence of what the witness claims to have observed. But although the witness lingering near the bank is a conclusion from evidence, it is also evidence for some other conclusion—that the defendant robbed the bank. Leaving the courtroom aside, we can say, for example, that a decrease in the size of the Arctic ice cap is evidence of global warming, but believing that the Arctic ice cap is shrinking is itself based on evidence. And voter turnout being lower in the African American neighborhood than in the white neighborhood is both evidence for the conclusion of voter suppression and itself a conclusion based on the evidence that tells us about voter turnout. All evidence, or at least almost all evidence, has this double aspect. It is typically based on other evidence, and it is also evidence of something else. When we say that an item of evidence is evidence of something, therefore, we need to bear in mind that the item of evidence is also the something that another piece of evidence is the evidence of.
Evidence is thus typically based on other evidence.19 More importantly, something is evidence insofar as it leads to some conclusion or leads to confirming or disconfirming some hypothesis. But “leads to” is too vague. What precisely is the relationship between some fact and some conclusion that makes the fact evidence, rather than just a free-floating piece of data? What does it mean to say that evidence supports some conclusion? Or that it is evidence against some conclusion? Exploring these questions is our next task.
The Reverend Thomas Bayes (1702–1761) was a Presbyterian minister who, we presume, spent part of his Sundays preaching the Gospel. But no one remembers him for his sermons. Reverend Bayes is remembered instead for his contributions to the theory of probability and statistics, one of which was Bayes’ theorem, whose formal symbolic version need not detain us here. But nonformally, Bayes’ theorem is about the way in which additional evidence incrementally (or serially) contributes to some conclusion. Under a Bayesian approach to inference, people start with some estimate of the probability of some conclusion. This, in Bayesian terminology, is the prior probability—or simply, as often expressed these days by those who employ Bayesian methods, the prior. And then when people are given additional evidence, they consider each new piece of evidence and readjust the probability of their earlier conclusion upward or downward to produce the posterior probability.
Consider the prominent and occasionally still-contested example of Thomas Jefferson and Sally Hemings. If we were to have asked, some decades ago, what the probability was that Thomas Jefferson impregnated an enslaved person in his household named Sally Hemings, we would have produced or assumed a probability, not zero, based on the facts that Sally Hemings was an enslaved person in Jefferson’s household, that male slave owners in Virginia and elsewhere often had (almost always coercive) sexual relations with their slaves, and that sexual relations sometimes produce pregnancy. These background facts would have produced a prior probability that Jefferson impregnated Hemings and was the father of one or more of her children. And when evidence of various writings of Hemings’s children describing Jefferson as their father came to light, this evidence raised the probability that Jefferson had impregnated Hemings. A number of census records consistent with Jefferson being the father of Hemings’s children then raised the probability even further. And then DNA testing confirmed that some of Jefferson’s descendants had some of the same DNA as some of Hemings’s descendants, which raised the probability even more. What started out as a low-probability possibility ended up as a high-probability conclusion with the successive addition of subsequent incremental items of evidence. And, indeed, the Thomas Jefferson Foundation, devoted to studying and preserving all things Jefferson, explicitly relied on Bayes’ Theorem in explaining how they reached the conclusion that Jefferson was the father of Hemings’s children.20
Under a Bayesian approach, the test of whether some fact is evidence of some other fact, or of some conclusion, is incremental. If a fact increases the likelihood of some conclusion above what it would be without that fact, then that fact is evidence for the conclusion. And if a fact decreases the likelihood of some conclusion, then it is evidence against the conclusion. But if the fact neither increases nor decreases what we believed before—the prior probability—than the fact is simply not evidence at all, or at least not evidence for or against the conclusion in question, although it might be evidence for or against some other conclusion.21
This incremental definition of evidence is widely accepted by many of those who practice or philosophize about science.22 Unfortunately, this sound conception of evidence is occasionally replaced by a less-sound conception of evidence sometimes found in the philosophical literature—that evidence is only potential evidence unless the conclusion it supports is true.23 But this is a mistake, or, more charitably, an odd and nonstandard understanding of the definitions of evidence and of the word “evidence.” Take the example (the basis of the 2020 movie The Last Vermeer) of the famous art forger Han van Meegeren.24 In 1937 the Boijmans Museum in Rotterdam purchased through multiple highly respected art dealers a painting alleged to be a Vermeer entitled Christ at Emmaus. The painting not only was alleged to be a genuine Vermeer but also appeared to be such to the highly knowledgeable dealers and to the equally highly knowledgeable experts at the museum. The museum arranged to have the authentication tests available at the time conducted, and discovered that the kind of paint appeared authentic for the time and for Vermeer, that the canvas was correct for the dates and the artist, that the wood on which the canvas was stretched was similarly correct, that the brush strokes on the painting appeared to be Vermeer’s style and could only have been created by the kind of brush that Vermeer used, and that the widely respected Vermeer expert Abraham Bredius had declared the painting to be an authentic Vermeer. Ten years later it was discovered that the painting was a forgery, which we know because the forger confessed.25 The question then is whether, knowing now that the painting is a forgery, we should consider the paint, the canvas, the wood, the brushstrokes, and the expert opinion as having been evidence, or instead as mere potential evidence whose status as evidence disappeared upon discovery of the forgery.
It may seem as if the question whether these various bits and pieces of information are evidence or only potential evidence is a mere definitional dispute, but it is more than that. The view that these items of information are only potential evidence assumes that we finally evaluate the status of some fact as evidence—or not—only after having discovered the fact of the matter. But that is not how we evaluate evidence, and it is not why we use or care about evidence. We care about evidence because of what it can tell us about some aspect of the world about which we are uncertain. Questions of religious faith aside, when we are certain of some feature of the world, we do not need evidence for it, although we might have had evidence for it. Only when we are uncertain do we need and use evidence, and that is the point at which we must determine whether something is or is not evidence. Determining whether something is evidence occurs during and part of the process of discovering the truth, and not after we know what the truth is. And that is why it is important to understand evidence as that which makes some uncertain conclusion more (or less) certain as a result of having the evidence, a status that does not evaporate after we have determined the truth.
The foregoing “anti-potential” account, which accepts that there can be actual evidence in favor of a hypothesis that turns out to be false, parallels the legal system, which also treats as relevant evidence, and therefore as evidence, any fact that makes some conclusion more or less probable than it was before considering that fact. As put in Rule 401 of Federal Rules of Evidence, replicated almost verbatim in the state evidence law of most states, “Evidence is relevant if … it has any tendency to make a [material] fact more or less probable than it would be without the evidence.” And it is hardly surprising that the law would see things this way. In a trial, rulings about the admissibility of evidence take place individually and incrementally, and obviously prior to a final verdict. The judge must therefore decide whether something counts as evidence under circumstances of uncertainty about the truth of what the evidence is presented as evidence for, and uncertainty about what the other items of evidence will be. Accordingly, the law has little use for the idea of potential evidence. And outside the courtroom, neither do we.
In the broadest sense, therefore, the very idea of evidence is compatible with the teachings of Reverend Bayes. Although there are debates about whether people are good or bad at probabilistic reasoning, we should not impose too stringent a test on the very idea of Bayesianism.26 In this book I sample only selectively from the Bayesian buffet, and thus focus only on the way in which the valuable core of a Bayesian approach lies in its incrementalism—the way in which Reverend Bayes counsels us to think of evidence as making some conclusion, or some other fact, more probable than we thought it was prior to learning of this evidence. Perhaps this approach would work best if people could assign numerical probabilities to their prior and posterior probabilities, but whether people can or should is a side issue. As long as we can accept that “beliefs come in degrees”—and thus, as long as we can accept that “more than,” “less than,” “stronger than,” and “weaker than,” for example, are sensible and realistic ways of thinking, including ways of thinking about evidence—questions about whether we can accurately quantify such ideas are peripheral.27 When we ask if something is evidence for some conclusion, or when we criticize someone for not offering evidence, all we are doing is seeing if something “moves the needle.” In the context of the debates about evidence that followed former president Trump’s November 5 post-election speech alleging widespread electoral fraud, for example, we should ask ourselves what we estimated the likelihood of such fraud to be prior to that speech. And those from both parties who criticized the president for offering no evidence are best understood as saying that nothing the president said caused them to adjust what their previous (and probably very small) assessment of the likelihood of fraud had been.
We can imagine, counterfactually, a speech that would indeed have provided such evidence. Suppose Trump had said, “I have been informed by four state attorneys general, three of them Democrats, that they are now investigating allegations of electoral fraud.” Even though such an assertion might have been made with no further information, and even though the assertion is based on the assertions of others—hearsay—the assertion itself might still have counted as evidence, assuming (again, possibly counterfactually) that the president would not have said it had he not been prepared to provide more detail. But with this and no other assumption in place, the very statement would plausibly have counted as evidence, even absent any documents, even absent more detail, and even absent any results of the alleged investigations. We will explore further in Chapter 5 the ways in which simple unverified statements—as with the hypothetical presidential statement just described—can be evidence, but for now the point is only that something being evidence is compatible with it being weak evidence, and compatible as well with there being other evidence inclining in the opposite direction.
The “horses, not zebras” adage emphasizes the fundamentally probabilistic nature of inferences about evidence. Even when people say they saw something with their own eyes—zebras, for example—what they are really saying is that they have had these perceptual experiences in the past, and that they have been reinforced in the belief that those perceptual experiences have a particular origin. When people first see something with a certain size and shape and pattern of stripes, they are told that these perceptions indicate zebras. And every time they again have these perceptions, they are reinforced in the belief that the perceptions are perceptions of zebras. So the next time they have the same perceptions, they identify the source as a zebra, even though it remains (remotely) possible that what they think they are perceiving as a zebra is actually two boys together wearing a zebra suit.
This may be too philosophically abstruse, but common inferences display the same pattern. I wake up in the morning, see a wet street outside, and infer that it has been raining. Although it is possible that the wet street has been produced by a street cleaning vehicle, my neighbor’s malfunctioning sprinkler system, or a truck with a leaking load of fuel oil, I infer that it rained because rain is what usually produces wet streets. This inference is based on a generalization, the kind of generalization that is an essential feature of our reasoning processes.28
Generalizations are typically—maybe necessarily—what makes an item of alleged evidence relevant.29 What makes a car being a Volvo relevant to its reliability is the generalization that Volvos are more reliable than cars in general. If the rate of reliability for Volvos were the same as that for all cars, the proper response to the fact that a car is a Volvo, if reliability is the question, is “So what?” Similarly, the Internal Revenue Service uses something called the Discriminant Income Function to determine whether a taxpayer’s return should be audited. And the characteristics that the IRS considers “relevant”—those that make it more likely than otherwise that an audit will change the return—are, in the words of the IRS, based on “past experience with similar returns.”30 Thus, the IRS’s alleged conclusion (the Discriminant Income Function being, not surprisingly, highly secret) that being a drywall contractor is evidence, even if slight by itself, of under-reported income is based on the experience-based generalization that drywall contractors are more likely than all taxpayers in general to under-report income, and more likely even than the category of all self-employed taxpayers.31
Thus, an alleged piece of evidence becomes relevant by being a member of a class of pieces of evidence whose presence makes it more likely than it would be without the evidence that some conclusion is either true or false. “Volvos are reliable” is a generalization. This car being a Volvo is relevant to its reliability precisely and only because of that generalization.
The focus of this chapter has been on what might count as evidence. But there is a difference between what counts as evidence and what we do with the evidence that counts. Under a pure Bayesian approach, what we do when it is time to reach a conclusion is to see where we are at that stage in the process of Bayesian updating. Each incremental item of evidence adjusts the probabilities, and at the moment of decision—whether a decision about facts or a decision about what to do—we make a decision based on the probabilities at that point.
This seems straightforward, but this Bayesian understanding of how to reach a decision based on multiple items of evidence has long been subject to what appears to be a challenge. One version of the challenge is based on an idea, attributed initially to the philosopher Gilbert Harman and developed most influentially by the late philosopher Peter Lipton, that goes by the name of “inference to the best explanation.”32 According to this approach, the evidence for (or against) some conclusion is not evaluated incrementally. Instead, all of the evidence is evaluated holistically, with the aim of seeing which explanation best explains all of the evidence that we have to that point obtained.33
Philosophers have vigorously debated the relative virtues of Bayesianism versus inference to the best explanation.34 And so have those who study how judges and jurors evaluate evidence in courts of law.35 But rather than wade into those debates, I want to suggest that determining which of these allegedly competing accounts of our evidentiary practices is more descriptively accurate or normatively desirable is a function not only of what we want to know, and not only of why we want to know it, but also, most importantly, of when we want to know it.
One question with which we are frequently confronted is whether some fact is or is not evidence at all. We are not, or at least not yet, seeking to explain a phenomenon, but merely trying to identify which facts will help us find some explanation. At this stage we do need a hypothesis, or a question to which we seek an answer, even if only tentatively. Observation—fact-finding—is necessarily theory-laden. To make relevant observations regarding a hypothesis, we cannot just go out into the world and accumulate random facts. We need some reason for accumulating these facts rather than those, a reason that will guide us in deciding which facts we care about and which are irrelevant—or, in legal terminology, which are immaterial. If we are seeking an explanation of why the Arctic ice cap is shrinking, the fact that Alicante is a Spanish city on the Mediterranean Sea—and that is certainly a fact—is of no interest. But once we have some question we wish to answer or a hypothesis we wish to test, we then look for those facts that seem useful in answering the question or testing the hypothesis. And for this task—the task of determining if something is evidence at all—the incremental approach commonly associated with a Bayesian perspective seems most helpful. If we already have a hypothesis and are interested in whether that hypothesis is true or false, then a Bayesian evaluation of whether some fact makes that hypothesis more likely true (or more likely false) than we thought it was prior to considering this fact seems most consistent with both how we do, and how we should, approach the issue. When we are evaluating facts to see, initially, whether they will count as evidence at all, we typically look at those facts one at a time, and thus evaluate those facts individually and incrementally to see if they make the hypothesis more or less likely, or if they help us answer the question at hand.
Once we have all of the evidence in hand, however, it would seem odd to then make an incremental evaluation of all of it. Yes, we could pick apart each of the items in our basket of evidentiary facts and evaluate them incrementally in good Bayesian fashion. But doing so seems both artificial as well as being unfaithful, as an empirical matter, to our actual reasoning methods. Instead, at the point of decision, when all of the evidence is in, we do look at all of the evidence more holistically. In doing so, we often simply recognize that various individual pieces of evidence may, as Susan Haack argues, combine to produce a conclusion that is greater than the mere sum of its parts.36 Looking at pieces of evidence in this mutually reinforcing way is not necessarily a search for an overarching explanation or story, but it is compatible with attempting to determine, on the basis of all of the evidence, which conclusion, or which hypothesis, best explains all of this evidence. This latter version of evidentiary holism is inference to the best explanation, and it may capture most faithfully how people actually do reason about hypotheses once all of the evidence is in, and how people ought to reason most rationally at this stage of the process. Indeed, Lipton frames his defense of inference to the best explanation by emphasizing that we start “with the evidence available to us.”37 In doing so, he makes it clear that his account is a post-collection explanation of what we do with the evidence we have collected, and neither a pre-collection account of how we collected that evidence in the first place nor a pre-decision account of how we sorted the evidence we have collected into the relevant and the nonrelevant.
Inference to the best explanation not only is compatible with Bayesian incrementalism in this way but also is compatible with a probabilistic approach. And here the key word is not so much “explanation” as it is “best.” If our goal is to evaluate competing plausible explanations to see which is more likely true, and thus to see which among the accounts before us best fits the evidence we have, then probabilities, even if not numerical, remain strongly in the picture. One explanation may be almost certainly true based on what we know about the world, whereas another may be possible but less likely. Escaping zebras is one explanation for the sound of hoofbeats, and my neighbor’s horses is another. But when we say that the former is more likely than the latter, we are engaging in a probabilistic assessment, a probabilistic assessment that is implicit in most searches for the “best” explanation. The idea of inference to the best explanation is often a sound way of understanding what we do with the evidence we have in hand. But what we do with that evidence remains irreducibly probabilistic, at least if our principal goal is the discovery of truth and the rejection of error.