Chapter 3The Burden of Proof

IT WOULD BE NICE if our evidence-based conclusions were as airtight as our mathematical ones. Two plus two equals four. The square root of 81 is 9. Period. But in the world of fact, and thus in the world of evidence, things are never so clear. Like it or not, uncertainty in factual judgments is an inevitable aspect of the human condition. Did Lee Harvey Oswald act alone in killing President Kennedy? Did Jack Ruby act alone in killing Lee Harvey Oswald? Do any ivory-billed woodpeckers still exist? Did the abominable snowman (Bigfoot, or Yeti) ever exist? What about the Loch Ness Monster? Did William Shakespeare write the plays now attributed to William Shakespeare? Was Thomas Jefferson the father of Sally Hemings’s children? Did former President Clinton sexually assault Juanita Broadrick, which she alleges and he denies? Did former President Trump sexually assault E. Jean Carroll, which she alleges and he denies?

When faced with such factual uncertainty, we typically have at least some evidence for one conclusion or the other, and maybe for both, but rarely do we have no evidence at all. And so we should not confuse uncertainty with ignorance.1 Webster’s Dictionary tells us that to be ignorant is to be “destitute of knowledge,” but we are rarely destitute when it comes to reaching conclusions about facts. Typically we have evidence—even if that evidence is weak, and even if there is some evidence both for and against some conclusion. Which is to say that having evidence is entirely compatible with being uncertain. Moreover, to know something is compatible with the possibility that what we know, or at least what we think we know, might be wrong. Philosophers typically equate knowledge with a degree of certainty that excludes the possibility of error, but ordinary people and even ordinary academics recognize that it is no mistake to believe that what we think of as knowledge may exist short of no-doubt-about-it absolute certainty.2 And what is important here is that below some level of complete certainty and above the level of complete ignorance we find most of the important issues of evidence. The question then is how much certainty is enough, and how much uncertainty we can tolerate. And this leads, naturally, to the question of how much evidence is enough, and for what.

The legal system’s approach to factual uncertainty is familiar. In a criminal case, the prosecution will prevail, and the defendant will be convicted, if the jury (or judge, if there is no jury) finds the defendant guilty “beyond a reasonable doubt.” Defendants will often be acquitted, and properly so, even when there is evidence against them, sometimes considerable evidence, as long as the strength of that evidence does not clear the “beyond a reasonable doubt” threshold.

There are debates about whether belief beyond a reasonable doubt, or any degree of belief, can (or should) be translated into numerical probabilities. In the context of actual trials, some number of courts, judges, and commentators argue that providing a numerical probabilistic equivalent to “belief beyond a reasonable doubt” would add a needed degree of clarification to an otherwise intolerably vague idea. Other courts, judges, and commentators, however, take the opposite view, insisting that adding numbers would lend an aura of false precision to an inevitably (and perhaps desirably) imprecise concept.3 For now, and consistent with the probabilist sympathies that pervade this book, let us assume that attaching rough percentages to the various burdens of proof can add useful refinement.4 When those who share this view assess, whether experimentally or otherwise, what “beyond a reasonable doubt” means in numbers, they widely conclude that to believe something beyond a reasonable doubt is to have a degree of belief (philosophers call it a “credence”) that is equivalent to between 90 percent and 99 percent certainty.5 “Beyond a reasonable doubt” does not mean convinced to an absolute certainty, as with my absolute certainty that I now have five fingers on my right hand, and judges commonly instruct jurors about the difference. But most analyses conclude that beyond a reasonable doubt means at least 90 percent certain.6

By contrast, the typical standard of proof in a civil—not criminal—case is the “preponderance of the evidence,” or, as often put in British Commonwealth countries, the “balance of probabilities.” So when Jack sues Jill for negligently causing him to fall and injure his head, he needs only prove his case by a preponderance of the evidence—that it is more likely than not, even if only barely more likely than not, that Jill was negligent and that her negligence caused the damage and the injuries.7

The difference in practice between proof beyond a reasonable doubt and proof by a preponderance of the evidence was vividly illustrated two and a half decades ago in the legal proceedings against former football star O. J. Simpson. Simpson was charged with having murdered his wife, Nicole Brown Simpson, as well as a waiter named Ron Goldman, who had dropped off at Brown Simpson’s house a pair of sunglasses she had forgotten at the restaurant where Goldman worked. The 1995 California criminal trial was front-page worldwide news for months.8 And at the conclusion of the trial, the jury decided that the prosecution had not proved beyond a reasonable doubt that Simpson was the murderer, and he was acquitted. But shortly thereafter, several members of Goldman’s and Brown Simpson’s families sued Simpson in a civil action for the tort of wrongful death. That being a civil case, the plaintiffs needed to prove their case only by a preponderance of the evidence. And in 1997 the jury so found, concluding that Simpson was civilly liable to Goldman’s family for the amount, including punitive damages, of $33.5 million dollars.9 To be sure, the first trial (criminal) and second trial (civil) took place in different courts with different judges and different juries, and neither the evidence nor the trial strategy was precisely the same in both trials. Still, both cases were based on much the same evidence of the same acts.10 That being so, the divergent results of the two trials illuminate the fact that evidence sufficient to prove something by a preponderance of the evidence might be insufficient to prove the very same thing beyond a reasonable doubt. And thus a failure to convict under a beyond a reasonable doubt standard is often far from a vindication, the claims of acquitted celebrity defendants notwithstanding. The Scottish verdict of “not proven” makes this non-vindication clear, and thus ameliorates some of the problem, but that is a verdict not generally available outside of Scotland.11

Jumping from past to present, we see a similar scenario play out in the controversy about the standard of proof to be applied in college and university disciplinary proceedings against students and faculty accused of sexual assault and faculty and administrators accused of sexual harassment. Under Title IX of the Civil Rights Act of 1964, as repeatedly amended, sex discrimination in colleges and universities, typically but not necessarily discrimination against women, is declared unlawful and thus a violation of federal law.12 Moreover, colleges and universities are considered in violation of Title IX if they do not provide adequate procedures by which students who are victims of sexual assault can obtain redress and initiate procedures leading to punishment of the perpetrators by the college or university. That is all familiar, and in many respects straightforward. But now things get more complicated.

In 2011 the Office for Civil Rights for the US Department of Education sent out a “Dear Colleague” letter informing colleges and universities that they would be considered in violation of Title IX if they employed a standard of proof any higher than preponderance of the evidence in disciplinary proceedings against those accused of sexual violence.13 In other words, it was more or less a federal requirement that educational institutions covered by Title IX initiate disciplinary proceedings against those accused of sexual violence—typically students accused by other students—and that in those proceedings the accused student must be found guilty if it was determined more likely than not (the preponderance of the evidence) that the accused student committed the acts with which he (typically) was charged.

In 2017, however, with the Trump administration having succeeded the Obama administration, the same office rescinded its 2011 letter and replaced it with an interim guide instructing colleges and universities that they were free to use either preponderance of the evidence or a higher “clear and convincing” standard. And many colleges and universities accepted this invitation to raise the burden of proof necessary for a finding of culpability from preponderance of the evidence to a more demanding “clear and convincing” standard.

The standard of proof by clear and convincing evidence does not exist in the criminal law but is found in various other parts of the law. It is, for example, the standard of proof commonly applicable to involuntary civil commitment of mentally ill individuals who are thought to be dangerous to themselves or others.14 It is also the standard prescribed by the US Supreme Court for libel actions brought by public officials or public figures, who can recover against a publication for libel only if they can prove by clear and convincing evidence not only that what was said about them was false, but also that the publisher knew it was false at the time of publication.15 And although attaching a numerical probability to “clear and convincing” might be especially difficult, the “clear and convincing” standard plainly establishes a heavier burden than “preponderance of the evidence” and a lighter one than “proof beyond a reasonable doubt.” We might imagine “clear and convincing” as something in the vicinity of a .75 likelihood.

The difference between the two standards of proof may seem minor. And the whole issue may seem even more minor still. Major policy disputes are rarely about competing conceptions of the burden of proof. But this controversy was an exception. Fearing that the revised policy would encourage colleges and universities to use the higher “clear and convincing” standard, victim advocacy groups objected, arguing that the higher standard would allow some actual perpetrators to escape punishment—perpetrators who could be found responsible by a preponderance of the evidence, but not by clear and convincing evidence, especially given the frequency with which there is no physical evidence and no witnesses other than the accuser and the accused.

In response, others were worried about fairness to the accused and the “due process” rights of those charged with very serious offenses. These individuals and groups argued that under a mere preponderance of the evidence standard, a large number of accused students and faculty would be found culpable even though they had not done what they were accused of doing. A preponderance of the evidence standard is, after all, compatible with a 49 percent chance of mistake.

Both groups were right. Compared to a “preponderance of the evidence” standard, the higher “clear and convincing” standard increases the number of likely guilty individuals who are likely to escape punishment. And compared to a “clear and convincing” standard, the lower “preponderance of the evidence” standard increases the likely number of individuals who are not guilty but who will nevertheless wind up being punished for something that they probably did not do.

In the context of the criminal law, this trade-off has been known for centuries. Famously, William Blackstone observed in the eighteenth century that “it is better that ten guilty persons escape, than that one innocent suffer.”16 Others have used various other ratios to make much the same point.17 And that point is that any imperfect decision procedure will make mistakes. By traditional stipulation, statisticians often label the error considered to be more serious the Type I error and the less serious one the Type II error. More commonly, people talk about false positives and false negatives. In the context of punishment, the false positive is punishing the person who should not be punished—typically someone who is innocent. And the false negative is not punishing someone who should be punished—the guilty. As Blackstone recognized, the traditional view under the common law is that personal liberty (or life) is so important that we should treat the false positives as more serious errors than the false negatives, and design our procedures, systems, and institutions accordingly.

Even in the criminal law, however, this preference for avoiding the false positive is not absolute. If it were, we could minimize the incidence of false positives—false punishments—by punishing no one. But we do not do that, and thus we accept that the system will make mistakes of both kinds. Accordingly, society recognizes that even a Blackstone-type skewed ratio will accept some false positives—mistaken convictions—as the price to be paid for punishing many who are in fact guilty.18

In the criminal law, numerous procedural mechanisms and protections embody the Blackstonian perspective. Most relevant here is the requirement that the prosecution prove its case beyond a reasonable doubt—meaning that, as the Simpson scenario illustrates, some number of actually guilty people will likely escape the grasp of the criminal law. Other aspects of American criminal procedure similarly embody the preference for false acquittals over false convictions, if false verdicts there must be. The prohibition on double jeopardy in the Fifth Amendment and the requirement of a unanimous jury verdict to convict are among the most prominent examples.19

So let us return to college and university disciplinary proceedings based on allegations of rape or other forms of sexual assault. Advocates for the “clear and convincing” standard, some of whom we suspect would prefer a “beyond a reasonable doubt” standard, insist that a judgment of guilt in a campus disciplinary proceeding on a charge of sexual assault, even though it might not produce imprisonment, has such disastrous future consequences for anyone found guilty that a finding of guilt in a campus disciplinary proceeding ought to be treated as equivalent to a criminal conviction, and thus subject to the same burden of proof as that of the criminal law. And advocates for the lower “preponderance of the evidence” approach emphasize the difference between university sanctions, even expulsion, and actual criminal penalties. The sanctions available to the criminal law, after all, include imprisonment, which universities cannot impose, loss of the right to vote, which still exists in some states for those convicted of felonies but is again beyond the power of a college or university to prescribe, and an official criminal record, which is more difficult to conceal than a university disciplinary sanction.

The case for the lower standard, akin to that of a civil lawsuit, is typically supported by one of two arguments, one not very good and the other somewhat stronger. The weaker argument is that a preponderance of the evidence standard minimizes the number of errors—that is, maximizes accuracy. That is true, but the reason the maximizing-accuracy argument did not persuade Blackstone and countless others is that in most contexts we are interested not only in the raw numbers of mistakes or non-mistakes. We know, and Blackstone knew, that some mistakes are different from others, and any rational decision theory will take this into account in settling on the appropriate procedures.

The stronger argument acknowledges that not all errors are equivalent, and that the goal cannot simply be minimizing the number of errors. Nonetheless, especially in the context of disciplinary proceedings at educational institutions, this argument insists that the consequences—the costs—of an erroneous acquittal may be even greater than in the criminal law. For Blackstone and others of his time, the consequence of an error of letting a guilty person escape punishment was primarily the absence of deserved retribution. It is possible that Blackstone was not especially worried about the further crimes a wrongfully acquitted person would commit, or about the people who would be injured by those further crimes, even though there is an argument that he should have been, and that those who adopt the Blackstonian perspective should be now.20 But if we return to the present and to the university campus, those who advocate a lower burden of proof argue that students and faculty who are erroneously acquitted in a nonpublic proceeding, especially of sexual assault, will continue to be a danger to others in the same closed and relatively small community. As a result, so the argument goes, the harms of erroneous acquittal include not only the harms of failing to punish but also the actual or potential harms to other members of the very community that the university is committed to protecting. And this, it is said, is as much of a harm as a false conviction, meaning that we should treat the false negatives as being as problematic as the false positives, with the implication that the “balanced” preponderance of the evidence standard should be employed.

This is not the place to resolve this debate, especially because it will likely turn out that the debate will be “resolved” for the time being in favor the lower burden of proof as a result of the 2020 presidential election and the change in staffing of the Department of Education that the change of administration has produced.21 This political reality does not resolve the underlying normative issue, but that is not the goal here. Rather, the point is to illustrate that the normative issue is not only about evidence. It is about a conflict of substantive values, a conflict being played out on the field of evidence and procedure. Setting a burden of proof is inescapably an exercise in determining what, substantively, is at stake—and this determination is not based on principles of evidence alone.

Understanding the relationship between burdens of proof and substantive values also helps us avoid the common error of assuming that the legal system’s burden-of-proof standards should be used by other fact-determining or adjudicative systems. The Title IX controversy is one example, but there are many others. Consider the question of the burden of proof in the US Senate when it conducts an impeachment trial.22 The prevailing view is that the senators should each determine their own burden of proof, but this hardly answers the question of how those senators, individually, should make that decision. And that requires consideration of the purposes of an impeachment trial, the consequences of the different verdicts, and the consequences should those verdicts be mistaken—purposes and consequences that are different from those in an ordinary criminal trial. In the most recent trial of an impeachment, however, the February 2021 trial of the second impeachment of Donald Trump, the proceedings conjoined the issue of the factual burden of proof with the constitutional question whether a president (or other official) could be impeached and tried after leaving office, with the conjunction of the two issues making it especially difficult to discern what burden of proof the individual senators actually employed.

Determining the burden of proof is pervasive even beyond these prominent instances. What burden of proof, for example, should be used by an adjudicative body dealing with accusations such as cheating in an international bridge tournament. In a case well known to bridge players, visual evidence supporting the accusation was inconclusive, and an analysis of the bids and plays made by the accused players, and offered in opposing the accusation, also was inconclusive. What standard should the adjudicative body then use?23 Moving even further away from anything resembling a trial in a courtroom, what burden of proof should teachers apply when determining a question of classroom misconduct that was not observed by the teacher? What of the professor who suspects a student of plagiarism? What of the purchaser of a used car who suspects that the dealer rolled back the odometer? And what of the baseball umpire or basketball referee who is simply unsure of the right call, but nevertheless must decide at that very moment? Or, these days, what of the official who views the video and is empowered to reverse the call of the on-field or on-court umpire or referee?24 In these instances, and countless more, a burden of proof is at work, even if it is not made explicit. And setting the burden of proof, even if not done explicitly or deliberately, will unavoidably be based on an assessment of the purpose of the decision, the deeper values implicated, and the consequences that flow from mistakes. And for this, more than two hundred years later, Blackstone is still our guide.

Slightly Guilty?

In the 1967 movie The Producers, a satirical comedy directed by Mel Brooks about a theatrical scam, the criminal trial of the scammers is punctuated by a jury verdict of “incredibly guilty.” As should be obvious, “incredibly guilty” is not one of the options given to real jurors in real trials. Still, the idea of being “incredibly guilty” suggests that there can be degrees of guilt. And if there can be degrees of guilt, and if someone can be incredibly guilty, or very guilty, or really, really guilty, then what lies at the opposite end of the spectrum? One possibility is that perhaps there are people who are only slightly guilty.

The Producers is not clear about whether “incredibly guilty” is a reference to the measure of the wrongness of the act for which the defendants were convicted, or instead to the strength of the evidence of their guilt. The two are different. There can be overwhelming evidence of minor offenses such as littering and jaywalking and weak evidence of major crimes such as serial murder and rape. But here I focus on the latter—the weight or strength of the evidence—in thinking about the possibility of being slightly guilty.

In the previous section we contrasted the legal system’s idea of proof by a preponderance of the evidence with its higher standards of proof, in particular proof by clear and convincing evidence and proof beyond a reasonable doubt. But that is only half the story. Literally. If these standards describe a range of something like 50.1 percent to 99.9 percent, then, putting aside an exact 50-50 probability, what about the range from zero (or perhaps 1 percent) to something like 49.9 percent? If we are thinking about punishment in the strict sense, then ignoring the lower range makes perfect sense. After all, if there is a 40 percent chance that someone committed some crime, then there is a 60 percent chance that they did not, and we should not imprison someone who more likely than not did not do what we would be sending them to prison for.

But not so fast. What if we are 40 percent sure that the babysitter is a pedophile? Or 20 percent. Or 10 percent? What if the evidence shows that there is a 10 percent chance that the heart surgeon who is to operate on me tomorrow cheated in medical school, perhaps by having someone take their surgery exam for them? And should observant Jews patronize the kosher butcher when they believe there is a 20 percent probability that the butcher uses pork in the allegedly kosher frankfurters?

All these possibilities follow naturally from recognizing the uncertainty of evidence and the variability of burdens of proof. And these possibilities follow as well from the way in which, as Blackstone stressed, allocating the burden of proof, and by how much, flows from an assessment of the comparative harm and frequency of the errors of false positives and false negatives. So suppose we flip the Blackstonian ratio. What if we were to believe that letting guilty people go free is really horrible or extremely dangerous, and thus that it is better that ten innocents be punished than that one guilty person escape? This scenario seems bizarre, but it turns out to represent a plausible approach in various non-courtroom contexts. Consider the physician deciding whether to prescribe antibiotics for the patient who, to reprise an earlier example, has a ring-shaped redness on his arm and reports that he has been raking leaves. The patient displays no other indications—evidence—of Lyme disease. On this evidence, it is possible that the patient has Lyme disease, but it is probable that he does not. Still, failure to treat for Lyme disease at an early stage can have serious consequences. True, overprescribing antibiotics is, over time, good for neither the patient nor society, because overprescribing can foster the emergence of antibiotic-resistant strains of bacteria. Also, some people have allergic reactions to antibiotics. Nevertheless, the negative consequences of prescribing unneeded antibiotics in the normal case are minimal, whereas the consequences of not prescribing needed antibiotics are potentially catastrophic. Under such circumstances, it may be better that ten “innocent” cases (no Lyme disease) be “punished” (treated) than that one guilty case (actual Lyme disease) “escape” treatment. Accordingly, it may be wise to treat for a disease even when the evidence suggests that its likelihood is far below 50 percent.

The same holds true when the idea of punishment is less metaphorical than in the Lyme disease example. For instance, under what circumstances should we deprive people of their liberty—a serious consequence, even if it is not described as punishment—because of a risk that those people will endanger the health and safety of others? The obvious example in the era of Covid-19 is a quarantine for people with contagious conditions. Under English law, for instance, it is permissible to restrict the liberty of people with cholera, leprosy, malaria, meningitis, tuberculosis, and typhoid fever.25 The practice of quarantining people with contagious conditions raises the issue of when, if ever, it is justifiable to deprive people of their liberty when their probability of having a disease or condition that can cause great harm is less than .50. Blackstone is relevant but tells only part of the story. Is it better that ten people with a highly contagious serious disease—meningitis, for example—be allowed to mingle in the population of a large city than that one person without the disease be restricted? The answer is not so clear.26

There are similar issues with whether and when to release on parole persons whose risk of recidivism is small but still greater than that for the population at large. If an adult male has been convicted of a sexual offense against a child, incarcerated, and then released, the likelihood of a further sexual offense, depending on the nature of the offense and the relevant time period, is somewhere between 6 and 35 percent.27 And although these and other data are contested, it is clear that the recidivism rate is well below .50 and well above the rate of offending for a randomly selected adult male member of the population. The question, therefore, is what burden of proof should be used in assessing the evidence that someone is likely to reoffend.

Far more controversial and far less defensible are questions such as how much evidence is necessary to impose restrictions on the unconvicted members of a class whose occupation or avocation—priest, scout leader, Little League coach—statistically indicates a likelihood of offending greater than for the population at large, but far less than .50? This is not the place to answer that question, for it will depend on the degree of the danger, the probability of the danger, the stringency of the restriction, and the moral issues surrounding questions of when people must be treated as individuals and when they can be treated as members of a class.28 But although class-based detention is plainly unacceptable on moral and legal grounds, other forms of restriction present the same question. Being watched carefully, for example, is far less restrictive than being incarcerated, but it is not nothing. And because even being watched carefully is not nothing, a potential “watcher” is implicitly adopting a burden of proof in evaluating the strength of the class-based evidence in determining whether to treat a member of that class differently from how any other person would be treated. Of course this example presents a situation in which the moral stakes are very high. But consider the opposite end of the moral scale: Should the purchaser of a notoriously unreliable make of car—a Yugo, for example, or a Trabant—engage in more careful inspection of that car than for an equivalently aged Volvo or Subaru? If the answer is yes, then the history of the make of car has influenced the burden of proof, with a twelve-year-old Yugo being presumed unreliable and a twelve-year-old Subaru being presumed reliable.

The tendentious example of child molestation is merely one aspect of a large and rich literature on preventive justice and preventive detention.29 Depending on the consequences of non-prevention, and thus on the costs of mistaken non-preventions, and also of course depending on the costs of mistaken preventions, there may be circumstances in which the burden of proof to justify an act of prevention may differ from the burdens of proof normally applied in both criminal and civil trials. This is not the place to describe the literature on preventive justice and what that literature says about burdens of proof, and it is certainly not the place to take a position in the debates about preventive justice. But even the lower-temperature issue of whether to buy a used Yugo or a used Subaru suggests that supposing that the range of burdens of proof has a lower bound of preponderance of the evidence—.51—may be too quick. At least in some contexts, slightly guilty may be guilty enough.

Better to Be Safe than Sorry? The Precautionary Principle

We often use evidence to reach a conclusion about a specific fact or specific act. Did Susan rob the First National Bank on September 30? Was Thomas Jefferson the biological father of Sally Hemings’s son Eston Hemings? How many votes for president did Donald Trump and Joe Biden each receive in the state of Michigan in November 2020? But just as often we use evidence to support or challenge a general hypothesis about a category of acts or events, or about some larger phenomenon. What do we (or scientists) mean when we or they say that cigarette smoking causes cancer? And what is the evidence for that conclusion? Are Volvos reliable? How do we know? Does the use of aerosol cans damage the ozone layer? Does damage to the ozone layer cause climate change? Does the increased legal availability of guns increase the incidence of unlawful gun-produced harm?30 Does playing violent interactive video games cause an increase in the aggressive tendencies of teenage males who play those games? Does an increase in aggressive tendencies among teenage males cause teenage males to commit actual acts of violence?31 Does the amount of alcohol consumption that would ordinarily have no detrimental effects create problems when consumed by pregnant women?

In seeking to answer such questions, we typically rely on evidence that leads to probabilistic assessments, including probabilistic assessments of causation. No one claims that every cigarette smoker gets lung cancer, and no one claims that every case of lung cancer is caused by cigarette smoking. And no one who says that Subarus are reliable denies that there are unreliable Subarus, just as there are reliable cars of other makes. Instead, the claim, akin to the understanding of evidence and its relevance discussed in Chapter 2, is that something is the cause of some effect if it raises the probability of the effect, just as smoking raises the probability of lung cancer for the smoker, just as eating spicy Mexican food raises the probability of heartburn, and just as a car being a Subaru raises the probability of its being reliable.32

But what is the evidence for these conclusions, and how strong must it be in order to justify a particular conclusion about causation? More importantly, how strong must the evidence of causation be to justify some particular policy intervention based on that conclusion? At this point, questions of burden of proof again become crucial. How strong must the evidence be of a dangerous side effect of a prescription drug before the Food and Drug Administration prohibits its further distribution? How strong must the evidence be that there are sharks in the vicinity before a public beach is closed to swimmers? What strength of evidence is necessary to require people to wear seatbelts or motorcycle helmets? And what about restrictions on the use of recreational drugs?

In the context of many potential but still uncertain dangers to the environment, or potential but uncertain risks to health, a common approach to such questions is what is often, especially in the industrial democracies of western Europe, called the precautionary principle.33 The basic idea is straightforward: When there is evidence that some substance or practice presents a possible and plausible (even if far from certain) risk to the environment or human health, the practice or substance should not be permitted.

The precautionary principle is controversial.34 And it is controversial because it raises the Blackstonian ratio to what some believe to be unrealistic and dangerous levels. Unrealistic because it strikes some as exaggerating minuscule possibilities beyond reason. And dangerous, to the same critics, because it ignores the benefits that may come from slightly harmful products, substances, and technologies—where “slightly harmful” is a measure of the likelihood, and not the gravity, of a harm. Critics say that the precautionary principle looks at only one side of the cost–benefit equation, and unwisely intensifies and misrepresents low-probability dangers at the expense of higher-probability benefits.

Defenders of the precautionary principle—generally, and also recently in the context of various responses to the Covid-19 pandemic—respond that some of the low-probability possibilities are so catastrophic that the expected harm is still great.35 Even evidence of a small probability of a great danger may still represent a very large expected danger when we do the calculation of probability multiplied by consequences that any expected value calculation requires. And although the debates about the precautionary principle involve scientific estimates whose accuracy is contested and are beyond the scope of this book, the issue nevertheless stands as an example of how, in some contexts, quite low burdens of proof may well justify actions taken on their basis. The precautionary principle is based on the idea that not very much evidence, or not very strong evidence, might be sufficient to justify restrictions when the improbable but possible consequences of what the evidence indicates are sufficiently grave.

The Tyranny of Adjectives

The claims of former president Trump and his allies that there was widespread fraud in the 2020 presidential election were remarkable for many reasons. One of those reasons is that not only many commentators, and not only many political opponents of the president, but also many state and federal judges and election officials from both political parties concluded that the allegations were made with no evidence at all.36 Nebraska senator Ben Sasse’s December 30, 2020, observation in a long Facebook post—“If you make big claims, you had better have the evidence”—pretty much sums it up.

The rejection of Trump’s fraud claims both in courts of law and in the court of public opinion was often justified by the fact the allegations of fraud were based on a complete lack of evidence. But rarely is there so little evidence for a conclusion, and rarely are the objections to the evidentiary support for assertions by public authorities and officials so unqualified. More often, objectors to some conclusion insist that the conclusion has been made without “hard evidence,” “concrete evidence,” “conclusive evidence,” “solid evidence,” or “definitive evidence.” And the list goes on. Moreover, when the objections are characterized as a failure of “proof,” the implication is that “proof” is stronger than mere evidence, and that whatever evidence there might be does not rise to the level of proof.37

The claim that there is a lack of “conclusive evidence” or “definitive proof” for some conclusion typically implies—or concedes—that there is at least some evidence supporting the conclusion. You would not object to there being no conclusive evidence if you could object that there was no evidence at all. The objection that the evidence is not conclusive or definitive, and so on, is ordinarily a rhetorical device used to smuggle a high burden of proof into the evaluation of a contested evidentiary claim. And often those contested evidentiary claims are not claims about individual acts that may or may not have occurred, but instead are claims about the state of the evidence (usually scientific) for some general conclusion (often a general conclusion about causation). If we go back several decades, for example, when the debates about smoking causing lung cancer or heart disease were more contested than they are now, the tobacco companies often argued—in the face of some evidence that smoking caused lung cancer and heart disease—that the evidence was not conclusive, not definitive, not solid.38 More recently, the vaping industry has asserted that there is “no conclusive evidence” that vaping leads to smoking.39 Similarly, the beer, wine, and spirits industry claims there is “no conclusive evidence” of a link between moderate drinking and birth defects or fetal alcohol syndrome.40 And the website nintendoenthusiast.com insists that there is “no definitive proof” that video game use causes a decrease in time spent on employment-related tasks.41

Claims that some evidence is not sufficiently conclusive, definitive, persuasive, hard, concrete, or solid are claims that implicitly call for a specific burden of proof, typically one that is contested. But identifying this rhetorical phenomenon does not resolve what the burden of proof should be for the kinds of issues that arouse this kind of adjectival tyranny. And here again there is no avoiding the question about the relationship between the burden of proof and the consequences of finding that the burden of proof has been satisfied. The Blackstonian ratio supports the requirement of proof beyond a reasonable doubt in the criminal law because being imprisoned (or executed) is a pretty awful thing, making it important to get things right within the limits of reason and practicality. But it is also important to get things right in making policy-relevant attributions of causation, even if the desired ratio of false negatives to false positives need not be the same as it is in the criminal justice system. When Supreme Court Justices Scalia and Breyer argued about whether there was evidence that playing interactive video games with violent content (probabilistically) caused actual violent aggression in teenage males, what was at stake was a California law restricting such games—a law that sought to restrict activity protected by the First Amendment.42 And thus the burden of proof to justify the regulation, and therefore the burden of proof for evaluating the causal claims, was heightened in a way that it would not have been had the question been about the weight of the evidence necessary to regulate something like traffic or mining, neither of which is protected by the First Amendment or any other constitutional provision.

The video game example involves constitutional rights, but the same considerations apply whenever there is reason to impose a special burden on one side of an evidentiary dispute about causation or about the magnitude of some danger. But whether it be the degree of causation, the severity of danger, or any other policy-relevant question for which evidence is important, the larger lesson is that the burden of proof depends on what is at stake. Accordingly, the burden of proof is different when the stakes are different, even when the evidentiary question is the same. Parents deciding whether to allow their children to play violent video games need not be bound to the same heightened burden of proof as the state is when it is deciding whether to restrict the same activity, just as the local animal shelter need not be convinced that a suspected animal abuser is guilty beyond a reasonable doubt before refusing to hire that person to take care of the cats and dogs in its care.

It is worthwhile pausing over the previous point. Perhaps because of newspapers and television, which not surprisingly find the criminal law more interesting than civil lawsuits or employment decisions, the standards of the criminal law—particularly the ideas of a presumption of innocence and the necessity of proof beyond a reasonable doubt—are often assumed to apply whenever someone is accused of wrongdoing, even if the accusation takes place outside of the legal system and even if the conclusion that there was wrongdoing produces sanctions typically of lesser consequence than those administered by the legal system. But it should now be clear that the easy transposition of the standards of the criminal justice system to the full range of accusations of misconduct is too easy. This is true even when misconduct or wrongdoing is not the question. Perhaps, as has often been argued, the precautionary principle is misguided, often ignoring the benefits of risky technologies and imposing a conservative (in the nonpolitical sense of that term) bias on innovation. But for our purposes, the basic lesson of the precautionary principle is the lesson about the pervasiveness of questions about the burden of proof, a pervasiveness that makes the burden of proof relevant to any determination of evidence in the uncertain world in which we find ourselves trapped.

Once we understand that different decision-making environments may apply different burdens of proof to the same factual issue, as with the different burdens of proof in O. J. Simpson’s civil and criminal trials, we can recognize the common mistake of excess deference to the legal system. To take an unfortunately common example, suppose that some professional athlete has been accused of domestic violence or sexual misconduct. And suppose also that the accusation is taken seriously enough by law enforcement to warrant formal investigation and, sometimes, prosecution. But when the team or its owner or its coach or manager is asked what the team is going to do about the issue, a frequent response is that the team will decide what to do after the legal system has made its decision. Admittedly, there is some risk that early disciplinary action by the team, if sufficiently publicized, will taint a subsequent criminal trial. But there is also the risk that acquittal or non-prosecution in the criminal process will relieve the team of its responsibilities to decide who it wants to have on its team. People only 70 percent likely to have assaulted their spouses should not be imprisoned, but it is far from clear that people 70 percent likely to have assaulted their spouses should be retained as shortstops or quarterbacks.

A Long Footnote on Statistical Significance

Closely related to the question of burden of proof is the question, recently and perhaps surprisingly the subject of controversy, of the statistical significance of experimental or other empirical research.43 Just as the evidence emerging from various studies is too often described as not being conclusive or definitive, experimental and other scientific conclusions often are discounted as evidence because they lack statistical significance. But statistical significance is just a number—and an artificial threshold. The artificial threshold may serve valuable purposes in holding scientific studies to a high standard of reliability, but it often has the troubling side effect, in the same manner as inconclusive evidence, of discounting as evidence that which has evidentiary value even if it does not meet the high threshold.

In the natural and social sciences, statistical significance is the measure of the probability that some experimental result, or an even more extreme experimental result, would have been produced even if the null hypothesis—the hypothesis that there is actually is no effect—were true. In other words, what is the probability that what appears to be a positive result—A causes B, for example—would have been produced even if A did not cause B at all or if there were no relationship between A and B. And this likelihood is conventionally described in terms of a p-value, where the p-value is the probability that positive results—rejection of the “null hypothesis” that there is no connection between the examined variables—were produced by chance. Suppose we are examining the hypothesis that an observed ring-shaped mark was caused by Lyme disease. The null hypothesis would be that the cause of the ring-shaped mark was not Lyme disease. A p-value for some test of the relationship between an observed ring-shaped mark and actual Lyme disease that showed the existence of that relationship would thus be the probability that the result produced by the test was produced randomly—by chance. In recent years, a p-value of .05 or less has conventionally been taken in most experimental disciplines to be the threshold for statistical significance. A p-value of greater than .05—a greater than 5 percent probability that the same results would have been the result of chance—has been understood to mean that the results are not statistically significant.

Because the .05 threshold has the effect of branding large numbers of experimental outcomes as “non-significant,” which most lay people would interpret as “insignificant,” it has recently become controversial. Some researchers, reacting to reports of scientific conclusions that could not be replicated, have urged that the threshold for statistical significance be made even more conservative, perhaps by reducing it even to .005—a 1 in 200 chance of the same results being produced by chance—but much of the recent attention has been produced by claims that .05 is too high. And there have been related pleas simply to abandon the practice of describing results in terms of statistical significance at all.44

The impetus for this latter claim parallels the worry about the effects of demanding “conclusive” or “definitive” evidence. Just as inconclusive or even weak evidence may still be evidence, and may still be useful evidence for some purposes, so too might conclusions—rejections of the null hypothesis—that are more than 5 percent likely to have been produced by chance still be valuable, depending on what follows from those conclusions. Suppose that a trial of an experimental drug indicated that it could cure a previous incurable fatal disease, but that the p-value of that trial, given the sample size, was .20. If you suffered from that disease, and if no other treatments were available, and if no other trials were in the offing, would you want to use the drug, even recognizing that there was a 20 percent chance that the rejection of its ineffectiveness was purely a matter of luck? I suspect that most people would say yes, recognizing that an 80 percent likelihood that the rejection of ineffectiveness was still good enough, at least if there were no other alternatives.

Plainly there is a difference between whether you would take the drug and the question whether the study should be published in a reputable journal, be eligible for grant funding, be approved by federal authorities for prescription and sale, or count for tenure for the lead researcher at a university. And that is the point. By setting a statistical significance threshold at what is too often presented as a context-independent .05, the scientific community has presumed that a threshold for evidence being good enough for publication, grant funding, or tenure is the applicable threshold for all purposes. Some of the objectors to the emphasis on statistical significance wisely say that it would be better simply to accompany any experimental finding with a report about the likelihood of the results being produced by chance, and discard the notion—traditionally packaged in the language of statistical significance—that what is not good enough for certain undeniably legitimate purposes within the scientific community is not good enough for anything.

The theme that runs through the ideas of being slightly guilty, the precautionary principle, the concern about the misuse and overuse of the term “conclusive,” and, now, of statistical significance, is that evidence comes in degrees. Stronger evidence is better than weaker evidence, but weaker evidence is still evidence. And in some contexts, and for some purposes, weaker evidence may be good enough.