Chapter 6Testing Testimony

IN ATTEMPTING TO GUARD against lies and lying, the legal system has long relied, in part, on the oath that witnesses are required to take before testifying. This familiar oath, traditionally sworn on a Bible, purports to oblige the oath-taker to tell the truth, the whole truth, and nothing but the truth. Obviously, the oath is an attempt to prevent lying, and in Chapter 7 we will look at lies, lying, and liars in more detail. An initial question, however, is whether oath-taking provides the kind of assurance that the legal system intends it to provide. And if it does, which as we will see is hardly obvious, does the same apply across the entire range of testimonial statements? We need to look, therefore, both at whether swearing-enhanced or oath-enhanced statements have greater value as legal evidence than those without such purported credibility enhancements, and also whether the same conclusions (or doubts) apply to oaths and swearings outside of the courtroom?

If this were a book of theology, there would be much to say about the value of the oath. Some of that would be about someone’s special obligation to God to tell the truth after having sworn unto God to do so. And some would be about what God does to people who lie when they have sworn not to. But this is not a book of theology. Nevertheless, there remains the sociological and nontheological question of the (empirical) extent to which people do or did believe in the religious character and power of the oath. Sociologically and not theologically, we know that many people do believe that they will suffer in the afterlife, if not sooner, should they tell a lie after swearing to tell the truth. Of course, I have no way of knowing whether this widespread belief is correct and no way of knowing whether post-death punishment is what actually happens to oath-taking liars. Nor do most readers of this book. Still, some people believe it now, and, importantly, many more people believed it in the past, when the formal oath was first developed. Capitalizing on this belief in divine retribution, the legal system developed the oath as a way of increasing the likelihood of honesty and perhaps even accuracy in trial testimony.1

Although the percentage of people who believe that lying under oath will sentence them to eternal agony in the fires of hell has decreased, the oath persists. And it persists not only for reasons of tradition. It persists in part because in many contexts a lie under oath creates a risk of serious legal consequences. It is true that prosecutions for perjury are rare, largely because criminal prosecutions for perjury require the prosecutor to prove beyond a reasonable doubt that the witness knew their testimony was false.2 But although perjury prosecutions are difficult and infrequent, they are not unheard of. Moreover, they are often highly publicized, as with former president Donald Trump’s friend Roger Stone, whom the president pardoned after he was sentenced to forty months’ imprisonment for lying under oath to Congress.3 And equally well publicized are related crimes such as lying to federal officials, which is what got Martha Stewart sent to prison, and noncriminal sanctions for lying under oath, which is what got former President Bill Clinton impeached (but not convicted) and suspended from the practice of law for five years.4 And if even these sanctions are uncommon, we know from the research on heuristics and biases, and as the purveyors of airline crash insurance know as well, that people tend to exaggerate the frequency of unlikely catastrophic events.5 Consequently, the threat of a perjury prosecution or other official sanction may have a deterrent effect that is greater than the statistical chances of prosecution and conviction would indicate. To the extent that this is so, witnesses in court, reminded that they are under oath, might plausibly tend to tell the truth more than their own moral compass or even self-interest would otherwise suggest.

The phrase “might plausibly” reflects the fact that there is only limited research on the precisely specified question whether oath-taking has a causal effect on the likelihood that witnesses will testify honestly.6 Plainly there are people who will tell the truth even absent an oath. Or at least we hope so. And equally plainly there are people who will lie even when under oath. That’s too bad, but there may not be much that can be done about it. But that leaves a third category, probably the one of greatest importance when we are thinking about the oath, and likely smaller than either of the previous two categories. And this the category of people who will tell the truth when sworn to do so, but would otherwise lie when it suited or benefited them.7 There is some indication from low-stakes experiments that explicit reminders of obligations to tell the truth, or explicit promises to tell the truth even if not under oath in a technical sense—such as the Ten Commandments’ prohibition of lying—have at least some ability to reduce the incidence of lying.8 But because there is not much other research on the question, and because research on the effects of reminders about the Ten Commandments appear not to distinguish the religious aspects of the Ten Commandments from nonreligious aspects of honesty reminders, we lack a substantial body of empirical evidence on the extent to which, if at all, oaths themselves are truth-promoting, whether in court or out.

Even apart from perjury prosecutions, and even apart from the risk of eternal damnation, the oath, it is said, serves as a reminder of the seriousness—or solemnity, as it is often put—of the proceedings, and thus of the importance of veracity in those proceedings. But although this purported virtue of the oath is often touted, as with the court that announced that “those who have been impressed with the moral, religious or legal significance of formally undertaking to tell the truth are more likely to do so,” there is hardly any research indicating whether formal oath-taking, as opposed to less formal reminders, does or does not serve this purpose.9

Whether the oath successfully increases the likelihood of honesty in the formal legal process is an important question, but the testimony we are considering more broadly is hardly limited to judicial proceedings. Nor are oaths. Variations on the oath are ubiquitous in everyday life. When someone says, “I swear to God, that’s what I saw,” they seem to be suggesting that however casual they might normally be about the truth, this statement is different. And so too with the large range of equivalents that also seem to have similar quasi-religious origins. “I swear on my mother’s grave.” “Swear to God, hope to die.” “Cross my heart.” Or, even more simply, “I swear.”

Similar assertions rely less on religious traditions than on conceptions of honor. “Do you give me your word?” Or, unprompted, “I give you my word on this.” “You can take my word for it.” “As an officer and a gentleman, I give my word.” And so on.10 Similarly, blanket honor codes, such as those in force at the military academies and some of our older colleges and universities, purport to impose an honor-enforced (and sanction-enforced) obligation not to “lie, cheat, or steal, or tolerate those who do.”11

Honor codes aside, assertion-specific oaths are in some sense curious. When testifiers add these kinds of self-endorsements to their statements, are we then expected to be more skeptical of statements made without them? Is making a statement shorn of “I swear to God” or its equivalents the same as crossing your fingers behind your back, and thus not to be relied on? Or have “I swear to God” and “You have my word” become not much more than throat-clearing, adding little, if anything, to the confidence that the hearer would otherwise have in the veracity of the statement? This hypothesis seems likely, consequently supporting the conclusion that although it would be good to have some way of testing testimony before relying on it, the fact of the testifier giving or not giving an oath, formal or casual, is hardly likely to be a very effective test. Testimony is often good evidence, but it is doubtful that oaths, especially those made outside of formal legal settings, make it very much, if at all, better.

Perry Mason and the Art of Television Cross-Examination

Perry Mason, the fictional defense attorney first created in a series of Erle Stanley Gardner mystery novels and then the eponymous central figure in three different television series starting in the 1950s, was a master of cross-examination. In the typical episode, Mason’s client would have been wrongly accused of some horrific crime, usually murder. At the preliminary hearing, or at the trial, Mason would aggressively cross-examine one of the prosecution’s witnesses, and under this intense cross-examination the witness would confess that it was he, and not the defendant, who had committed the crime. And occasionally Mason’s cross-examination was so effective in pointing to the truth and away from the guilt of the defendant that some member of the spectator’s gallery, one who was not even a witness, would stand up in the middle of the trial and, wracked by guilt, blurt out that he (or, occasionally but rarely, she) had actually committed the crime.

All this was fiction. And not just because the stories were fictional. It was fiction because the image of Perry Mason verbally bludgeoning a witness into a confession painted a dramatically unrealistic picture of the nature and effectiveness of cross-examination in testing the truth of testimonial evidence. In reality, witnesses who lie while testifying continue to lie under cross-examination, and witnesses who are simply mistaken reiterate those mistakes when cross-examined. As anyone who has conducted a cross-examination knows, Gardner and the television script writers had the distinct advantage of being able to write the answers as well as the questions.12 But rarely are real witnesses so cooperative, and actual cross-examination is more commonly a mix of witness stubbornness, compounded uncertainty, and lawyer statements that are as often forms of testimony as they are genuine questions.

The alleged virtues of cross-examination as revealing the truth and exposing falsity received a ringing endorsement a century ago from John Henry Wigmore, at the time the leading scholar of the law of evidence in the United States and perhaps in the entire common-law world. “Cross-examination,” Wigmore announced, is “beyond any doubt the greatest legal engine ever invented for the discovery of truth.”13 It is not clear, however, that Wigmore’s statement would stand up to the very cross-examination he lauded, and so we should look more carefully at cross-examination as a truth-producing procedure.

In examining the role of cross-examination as an aid to evaluating the truth of testimonial statements, we need to focus, as Wigmore did, on genuine cross-examination. And in doing so, we should exclude the performances, sometimes misleadingly described as cross-examination, that we see with some regularity in congressional hearings. Whatever other purposes such spectacles may serve, haranguing a witness with some combination of questioner-supplied facts and adjective-laden accusations is hardly, an “engine for the discovery of truth.”14 As anyone who has ever moderated a public event knows all too well, there is a difference, often ignored by questioners, between asking a question and making a stump speech, and the typical public legislative or administrative agency hearing contains a great deal of the latter and not much of the former. Indeed, much the same can be said about far too many press conferences, events in which multi-part questions by self-important reporters are followed by vacuous nonanswers from the official at the podium.15

When cross-examination operates at its best, it does not, to repeat, lead testifiers to recant their testimony, Perry Mason notwithstanding.16 And when it operates at its worst, as the sorry history of abusive cross-examination of rape victims shows, it can impede the search for truth, either by casting unjustified doubts on the testimony of witnesses who are telling the truth, or by discouraging such witnesses from even being willing to testify in the first place.17 Effective and non-abusive cross-examination, though, can elicit information that the testifier might have an interest in not disclosing. It can sometimes, especially by exposing inconsistencies, elicit information by which the receivers of testimony—jurors, prototypically, but many others in many other contexts—can evaluate the honesty and reliability, the credibility, of the testifier.18 It can reveal sources of bias, conflicts of interest, or simply some reason for believing that the testifier prefers one answer or outcome to another—a matter of some interest if one accepts what the research tells us about motivated reasoning, the topic of Chapter 13. It can also help the evaluator of testimony determine the basis for the testifier’s perception. And it can, by the use of follow-up questions, often supply valuable clarification of what may initially have appeared imprecise. If a factual assertion—testimony—is to be useful as evidence, and if and when there is reason to be skeptical about the reliability of such assertions, cross-examination in the broadest sense may supply a useful form of assessment. Such cross-examination need not resemble Perry Mason’s, or even real cross-examinations. It may instead be simply the process of taking an assertion as an opportunity to engage the asserter in further clarification, elaboration, or qualification. And thus—as the Internal Revenue Service was grudgingly compelled to acknowledge in its dispute with George M. Cohan—when cross-examination provides the opportunity to scrutinize testimony more carefully, testimony often can supply a useful form of evidence.

Calibrating Testimony

Many people rely on Tripadvisor, Yelp, and related internet resources to choose restaurants, hotels, contractors, and much else. And although the idea of relying on reviews is hardly new, what these services offer that traditional published reviews do not is not only the aggregation of multiple reviews, but also easy access to the reviewing history of each reviewer. Aficionados knew, of course, how to interpret a movie review by the legendary New Yorker film critic Pauline Kael or a theater review by the equally legendary Brooks Atkinson of the New York Times, just as they know how to evaluate a Times restaurant review by Pete Wells now or Ruth Reichl in the recent past. But Tripadvisor and its ilk make the practice far easier. And they do so by allowing the reader of the online reviews, with little more than a click or a keystroke, to inspect all of the past reviews of each reviewer.

When the reader of a review consults the reviewing history of the reviewer, the reader is given the ability to calibrate a particular review, just as we calibrate (or should) when we add pounds to the reading on a scale that we know reads low and subtract pounds from a reading on a scale that prior experience tells us reads high. So too with the hunters who aim lower than where the gunsight tells them to aim, knowing from past experience that the gunsight leads them to miss high. Before there was very much grading on a curve, and certainly before there were the mandatory curves now current at many colleges and universities, word-of-mouth—itself a form of evidence—told us who were the tough graders and who could provide an easy A. Letters of recommendation are treated similarly when we deliberate over appointments, admissions, hiring, and so on. If we receive multiple letters over time from the same recommenders, recommending different people, we learn that certain recommenders say over-the-top nice things about everyone, and we discount accordingly.19 And other recommenders are just the opposite, barely having much good to say even about those they recommend. And for those recommenders we inflate. All of this is calibration, and it is not just a question of inflating or deflating. When a restaurant reviewer on Tripadvisor is revealed to have a history of complaining about small portions or too much spice, I know how to evaluate the evaluator, and act accordingly. Similarly for the evaluations coming from evaluators who expect every restaurant to cater to their unreasonable demands. If the demands are described, which they usually are, I can determine which ones are unreasonable, even if an evaluator rarely sees it that way, and then proceed to discount or ignore the ratings of this particular reviewer.

The idea of calibration is applicable to the entire realm of evidence, but it is especially relevant to the use of testimony as evidence.20 As we will explore in Chapter 7, suppliers of testimony often have an interest in lying, and it would be useful to have some way of minimizing lying and identifying the liars. In Chapter 8 we will look at honest mistake, which is plainly less morally and legally culpable than lying but no less an impediment to treating testimony as reliable evidence. Yet even in the absence of lies or mistakes, testimony is subject to so many shadings, fudgings, hedgings, twistings, embellishments, inflations, deflations, and various other forms of distortion that calibration can be a valuable way for the hearer to evaluate the epistemic worth of what the speaker is saying or what the writer is writing.

One form of calibration that has an especially ugly history is calibration based on the nonindividualized attributes of the testifier. There are many such nonindividualized or group attributes, but the ones most justifiably notorious are those that are based on race, religion, ethnicity, national origin, and gender. Even apart from the exclusions that were based in one way or another on slave status, non-enslaved African Americans were often officially precluded from serving as witnesses in court, as were Indians and sometimes those of Chinese descent.21 Even when such official exclusions were not in force, informal exclusions of members of the same racial groups were common.22 Moreover, and especially relevant here, even in the absence of formal or informal exclusions we had (and still have) the prevalence of what Miranda Fricker has aptly referred to as “credibility deficits.23 People tend to be believed or not believed based on various attributes they possess, even when such attributes have no value in predicting accuracy or honesty.24 And race has long been foremost among these spuriously predictive attributes.25

When it comes to gender, the history is different but no better. Unlike the situation in antiquity, unlike in England in the earliest years of the common law, and unlike in some systems of religious law, women have not generally been precluded from serving as witnesses in modern secular legal systems. Women have, however, long been assumed to be less credible than male witnesses, although there is no evidence to support the assumption.26 But relying on psychological surveys that would now be considered as methodologically laughable as they are morally offensive, John Henry Wigmore, in his Principles of Judicial Proof, first published in 1913 and republished with no change on this point in 1931, claimed that women are more likely than men “to confuse what they have really observed with what they have imagined or wished to occur,” and more likely to “fall below [men] in candor and honesty.”27

Of even greater and more lasting consequence is the fact that women reporting rape and other forms of sexual assault have been thought, again with no supporting evidence, to be more prone to exaggerating or fabricating accusations of sexual assault than other crime victims are to exaggerate or fabricate the crimes they report. As recently as 1975, the California standard (and thus official and mandatory) jury instructions included the so-called cautionary instruction, by which jurors in rape cases were to be told by the judge that “a charge such as that made against the defendant is easily made, and, once made, difficult to defend against, even if the person accused is innocent. Therefore, the law requires that you examine the testimony of the female person named in the information with caution.”28 And this instruction, with its roots in the seventeenth-century writing of the English judge Sir Matthew Hale, was based in large part on the widely believed fiction, just noted, that women were more likely to imagine nonexistent rapes than other people were to imagine other nonexistent crimes against them.29

Modern law has attempted to alleviate most of the formal and some of the informal ways in which statistically irrelevant group attributes have been used to discount the courtroom testimony of women and members of racial and other marginalized groups. But that is not to say that such barriers to the accurate assessment of testimony do not persist in legal and non-legal contexts. An entire philosophical program under the name of “epistemic injustice” is focused on addressing many of the problems just described, and many others similar to them.30 It is important, however, to distinguish the epistemic dimensions of epistemic injustice from those dimensions of testimony and credibility that are unjust but possibly not epistemically so. If there are attributes of classes of people that do predict honesty or dishonesty, or reliability or unreliability, using those attributes to attach an epistemic enhancement or to impose an epistemic discount would not be epistemically problematic even though, depending on the nature of the class, it might still be unjust for other reasons.31

Although I know of no racial, ethnic, religious, or gender classes that would fit the foregoing description, other types of classes might be different. The fact that there are laws against age discrimination in employment, for example, and properly so, does not mean that age-based memory impairment is illusory. Even for older adults with no signs of Alzheimer’s disease or other forms of dementia, normal aging is highly predictive of at least some memory loss.32 As a result, being more skeptical of testimonial recollections by people above a certain age—treating those testimonial recollections as weaker evidence, all other things being equal—is not epistemically irrational, even though it might be unjust (and thus avoided or prohibited despite its epistemic rationality) in marginalizing people already marginalized in other ways. And although we cannot choose our ages and thus have only limited ability to control age-related memory weakening, much the same may apply with respect to those attributes that are chosen rather than being beyond the control of the individual. There is no research on whether people who lie for a living—professional poker players, for example, and some telemarketers and telephone solicitors—are less honest than others in other aspects of their lives, but it is not inconceivable that a casual approach to honesty might spill over from one area of a person’s life to other areas. The same applies to enhancements rather than discounts. People who are trained in careful visual observation—some police officers, some security guards, members of the military trained in identifying enemy aircraft—might be assumed to be less likely than the rest of us to makes mistakes in perception, and so their testimony about what they saw might be taken as more credible evidence, again all other things being equal, than would the same testimony from someone without that training.

At this point we have shifted to a somewhat broader sense of calibration, and a sense that is different from the testifier-specific calibration that Tripadvisor and Yelp enable with their reviewer histories. The larger point is that when people are evaluating a testimonial statement for its worth as evidence—its probative value, in legalese—they often calibrate that statement based on attributes of some category of which the testifier is a member. Sometimes that calibration will be empirically justified, and sometimes it will not be. Sometimes that calibration will involve a non-epistemic injustice, and sometimes it will not. But even when the calibration is not based on putting the testifier into some larger category, we can still calibrate based on that testifier’s own prior testimony. That is what Tripadvisor and Yelp allow us to do. That is what we do when we inflate the testimony about ability from a tough grader (in the precise sense of grading an examination or a term paper or in the less precise context of offering a recommendation) and discount it from an easy one. That is what American law does when it permits “secondary” testimony of various kinds about the likely honesty of the primary testifier.33 And that is also, if the story of George Washington and the cherry tree were true, which it almost certainly is not, what those who told the story asked the public to do in using a past example of self-sacrificing honesty as evidence of the truth of what Washington was now saying, and of what he might say in the future.

Oath-taking, cross-examination, and various forms of calibration all can help us assess the value of testimony as evidence, although they are primarily focused on oral testimony. But all these devices and strategies are general, in the sense of spanning the entire range of dimensions on which we might evaluate the worth of an item of testimony. More specifically, however, our concerns about the value of testimony are typically of two varieties. One is the possibility that the testifier is lying. And the other is that the testifier is honestly mistaken. These are the specific worries about testimony as evidence that are addressed in Chapters 7 and 8.