BAYESIAN THINKING IN SCIENCE AND IN COURT

Recall from Part One the idea of Bayesian probability, in which you can modify or update your belief about something based on new data as it comes in, or on the prior probability of something being true—the probability you have pneumonia given that you show certain symptoms, or the probability that a person will vote for a particular party given where they live.

In the Bayesian approach, we assign a subjective probability to the hypothesis (the prior probability), and then modify that probability in light of the data collected (the posterior probability, because it’s the one you arrive at after you’ve conducted the experiment). If we had reason to believe the hypothesis was true before we tested it, it doesn’t take much evidence for us to confirm it. If we had reason to believe the hypothesis unlikely before we tested it, we need more evidence.

Unlikely claims, then, according to a Bayesian perspective, require stronger proof than likely ones. Suppose your friend says she saw something flying right outside the window. You might entertain three hypotheses, given your own recent experiences at that window: It is a robin, it is a sparrow, or it is a pig. You can assign probabilities to these three hypotheses. Now your friend shows you a photo of a pig flying outside the window. Your prior belief that pigs fly is so small that the posterior probability is still very small, even with this evidence. You’re probably now entertaining new hypotheses that the photo was doctored, or that there was some other kind of trickery involved. If this reminds you of the fourfold tables and the likelihood that someone has breast cancer given a positive test, it should—the fourfold tables are simply a method for performing Bayesian calculations.

Scientists should set a higher threshold for evidence that goes against standard theories or models than for evidence that is consistent with what we know. Following thousands of successful trials for a new retroviral drug in mice and monkeys, when we find that it works in humans we are not surprised—we’re willing to accept the evidence following standard conventions for proof. We might be convinced by a single study with only a few hundred participants. But if someone tells us that sitting under a pyramid for three days will cure AIDS, by channeling qi into your chakras, this requires stronger evidence than a single experiment because it is farfetched and nothing like it has ever been demonstrated before. We’d want to see the result replicated many times and under many different conditions, and ultimately, a meta-analysis.

The Bayesian approach isn’t the only way that scientists deal with unlikely events. In their search for the Higgs boson, physicists set a threshold (using conventional, not Bayesian, statistical tests) 50,000 times more stringent than usual—not because the Higgs was unlikely (its existence was hypothesized for decades) but because the cost of being wrong is very high (the experiments are very expensive to conduct).

The application of Bayes’s rule can perhaps best be illustrated with an example from forensic science. One of the cornerstone principles of forensic science was developed by the French physician and lawyer Edmond Locard: Every contact leaves a trace. Locard stated that either the wrongdoer leaves signs at the scene of the crime or has taken away with her—on her person, body, or clothes—indications of where she has been or what she has done.

Suppose a criminal breaks into the stables to drug a horse the night before a big race. He will leave some traces of his presence at the crime scene—footprints, perhaps skin, hair, clothing fibers, etc. Evidence has been transferred from the criminal to the scene of the crime. And similarly, he will pick up dirt, horsehair, blanket fibers, and such from the stable, and in this way evidence has been transferred from the crime scene to the criminal.

Now suppose someone is arrested the next day. Samples are taken from his clothing, hands, and fingernails, and similarities are found between these samples and other samples taken at the crime scene. The district attorney wants to evaluate the strength of this evidence. The similarities may exist because the suspect is guilty. Or perhaps the suspect is innocent, but was in contact with the guilty party—that contact too would leave a trace. Or perhaps the suspect, quite innocently, was in another barn, interacting innocently with another horse, accounting for the similarities.

Using Bayes’s rule allows us to combine objective probabilities, such as the probability of the suspect’s DNA matching the DNA found at the crime scene, with personal, subjective views, such as the credibility of a witness, or the honesty and track record of the CSI officer who had custody of the DNA sample. Is the suspect someone who has done this before, or someone who knows nothing about horse racing, has no connection to anyone involved in the race, and has a very good alibi? These factors help us to determine a prior, subjective probability that the suspect is guilty.

If we take literally the assumption in the American legal system that one is innocent until proven guilty, then the prior probability of a suspect being guilty is zero, and any evidence, no matter how damning, won’t yield a posterior probability above zero, because you’ll always be multiplying by zero. A more reasonable way to establish the prior probability of a suspect’s innocence is to consider anyone in the population equally likely. Thus, if the suspect was apprehended in a city of 100,000 people, and investigators have reason to believe that the perpetrator was a resident of the city, the prior odds of the suspect being guilty are 1 in 100,000. Of course, evidence can narrow the population—we may know, for example, that there were no signs of forced entry, and so the suspect had to be one of fifty people who had access to the facility.

Our prior hypothesis (a priori in Latin) is that the suspect is guilty with a probability of .02 (one of fifty people who had access). Now let’s suppose the perpetrator and the horse got in a scuffle, and human blood was found at the scene. Our forensics team tells us that the probability that the suspect’s blood matches the blood found at the scene is .85. We construct a fourfold table as before. We fill in the bottom row under the table first: The suspect has a one in fifty chance to be guilty (the Guilty: Yes column), and a forty-nine in fifty chance to be innocent. The lab told us that there’s a .85 probability of a blood match, so we enter that in the upper left: the probability that the suspect is guilty and the blood matches. That means the lower left cell has to be .15 (the probabilities have to add up to one). The .85 blood match means something else: that there’s a .15 chance the blood was left by someone else, not our suspect, which would absolve him and render him not guilty. There’s a .15 chance that one of the people in the right-hand column will match, so we multiply 49 × .15 to get 7.35 in the upper right cell. We subtract that from the forty-nine in order to find the value for the bottom right cell.

Suspect Guilty

YES

NO

Blood Match

YES

0.85

7.35

8.2

NO

0.15

41.65

41.8

1

49

50

Now we can calculate the information we want the judge and jury to evaluate.

P(Guilty | Match ) = .85/8.2 = .10
P(Innocent | Match) = 7.35/8.2 = .90

Given the evidence, it is about nine times more likely that our suspect is innocent than guilty. We started out with him having a .02 chance of being guilty, so the new information has increased his guilt by a factor of five, but it is still more likely that he is innocent.

Suppose, however, some new evidence comes in—horsehair found on the suspect’s coat—and the probability that the horsehair belongs to the drugged horse is .95 (only five chances in one hundred that the hair belongs to a different horse). We can chain our Bayesian probabilities together now, filling out a new table. In the bottom margin, we enter the values we just calculated, .10 and .90. (Statisticians sometimes say that yesterday’s posteriors are today’s priors.) If you’d rather think of these numbers as “one chance in ten” and “nine chances in ten,” go ahead and enter them as whole numbers.

Suspect Guilty

YES

NO

Blood Match

YES

0.95

0.45

1.4

NO

0.05

8.55

8.6

1

9

10 

We know from our forensics team that the probability of a match for the hair sample is .95. Multiplying that by one, we get the entry for the upper left, and subtracting that from one we get the entry for the lower left. If there is a .95 chance that the sample matches the victimized horse, that implies that there is a .05 chance that the sample matches a different animal (which would absolve the suspect) so the upper right-hand cell is the product of .05 and the marginal total of 9 = .45. Now when we perform our calculations, we see that

P(Guilty | Evidence ) = .68

P(Evidence | Guilty) = .95

P(Innocent | Evidence) = .32

P(Evidence | Innocent) = .05

The new evidence shows us that it is about twice as likely that the suspect is guilty as that he is innocent, given the evidence. Many attorneys and judges do not know how to organize the evidence like this, but you can see how helpful it is. The problem of mistakenly thinking that P(Guilty | Evidence) = P(Evidence | Guilt) is so widespread it has been dubbed the prosecutor’s fallacy.

If you prefer, the application of Bayes’s rule can be done mathematically, rather than using the fourfold table, and this is shown in the appendix.