So you want to go to law school? Or to win arguments with people who went to law school? Then you’ll need to think clearly about how to assess evidence.
Economists can help, because we assess evidence all the time. How sure are you that a proposed economic policy can accomplish its goals? Sure enough to recommend the policy? How sure are you that a defendant is guilty? Sure enough to convict? These questions have a lot in common.
Start here.
Here I have a couple of urns. The one on the left contains 70 white balls and 30 black. The one on the right contains 30 white and 70 black.
While you weren’t looking, I reached into one of these urns and randomly drew out a dozen balls. (I flipped a fair coin to choose the urn.) As you can see, 4 of those balls were white and 8 were black.
If you had to guess, which urn would you guess I drew from?
a) What’s your estimate of the probability you’re right?
b) Do you think you’re right beyond a reasonable doubt?
SOLUTION:
It’s roughly 98 percent probable that I drew from the right urn. To make that a little more graphic, suppose you had the opportunity, on the first of every month, to place a bet that’s as close to a sure thing as this one is. Then you’d lose your bet only about once every four years or so. By the standards that are ordinarily employed in courtrooms, that’s quite comfortably beyond a reasonable doubt.
The decision theorist Howard Raiffa once posed this problem (with some minor changes) to a group of lawyers at a cocktail party and was surprised when one of them exclaimed, “I bet you drew from the left-hand urn.” Another lawyer was quick to correct him: “No, you got confused. The drawing was eight blacks and four whites, not the other way around.”
“I know,” said the first lawyer, “but in my experience at the bar, life is just plain perverse, so I would bet on the left-hand urn! But I am not really a betting man.”
The other lawyers all agreed that this was not a very rational thing to do—that the evidence was in favor of the right-hand urn.
But by how much? They discussed it for a while and decided that the evidence was pretty meager—the odds might have gone up from 50–50 to about 55–45. They also agreed that because lawyers should cultivate the habit of being skeptical, they’d be inclined to slant their judgments downward and act as if the odds were roughly 50–50.
But as I’ve told you, the correct answer is about 98 percent, so that the balls were drawn from the right-hand urn beyond a reasonable doubt. As Raiffa observed, this story points out that most people vastly underestimate the power of a small sample. The lawyers described above had an extreme reaction, but even statistics students tend to cluster their guesses around 70 percent.
Is 98 percent really beyond a reasonable doubt? That’s ultimately a subjective question, but given a reasonable interpretation of what reasonable means, my own subjective answer is a pretty emphatic yes. There’s not much in life that we can be more than 98 percent sure of.
A world in which the reasonable-doubt cutoff is set at 98 percent is a world where it’s almost impossible to convict anyone for anything, and consequently a world with a lot of crime. On the other hand, a world in which the reasonable-doubt cutoff is only, say, 50 percent is a world with a lot of false convictions. I would not want to take the risk of living in either of those worlds. Somewhere in between is, if not a comfortable medium, then at least the most comfortable of all possible mediums.
So if I were on trial for the crime of drawing from the right urn, I hope this evidence would be strong enough to convict me. If you’re unwilling to convict on this evidence, then you’re ipso facto willing to free forty-nine guilty men before you’ll convict a single innocent. According to the frequently cited eighteenth-century legal scholar William Blackstone, it’s better that ten guilty men escape than that one innocent suffer. To let forty-nine guilty men escape is to go far above and beyond this standard.*
In fact, empirical evidence suggests that for real-world juries, the cutoff for reasonable doubt is somewhere around 74 percent. (That is, juries tend to convict in cases where independent observers, seeing the same evidence the jury saw, estimate the subjective probability of guilt to be anything more than about 74 percent, or perhaps a bit less.)
I think I’d be pretty comfortable with a 74 percent standard if I thought I lived in a world where all police and prosecutors could be counted on not to take advantage of that standard to falsify evidence against people they don’t like. In the world I actually inhabit, I think I’d prefer to set the bar a little higher, though certainly not as high as 98 percent. To my knowledge, no serious scholar has ever defended a bar that high. So once again I think we can safely say that 98 percent counts as well beyond a reasonable doubt.
With those preliminaries out of the way, let’s enter the courtroom.
One Tuesday afternoon a particularly heinous murder is committed in Manhattan, and police seal off all the bridges and tunnels to ensure that the culprit remains on the island. Based on an incontrovertible combination of DNA evidence and eyewitness testimony, officers are told to be on the lookout for a suspect with naturally purple hair. An hour or so later, a police officer notices a purple-haired man named Nathan standing at a bus stop and detains him. Tests soon prove that Nathan’s hair is, indeed, naturally purple, and solely on this basis he is held for trial. At the trial, expert testimony establishes that naturally purple hair is extremely rare—in fact the odds against it are a million to one.
a) Based on these facts, is Nathan guilty beyond a reasonable doubt?
b) Oops. Did I say the odds against naturally purple hair are a million to one? I meant to say a
billion
to one.
Now
is Nathan guilty beyond a reasonable doubt?
SOLUTION:
a) On a weekday afternoon, there are about 4 million people in Manhattan. If the odds against naturally purple hair are a million to one, then about 4 of those people will have naturally purple hair. Any of them is as likely as Nathan to be the culprit, which means there’s about a 25 percent chance he’s guilty. You should let him go free (though you might want to keep an eye on him).
b) If the odds against naturally purple hair are a
billion
to one, then the probability that there’s even one possible culprit besides Nathan is about 4 million divided by a billion, or about 4/10 of 1 percent. That makes it at least 99.6 percent certain that Nathan is the murderer. (It’s actually more than that, because even in the highly unlikely event that Nathan
does
have a purple-haired doppelganger, he could still be guilty.) That’s damned near certain.
The moral of this problem is that you cannot assess evidence without weighing it against relevant background information. (In this case, the relevant background information is that there are 4 million people in Manhattan.) This is a moral that runs at large: it applies in the courtroom, it applies when you’re analyzing economic data, and it applies in the doctor’s office.
Your employer requires you to take an HIV test, and you’ve just gotten the results. The bad news is that according to the test, you’re infected. The good news is that regardless of the patient’s HIV status, every test result has a 5% chance of being wrong. So there’s a 5 percent chance you’re okay, right?
SOLUTION:
Wrong. The relevant background information is that most people—let’s say 99 percent of your demographic group—are uninfected. So you’re probably uninfected too. Even though the test is wrong only 5 percent of the time, odds are that this is one of those times.
In fact, the probability that you’re uninfected is about 84 percent. Why 84 percent? In a population of 100,000 people, we’ve assumed that just 1 percent—that is, 1,000—are infected. Of the 1,000 who are infected, 95 percent get accurate (and grim) test results. Of the 99,000 who are uninfected, 5 percent, or 4,950, get
in
accurate results that say they’re infected. That makes 950 + 4,950 = 5,900 people who got bad news, and of those 5,900, only 950, or 16 percent, are actually infected. The other 84 percent are just fine
.
It’s all a matter of weighing the evidence. The test result is evidence that you’re infected. But the fact that most people are uninfected is evidence that you are too. Both bits of evidence are relevant, and it would be wrong to ignore either of them.
If you’re skeptical, try a starker example: Suppose you know you have a rare gene that renders you absolutely immune to viruses. Then surely you are entitled to laugh off the results of the HIV test, no matter how dire they appear. The test cannot trump the background information that you’re uninfected. And similarly, no test can completely trump any background information, including the background information that most people are not HIV positive.
For that matter, if you ever have your IQ tested and manage to score 300, you’re probably smart enough to know that it’s extremely unlikely that the test is accurate, because you’ve got enough background knowledge to know that mis-scored IQ tests are a lot more common than IQs of 300.
Now back to the courtroom.
At the tail end of a great battle, a foot soldier is seen lobbing a grenade toward the tent that serves as headquarters for his commanding officer. Sometime later, parts of the officer’s body are found scattered around the area.
In his defense, the soldier is able to prove that the grenade in question was the notorious M16.5, which explodes only half the time. Besides, a lot of other grenades went off around that tent, and the officer might have been dead long before the incident.
But the prosecution is able to prove that
if
the grenade went off, then there’s a 90 percent chance that it killed the officer.
How likely is the soldier to be guilty of murder? That is, how likely is it that his grenade went off and killed the officer?
SOLUTION:
Well, let’s see. There’s a 50 percent chance that the grenade went off, and
if
it went off, there’s a 90 percent chance that it killed the officer. So the chance that it went off
and
killed the officer is 50 percent times 90 percent, or 45 percent, right?
Wrong. Yet again, you can’t assess evidence without accounting for relevant background information. In this case, the relevant background information is that
the officer is dead.
Although the grenade goes off only half the time, the officer’s being dead makes it particularly likely that this is one of those times. The naive calculation fails to account for that.
So we have to approach this a different way. Start by observing that there are three ways the officer could have died:
a) The soldier’s grenade went off and killed the officer.
b) First another grenade killed the officer, then the soldier’s grenade went off.
c) First another grenade killed the officer, then the soldier’s grenade failed to go off.
We’re told that the soldier’s grenade goes off exactly half the time. That makes (b) and (c) equally likely. We’re also told that
if
the soldier’s grenade went off, it’s 90 percent likely to have killed the officer. In other words, (a) is 9 times as likely as (b).
So out of 11 similar cases, we should expect scenario (a) to occur 9 times, scenario (b) once, and scenario (c) once. The defendant is guilty with probability 9/11. Call it about 82 percent.
A murder has been committed. There are four suspects: Bob, Carol, and Ted, who smoke, and Alice, a nonsmoker.
You’re quite sure that one (and only one) of these is the culprit but have no reason to suspect one more than another. So you call in your two crack investigators, Agents 86 and 99, explain the situation, and send them out separately to investigate further.
Agent 86 reports back that, based on the evidence he’s discovered, the odds are two to one that the culprit is a smoker. Agent 99 reports back that based on the separate evidence
she’s
discovered, the culprit is definitely female. Unfortunately, that’s all they can conclude.
You have complete confidence in your investigators. Who’s your main suspect, and how sure are you?
SOLUTION:
We know that the culprit is female, so it can only be Carol or Alice. And we know the odds are two to one for a smoker, so it looks like Carol—the smoker—is the most probable culprit, right?
Wrong. It’s probably Alice. You should be 60 percent sure of that.
Here’s why: When you first sent Agent 86 out into the field, all four suspects looked equally likely. Because three of the four suspects are smokers, 86 started out believing the odds were three to one that the culprit smokes. But along the way, something must have convinced him that the odds were only
two
to one. So whatever evidence he discovered must have tended to exonerate the smokers. That already leaves Alice as the chief suspect.
Indeed, based on Agent 86’s report alone, there’s a ⅓ chance that Alice is guilty and a ⅔ chance that one of the smokers is guilty. Spreading that ⅔ out equally over the three smoking suspects, the probabilities are
Bob 2/9 Carol 2/9 Ted 2/9 Alice ⅓
So Alice is half again as likely as Carol to be the culprit (because ⅓ is 50 percent more than 2/9).
Now Agent 99 arrives, to eliminate Bob and Ted from consideration. But this doesn’t change the fact that Alice is half again as likely as Carol to be the culprit. That makes the probabilities 40 percent for Carol and 60 percent for Alice.
Solving crimes is all about assessing evidence. Often that means thinking about statistics, as in the past few problems. But sometimes the evidence is of a very different kind.
You have a sealed lockbox about a cubic yard in volume, containing $100,000 in $100 bills. Your balance scale tells you that the box (with the money inside) weighs 50 kilograms. You give the box to your friend Al, who flies it to the moon, while you, along with your balance scale, follow in a separate vehicle. Upon arrival, you set up your balance scale on the moon’s surface, retrieve the sealed box, and verify that it still weighs 50 kilograms. You then give the box to your friend Barb, who loads it into her all-terrain vehicle and drives it to your moonbase, with you following along, again in a separate vehicle. When you get to the moonbase, Barb returns your lockbox. You open it and it’s empty.
Who stole your money, Al or Barb?
SOLUTION:
On earth, the left half of your balance scale, supporting a 50-kilogram metal ballast, just balances against the right half, supporting your sealed lockbox.
Actually, that’s not quite right. The left half of your balance scale supports a 50-kilogram metal ballast
plus the invisible column of air above it
(shown in gray), while the right half supports your sealed lockbox
plus a column of air with a lockbox-size hole in it:
On the moon both columns of air are removed. That lightens the left half more than it lightens the right half, so the scale should now tip to the right. If it doesn’t—if the scale still balances—the lockbox must have gotten lighter. Its contents must be gone. Al took your cash.
By way of a reality check: On earth, $100,000 in $100 bills weighs about 2 pounds. So does a cubic yard of air. So the weight of the missing money on the right does indeed cancel the weight of the extra missing air on the left, which explains why the scale still balances on the moon.*
There’s more to the law than resolving individual cases. There’s also the problem of designing more effective legal institutions. Here’s one to think about:
Professor Peter Leeson of George Mason University reports on a form of medieval justice:
For four hundred years the most sophisticated persons in Europe decided difficult criminal cases by asking the defendant to thrust his arm into a cauldron of boiling water and fish out a ring. If his arm was unharmed, he was exonerated. If not, he was convicted.
Could this be an effective way to sort out the guilty from the innocent?
SOLUTION:
Professor Leeson believes the answer is yes.
As long as defendants believe (superstitiously) that ordeals yield accurate verdicts, guilty defendants always confess to avoid the ordeal. At the same time, innocent defendants always opt for the ordeal—and are always acquitted, provided the priests cheat by (for example) substituting tepid for boiling water, or “sprinkling” a few gallons of cold holy water over the cauldron, or liberally redefining what counts as “unharmed.”
Not only does the system work but it’s continually reinforced. Even the superstitious masses are smart enough to figure out that their equally superstitious neighbors opt for ordeals only when innocent; therefore they expect all ordeals to yield acquittals, and their expectations are always confirmed.
If not everyone is perfectly superstitious, then the story is a little more complicated. Still, as long as there’s a healthy amount of superstition floating around, and as long as the priests are eager to convict the guilty and acquit the innocent, you’d expect most ordeals to yield acquittals, and you’d expect nonbelievers to be denied the ordeal option. (After all, the system works only as long as the participants believe it’s rigged not by the priests but by God.)
And indeed there’s at least some historical evidence for this theory. First, a great many trials by ordeal ended in acquittals, which at least shows that they were not designed to convict everyone. Second, nonbelievers were generally exempt from trials by ordeal, which would make sense if the whole arrangement depended on beliefs. And third, ordeals were generally administered by clerics (which would have been necessary to reinforce the needed superstitions) and in fact died out when clerics stopped administering them.
Professor Leeson’s conclusion is that “no one longs for the return of ordeals, but if the necessary belief structure existed to support them, perhaps they should.”
Finally, because every well-trained lawyer should know something about civil as well as criminal law:
In 1914 the British government passed a law to control rents. It defines the “standard rent” on a house to be the rent in 1914, unless that is less than the ratable value, in which case it is equal to the ratable value. (No, I do not know what a “ratable value” is, but that won’t matter here.) It then declares that a house is covered by the law “if either the standard rent or the ratable value is less than 105 pounds.”
Thanks to the obscurity of the language, there was a great deal of confusion about which houses were and were not covered, resulting in a considerable number of lawsuits.
Can you settle all of those lawsuits in one stroke by restating the criterion for coverage in (much) simpler terms?
SOLUTION:
A house is subject to the law if and only if the ratable value is less than 105 pounds. (Think about it!)