2

When to Trust the Evidence

A doctor and a lawyer were talking at a party, but they weren’t having fun because people kept interrupting to ask for free medical advice. After a while, the annoyed doctor asked the lawyer, “How do you stop people from asking you for legal advice when you’re at a party?”

“I give it to them,” replied the lawyer, “then I send them a bill.”

The doctor was shocked but agreed to give it a try. The next day, still feeling slightly guilty, he prepared the bills for the people who had asked him for advice the previous night. When he went to place them in his mailbox, he found a bill from the lawyer.

I am not a medical doctor, but as a medical researcher people ask me the same kinds of questions they might ask their doctor at a party. What do I think about herbal medicine? Do the benefits of chemotherapy outweigh the side effects? Can medical marijuana cure depression? What about vaccines and autism? The answer to these questions can only be found by looking at the evidence. If there’s good evidence a treatment works, we can probably trust it; otherwise we should be careful. We will probably all need to know whether a particular treatment works at some point, so we should all understand what good evidence is. The problem is that the media love splashing headlines about “magic bullet” treatments before there is evidence proving they work and, with very few exceptions, academics use incomprehensible mumbo jumbo to explain their studies. Worse, academics rarely bother to translate their research for other researchers, let alone the public. So someone with a PhD in chemistry will have difficulty understanding what someone with a PhD in physics writes.

Yet if you want to get the basics, evidence is easy to understand, as long as it’s explained clearly. I translated the language of medical researchers for my philosophy colleagues in my book The Philosophy of Evidence-Based Medicine, and in this chapter I’m translating it for you. What you’ll find is that for most things you don’t need to understand more than the basics, and that is enough to become an “active medical citizen.”

A Fair Start and Randomized Trials

I could tell everyone that I ran 100 meters faster than Usain Bolt. But nobody would believe me unless I proved it by lining up beside him, racing, and winning. If I refused to race him, you would say I was full of sh*t. Yet that kind of bullsh*t is common in medicine. To prove a treatment works, you have to compare it with what happens if someone does not take the treatment—you have to have a “race.”

For example, a researcher, often one who was paid by the industry, might give you vitamin C when you caught a cold. Then, if your cold went away in five days, he might say the cold had gone because of the vitamin C. But most colds go away in five days without any treatment anyway. To check whether taking vitamin C helps, you need to compare people who take it with people who don’t. Only if the colds in the group that got vitamin C went away faster than the other group’s could we say that vitamin C “won.” But the start of the race would have to be fair . . .

If I agreed to prove myself by actually racing Usain Bolt, but then took a massive head start, you would say the race was not fair. While it is not always on purpose, this kind of cheating is common in medical research. For instance, a researcher might give younger, healthier people vitamin C and not give vitamin C to older, less healthy people. But since young and healthy people’s colds go away faster than older, unhealthy people’s colds, that would not prove anything, because the younger, healthier group had a “head start” when it comes to health.

The groups that take or don’t take vitamin C have to be as similar as possible. To create similar groups, scientists flip a coin to decide who gets vitamin C and who does not. (Actually, they don’t really flip a coin, but they use a computer to achieve the same thing.) When we flip a coin to decide who gets what, we have a fair start and what is called a randomized trial.

Blinding to Stop Cheating Along the Way

At the 2016 cycling world championships, a Belgian cyclist was caught with a hidden motor in her bicycle. (The cyclist claimed it was not her bike, and that the team mechanic had given her the wrong bike by accident.) Whether or not she knew about the motor, having the motor was cheating. Yet this kind of cheating is also common in medical research, although it is not always done on purpose. If the doctor believes vitamin C works (or if they are being paid by the company who makes the drug in a trial), they might interpret a little sniffle as failure to cure the person who did not take vitamin C, but as a cure for the person who took the vitamin C. The same goes for the patients and everyone else involved in the trial. If people believe that the treatment works, they can make biased observations, or pretend to get better when in fact they don’t. The coolest study I know that shows this is called “Pygmalion in the Classroom.”

In the spring of 1964, Robert Rosenthal and Lenore Jacobsen went to a public elementary school they called the “Oak School” to carry out an experiment that they named after Pygmalion, the Greek artist who sculpted an ivory statue that came to life because he lavished it with so much attention.

Rosenthal and Jacobsen gave all five hundred kids in grades 1 to 5 (kids between five and ten years old) a test they called the impressive-sounding “Harvard Test of Inflected Acquisition.” Teachers were told that the test “predicts the likelihood that a child will show a learning spurt within the near future.” Teachers administered the multiple-choice test, and two independent assessors, who didn’t know the identities of the participants, scored them separately. The teachers were allowed to see the results of both tests but were told not to discuss them with the pupils or their parents. After a year, the same Harvard test was administered by the teachers and graded by the same independent assessors. The students that Rosenthal and Jacobsen had originally scored as in the top 20 percent for learning-spurt potential improved in English, math, and even IQ significantly more than the other students.

The funny thing was that the “Harvard Test of Inflected Acquisition” was actually a standard IQ test. Then, Rosenthal and Jacobsen didn’t choose the top 20 percent of students, they chose 20 percent at random! The reason these students “spurted” was not because they were “spurters,” but because the teachers believed in them. A teacher, believing that a student was ready to “spurt,” might pay special attention to that student. The additional attention received could easily translate into accelerated rates of improvement.

Pygmalion-type effects are probably common in medical research. When a doctor or researcher believes they are administering the best experimental treatment to a patient, they might treat that group differently than they would a patient who was getting the placebo instead of the experimental treatment. Meanwhile, if the doctor believed that a different patient was being given a placebo, the doctor might not bother providing the highest quality of care. They might deem it “not worthwhile,” especially given that they all have limited time to distribute among their many patients. An obvious scenario in which caregiver knowledge could have an effect is when they have a personal or financial interest in showing that the experimental treatment works. The role of these personal or financial interests can be conscious or unconscious.

To prevent bias, researchers are blinded, which means they don’t know which patients get the experimental treatment. To achieve blinding, you obviously need to give some patients placebos that can be disguised as a drug. This is possible with pills, but much more difficult with complicated treatments like ginger tea and exercise.

In practice, blinding is not easy to achieve because researchers can be very good at figuring out which patients received the real treatment. The way researchers hide their knowledge of which patients get the real drug and which patients get the placebo is that they have a secret number for each patient. For instance, one patient might be assigned the number “2958” and another will be given the number “5829.” Then, in a separate place, the “decoding” for the numbers is done. This can be a piece of paper saying “2958 = placebo” and “5829 = drug.”

Sometimes researchers and doctors try to decode the numbers themselves. Kenneth Schulz is a prolific researcher who is the president of the Quantitative Sciences Department within the International Clinical Studies Support Center. He conducted a workshop where investigators revealed anonymously the methods they used to figure out which patient was getting which treatment. Here is what he reported about one of them:

[One] workshop participant had attempted to decipher a numbered container scheme, but had given up after her attempts bore no success. One evening she noticed a light on in the principal investigator’s office and dropped in to say hello. Instead of finding the principal investigator, she found an attending physician who also was involved in the same trial. He unabashedly announced that he was rifling the files for the assignment sequence, because he had not been able to decipher it any other way. What materialized was almost as curious as her response. She admitted being impressed with his diligence and proceeded to help in rifling the files.

The Pygmalion experiment shows that when doctors believe they are giving the best treatment, this can influence how quickly the patient recovers. It is also important to blinded patients, because their expectations can improve outcomes, and this can make a treatment appear effective when it isn’t. If a patient knows they are getting the “newest and best” drug, they might expect to get better, and these expectations could cause an improvement (see Chapter 8). Meanwhile, if the patient knows they are “just” getting the placebo, they might not have the same expectations. Placebo control treatments that patients think could be “real” are used to help make sure blinding is applied and maintained during a trial. It also helps if other people involved in the trial, like the statisticians, are blinded, since they too can introduce bias. But even if the blinding in an individual trial is perfect, it is not enough: we also have to avoid “cherry picking.” Systematic reviews are a type of study used to help prevent cherry picking.

Systematic Reviews to Make a Better Final Judgment

The Swiss Men’s Ice Hockey team beat Canada in the 2006 Olympics. It would be wrong to say, on the basis of one game, that the Swiss team was better than the Canadian team. It’s the only time they have ever beaten Canada in the Olympics, and they have never won the Olympics, while Canada has won the Olympics nine times. To make a judgment about which is the best team, you need to look at the whole picture. Looking at the whole picture tells us that Canada has a much better team than Switzerland. The Swiss may have other advantages, like good chocolate, but in ice hockey they just aren’t as good. The same applies to medical research: a fair assessment of whether a treatment “wins” has to be based on all relevant evidence. As obvious as this seems, it happens more rarely than we might like.

If I wanted to know whether Prozac is more effective than a placebo, it would be wrong for me to cherry-pick my favorite studies that indicated a positive benefit of Prozac. It would be wrong to ignore those with a negative result. The first time I learned that we needed more than one study I was surprised—why isn’t one enough? Either something works, or it doesn’t, right? Wrong. That is only what headlines tell you. Anyone who has done real science knows that it is messy. There is a lot of randomness: once in a while a drug works with some people but not others; sometimes the study is flawed; and unfortunately researchers can cheat.

That is why it is so important to look at all the studies together in a megastudy called a systematic review to make sure you are not merely choosing the ones that give you the result you want. Once we have gathered all the trials together, it is sometimes possible to use statistical methods to get an average-effect size in what is called a meta-analysis.

Summary: What Is Good Evidence?

If there is a systematic review of blinded randomized trials showing that a treatment works, then it probably does. Otherwise we have reason to remain skeptical. Now you are probably asking whether there is good evidence showing that vitamin C can cure the cold, or whether marijuana cures depression. It actually can be quite easy to find a systematic review of randomized trials if you want to get a good idea about whether something works. For example, I just typed “systematic review vitamin C cold” into an internet search engine and learned that there is a systematic review of blinded randomized trials looking at the effects of vitamin C on colds.

I read the study carefully, and its conclusions about vitamin C’s effects are interesting. On the one hand, vitamin C does not seem to prevent colds. On the other hand, if you take vitamin C on a regular basis your cold won’t last as long as it will if you don’t take vitamin C. As for marijuana, a systematic review seems to suggest that it makes depression worse rather than better.

Some people will tell you there is a lot more to learn about evidence that medical interventions work than I’ve explained here, and they’re right. You need an entirely different kind of evidence—qualitative research—to learn about people’s feelings. You need to watch what happens to people for years to ascertain the long-term effects of a treatment. Even then, doctors will always need to use judgment to adapt evidence to individual patients, and we don’t always need randomized trials or systematic reviews to prove treatments are effective. As I say, real science is messy, and it can also be depressing.

Three Depressing Things About Medical Studies

Publication Bias

About half of trials are never published, especially those with negative results. This means that systematic reviews often contain a biased sample of trials and often have exaggerated results. The unpublished trials are hard and sometimes impossible to find. In his TED Talk, Ben Goldacre talks about a time when he prescribed an antidepressant called reboxetine to one of his patients. Goldacre is an evidence guru, so naturally he checked the literature and found that reboxetine had been proven superior to placebos and as good as other antidepressants. Because his patient had not responded to the other antidepressants, he decided to try reboxetine.

But it turns out that Ben did not have the full picture. There were six unpublished trials comparing reboxetine against a placebo with negative results in which reboxetine was no better than the placebo. There were also unpublished data comparing reboxetine with other antidepressants showing that reboxetine was worse than the other options. How can doctors and patients make good choices about treatments if trials are not published?

Making things worse, governments don’t require that the drug companies release all data about benefits and harms. This is a kind of madness. What would you say if a car company had suppressed information about the brakes in a car not working and people died in accidents as a result? I think the lack of any requirement for drug companies and others who do trials to release all the trial data on their products is crazy, and most people I tell about this find it difficult to believe. The result is that doctors prescribe drugs without knowing how well they work or how serious the side effects are. If you are lucky enough to live in a country where they have socialized medicine, like Canada or the United Kingdom, then you pay for these treatments through your taxes. If you live in a country without socialized medicine, then you pay for them out of your pocket. You have the right to all the data about the benefits and harms of these treatments. Movements like the AllTrials campaign (www.alltrials.net) are fighting—with some success—to change this.

Hidden Bias

It gets worse. Even experts have trouble detecting some bias. The funny thing about these hidden biases is that they almost always support the new drug of the company that paid for the trial. Sometimes this leads to absurd conclusions. A research group in Germany looked at trials that compared three different antipsychotic drugs—olanzapine, risperidone, and quetiapine—against each other. They found that olanzapine beat risperidone, risperidone beat quetiapine, and quetiapine beat olanzapine. But that is ridiculous.

What if I told you that Sarah was taller than Johnny, Johnny was taller than Mark, and Mark was taller than Sarah? It doesn’t take too much brainpower to figure out that it can’t be true. Yet these kinds of Orwellian nonfacts pervade the medical literature. The antipsychotic trials were all randomized and blinded, so they all looked as if they were trustworthy. The factor that appeared to determine the outcome was who paid for the study. When the manufacturers of risperidone paid for the trial, risperidone “won”; when the manufacturers of quetiapine paid, quetiapine “won”; and when the manufacturers of olanzapine paid, olanzapine won. Researchers in Germany conclude that the trials suffered from some hidden biases. These hidden biases are impossible to find by reading journal articles. Those articles can look squeaky clean, but they don’t contain enough information to allow us to verify whether statistical tricks or outright cheating may have been used in the actual trial.

Size Matters (and People Lie about It)

Pound for pound, an ant is stronger than an elephant. A leafcutter ant can even carry something that weighs fifty times more than it. That is the equivalent of a human being lifting a truck with their teeth, or an elephant lifting a brick house. That is pretty incredible and I have a lot of respect for ants. But their pound-for-pound strength will not help me if I need to carry something heavy. For that, I would much prefer the help of an elephant (or a human). The geek name for pound-for-pound strength is relative strength, and relative things are confused even by smart humans with degrees from Harvard and Oxford. That is why, unless I say otherwise, I use absolute-effect sizes in this book. The good news is that if I tell you about the blue whale and the barnacle, I bet you will never get it mixed up again.

The blue whale is the largest known mammal ever to have lived on earth. Bigger than the biggest dinosaurs that we know of. Adult blue whales are about 98 feet long and weigh up to 170 tons. They are longer than two school buses parked end to end, their tail is as wide as a van, and their heart is as big as a small car. If you get very close to a blue whale, you will see tiny shelled creatures called barnacles attached to its skin. Full-grown barnacles can easily fit into the palm of an adult’s hand. Now if I asked you which creature had a bigger penis, the blue whale or the barnacle, you might think I was joking. The blue whale’s is longer than an adult human is tall, while the barnacle’s is barely longer than your hand. Clearly the blue whale wins. But that is only if we are talking absolute sizes.

If we are talking size relative to their body length, it is a different story. The barnacle’s penis is up to thirty times as long as its body. Barnacles don’t move, so their penises have to be longer than their bodies to impregnate a mate. The blue whale’s penis, on the other hand, is shorter than its body. So according to the relative measure, the barnacle’s penis is larger than the blue whale’s. When you read about medical treatment effects, they usually report relative, not absolute-effect, sizes, which can be confusing and misleading.

In the EUROPA study, investigators randomized 12,218 patients with heart disease to take either a drug called perindopril or a placebo. Perindopril relaxes blood vessels and lowers the amount of blood moving through the vessels so that the heart does not “demand” as much blood. After taking the drug or the placebo for more than four years, investigators looked to see how many people who took perindopril died or had a serious heart attack, and compared that with what happened to the people who received the placebo. Of the participants in the placebo group 10 percent died or had a serious heart problem, compared with 8 percent in the perindopril group. So the absolute difference between the drug and the placebo was 2 percent.

In some ways 2 percent is a lot. It means that out of one hundred people, you prevent two deaths or two heart attacks. To some people, 2 percent is not very much. It means that you would have to give fifty people the drug to save one death or heart attack. When given the choice to take a pill every day for the rest of their lives, or have a 2 percent reduced chance of heart attack or death, some will pop the pill. But many would choose to forget the pill and take their chances, or even try a little more exercise. (A systematic review of trials suggests that exercise may be as good as drugs for preventing heart disease.)

Then it gets confusing. The authors of the study did not say the drug had a 2 percent effect, they said it had a 20 percent effect. And because 20 percent is very big, they said everyone at risk should take the drug. But how did they get 20 percent from 2 percent? They used the confusing relative-effect sizes. Saying perindopril reduced deaths and heart attacks by 20 percent is like saying that a barnacle has a bigger penis than a blue whale. The mathematics required to calculate the relative-effect size is not too hard. You take the absolute size (2 percent) and divide it by the effect in the placebo group (10 percent). In this case the relative-effect size is 2 percent divided by 10 percent, which is 20 percent. Some statisticians say there are good reasons for using relative-effect sizes. They say it can help with translating the results of one trial to another. Whether they are right or wrong, most people still find relative-effect size confusing—we don’t see why 2% can become 20%.

With such a small absolute effect, a small hidden bias in the trial could have tipped it the other way. Here are two small factors that could have done just that:

One cannot help but wonder whether these factors could have led to small hidden biases that influenced the results.

Because of this we should not be surprised if perindopril didn’t demonstrate superiority to placebo in another trial. In fact, that’s just what happened.

In the PEACE study, investigators randomized 8,290 patients in four countries to receive trandolapril (a pharmaceutical cousin of perindopril) or a placebo. Trandolapril failed to demonstrate superiority to the placebo. The different results in the PEACE and the EUROPA studies could be because the drugs were slightly different. It could also be because small effects in trials with hidden bias can’t be trusted.

A similar debate is going on now about statins. Statins are very effective for people who already have confirmed heart disease; for example, people who have already had a stroke or heart attack. But not everyone who takes statins has confirmed heart disease, and some experts say that everyone over age fifty (even if they have a very low risk of heart disease) should take them to prevent possible future disease. (Some doctors have even recommended that statins be put in our drinking water, but these suggestions have not been taken seriously.) Yet statins are not very effective for these people.

In the most recent randomized trial of people without confirmed heart disease, 3.7 percent of the people who took statins died over six years, while 4.8 percent of the people who took a placebo died in the same period. This means that the (absolute) benefit of statins was 1.1 percent (4.8 percent minus 3.7 percent). However, by the time the results got to press, the statins trial was reported as reducing death by more than 20 percent. How did the results get from 1.1 percent to 24 percent?

The statisticians used confusing relative effects. This means that they took the absolute effect of 1.1% and divided it by the effect in the placebo group of 3.7%.

Something else to consider is that the decision to take medicine is not just about benefits, but about whether the benefits outweigh the harms. And it takes only a small harm to outweigh a small benefit. Small effects are also more likely to arise from small (hidden) biases. With that in mind, the following three facts are relevant:

In spite of this, it is likely that statins do have some—albeit small—benefits for people who do not have confirmed heart disease. Faced with the choice to take a pill every day for the rest of their life, or have a 1.1 percent reduced chance of cardiovascular death, some will be happy to take the pill, while others might not.

Regardless of these problems with the statin evidence, some “hard-core” statin believers say that we should not even be given the choice. Just like vitamins A and D used to be added to milk to prevent blindness and rickets, they say we should now pop statins like M&Ms. Or even add them to the water, so people are forced to take them. Especially in countries where there is a national health service and taxpayers pick up the tab to deal with heart attacks and strokes, why should we let people refuse to take statins if we are the ones who pick up the bill to treat them?

Small effects, they might add, are important if we add them up over the entire population. For instance, putting babies to sleep on their back reduced infant deaths by much less than 1 percent, but because so many babies are born, it probably saved more than five hundred lives per year in the United Kingdom alone, and more than two thousand lives per year in both the United States and Europe. Putting babies “back to sleep” (as the campaigns were called) is a simple intervention that does save some babies’ lives, and even one baby’s life is important. Likewise, the reasoning goes, if everyone (even those with a low risk of heart disease) took statins, we would save thousands of lives.

The argument that we should be forced to take statins does not make much sense, because statins are not like vitamins A or D, or putting babies back to sleep. For one, there are other ways to reduce the risk of heart disease, such as exercise and better diet, so forcing someone to take a statin takes away their freedom of choice. Also, the argument falls apart unless we are sure that (for people with a low or medium risk of heart disease) statin benefits actually outweigh the harms. The reality, as we have seen, is that there are a lot of open questions about bias in the evidence and whether the benefits outweigh the harms for people with a low or medium risk of heart disease. The problem with small effects is worse if we consider that average results don’t always apply to individuals.

I Am Not Average

My colleague and friend Professor Donald Gillies tells an interesting story about his niece in Rome. He was visiting his niece when she was about to turn sixteen. Apparently when teenagers in Rome turn sixteen, most of them get mopeds. Hoping to discourage his niece, Donald informed her about the evidence of the serious dangers of riding mopeds, such as paralysis and death. As an academic, Donald had studied the latest statistics, which he cited, hoping to scare his niece. To Donald’s surprise, she immediately agreed with all the statistics. He thought he had succeeded at dissuading her from getting a moped. But then she added that the statistics didn’t apply to her. She said that the ones who were injured and died in moped accidents drove while drunk, drove too fast, and drove when they were too tired. She, on the other hand, rarely drank, would never drive too fast, and was generally very careful. She may not have known it, but she had stumped Donald with a major controversy in medicine: When do average results from trials apply to individuals?

Just as Donald’s niece was different from the average Italian sixteen-year-old, you might be different from the average person in a clinical trial. Trials usually exclude smokers, people with more than one ailment, very young people, and very old people. Yet once a treatment is proved effective in a trial, it is used to treat everyone, even the people who would not have been eligible for the trial. But how do we know that the drug works in the people who were not eligible for the study? How do we know if the results of the trial apply to us? Sometimes they don’t.

Some antidepressants proved to be effective when tested in adults, and were then used to treat children. However, later studies showed the drugs had doubtful effects in children. In a more dramatic case, an arthritis drug called benoxaprofen (Oraflex in the United States and Opren in Europe) proved effective in trials in 18- to 65-year-olds but was withdrawn from the market immediately after it was reportedly responsible for deaths of twelve elderly patients who took the drug. There seemed to be something about older people’s bodies that reacted with the drug in a fatal way.

The statin trials face a similar problem. The researchers conducting the trials excluded people with liver disease, muscle pain, patients taking other medications, and patients with any other “serious condition(s) likely to interfere with study participation.” But many people who end up getting statins are taking other medications or have muscle pain or liver problems. Just as Donald Gillies’s niece said the evidence didn’t apply to her, the results of the statin trials may not apply to many people who would have been excluded from the trial. The results certainly don’t apply to all the people who would take them if statins were included in the water supply. For some individuals, statins may be better than they were for the average person in the trial, and for others statins could be harmful.

Things get even more complicated if we think about national and cultural differences. A few years ago, an intervention designed to reduce malnutrition among children in Tamil Nadu was introduced. The intervention involved educating mothers, additional health care, and food supplements. The intervention was a great success, with malnutrition declining by 33 percent among children aged six to twenty-four months.

Inspired by the success in Tamil Nadu, a similar project was implemented in Bangladesh. However, the intervention didn’t work in Bangladesh. The people in Tamil Nadu and Bangladesh were not that different, but their culture was. Some of the reasons the intervention didn’t work in Bangladesh were what came to be called the “mother-in-law” and “man-shopper” factors. Unlike in Tamil Nadu, mothers were not the decision makers in Bangladesh. The men did the food shopping, so educating mothers didn’t have an effect on what was bought, and the mothers-in-law allegedly diverted the food supplements from the children to their sons.

There are different techniques for dealing with the problem that we might be different from the average person in a trial. All of them—most recently, personalized medicine and genetic medicine—make huge promises and consistently underdeliver. The details of the solutions can get pretty geeky, so I will not describe them in detail here (see my papers about mechanisms cited in the Works Cited section at the end of the book if you have an appetite for some academic writing). What I found is that, at present, there often is no good way to predict whether you are like the average person in a trial. This means that whenever you take a new treatment, it’s important to monitor what happens to you when you take a new treatment. If it turns out that it is working, then great! It might even work better for you than it does for the average person in the trial.

On the other hand, if it doesn’t work, or if you are getting side effects that (for you) do not outweigh the benefits, you need to speak with your doctor about other possibilities. The problems with too much medicine suggest that this “watch-and-see” attitude is not used enough.

Nothing (Even a Randomized Trial) Is Perfect

When we hear about very large studies with thousands of people, we think this is a good thing, and it usually is. Large trials are wonderful because differences between people “wash out,” making the trial more trustworthy. That is why randomized trials are considered to be the “gold standard” of medical evidence. The irony is that we only need large studies when effect sizes are very small. You don’t need to have a thousand races to figure out how fast Usain Bolt is compared with the competition—you need only a few.

Taking advantage of this irony, Gordon Smith and Jill Pell wrote a spoof article in 2003 called “Parachute use to prevent death and major trauma related to gravitational challenge: a systematic review of randomized controlled trials” to poke fun at medical evidence experts. Smith and Pell concluded:

Advocates of evidence-based medicine have criticized the adoption of interventions evaluated by using only observational [not from randomized trial] data. We think that everyone might benefit if the most radical protagonists of evidence-based medicine organized and participated in a double-blind, randomized, placebo-controlled, crossover trial of the parachute.

Smith and Pell are right that many treatments with huge effects don’t need randomized trials. We know that automatic external defibrillation starts a stopped heart, tracheostomies open blocked air-passages, the Heimlich maneuver dislodges airway obstructions, penicillin cures most kinds of pneumonia, and epinephrine cures severe anaphylactic shock. To the best of my knowledge none of these treatments has been tested in randomized trials, yet we know they work.

The problem is that all researchers and drug manufacturers think their new drug or treatment is so good that it does not need to wait for a big randomized trial before being used on patients. However, in reality most of these treatments will not be revolutionary, and about half of new treatments turn out to be worse than the existing treatment. The only time we don’t need randomized trials is when the absolute-effect size is really big, which is rare.

Conclusion

“Good” evidence is like a fair race—it is not that hard to understand if you avoid jargon. The problem is that most new treatment effects are tiny. So whenever you read about a breakthrough drug with a huge effect size, it is probably the relative effect. Like saying an ant is stronger than an elephant. In fact, a rule of thumb is that if you read a headline saying that a drug has a greater than 10 percent effect size, it is almost certainly the relative-effect size. There are many problems with evidence making it difficult to trust, so we have to remain skeptical. Just as democracy may be the worst system of government except for all the others, systematic reviews of randomized trials are the best way we have to detect treatment effects, compared with all the others. The other methods often amount to little more than stories or opinion.

Takeaway 1: Ask Your Doctor About Absolute Benefits and Harms

The next time you read something about a medical breakthrough or a new diet or exercise regimen, either from reading about it on social media or hearing it from your friends, do a little research. Is there a systematic review of randomized trials suggesting that it works? If there is no systematic review, is there one or more randomized trials? Is there evidence that it is harmful? Not everything you do has to be based on a systematic review of randomized trials, but you do need to ask some questions before believing things.

This is important because the chances are that at some point someone—either a medical professional or a media reporter—will try the relative-effect size trick on you. This is usually because they are confused themselves about the difference between absolute and relative risks. If anyone ever suggests that you take a treatment, you should ask:

Only when you have answers to these questions can you make the right choice about whether to take a treatment or not.

Here is the kind of dialogue someone with a medium risk of cardiovascular disease might have with a doctor when discussing the possibility of taking statins. You can—and usually should—do the same exercise for any medication you or your children might take, especially painkillers, antidepressants, ADHD treatments, and, as we will see in Chapter 7, for knee, hip, or back surgery.

To warn you in advance, I can’t tell you whether to take statins or some other treatment with a small average effect. The choice to take any treatment depends on the evidence and your values and circumstances, which are a matter of individual choice. This is a simple message, yet it is often forgotten. For you to make the best choice about a treatment for you, you need to know about absolute-effect sizes and the harms. And to know whether the average effects apply to you, you need to monitor your progress. Here is an example of a dialogue someone considering statins might have with their doctor:

Doctor: According to your profile, you are considered to be at a medium risk of cardiovascular disease, which means you might have a heart attack or a stroke over the next ten years. Statins are recommended for people like you because they will reduce the chances of a fatal heart attack or stroke by about 14 percent.

You: Hmmm. If you don’t mind, I have a few questions. First, what will happen if I don’t take statins?

Doctor: Sure. Well, you are considered to be at a medium risk. In simple terms, this means that if we take one hundred people who are similar to you, about ten of you will have a serious heart attack or stroke in the next ten years if you don’t take statins.

You: Thank you. Now I’d like to know what will happen if I do take statins. What are my chances of having a heart attack or stroke if I do take statins, in absolute terms?

Doctor: The latest evidence suggests that the absolute benefit of statins for people like you is between 1 percent and 2 percent. So whereas ten out of a hundred people like you are likely to have a stroke or heart attack in the next ten years if you don’t take statins, only eight or nine of you would have a stroke or heart attack if you do take statins.

You: That makes sense. I have two more questions . . .

Doctor: Sure.

You: What are the likely harms of statins?

Doctor: Statins are quite safe for most people, and most people take them without experiencing side effects. That being said, about one in a hundred experience muscle pain or weakness; however, we aren’t sure if the muscle pain and weakness is caused by the statins or that people simply feel pain and think it is because of the statins. They may also induce diabetes in about one in a hundred people. In very rare cases—about one in a thousand people—statin therapy may cause strokes.

You: Thanks. Now my last question. I’ve read some things on the internet about how evidence is biased. Is the evidence you are citing for statins biased?

Doctor: That is a good question, and it is true that a lot of evidence is biased. In the case of statins there has been controversy, because some of the scientists who have produced the statin data have declared financial conflicts of interest and have not released all the trial data. And in other cases in which all the data has not been released and there are similar conflicts, we often end up learning that the benefits are exaggerated and the harms minimized. At the same time, a great deal of data has been published, so I don’t think a closer independent look at the data will reveal any big surprises. We may find that the benefits are slightly lower than we thought and that the harms are ever so slightly greater. However, statins are still likely to have a benefit for people like you with a medium risk of cardiovascular disease.

At this point the patient can answer in one of three ways, so there are three possible endings to this dialogue . . . There is no absolute right or wrong here, there is only right or wrong for you, and you need answers to these questions to know what is best for you.

Ending 1: Not Taking Statins

You: Thank you; that makes sense. I think I’m going to hold off on taking statins for now.

Doctor: That’s fine. In that case I recommend that you keep up your exercise and stick to a healthy diet. And we can monitor you in our regular checkups and see if your risk factors—and therefore the likely benefits of statins—change.

You: Thank you very much for your understanding. You have provided me with a good motivation to stick to my exercise regimen and reduce the amount of dessert I eat.

Doctor: That’s why I’m here. Have a great day.

Ending 2: Taking Statins

You: Thank you; I understand. Although the benefit is small, I’m happy to take a statin since it doesn’t seem like a big deal to take the pills.

Doctor: That makes sense—many people don’t see much of a downside and don’t mind taking the pills. I still recommend that you keep up your exercise and stick to a healthy diet. We can reevaluate at our next checkup.

You: Thank you very much for your understanding. I’ll try to keep up with my exercise and good diet.

Doctor: That’s why I’m here. Have a great day.

Ending 3: Wait and See

You: Thank you; that all makes sense. I think I need to think about it for a while.

Doctor: That’s fine. I understand your skepticism about taking statins, and, to be clear, the choice is yours. Many people don’t see much of a downside and don’t mind taking the pills with their breakfast cereal, but others don’t like taking pills. I’m going to give you the prescription anyway and you can choose whether or not to follow through with taking them. If you decide not to take them I recommend that you keep up your exercise and stick to a healthy diet. And we can monitor you. At some point if your risk level changes, we can reevaluate what you’d like to do.

You: Thank you very much for your understanding. You have provided me with a good motivation to stick to my exercise regimen and reduce the amount of dessert I eat.

Doctor: That’s why I’m here. Have a great day.

Takeaway 2: Quick and Easy Way to Check If There Is a Systematic Review or Randomized Trial

To really confirm whether something works, you would need to study critical appraisal and maybe even do your own research. This can take years. But you can make a great guess that is likely to be correct by looking up whether there is a randomized trial or systematic review of randomized trials. Here are three ways, starting with the easiest, to look up whether there is a systematic review of randomized trials:

  1. 1. The easiest way is to do an internet search. For example, to find out whether marijuana cures depression, I typed in “systematic review randomized trial marijuana depression.” I could not find anything. Then I typed in “systematic review marijuana depression” and found a systematic review, but it was not of randomized trials.
  2. 2. There are lots of websites that do the evidence search for you and summarize what they find in a user-friendly way. My favorite is one titled NHS Choices, which is produced by the United Kingdom’s National Health Service: http://​www.nhs.uk/​pages/​home.aspx. I typed in “depression” and didn’t find that they recommended marijuana for depression.
  3. 3. If you are feeling ambitious, you can go to the PubMed (https://​www.ncbi.nlm.nih.gov/​pubmed), which is a library of almost every medical trial or systematic review ever published. I typed in “marijuana depression systematic review random*” and fourteen results popped up. (I put an asterisk after the word random because you can spell randomized with an s or a z, and the asterisk tells PubMed to look for the word random with any other ending.) One called “Cannabinoids for Medical Use: A Systematic Review and Meta-analysis” seemed relevant; it showed there was no high-quality evidence of a benefit of marijuana over a placebo for depression.

Takeaway 3: Determine Your Cardiovascular Risk and What You Can Do to Change It

This is a great website that will tell you what your risk of cardiovascular disease is and what you can do to change it: http://​chd.bestsciencemedicine.com/​calc2.html.

Have a look.