3

Follow the Evidence

When a journalist jumps out from behind a hedge on the eleventh hole of a Caribbean golf course to ambush a professor of medicine from one distinguished university and a professor of psychiatry from another after both gave brief lectures that morning, and he asks the men whether their company-sponsored trip would influence their judgment, the response will be “Of course not!” If asked what does influence them, these academics will confidently point to the published evidence. They follow the evidence, not the money. Even if a doctor is pocketing the lecture fee rather than putting it into a research fund, the answer will still be the same.

There is a rich literature on how even minor gifts can hugely influence the recipient,1 and an equally rich literature on how doctors, when asked, readily agree that their colleagues are swayed by favors while still denying that they themselves can be influenced.2 That gifts can corrupt seems so obvious to everyone except doctors that in 2009 a Sunshine Act was introduced in the US Senate to mandate disclosure by pharmaceutical companies of money or other gifts given to doctors or medical academics.3

The response from the medical community to moves like the Senate's bill that seem to impugn medical professionalism is bewilderment. While traditionally doctors have put great store in professionalism, at least as important in their response to this issue is their belief that scientific research is immune to gifts, biases, or conflicts of interest. It's taken for granted that data don't lie, and this leaves medical personnel and their academic colleagues sanguine about gifts, from pens to all-expenses- paid conferences in the Caribbean. If anyone in the media can point to a mismatch between what doctors on the golf course prescribe in their practice and the published evidence, that would be another matter—but until Birnam Wood comes to Dunsinane there is no reason for concern.

These days, almost as though Macbeth's witches had invested in the fixedness of Birnam Wood, pharmaceutical companies also take their stand on the published evidence. From senior executives mingling with medical academics to drug reps visiting medical trainees, company personnel tout the virtues of controlled trials and exhort doctors to practice evidence-based medicine at every turn.

In its first manifestations, controlled trials and evidence-based medicine were known for demonstrating that fashionable treatments did not work. If now pharmaceutical companies invoke what they call evidence- based medicine to justify the status quo, that should at least raise some suspicion about what might be going on. In chapter 4 and 5 we will look at how companies cherry-pick the trial data that suits them, leaving inconvenient data unpublished and making a mockery of science. But in this chapter the focus is on how they have managed to turn controlled trials inside out, neutering their potential to show that some currently fashionable drugs don't work and transforming them into a means to sell worthless remedies.

THE TURN TO NUMBERS

Insofar as the Hippocratic dictum of “first do no harm” appears a good idea, we should adopt the view that we know very little. But faced with real diseases from Alzheimer's dementia to malignant cancers to rheumatoid arthritis, when both those affected and their doctors become desperate and vulnerable, remaining calm and skeptical is easier said than done. This is precisely why we need to pin down how much we do know about medical treatments, and for this purpose controlled trials can be a godsend.

There are many situations where it is not difficult to know when a drug or a medical procedure works. In the case of alcohol, for instance, we comfortably base our judgments on the fact that we can see certain effects occur reliably soon after intake of the drug, with some relationship between how much we imbibe and the effects we experience. We know about the analgesic effect of opium and the sedative effects of the barbiturates in the same way as we know about alcohol and about the benefits of having a dislocated shoulder popped back into place or a kidney stone removed, from the immediate relief.

Not all treatments have effects that are this clear-cut, though, nor are all the effects even of alcohol or sedatives self-evident. While a sedative may be usefully sedating and in this sense work, it still may not be all that useful in the long run for nervous conditions. Even bigger problems arise when the beneficial effects of a treatment are less immediate than those of alcohol or opiates, or when there are substantial differences in the effect of a treatment from one person to the next, or when the natural history of a condition is such that some people would have recovered anyway. In these cases, treatments may appear to be linked to cures and doctors or patients, who understandably want to see a treatment working, may jump to the conclusion that the treatment has worked when in fact it hasn't. This is a gap that quacks and charlatans exploit. In a world where nothing much worked, the charlatan engendered many false hopes. In today's world, where getting the wrong prescription may mean you needlessly suffer harmful side effects or miss out on a treatment that really would have helped, the consequences may be more serious.

In these more ambiguous cases—of which there are many in medicine—how do you know whether or not a treatment is likely to have any benefits and what hazards it may entail? For several centuries at least some physicians appreciated that before a proposed new remedy could be accepted, its effects would need to be observed under controlled conditions. There would need to be some way to demonstrate that any improvement in patients was due to the remedy, and not to some other factor, or even to chance. Some group of patients living under similar conditions that could be divided into comparable groups, only some of which were given the remedy, would be needed. But short of imprisonment or quarantine, this is not easily done.

In 1747, James Lind, the Scottish physician who served as ship's doctor on the South Seas-bound Salisbury, was given the perfect opportunity. Some of the sailors developed scurvy, a fearsome disease that caused tissue breakdown—wounds failed to heal, teeth fell out, and victims began to bleed into their skin or from their bowels. In some long-distance sea voyages up to half a ship's crew might die from it. Faced with twelve sailors whose “cases were as similar as I could have them” and who were all on the same diet, Lind was able to give two of the men cider, two vitriol, two vinegar, two sea-water, two oranges and lemons, and two a mix of nutmeg, garlic, mustard seed, and balsam of Peru. Those given the oranges and lemons improved rapidly, the others languished.4 Even though there was limited scope for any other explanation of these recoveries, many attributed the improvement to ventilation (although that was the same for all twelve patients), while even Lind himself was slow to credit the citrus fruits for the favorable response. It took the British navy another fifty years to include lime juice in the provisions for a voyage, after which British sailors were widely termed limeys. In this case, when faced with a choice between their expectations and the evidence, observers at first clung to their expectations—a pattern we shall see again and again.

On a much grander scale, but in a less controlled fashion, in 1802 in the midst of the revolutionary ferment in France, Philippe Pinel, the general physician now commonly seen as the father of modern psychiatry, working at the Salpêtrière asylum, became the first to commit medicine to an evidence-based approach. At the time typical treatments for mental illness included bloodletting, brutality, forced immersion in cold baths, being hosed down with water jets or subjected to a variety of purgatives, emetics, diuretics, and other drugs. Although there were elegant rationales for some of these treatments, and in some cases the treatments stemmed back to antiquity and had been advocated by history's most distinguished medical names, Pinel was skeptical. Patients often seemed to get better when the doctor waited to intervene, he had observed. Learning the typical course of a disorder, he reasoned, would make it possible to predict when patients might turn a corner for the better on their own. This appreciation underpinned his dictum that the greater art in medicine lay in knowing when to refrain from treatment.5

Between April 1802 and December 1805, 1,002 patients were admitted to the Salpêtrière, and Pinel was able to follow these individuals during their stay to see who recovered and who didn't, whether patients in particular diagnostic groups fared better than others—and hence whether diagnoses in use at the time were worthwhile or not. This was a first example of what later came to be called a statistical approach to illness. Why do it? Pinel laid out his reasons.

In medicine it is difficult to come to any agreement if a precise meaning is not given to the word experiment, since everyone vaunts their own results, and only more or less cites the facts in favor of their point of view. However, to be genuine and conclusive, and serve as a solid basis for any method of treatment, an experiment must be carried out on a large number of patients following the same rules and a set order. It must also be based on a consistent series of observations recorded very carefully and repeated over a certain number of years in a regular manner. Finally it must equally report both events, which are favorable and those which are not, quoting their respective numbers, and it must attach as much importance to one set of data as to the other. In a nutshell it must be based on the theory of probabilities, which is already so effectively applied to several questions in civil life and on which from now on methods of treating illnesses must also rely if one wishes to establish these on sound grounds. This was the goal I set myself in 1802 in relation to mental alienation when the treatment of deranged patients was entrusted to my care and transferred to the Salpêtrière.6

There had never been anything like this in medicine before. Overall, 47 percent of the patients recovered, Pinel found, but of those who had been admitted for the first time, who had never been treated elsewhere, who had a disorder of acute onset, and who were treated only using Pinel's methods, up to 85 percent responded. When left to recover naturally, many more of the first-timers did so than did those among the patients who had been treated previously by other methods. Not only that, within a short time of admission Pinel could tell who was likely to recover and who was not based on their clinical features. In other words there seemed to be different disorders, and people suffering from some types would recover if left alone while inmates with some other types would not regardless of what treatments they were given. Finally, following the patients after discharge brought a whole new group of periodic disorders into view for the first time, laying the basis for the later discovery of manic-depressive illness and other recurrent mental disorders.

Aware of the pioneering nature of his research, Pinel presented his data, on February 9, 1807, to the mathematical and physical sciences faculty at the National Institute of France rather than to the country's Academy of Medicine. This was hard science and the first time in medicine that results were presented as ratios across a number of patients studied, rather than as accounts of individual cases.

In reporting these findings, Pinel showed that he was well aware that his personal bias could have colored the results. But, as he noted, while an individual patient in London could not properly be compared to one in Paris or Munich, the results of complete groups of patients could be, and the registers of Salpêtrière patients were publicly available. So he confidently challenged others to contest his findings based on their outcomes.

The scientists were impressed. The physicians weren't. It took thirty years before another French physician picked up the baton and further unsettled the medical establishment with numbers. In 1836, Pierre Louis outlined a new numerical method that controlled for variations by using large numbers of patients: “in any epidemic, let us suppose five hundred of the sick, taken indiscriminately, to be subjected to one kind of treatment, and five hundred others, taken in the same manner, to be treated in a different mode; if the mortality is greater among the first than among the second, must we not conclude that the treatment was less appropriate, or less efficacious in the first class than in the second?”7

The treatment Louis assessed was bleeding—which in fact works well in disorders such as heart failure. But when he compared bleeding to doing nothing in a sufficiently large number of patients during the course of an epidemic, he sparked a crisis in therapeutics. Doctors expected bleeding to work better than doing nothing, but “the results of my experiments on the effects of bleeding in inflammatory conditions are so little in accord with common opinion [those who were bled were more likely to die, he found] that it is only with hesitation that I have decided to publish them. The first time I analyzed the relevant facts, I believed I was mistaken, and I repeated my work but the result of this new analysis remains the same.”8

These results led to howls of outrage from physicians who claimed that it was not possible to practice medicine by numbers, that the duty of physicians was always to the patient in front of them rather than to the population at large, and that every doctor had to be guided by what he found at the bedside.

Ironically, it was Louis and Pinel who were calling on physicians to be guided by what was actually happening to their patients, not by what the medical authorities traditionally had to say. As the marketers from GlaxoSmithKline and other companies might have told Louis and Pinel, though, for many physicians to be convinced there has to be a theory, a concept about the illness and its treatment, to guide the doctor. “The practice of medicine according to this [Louis's] view,” went one dismissal, “is entirely empirical, it is shorn of all rational induction, and takes a position among the lower grades of experimental observations and fragmentary facts.”9

Louis's struggles in Paris had their counterpart in Vienna where, in 1847, Ignaz Semmelweis noted that mortality was much higher on an obstetric ward run by physicians and medical students than one run by student midwives. Suspecting that the physicians were coming to women in labor with particles of corpses from the dissection room still on their hands, he got them to wash more thoroughly with a disinfectant and was able to show that antiseptic practice made a difference. No one paid any heed. A few years later, in 1860, Joseph Lister introduced antiseptic practice to the Glasgow Royal Infirmary, and postoperative putrefaction rates subsequently declined. The later discovery that infection with bacteria led to putrefaction provided a concept to explain these observations, but until then Lister, like Semmelweis, had trouble getting his colleagues to take his findings seriously.

One of the weaknesses in these early manifestations of evidence- based medicine, as the examples of Pinel, Louis, Semmelweis, and Lister make clear, was their inability to shed much light on what lay behind the figures—they showed associations but explained nothing about cause. There are commonly tensions between broad associations of this type, the specific evidence that comes from laboratory experiments, the evidence of our own eyes, and what currently dominant theories may dictate. To the relief of most doctors, the tensions between broad associations and more specific evidence were eased to a degree with the emergence in the second half of the nineteenth century of laboratory science, which more clearly linked cause and effect.

THE CAUSES OF DISEASES

In the 1870s, a set of laboratory sciences emerged to form the bedrock of the new scientific and diagnostic work that would transform much of medicine and underlie the practice of doctors like Richard Cabot and the rise of hospitals such as Massachusetts General, as noted in chapter 1. Advances in bacteriology were among the key scientific developments that led to new treatments as well as hope that science would lead to further breakthroughs. In France, Louis Pasteur provided the first evidence that germs, later called bacteria, were the causative factors in a series of infections such as rabies,10 and he supplied both evidence and a rationale for vaccinations and antiseptic procedures.11 In Germany, Robert Koch set up the first laboratory dedicated to the pursuit of the microbial causes of disease, and his most famous protégé, Paul Ehrlich, who more than anyone else developed the dyes that helped distinguish among bacteria, later developed the drugs that killed some of them. It was Ehrlich who coined the term magic bullet, for a drug that would specifically target the cause of an illness and leave the patient otherwise unaffected.12 For generations afterward, until the 1960s, the glamour and importance of their discoveries and those of their successors, written up in books such as the Microbe Hunters, attracted students to medicine.13

In 1877, Koch transmitted the lethal disease anthrax by injecting noninfected animals with the blood of infected animals; he then isolated anthrax spores and demonstrated that these spores, if grown in the eye of an ox for several generations, could also cause the infection. Where Lister met resistance for recommending an antiseptic approach in surgery on the basis of a comparison of the numbers of infections with and without antiseptic conditions, Koch could show the existence of bacilli under a microscope and later growing on a Petri dish, and then demonstrate the efficacy of sterilization in killing the bacillus. Where it had been difficult to overcome resistance to revolutionary ideas about antiseptics using only comparative numbers, for many seeing was believing.

The impact on medicine of this new science of bacteriology and the germ theory of disease can be seen with wonderful clarity in the case of cholera. From the 1830s to the 1860s, before the role of germs in disease was recognized, a series of cholera epidemics struck Europe, killing tens of thousands. Because no one knew what caused this plague or how to protect themselves from a grisly death, there was widespread public panic. In 1856, in a now-celebrated series of investigations John Snow, a London physician, mapped the appearances of the disease around London. He made a connection between clusters of those with the disease and contamination of the water supply and famously recommended removal of the handle from the pump in Broad Street so residents would get their water from other sources.14

Snow's work rather than that of Pinel, Louis, or Semmelweis is often cited as the first step in a new science of epidemiology, which maps the progress of a disease (or a treatment) through a population to pin down its course and its effects. But, while Snow is celebrated now, he was ignored at the time and the handle was not removed at the time because he could not point to a cause. The association he proposed was but one of many competing theories at the time. The data alone were not persuasive.

The later detection by Koch's group of a cholera bacillus in the drinking water of people who became ill both confirmed and undercut Snow's work. It confirmed Snow's suggestion of a link to the water supply rather than the other theories prevalent at the time. But it also made an approach to a disease like cholera that required tracking what happened to thousands of people over whole areas of a city seem crude and needlessly labor-intensive. Snow died two decades before Koch's work, but Lister, who in his antiseptic investigations had done something similar to Snow, came over to the bacterial theory of infections when it was demonstrated to him that bacteria caused putrefaction. Louis's and Snow's figures provide part of a story that needs to be matched with the evidence of our own eyes and the evidence that comes from the laboratory.

Koch's laboratory (or, experimental) approach didn't triumph without a fight, however. Many at the time refused to believe bacteria caused disease. Max von Pettenkoffer, a professor of medical hygiene in Munich, for example, argued that cholera was not caused by Koch's recently isolated bacillus but depended on an interplay of factors, many of which lay within the host. To demonstrate the point he brewed a broth of several million “cholera” bacilli and drank them, without suffering significant consequences. Faced with this challenge, Koch was forced to grapple with how we know something has caused something else. In von Pettenkoffer's case, Koch argued that stomach acid had likely killed the bacillus; still, there was room for doubt.15

Koch's solution to von Pettenkoffer's challenge and to the general problem of how to link cause and effect was to outline a number of rules. First, if you challenge (expose the person) with the cause, the effect should appear. Second, remove the cause and the effect should go. Third, a rechallenge should reproduce the effect. Fourth, the greater the exposure to the cause (the higher the dose), the more likely the effect should be. Fifth, an antidote to the drug (or bacterium) should reverse the effect. Sixth, there should be some temporal relationship between exposure to the drug or bacterium and the appearance of the effect. Finally, there should be some biological mechanism that ideally can be found that links the cause to the effect.

These are just the rules we now apply to deciding if a drug like alcohol or an industrial chemical has a particular effect on us and whether a controlled trial has shown a particular drug produces a benefit in a disease. Doctors attempting to make sense of what is happening to the patient in front of them will also use just the same rules. Whether direct observation by a doctor or a controlled trial run by a pharmaceutical company in hundreds of patients is the better approach depends on the question you're addressing. In the case of a patient with a suspected adverse drug reaction, when it is possible to vary the dose of treatment or stop treatment and rechallenge with a suspect drug, direct observation is just as scientific and may be much more informative than a controlled trial. In practice, however, if a hazard of treatment has not been revealed in what results of a controlled trial have been published, doctors are likely to deny it is happening, despite the evidence of their own eyes. To see how this situation has come to pass we have to turn to the study of fertilizers and the origin of randomized controlled trials.

FERTILIZERS IN MEDICINE?

In the best possible sense, doubt is the business of an epidemiologist. From John Snow onward, statistical epidemiologists worth their salt can provide several explanations that might account for the associations found in a quantitative study. Ronald Fisher (1890-1962), a Cambridge mathematician who did most of his key work in developing modern statistical methods from the 1920s to the 1940s while associated with an agricultural college, was typical of the genre. Photographs commonly show him smoking. To the end of his life in 1962, at the age of 72, he argued that lung cancer was not caused by smoking, that all we have for evidence are numbers linking people who smoke and cancer but that may simply mean that people prone to cancer were also more likely to smoke, and that correlations are not proof of causation.

Fisher's work centered on the question of whether the application of various fertilizers might improve grain yields. In grappling with how to determine this he introduced two ideas—randomization and statistical significance—that have come to dominate modern medicine. Indeed, they extend well beyond medicine and are worth looking at because they underlie so much of what is reported about science.

Misleadingly, a fertilizer may appear to increase the yield of grain if spread in an uncontrolled way: a myriad of soil, light, drainage, and climate factors might come into play, for example, and bias the results. Just as when trying to determine whether a drug works or not, the experimenter has to control for these known unknowns. Fisher anticipated Donald Rumsfeld's unknown unknowns by eighty years. But his key insight was how to control for these factors that he didn't know about—the way to do so was to allocate fertilizer randomly to the plants under study.

In early controlled drug trials, investigators allocated one male, say, to the new drug, the next one to placebo (a dummy pill), and so on, while also attempting to ensure there were an equal number of people of one age on the new treatment as on the control, or placebo, treatment. Now, in contrast, patients are divided into treatment and control groups according to a sequence of numbers that have been generated randomly before the trial starts. After the randomized controlled trial is over, investigators can check whether obvious factors like age and sex are distributed equally across treatment groups—which they invariably are. In one go, random allocation takes care of both the known and the unknown unknowns.

In addition to taking care of the unknown unknowns, randomization greatly reduces the number of subjects, whether plants or people, that need to be recruited to a study to get a clear-cut answer. At a stroke, the advent of randomization in controlled trials in the 1950s turbocharged epidemiology. It did away with the need to carefully balance controls by age, sex, social class, and ethnicity that made a nonrandomized approach slow and cumbersome because of the requirement for huge numbers of patients to produce a clear result.

Random assignment helped Fisher decide whether a fertilizer worked or not. But there is a key point to note. The question facing Fisher was whether a fertilizer would increase yield. Would there consistently be more bushels of grain, say, in the fertilized patches than in the unfertilized ones? This question is straightforward when the outcome of interest—in this instance, bushels of grain—is the only criterion. What, however, if the yield was greater but much of the grain was moldy? Would one still say the fertilizer worked?

Asked what it means to say a medical treatment works, most people would respond that it saves lives. Even if a treatment comes with side effects, staying alive ordinarily trumps these, and conversely death trumps any benefits. But many medicines, perhaps most, do not save lives. Once we look at outcomes other than death, we move into an arena where competing values come into play. We may not want the type of sleep induced by a hypnotic or the sex life produced by Viagra. The extreme, as we shall see in later chapters, is when we end up with claims by companies that a new treatment “works” because it can be shown to have effects on things the company values—even though the same clinical trials throw up more dead bodies on the new drug than on the placebo, as is the case for the cholesterol-lowering statin drugs, the Cox-2 inhibiting analgesics, blood-sugar-lowering drugs, beta-agonists for asthma, along with various antidepressants and antipsychotics. This happens when trials throw up a trivial but clear-cut and marketable benefit, with indicators of more problematic risks that companies ignore. This complex mix of benefits and risks is in fact the case for almost all of the current best-selling drugs in medicine, but all most doctors or patients get to hear about are the benefits.

In cases like these, the word “works” becomes ambiguous. When a life is saved, we know where we stand and all observers can agree on what has been done. When there is an obvious and immediate benefit, such as the effects of Viagra on a penis, or the sedative effect of a sleeping pill or anesthetic, we can all agree these effects are present and many of us feel we can make up our own minds as to whether we want such an effect. But few of the currently best-selling drugs in medicine have benefits as obvious as these.

An early indicator of this new world into which medicine began moving in the blockbuster era comes from a 1982 English study in which Sanjeebit Jachuk and colleagues looked at the perceptions of seventy-five doctors, seventy-five patients, and seventy-five relatives of the effects of propranolol, the beta-blocker for which James Black won a Nobel Prize. All the doctors surveyed reported propranolol was an effective antihypertensive: they saw the column of mercury in the blood pressure apparatus falling from visit to visit when they examined the patients, which was what they were hoping to see. The patients split down the middle in their responses to the drug they were taking: half reported benefits, half reported problems. Aside from thinking the drug was working because their doctor was happy, it's difficult to know why the patients reported benefits: raised blood pressure (hypertension) has no symptoms, so no one will have felt better on that account. But the investigators also consulted relatives, and all bar one of these reported that treatment was causing more problems than benefits—patients were now either complaining of treatment side effects or the process of diagnosis had made them hypochondriacal.16

So who—doctor, patient, or relative—was right? Reducing blood pressure can save lives, by reducing the likelihood of heart attacks and strokes, although statistically it may require hundreds of patients to be treated to save a life compared to people not on the drug. Companies don't collect data on outcomes like quality of life, novel side effects or relatives' impressions of benefits, however. When we are not in a position to make up our own mind about a benefit on the basis of seeing people get up off their death bed and walk or seeing the obvious effects of a hypnotic or Viagra, we become more dependent on the interpretation and judgment of our doctors, who in turn have become ever more dependent on pharmaceutical companies to interpret the effects of the drugs they produce. At the heart of those drug-company interpretations lies their use of Fisher's second innovation, the idea of statistical significance, a technique used to hypnotize doctors into focusing only on the figures that suit companies.

HYPNOTIZING DOCTORS

Fisher was an unlikely ally for a pharmaceutical company. He was a skeptic. His basic approach was to assume a new fertilizer didn't work. This is called the null hypothesis. It was only when the yield from plots fertilized with the new agent beat the plots with no fertilizer in nineteen cases out of twenty that he thought we can rule out the play of chance in the results and should concede that the new agent plays some role. When the yield from the fertilized plot is greater nineteen times out of twenty, by Fisher's determination, the result is said to be “statistically significant.” All this means is that the higher yield of the fertilized fields is unlikely to be due to chance. When applied to a drug and a placebo, a meaningless difference may be significant in this sense—but calling it significant leads most people to assume that there is in fact a substantive difference.

Ironically, statistical significance was a side issue for Fisher, for whom the key issue was whether we could design good experiments or not, ones that would yield similar results time after time. As he put it, “No isolated experiment, however significant in itself, can suffice for the experimental demonstration of any phenomenon…. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment that will rarely fail to give us a statistically significant result.”17 Perhaps because they can design conclusive experiments, branches of science such as physics and chemistry rarely use the concept of statistical significance.

The idea of significance testing was picked up from the 1950s onward primarily in sociology, psychology, economics, and medicine, disciplines where designing conclusive experiments is much harder because of the complexities involved. It may have triumphed in these arenas because it appeared to offer scientific rigor,18 The procedure creates the impression that the scientists have been forced to stand back and let the testing procedure objectively bring out what the data demonstrate. This can send a powerful signal in arenas that are heavily contested—but the signal is a rhetorical maneuver rather than something that in fact does guarantee objectivity. In many instances significance testing in these sciences has become a mechanical exercise that substitutes for thought. Experiments are considered good, it seems, if they throw up “significant” findings, even if the findings are trivial and cannot be reproduced.

When drug trials throw up a “significant” finding on cholesterol levels or bone densities, say, companies rush out a story that their drug “works,” even though in 50 percent of the trials they've run, the drug may not beat the placebo, or there may be more dead bodies in the treatment group than in the placebo group. Companies can do this in part because regulators in the United States such as the FDA, and in Europe, have established an incredibly low bar for a drug to be allowed on the market: only two trials with statistically significant positive results are needed to let a pharmaceutical company put a drug on the market, even though there might be up to ninety-eight negative studies. Given that Fisher expected that five in a hundred studies might be positive by chance, this turns the idea of statistical significance inside out. Of the studies done with the antidepressants, for instance, 50 percent show no benefit for the drug compared with the placebo. Fisher almost certainly would have thought that those investigating such drugs simply did not know what they were doing scientifically.

Significance testing also explains how companies are able to get away with claims that treatments work even when more people die who are given the drug in a trial than those given a placebo. Trials will be set up so that findings of lowered cholesterol levels with treatment, for example, will be statistically significant while the increase in dead bodies may not be. Doctors, like many others not well versed in mathematics, clutch at the illusory certainty offered by findings that are statistically significant, even if these are trivial, on the grounds that these results could not have arisen by chance. Fascinated with significance, they also tend mentally to dispose of any evidence of accumulating harms brought on by the same treatments, as we shall see in chapter 7, by denying they exist—on the basis that findings that are not statistically significant could have arisen by chance. They are hypnotized.

There has always been a deep-rooted tension between medical care and medical business. Good medical care once firmly embraced the idea that every remedy was a potential poison that inevitably produced side effects—the trick lay in knowing how and to whom to administer this poison in order to bring about a benefit that warranted these side effects. But insofar as they are looking after their business interests, medical practitioners have always been inclined to embrace new drugs, and most doctors want more of them if they can be convinced, or can convince themselves, that they have some positive effect. This produces a bias against seeing when these very same drugs shorten lives. In the era of evidence-based medicine, the marketing barrage of the pharmaceutical companies and the promise of statistical significance have led doctors into a world in which they regard treatments more as fertilizers or vitamins that can only do good if applied widely. As a profession, medicine is thereby losing any sense of the treatment as poison, and controlled trials, which began as a method to protect patients from the biases of doctors, have become instead a method to enhance business in great part because drug companies have managed to hook doctors to the crack pipe of statistical significance.19

TAMING CHANCE

Because it was fatal in miniscule doses, strychnine was a favorite of poisoners. In 1852, Pierre Touery claimed that activated charcoal could act as an antidote to strychnine, but his medical colleagues were not convinced. To prove his point, Touery, in front of a large audience on the floor of the French Academy of Medicine ingested some coal tar and then drank ten times the lethal dose of strychnine—without any ill effects. Nobody is likely to think now that we need a randomized controlled trial to make a convincing case for using coal tar in a situation like this. Neither Worcester nor Cabot nor any of their colleagues in Massachusetts pressed for controlled trials for diphtheria antitoxin after its introduction in the 1890s, when the garroting membranes the illness produced could be seen to dissolve almost immediately after the treatment was administered.

When treatments are unambiguously effective, we don't need a controlled trial to tell us so. But randomized trials have become such a fetish within medicine now that it's become the source of parody. A 2003 British Medical Journal article, for example, suggested that we should not be using parachutes because their efficacy hadn't yet been demonstrated in a placebo-controlled trial.20

The perverse humor here extends into almost every encounter between doctors and patients today. If tomorrow our doctor suggested putting us on a treatment that he said had not been shown to work in a controlled trial but that he had seen work with his own eyes, most people would likely refuse. In contrast, we would likely be reassured if he suggested he was only going to treat us with drugs that had been shown by controlled trials to work. We would be even more reassured if he told us that thousands of people had been entered into these trials, when almost by definition the greater the number of people needed in a trial, the more closely the treatment resembles snake oil—which contains omega-3 fatty acids and can be shown in controlled trials to have benefits if sufficiently large numbers of people are recruited.

How has it happened that our understanding of trials has in some sense been turned inside out? The story starts with the discovery of the antibiotics. The first real magic bullet that made the rest of modern therapeutics possible came in 1935 with the discovery of the sulfa drugs.21 Before the antibiotics were discovered a range of bacterial infections leading to conditions such as bacterial endocarditis (an infection of the lining of the heart), puerperal sepsis (an infection of mothers after childbirth), and septicemia (an infection of the blood stream) were commonly fatal. Sulfanilamide and, later, penicillin transformed this picture. Patients who were expected to die got up off their bed and walked out of the hospital. Neither doctors nor regulators required trials to show these drugs worked—the only onus on companies was to establish that their treatment was safe.

Wonderful though drugs like penicillin were, they failed when it came to the infection that terrified people at the time more than any other—tuberculosis—and it was this failure that led directly to the first randomized controlled trial in medicine. Part of the problem was that tuberculosis was a more chronic and insidious infection than bacterial endocarditis or puerperal infections, so it was harder to tell when the treatment was working. Where other infections came on dramatically and either cleared up or killed quickly, tuberculosis crept in on its victims, who might then have good spells and bad spells. Even sputum samples clear of the bacterium did not provide a foolproof answer to the state of the illness.

Every new treatment devised at the time was tested against tuberculosis—even the first antidepressants and antipsychotics—many of which worked in the test tube but not in patients. There had been endless claims for cures which had been shown repeatedly to be hollow in clinical care. So when Merck developed a novel class of antibiotic in 1945, of which streptomycin was the prototype, skepticism was called for. Austin Bradford Hill, who became the statistician at Britain's Medical Research Council (MRC) because tuberculosis had ruled him out of a career in medicine, suggested a controlled trial of the new drug using Fisher's ideas about randomization.

There were concerns about the ethics of what Hill was proposing. If the drug in fact turned out to be effective, were those who got the placebo being denied critical care? It was one thing to allow unfertilized plants in an agricultural field to languish but treatment trials had never left sick people untreated before. As it turned out, streptomycin wasn't as effective as penicillin for bacterial endocarditis, but it couldn't be said the drug didn't work for tuberculosis. The trial demonstrated that streptomycin made a difference to the patient clinically, reducing the amount of tubercle bacillus growing in sputum, and the number of tubercular holes visible on X-ray, and more patients on streptomycin survived. Running a randomized controlled trial in this case brought the effects of streptomycin into view in a way that would not have happened otherwise.

In the 1950s, even in the case of such clearly effective drugs as penicillin, the antipsychotic chlorpromazine, and the amphetamines, which, once developed, swept around the world, crossed frontiers and language barriers with little or no need for marketing, trials offered something useful. While there was no question penicillin worked on some bacteria, it was clear it did not work on all. And while chlorpromazine tranquilized, there could be real questions about which patients would most benefit from it. Similarly, the amphetamines obviously increased alertness and suppressed appetite, but did they make a difference in a medical condition? When controlled trials were conducted, it turned out that amphetamines were of benefit for narcolepsy, a condition where people abruptly fall asleep sometimes in mid-conversation, and possibly produced benefits in some neurotic states but did surprisingly little for severe depressions.

In assuming treatments don't work, controlled trials challenge therapeutic enthusiasm. Because surgery is such a clear physical stress to any body, many supposed that pre- and post-operative treatment with beta-blockers such as propranolol, which counter the effects of stress hormones on heart rate and blood pressure, could only be a good thing. This was so logical, it had become standard practice. But when the proper study was finally done, it was found that there were more deaths in the beta-blocker group.22 Similarly, it seemed obvious that treating the anemia that develops in the wake of renal failure would be helpful and likely prolong life expectancy, but when the first randomized trial was undertaken more patients died on the high-cost treatment (Aranesp) to relieve anemia than died in the placebo group.23

Demonstrations that treatments don't work typically come in trials sponsored by national health organizations or other not-for-profit research institutions rather than company trials. But there are exceptions. Given strong suggestions that anti-inflammatory drugs might help Alzheimer's disease, Merck decided to pursue a jackpot by investing millions of dollars in a trial to see if Vioxx would reduce the incidence of or slow the rate of progression of dementia. In fact on Vioxx (and later

Celebrex), more patients developed Alzheimer's, the disease progressed more rapidly, and more patients died than on the placebo.24

If all medicines are poisons, an outcome like this is not surprising. Simply recognizing that biology is complex highlights the risk of intervening and the need to test our assumptions and practices, no matter how benign the rationale for a particular approach might sound. An insistence on testing is exactly the spirit that gave rise to randomized controlled trials. They began as a means to control therapeutic enthusiasm, whether this enthusiasm came from the good intentions of physicians or from the greed of hucksters. What is there, then, about these trials that make companies so interested?

MIND THE GAP

In between treatments that are so obviously life-saving that trials are not needed and proposed remedies where trials save lives by demonstrating that the treatment doesn't work, there is the huge gap in which we have treatments that ease pain or restore function or promise some other benefit, even if a modest one. In the case of treatments that do not necessarily save lives but which equally cannot be dismissed as doing nothing, we are in much less certain waters than is usually realized. Controlled trials in these instances function primarily to bring to light both positive and negative associations between treatment and changes on a blood test or rating scale. It is in these waters that pharmaceutical companies have become adept at turning the evidence to their advantage.

Imagine an orthopedic department starting a trial on plaster casts for fractures of the left leg. As their placebo treatment they opted to have a cast put on the necks of the control group but in the active treatment group they randomly put casts on the right arm or leg, or left arm or leg of the patients, all of whom had broken left legs. The active treatment group in this case would do statistically significantly better than the placebo group but to advocate treating left leg fractures by indiscriminately putting a cast on any of four limbs on the basis that a randomized controlled trial had clearly shown this had worked would be nonsensical.25 Medicine in thrall to randomized controlled trials increasingly lets companies get away with just this, partly because an artful use of rating scales or blood tests conceals the fact that we don't know what we are doing. When we do know what is wrong the absurdity of simply practicing according to the figures becomes clear.

The plaster-cast example is not much more extreme than what in fact did happen in the case of the antidepressants.

When companies or their academics say today that a drug “works” what is commonly meant is that there is at least a minimal difference that is “statistically significant” between the effects of an active drug and a placebo on a blood test or rating scale. Evidence like this rather than evidence of lives saved or function restored, is all that the regulators need to let the drug on the market. Once approved for the market, the drug, be it for osteoporosis, cholesterol regulation, depression, or hypertension, is sold as though using it is the equivalent of being given penicillin or insulin. The problem is that increasingly, under the influence of company spin as to what the figures show, clinicians seem to prescribe drugs like the statins or antidepressants as though a failure to prescribe would leave them as open to a charge of clinical negligence as failing to prescribe insulin or penicillin would. The magic for companies lies in the fact that the numbers of patients recruited to the trials can be such that changes in rating scale scores or bone densities are statistically significant, whereas increased rates of death or other serious adverse events on treatment may not be.

When it comes to treating people who are supposedly depressed, anti- depressant credentials in the form of comparable changes on depression rating scales have been generated for most of the benzodiazepines, for a number of antihistamines, for almost all the stimulants, as well as for the antipsychotics and anticonvulsants,26 and they could be generated for nicotine or indeed for snake oil, whose omega-3 oils appear to have some psychotropic properties. A key difference between these diverse drugs and the selective serotonin reuptake inhibitors (the SSRIs) such as Paxil, Prozac, and Zoloft, was that the SSRIs were newly patented for treating depression, while drugs like nicotine or the antihistamines were unpatentable for this purpose. There was no incentive for companies to bring these latter drugs to the market but no reason to believe these drugs would be any less helpful than Prozac for depression. In the case of Prozac and Paxil, there is evidence of a weak association between treatment and a change on a rating scale but the question is what lies behind that change. The fact that so many quite different drugs can also be linked to a comparable benefit shows we know next to nothing about what is going on.

This is where the role of a mythic image of what a drug is supposed to do (a concept) can be of great importance to a marketing department. No one claims nicotine or benzodiazepines correct a lowering of serotonin in depression, whereas the SSRIs supposedly do. The idea that there is an imbalance of serotonin in depression is completely mythical. It arose in the marketing department of SmithKline Beecham, the maker of Paxil.27 The key thing about this myth is that it provides an image that functions like the imagery of bacterial colonies in a Petri dish shrinking back from an antibiotic, or images of cholesterol levels declining following treatment with statins, or bones becoming denser with biphosphonate treatment for osteoporosis. These images help create the impression that drugs “work,” when in fact the data from trials show these treatments have relatively minimal effects. These images create a spin that no data can overcome. Myths always have the last word.

How minimal are the treatment effects? In 2006 the FDA asked companies making antidepressants to submit all placebo-controlled trials to the agency. Just as some people recover from infections without treatment, based on 100,000 patients who had been entered into these anti- depressant trials, the data showed that four out of ten people improve within a few weeks whether treated with a drug or not.28 This may in part be due to the natural history of depressions in which 40 percent recover within a few months whether treated or not. Advice from a clinician on diet, lifestyle, alcohol intake, and problem solving on work and relationship issues may make a difference. Perception by patients that they are being seen and cared for by a medical expert may also make a difference, and this effect may be enhanced by being given a substance they think will restore some chemical balance to normal—even if that imbalance is mythical and the substance a placebo. On the active drugs, five out of ten apparently responded. But what comparing an active drug to a placebo shows us is that of these five, four (80 percent) of apparent responders to an antidepressant would have improved had they received the placebo. In other words, only one in every ten patients responds specifically to the antidepressant, whereas four in every ten treated with a placebo show a response.

If clinicians were really following the evidence, they should say that it's wonderful to have some evidence that antidepressants have benefits, but they would hold back on prescribing them indiscriminately and give a number of their patients a chance to recover without treatment. There is good reason to believe that many of those who recover without drug treatment are less likely to relapse in the longer run, which provides even more reason to wait judiciously in at least some cases.29 Given that the benefits obtained in the one out of ten are bought at a cost— overall more die on treatment than on placebo, more become dependent on treatment than on placebo, more on treatment have children born with birth defects than do those on placebo, and have many other side effects—the antidepressants arguably provide the perfect set of data to support Pinel's dictum that it is important to know when not to use a treatment.

On grounds of self-interest, there are good reasons for doctors to wait in many nonacute cases. Until recently the magic was in the therapist, who might also give pills, which were an extension of his or her impact on us. Now the magic has passed into the capsule and the physician often seems little more than a conduit for medication. Therapists have forgotten how influential they might be in promoting healthier lifestyles for conditions from raised cholesterol to the inevitable but relatively inconsequential thinning of bones that happens after the menopause. With the focus that both doctor and patient now have on taking a pill, seldom do either heed the context in which a person has become distressed or unhealthy. Neither doctor nor patient appears to see how small a contribution this chemical manipulation is likely to make or to see the potential for a chemical manipulation to make things worse. In practice, doctors end up so often doing what suits drug companies- they persuade patients to go on treatments. Why? In no small part because they have become convinced that these treatments have been shown in randomized controlled trials to work.

A consideration of these nondrug aspects of medical care doesn't just apply to drugs like the antidepressants. An antibiotic like penicillin might make a life-saving difference, but it's important to note that this may not be the only route to saving a life. Once the infection begins, an antibiotic may be by far the best way to help, and we would sue a doctor who let a patient die without treatment. But in the case of puerperal infections, long before the advent of penicillin, it had become clear that women were only likely to contract these disorders if they gave birth in hospitals where the infection could be transmitted readily from one woman to the next. Strict antiseptic procedures in hospitals could help, but giving birth outside the hospital made an infection much less likely.

At the moment doctors appear to be under increasing pressure from insurance managers, hospital bureaucrats, and others to hand out drugs in response to medical problems. If patients aren't on a treatment, they aren't in treatment. No one, not even an insurance manager, would want to be linked to unnecessary deaths. In using drugs that have been shown to “work” in a statistically significant fashion, all concerned think they are avoiding this possibility. But while penicillin can clearly be shown to save lives, the same clarity can't be found with antidepressants, statins given to people with no prior cardiovascular events, asthma drugs, or treatments for osteoporosis and many other conditions. In all these cases, a shrewd selection of statistically significant changes on rating scales or blood tests as evidence that the treatment “works” has been used by pharmaceutical companies to mesmerize all the key players.

“Working” in the case of all the best sellers in medicine, it bears repeating, means the drug produces changes on some measurement of interest to a drug company, rather than indicating the drug saves a life or returns someone to employment, or is better than an older drug in the field, or even makes a person simply feel better. When in the course of these trials patients are allowed to rate whether their quality of life has been improved, in results reminiscent of Sanjeebit Jachuk's study of propranolol, antidepressants, for instance, don't show any benefit over placebo. Such quality-of-life data from antidepressant trials are little known, however, because they remain almost universally unpublished.30 The bottom line is that while placebo-controlled trials have created appearances that the drugs work, with a few changes to the choice or rating scales or blood tests in these studies or taking into account the withdrawal effects many of these drugs have, it would be possible to show just the opposite for most of medicine's blockbusters.

There is a fundamental psychological issue here on which companies play, an issue illuminated by a series of experiments Daniel Kahnemann and Amos Tversky conducted in the 1970s on what happens when we are asked to make judgments under conditions of uncertainty.31 Kahnemann and Tversky, who won a Nobel Prize for their work, gave descriptions of a shy, retiring, and bookish personality to their test subjects and asked them to judge whether the person was a nurse or a librarian, having told them the personality profile had been drawn from a group that contains eight nurses and two librarians. Their subjects confidently said the person described was a librarian, when, given the probabilities, they should have said nurse. In the same way, statistics like those mentioned from the antidepressant trials (in which five out of ten seemed to improve from the drug, but closer inspection revealed that improvements in four out of those five could as well be due to placebo effects) should lead us, given the overwhelming odds, to attribute a positive response in a patient to a placebo effect. But like the subjects who chose librarian, we're more likely to jump to the conclusion that the antidepressant must have been the cause.

As drug marketers know, we are all more confident with stereotypes than with rational analysis of the probabilities of a situation. When we see patients on a pill recover, probably because of powerful examples like those of penicillin and insulin, we assume the recovery has come about because of the pill. This bias may be reinforced by hearing “experts” claim that antidepressants or statins work or by seeing these claims in what are considered authoritative publications. A mythic image of increasing bone density or normalizing serotonin levels or lowering cholesterol levels helps increase our certainty.

Neither clinicians nor patients are well equipped to make judgments based on data. Our psychology biases us against seeing what the data actually show, and this bias is aggravated by the selective publication of company trials that indicate a “positive” response to the drug and, ironically, by an apparatus put in place to ensure doctors adhere to the “evidence.” These factors have increasingly led to an almost automatic prescription of the latest drugs whether they are statins, hypoglycemics, biphosphonates, or psychotropic drugs.

COMPANY TRIALS

The job of medicine is to save lives, restore function, or improve on treatments already available. The aim of a drug company is to get their drugs on the market and generate profits by so doing. To see if a new treatment saves more lives or performs better than an older treatment, the obvious step is to compare the two. To get on the market, you could demonstrate superiority to an older treatment, but to satisfy the FDA or regulators in Europe or Japan you only have to beat the placebo. And if you recruit ever larger numbers of patients to trials, ever less clinically significant differences from placebo can become statistically significant. Perversely this will lead to the newer and weaker drug selling even better than the older one.

Almost all the drugs trials now conducted are done by their manufacturers to get their compounds on the market or to establish a market niche. Once completed, these trials mark the point at which any science stops—a drug has been shown to “work,” companies say, and the job of doctors is now to prescribe it—whereas entry into the market should mark the point where science begins to establish who benefits from which drug. If only a small number of people respond specifically to a statin for cholesterol levels or a biphosphonate for osteoporosis, we should be identifying who these specific responders are. Until we answer this we remain in the position of congratulating ourselves on the use of plaster casts when in three out of four cases they have been put on the wrong limb—but of course this is the question no company wants answered.

Even without fully understanding why a treatment helps, though, more can be done to improve how it is used. For instance, we have many different kinds of blood pressure medication, but almost no research to discover how they compare or who they suit. The first thiazide antihypertensives in the 1950s were succeeded by James Black's propranolol in the 1960s and 1970s, the ACE inhibitors in the 1980s, sartans in the 1990s, and a string of others, with each new drug marketed as the best. When the first proper head-to-head studies were done fifty years later, they showed that, in fact, the thiazides were the most effective and the safest.32 For fifty years we have used a succession of ever more expensive treatments, while the best and safest and least expensive treatments fell out of favor.

Similarly, the SSRIs in clinical trials got far fewer severely depressed patients well than older antidepressants.33 But to get on the market they did not have to be compared to an older drug; they had only to beat the placebo. As a result of marketing, the more recent drugs have almost completely replaced older antidepressants, even for the most severe depressions, for which there is not a scrap of evidence the newer drugs work.

The story is similar for the analgesics, drugs for osteoporosis, blood- sugar-lowering drugs, the antipsychotics, and almost every other best- selling drug. Best-sellers are not best sellers because they are in fact better than the drugs previously available. Yet in all these areas of treatment, doctors who are supposedly following the evidence make the latest drugs into best sellers.

There is more going on here than simply squandering money in the pursuit of the trivial (though profitable) new drug or giving patients suboptimal medicine. Company trials have radically changed how doctors treat patients. Before 1962 when the FDA stepped in and required companies to provide evidence of trials to bring their drug on the market, doctors for centuries had to learn how to use a drug when it became available. Digitalis offers a good example. This drug works by removing excess fluid from the body in cases of heart failure, but as with all drugs, digitalis came with problems. So doctors, when giving it, would typically start at a low dose, and work upward depending on how the patient responded.

But in company trials, no company is prepared to take a chance that their drug won't beat the placebo, so they err on the side of a higher or more poisonous dose. If these studies get the drug on the market, the trial results are then taken to mean that doctors should use the dose of the statin, analgesic, or antidepressant used in the trial, even though it is likely to be too high for many people. On the basis of early trials, the thiazides were given for hypertension in doses ten times higher than necessary, for example, while the lowest dose of Prozac for depression was four times higher than many people needed.34

Once a drug is launched, companies could run studies to find the right starting dose or determine which drug suits which kinds of patients, but marketing departments have resisted studies of lower doses of a drug—on the grounds that they want to keep things simple for doctors. Their dream is one-size-fits-all treatment, and they refuse to make lower dose formulations of a medicine available. In this way companies are, in essence, removing the craft as well as the art from medicine and encouraging overmedication.

But there are even bigger problems than this. Company trials trap both doctors and patients into treatment with the wrong medication. The antipsychotic group of drugs shows how badly wrong things can go. The first of these was chlorpromazine, a drug discovered in 1952 and widely cited as rivaling penicillin as a key breakthrough in modern medicine. It and succeeding tranquilizers had a magical effect on manic and delirious states and on some acute psychoses, sometimes restoring patients to normal functioning within days, or weeks, sometimes in hours. This was not a matter of small changes on a rating scale; there was no question but that chlorpromazine worked.35

Its French discoverers were sure, however, that it was not a cure for schizophrenia. In some cases it provided a useful tranquilization, but in up to a third of schizophrenic patients it made their condition worse, and in a further third the benefits were minimal. Nevertheless, most company trials to bring the successors of chlorpromazine to the US market were undertaken in schizophrenia. When responders, minimal responders, and those who got worse were combined, the results for these drugs could be shown to be, on average, marginally better than for the placebo—which was all it took to get these drugs on the market. These results, reinforced by rebranding the tranquilizers as antipsychotics, made it seem these drugs “worked” for schizophrenia. For the FDA, and most doctors, the one in three patients who got worse on treatment vanish into a statistical ether—but these patients don't vanish from the hospital and the clinic.

The antipsychotics are drugs to use judiciously. They increase rates of heart attacks, strokes, diabetes, and suicides. Studies that have examined longer term outcomes for patients on these drugs universally show a reduction in life expectancy measured in decades, not just years.36 This is not an argument against their use, but it is definitely an argument for ensuring that they actually are producing benefits that warrant the risks undertaken. Unfortunately, even when faced with a patient who is not responding or responding negatively and for whom therefore the treatment should be stopped, the drug often keeps being given. Nursing staff, hospital administrators, relatives, and other doctors find it inconceivable that a doctor might not give a drug that “works” to a patient who is so clearly ill with the condition that the drug supposedly benefits—to the point that a refusal to prescribe is in many settings getting ever closer to being grounds for dismissal.

In similar fashion, the company trials of statins for lowering cholesterol, biphosphonates for osteoporosis, and for other medicines that suggest these drugs “work” exert pressure on clinicians to prescribe and patients to acquiesce in treatment. This pressure cannot simply be put down to corruption by the pharmaceutical industry. No one, it seems, wants a doctor to wait judiciously to see if a treatment is really warranted. We might like the idea of an Alfred Worcester in books or movies but not in real life.

In addition to these serious consequences for medical care, there is a huge ethical problem with company trials. Randomized controlled trials began in conditions of scarcity after World War II, when those who volunteered to be left untreated or to get a potentially dangerous new treatment did so for the sake of their families, their relatives, or the communities from which they came. They consented, in other words, because they thought they were helping to improve medical care, and in so doing they did in fact lay the basis for our current freedom from infectious diseases, malignant hypertension and other disorders, and our better life expectancies in the face of tumors. The same spirit is now invoked when patients are asked to consent to company-run trials. But they are not told that these studies are designed to secure a business advantage, that they will lead to marketing that often substitutes giving drugs for caring, that far from benefitting their communities the studies may result in treatments that shorten lives, or that data from these studies may be sequestered so that nobody ever finds out about the side effects of treatment. They never get a chance to decide freely whether to consent to this or not.

If the primary ethical as well as scientific purpose of controlled trials was initially to debunk unwarranted therapeutic claims, companies have transformed them into technologies that mandate action. The method originally designed to stop misguided therapeutic bandwagons has in company hands become the main fuel of the latest bandwagons. A method that is of greatest use when it demonstrates drugs either do not work or have minimal effects has become a method to transform snake oil into a must-use life-saving remedy. In the process, evidence-based medicine has become evidence-biased medicine.

EVIDENCE-BIASED MEDICINE

In 1972, two decades after randomized controlled trials came into use, Archie Cochrane, a physician based in Cardiff, Britain, who had worked with Austin Bradford Hill when the first clinical trials were being set up, published an influential book on the role of evidence in medicine. The vast majority of medical services and procedures still had not been tested for their effectiveness, he noted, while many other services and procedures that had been tested and shown to be unsatisfactory, still persisted.37 Cochrane was a randomization extremist; in his view, not only doctors but also judges and teachers should be randomizing what actions they took to see what worked, but all three unfortunately had God complexes—they “knew” what the right thing to do was. As late as the 1980s, Cochrane claimed fewer than 10 percent of the treatments in medicine were evidence based.38

Cochrane made it clear that using controlled trials to evaluate treatments was not a matter of dragging rural hospitals up to the standards of Harvard or Oxford. Rather, mortality often seemed to him greater where there were more medical interventions rather than fewer. After coronary care units (CCUs) came into fashion in the 1960s, for instance, he suggested randomizing patients who were having heart attacks to treatment in a CCU versus home treatment. Cardiff physicians refused to participate on the grounds that CCUs were so obviously right. Cochrane ran the trial in neighboring Bristol instead. When he first presented the results, he transposed them so that the home treatment results, which were actually the better ones, appeared under the CCU column and vice versa. His audience demanded an instant halt to home treatment. But the response was quite different when the “error” was corrected and it was made clear that the data favored home treatment. To this day there is a reluctance to believe that home care might be better than care in a CCU.

Iain Chalmers, a perinatologist and health services researcher from Oxford picked up the baton from Cochrane. He was similarly struck that physicians often seemed slow to implement procedures that had been shown to work and instead stuck with approaches that had not been shown to work or had been shown not to work. His concern lay not just in encouraging trials but in accessing the information from trials that had already been done.39 Everyone knew there had been an explosion in the medical literature since World War II, but efforts to collect reports of clinical trials began to reveal that there were far fewer published trials than many had thought. Some of the trials done had been published multiple times, while others had not been published at all.

Many of the articles that dictated clinical practice, furthermore, were framed as review essays, published under the names of some of the most eminent academics in the field, but on closer inspection, these often lengthy articles with their impressively long reference lists espoused only one point of view of a topic. These academics were not systematically considering all the available research, in other words. These were not scientific reviews—they were rhetorical exercises. Recognition that a scientific review should be systematic led Chalmers to set up the Cochrane Center in 1992 dedicated to amassing all available clinical trial evidence in every branch of medicine, even when the evidence had not been published.

It was David Sackett at Canada's McMaster University, outlining a program for educating medical students to practice according to the evidence, who branded the new dispensation evidence-based medicine.40 When it came to considering the evidence, Sackett drew up a hierarchy in which systematic reviews and randomized controlled trials offered gold standard evidence, while at the bottom of the hierarchy came individual clinical or anecdotal experience. This was a world turned upside down. Just a few years earlier, clinical judgment had been seen as the height of medical wisdom.

The implication was that we should submit every procedure to controlled trial testing. Even if newer treatments were more expensive as a result, in due course the health services would gain because money would be saved as ineffective treatments were abandoned and better treatments reduced the burden of chronic illnesses. This seemed to be a win-win claim for those paying for health services, for physicians and their patients, as well as for scientific journals. It quickly became almost impossible to get anything other than clinical trials published in leading journals.

When Cochrane advocated for randomized controlled trials, Chalmers campaigned for comprehensive collection of their results, and Sackett drew up his hierarchy of evidence placing trial results at the top, no distinction was drawn between independent and company trials. Controlled trials were controlled trials. It seemed so difficult to get doctors to accept the evidence that their pet treatments didn't work, that any indication that doctors were practicing in accordance with clinical trial evidence seemed a step in the right direction.

There are two problems with this approach. The first applies to both independent and company trials—namely, that we appear to have lost a sense that, other than when they demonstrate treatments don't work, what controlled trials do primarily is to throw up associations that still need to be explained. Until we establish what underpins the association, simply practicing on the basis of numbers involves sleepwalking rather than science—equivalent to using plaster casts indiscriminately rather than specifically on the fractured limb.

The second is that in the case of company trials, the association that is marketed will have been picked out in a boardroom rather than at the bedside. One of the most dramatic examples of what this can mean comes from the SSRIs, where the effects of these drugs on sexual functioning are so clear that controlled trials would be merely a formality. In contrast, hundreds of patients are needed to show that a new drug has a marginal antidepressant effect. Yet the marketers know that with a relentless focus on one set of figures and repetitions of the mantra of statistical significance they can hypnotize clinicians into thinking these drugs act primarily on mood with side effects on sexual functioning when in fact just the opposite would be the more accurate characterization. Because it has become so hard to argue against clinical trials of this nature, there is now almost no one at the séance likely to sing out and break the hypnotic spell.

A cautionary tale involving reserpine may bring home how far we have traveled in the last half century. In the early 1950s, medical journals were full of reports from senior medical figures claiming the drug worked wonderfully to lower blood pressure; what was more, patients on it reported feeling better than well.41

Reserpine was also a tranquilizer and this led Michael Shepherd, another of Bradford Hill's protégés, in 1954 to undertake the first randomized controlled trial in psychiatry, in this case comparing reserpine to placebo in a group of anxious depressives.42 While reserpine was no penicillin, some patients were clearly more relaxed and less anxious while on it, so it was something more than snake oil. Shepherd's trial results were published in the Lancet, a leading journal; nevertheless, his article had almost no impact. The message sank without trace, he thought, because medicine at the time was dominated not by clinical trials but by physicians who believed the evidence of their own eyes or got their information from clinical articles describing cases in detail— “anecdotes”—as they would now be called.43

Ironically the two articles preceding Shepherd's in the same issue of the Lancet reported hypertensive patients becoming suicidal on reserpine.44 Reserpine can induce akathisia, a state of intense inner restlessness and mental turmoil that can lead to suicide. The case reports of this new hazard were so compelling, the occurrence of the problem so rare without exposure to a drug, and the onset of the problem subsequent to starting the drug plus its resolution once the treatment was stopped so clear that clinical trials were not needed to make it obvious what was happening. On the basis of just such detailed descriptions, instead of becoming an antidepressant, reserpine became a drug that was said to cause depression and trigger suicides. But the key point is this—even though superficially contradictory, there is no reason to think that either the case reports or the controlled trial findings were wrong. It is not so extraordinary for a drug to suit many people but not necessarily suit all.

Fast forward thirty-five years to 1990. A series of trials had shown Prozac, although less effective than older antidepressants, had modest effects in anxious depressives, much as reserpine had. On the basis of this evidence that it “worked,” the drug began its rise to blockbuster status. A series of compelling reports of patients becoming suicidal on treatment began to emerge, however.45 These were widely dismissed as case reports—anecdotes. The company purported to reanalyze its clinical trials and claimed that there was no signal for increased suicide risk on Prozac in data from over three thousand patients, when in fact there was a doubling of the risk of suicidal acts on Prozac but this increase was not statistically significant and thus was ignored. Even if Prozac had reduced suicide and suicidal-act rates, it would still be possible for it to benefit many but pose problems to some. But the climate had so shifted that instead the fuss generated by the Prozac case reports added impetus to the swing of the pendulum away from clinical reports in favor of controlled trials.

But as we saw in the analysis of antidepressants, in addition to the 40 percent who responded to placebo, a further 50 percent of patients (five out of ten) did not respond to treatment, so that in only publishing controlled trials and not the convincing reports of hazards for treatments like the antidepressants, journals are privileging the experiences of the one specific drug responder over the nine-fold larger pool of those who in one way or another are not benefitting specifically from the drug. Partly because of selective publication practices, partly because of clever trial design, only about one out of every hundred drug trials published in major journals today is likely to do what trials do best—namely, debunk therapeutic claims. The other ninety-nine are pitched as rosily positive endorsements of the benefits of statins or mood stabilizers, treatments for asthma or blood pressure or whatever illness is being marketed as part of the campaign to sell a blockbuster.

The publishing of company trials in preference to carefully described clinical cases, allied to the selective publication of only some trials of a drug, and interpretations of the data that are just plain wrong amounts to a new anecdotalism. The effect on clinical practice has been dramatic. Where once clinicians were slow to use new drugs if they already had effective treatments, and when they did use the new drug, if their patients had a problem, they stopped the treatment and described what had happened, we now have clinicians trained to pay heed only to controlled trials—clinicians who, on the basis of evidence that is much less generalizable than they think, have rapidly taken up a series of newer but less effective treatments.

The development of randomized controlled trials in the 1950s is now widely acclaimed as at least as significant for the development of medicine as any of the breakthrough drugs of the period. If controlled trials functioned to save patients from unnecessary interventions, it would be fair to say they had contributed to better medical care. They sometimes fill this role, but modern clinicians, in thrall to the selective trials proffered up by the pharmaceutical companies, and their embodiment in guidelines, are increasingly oblivious to what is happening to the patients in front of them, increasingly unable to trust the evidence of their own eyes.

We have come to the outcome that Alfred Worcester feared but not through the emphasis on diagnosis and tests that so concerned him. It has been controlled trials, an invention that was designed to restrict the use of unnecessary treatments and tests, which he would likely have fully approved of, that has been medicine's undoing.

This company subversion of the meaning of controlled trials does not happen because of company malfeasance. It happens because both we and our doctors as well as the government or hospital service that employs our physicians, in addition to companies, all want treatments to work. It is this conspiracy of goodwill that leads to the problems outlined here.46 But in addition to this, uniquely in science pharmaceutical companies are able to leave studies unpublished or cherry-pick the bits of the data that suit them, maneuvers that compound the biases just outlined.

Two decades after introducing the randomized controlled trial, having spent years waiting for the pendulum to swing from the personal experience of physicians to some consideration of evidence on a large scale, Austin Bradford Hill suggested that if such trials ever became the only method of assessing treatments, not only would the pendulum have swung too far, it would have come off its hook.47 We are fast approaching that point.