5

Placebos in Clinical Trials

After thousands of studies, hundreds of millions of prescriptions and tens of billions of dollars in sales, two things are certain about pills that treat depression: Antidepressants like Prozac, Paxil and Zoloft work. And so do sugar pills.

—Shenkar Vedantam, Washington Post, 2002

There are many potholes on the long road from “bench” laboratory discovery to “bedside,” the clinical setting in which drugs are prescribed to patients. After discovering a protein or pathway that impacts a particular disease, drug developers spend years honing small synthetic or large biological molecules that target and they hope will perturb the disease process. A lead molecule is chosen based on its selectivity for the target, and then preclinical scientists characterize the molecule’s metabolism, excretion, and safety in animal models. Only after clearing these hurdles is the molecule ready for testing in humans via the phases of clinical trials. In small phase I trials of twenty to eighty healthy volunteers, the safety and dose are determined. If that goes well, the next step is testing for the efficacy in increasingly larger phase II (one to three hundred patients) and then phase III (one to three thousand patients) placebo-controlled clinical trials. If patients give their informed consent to join the study, they are randomized to either the study drug or placebo control arm of the trial.

Placebos as Controls in Clinical Trials

Placebos in clinical trials are used to control for regression to the mean, Hawthorne effects, the natural history of the condition, and of course, placebo effects. Regression to the mean in clinical trials is a statistical phenomenon in which particularly high or low measurements or readings, when repeated at a later date, will tend toward the average. This is important in clinical trials because patients have a tendency to enroll in trials when their symptoms are at their worst and therefore can appear to have improved regardless of the intervention.

The Hawthorne effect is named for a series of commissioned lighting studies conducted at the Western Electric factory near Chicago between 1924 and 1927. The studies were designed to determine whether workers were more efficient in bright or dim lighting. During the study, worker productivity was enhanced with all illumination intensities, but returned to prestudy levels once the study was completed. After several other similar experiments, it was determined that it was the observation and monitoring of the workers that was driving improvements in productivity. Thus the simple act of enrolling in a clinical trial, completing questionnaires about the state of one’s condition, and being observed by a clinician can influence and in some cases ameliorate symptoms.

Natural history refers to natural symptom fluctuations expected in the course of a condition. A good example of this is the common cold. If a researcher conducted a two-week study on the common cold, it is quite likely everyone would respond to treatment at the end of the study not because the treatment worked but rather because colds generally get better in about a week.

Finally, placebo effects, the subject of this book, are physiological responses to the therapeutic context and treatment delivery that can themselves result in clinical improvements sans active intervention. There is a general bias or expectation on the part of clinicians as well as patients to believe that patients who receive the active treatment are more likely to get better. Hence to avoid bias and the influence of expectations that can enhance placebo effects, studies are “double-blinded” so that neither the patients nor the clinicians know whether the patient is receiving the active study drug or an inert placebo.

To receive FDA approval, drugs tested in “pivotal” phase II and III trials must produce a significantly greater clinical benefit compared to a placebo control. Failure to beat the placebo response is one of the most common reasons why drugs “fail” in clinical trials of neurological and psychological conditions.1 There is a high price to pay for these failures. A 2020 study from the London School of Economics estimated the median cost of bringing a new drug to market at $985 million, with an average cost of $1.3 billion.2

When drugs fail, the reverberations across patients’ lives can be widespread, and impact prognosis, quality of life, and emotional well-being. Furthermore, when these novel treatments, many with compelling mechanisms of action, are shelved, companies can fail, and jobs are often lost. The fault, of course, is not with placebos; many drugs just don’t work. A survey of phase III trials in 2016 indicated that most fail because of a lack of efficacy, and a majority of these trials were in oncology, a field in which demonstrating a survival benefit in patients with cancer is a critical and high bar.3 In neurological (e.g., Alzheimer’s disease), psychiatric (e.g., depression), and functional pain (e.g., IBS) illnesses, however, the response to the placebo treatment is commonplace and can tip the balance in failure’s favor.

Placebo Controls in Alzheimer’s Trials

It has been almost twenty years since the FDA approved a new drug for Alzheimer’s disease, a progressively debilitating neurodegenerative condition resulting in brain changes that can severely impair cognition. It is a late onset condition characterized by memory loss, confusion, mood changes, difficulty understanding and processing information, and general cognitive decline. In 2019, Alzheimer’s was the sixth leading cause of death in the United States. In 2021, 1 in 9 people over the age of sixty-five and a total of 6.2 million people in the United States were diagnosed with Alzheimer’s disease.4 Today, there is no cure. As our population ages, the prevalence of Alzheimer’s is increasing, along with its enormous attendant social and economic burdens.

Although deficits in expectation-mediated placebo responses have been observed in patients with Alzheimer’s in an experimental setting, robust placebo responses in clinical trials have thwarted even the most promising drug development programs.5 To tackle this problem, nonprofit organizations like TransCelerate and the Critical Path Institute are leading efforts to harness the collective knowledge in historical placebo controls to model disease progression, and understand when and how randomized drug and placebo treatments can influence this progression.6 Such efforts call for generosity and trust among drug manufacturers as they pool their resources, successes, and failures to tackle the many obstacles in getting much-needed drugs to patients.

Drugs designed to prevent disease progression and treat Alzheimer’s target the many interrelated pathologies that result in some of the biological correlates of the disease, including protein aggregation, amplified oxidative stress, neuroinflammation, and impaired neurotransmission in the brain. The deposition of amyloid-β (Aβ) plaques and neurofibrillary tangles are characteristic findings in the brains of patients with Alzheimer’s. One approach to preventing the neurodegeneration and neurotoxicity associated with Aβ plaques is to use antibodies, large biomolecules naturally generated by the body, to bind to and clear out this unwanted material.

Biogen in Cambridge, Massachusetts, is a US biotech company that developed one of the first promising monoclonal antibody therapies, aducanumab, that targets Aβ plaques.7 In preclinical mouse studies, Biogen demonstrated that aducanumab entered the brain and reduced plaque formation in a dose-dependent manner. The Biogen researchers then moved on to small pilot studies in humans and had exceedingly promising results. After a year of monthly intravenous infusions, patients with mild Alzheimer’s had reductions in brain Aβ plaques accompanied by a decrease in the rate of clinical decline. Importantly, the drug appeared to be safe and tolerable, albeit difficult to administer by infusion. Buoyed by these positive results, Biogen initiated two essentially simultaneous phase III clinical trials, a strategy to shorten the time to FDA approval.

Unfortunately, when the FDA independent advisory panel met at the end of 2020, it concluded that even the most compelling data did not support efficacy and voted overwhelmingly against approval. At issue were dose-dependent improvements observed with aducanumab that only separated from the placebo at the highest doses. Aducanumab, though seemingly safe, overall did not beat the placebo. Still, with no new drugs since 2003, Alzheimer’s patient advocates and some patients who participated in the trial strongly encouraged the FDA to consider approving aducanumab. To the surprise of many, the drug was approved in 2021, and in 2022 the Centers for Medicare & Medicaid Services announced that Medicare would cover the new drug, pending further evidence of efficacy. With the contention around these developments, Biogen reduced the price of the drug by fifty percent in response to slow sales. Meanwhile, the failures of other Alzheimer’s drugs continue to pile up; atuzaginstat from San Francisco–based Cortexyme, because of toxicity, and others like troriluzole from Biohaven Pharmaceuticals, based in Connecticut, were not approved because they failed to beat the placebo.

Placebo Controls in Clinical Trials of Depression

Alzheimer’s disease is not the only therapeutic area battling placebo responses in clinical trials. Major depression is a chronic, recurring, and often debilitating psychiatric mood illness characterized by persistent feelings of sadness and anhedonia. In 2018, 264 million people worldwide were estimated to be affected by depression. The high prevalence of this condition comes at a growing cost. Since the introduction of fluoxetine (Prozac) to the market in the late 1980s, several other SSRIs have been approved by the FDA including sertraline (Zoloft) and paroxetine (Paxil). When asked, 75 percent of depressed patients said that they would prefer psychotherapy over antidepressant medication.8 Nevertheless, with the demands on psychiatrists to treat the growing number of people suffering from depression, the expediency of prescribing a pill can frequently supersede the patient’s preference for psychotherapy sessions. In 2019, the global antidepressants market was estimated at $14.3 billion. The market surged to $28.6 billion in 2020 as a result of the COVID-19 pandemic and is expected to level out at $19 billion in 2023. Over the last twenty years, as Prozac, Zoloft, and Paxil became household names, a vigorous debate over whether they were any better than placebos played out among academic researchers and the press.

The first salvo in the antidepressant-placebo debate was fired in 1998 by Irving Kirsch, who was then at the University of Connecticut. Kirsch and colleague Guy Sapirstein published a controversial meta-analysis examining the average size of antidepressant effects compared to placebos in nineteen double-blind randomized clinical trials.9 A meta-analysis is a statistical approach that allows researchers to combine results across clinical trials to get a sense of the overall effect of an “exposure,” which in this case was antidepressants. Kirsch and Sapirstein found that the difference between the drug and placebos was vanishingly small. Based on the data, they estimated that placebos accounted for approximately 75 percent of the improvement ascribed to antidepressants. The remaining 25 percent could be attributed to an enhanced placebo response resulting from drug-induced side effects that allowed patients in clinical trials to “break-the-blind” and correctly guess that they were in the active drug treatment arm. As discussed in chapter 2, expectation is a critical driver of symptom improvement with placebo treatment; thinking that one is in the active treatment arm can lead to the self-fulfilling expectation of symptom improvement.

Four years later, in 2002, another meta-analysis, this time led by Arif Khan at Duke University, examined forty-five phase II and III antidepressant clinical trials in the FDA database.10 The FDA requires sponsors of clinical trials to submit their results regardless of whether the trials were positive (favoring the drug) or negative (favoring the placebo). This protocol is important to avoid publishing bias such that studies that are positive are more likely to be published in academic journals and thus more accessible to researchers than negative studies. While these data offered a more comprehensive look at the differences between antidepressants and placebos, this particular FDA data set only contained averages and did not include an estimate of the variability (i.e., no standard deviations or standard errors) so only simple data tabulations were possible. Nonetheless, Khan and colleagues found that the placebo response was a function of depression severity; in other words, the placebo response was smallest and the antidepressants were most effective among patients with severe depression.

Why were the antidepressant effects so small, and which patients were benefiting? By 2008, no fewer than 120 meta-analyses were published trying to identify the demographic variables (e.g., age and sex), comorbidities (e.g., bulimia), risk factors (e.g., smoking), and trial designs (e.g., placebo run-ins) that influenced the benefit from antidepressants. For the most part, the findings remained the same: little to no differences between the drug and placebo for mild depression, and significant but still relatively small benefit with an increased severity of depression.

In 2008, Kirsch, then at the University of Hull, invoked the Freedom of Information Act to access more comprehensive data submitted to the FDA.11 Unlike the previous FDA data set, these data contained means and estimates of the variability. Still, the researchers found just small differences between the antidepressant and placebo. In that same year, Erick Turner at Oregon Health and Science University also published a meta-analysis combining published data with FDA data.12 Turner found striking differences between the sizes of the effects reported in the published literature and the data available from the FDA. Almost all the antidepressant trials were positive in the published literature, but an analysis of the FDA data showed that only half of the registered trials were positive.

As Turner pointed out, this trend of the selective reporting of positive clinical trial results could have adverse consequences for researchers, patients, and health care professionals. Turner’s meta-analysis found a small but statistically significant standardized mean difference (SMD) of 0.31 between the drug and placebo across all the data combined. The SMD is a summary statistic that makes it easy to compare the average effects across different outcome measures that might use different scales and different units. It is simply the difference between the average effect in the antidepressant groups minus the average effect in the placebo groups divided by the variability (standard deviation) among all the participants. In the published data, the result of the meta-analysis was higher: 0.41. Remarkably, both Kirsch and Turner found essentially the same SMD between antidepressants and placebos. Kirsch’s SMD was 0.32.

With such similar results across these meta-analyses, one might wonder what was left to debate. It was the interpretation of the significance of these findings that was contentious. Turner, consistent with the data, argued that each drug was superior to a placebo. Kirsch, consistent with the National Institute for Health and Care Excellence (NICE) guidelines, contended that this incremental change was not clinically significant. In the United Kingdom, NICE provides national guidance and advice to improve health and social care. Early in the debate, NICE recommended that a three-point reduction in the Hamilton depression rating scale (HDRS) or SMD of 0.5 was to be considered clinically significant. In depression, clinicians frequently use the HDRS, a seventeen-item survey, to measure disease severity. This clinician-administered depression assessment scale asks patients about their feelings of sadness, guilt, anxiety, and sleep patterns. The higher the score, the greater the severity, and reductions in the score after treatment indicate improvement. Whether depression is mild, moderate, or severe is determined by the HDRS score at the beginning of the study. Thus Turner and Kirsch both had valid points. The question was, How much improvement in the HDRS should be considered clinically significant?

In 2018, the then largest of these dueling meta-analyses was published.13 This time, researchers had the benefit of a recently developed, more powerful network meta-analysis tool; with twenty-one antidepressants in 522 trials comprising of 116,477 participants, they finally had “big data.” Led by Andrea Cipriani at Oxford University, this impressive effort was as robust and rigorous as the preceding studies, if not more so, and once again the researchers found the same small but significant benefit of antidepressants over a placebo. Soon after, at the 2018 American Society of Clinical Psychopharmacology Annual Conference, Marc Stone of the FDA reported results from another analysis, arguably the most comprehensive.14 This group from the FDA used all published and unpublished antidepressant trials sent to the FDA between 1979 and 2016. Like in the reports before him, Stone found the same small difference between the antidepressant drug and placebo response. With mounting criticism about the arbitrariness of its designation of clinical significance, NICE removed its recommended thresholds. Whether that small difference is clinically significant remains up for debate.

Minimizing Placebo Responses in Trials

Despite the consistent significant but small benefit of antidepressants over a placebo, the road to approval is still littered with near misses and failures, and some FDA decisions are still hanging in the balance. The failure to beat a placebo has been attributed to a myriad of factors: lax inclusion and exclusion criteria, the trial duration being too short or too long, a lack of adherence to the study drug, the wrong dose or formulation, the wrong study population, too small or too large trial sizes with too many sites, or the drug just doesn’t work.

As many reasons as there are for failure, there are strategies in place to remove their influence. We can group these into three buckets: patient variables, clinical trial designs, and outcome measures. Ironically, with marginal differences between a drug and placebo, clinical trialists need to boost the statistical power by enrolling more patients into the trials. This need for more subjects has contributed to the globalization of clinical trials, which in turn has led to increased heterogeneity due to differences in access to clinical care (in some countries, enrolling in a trial is a way to get treatment), risk factors, and beliefs about health care that can influence expectations.15 Trialists yielding to financial and temporal pressures might relax inclusion criteria, further increasing heterogeneity in the target population. Another problem is the growing cohort of professional patients who game the system by simulating patient effects to get enrolled in trials for money.

Adherence is also a critical factor in response to treatment. While medication adherence is likely a proxy for healthy behaviors, several studies have found that both drug and placebo adherence are associated with better health outcomes.16 Still, poor adherence can in many cases be blamed for drugs failing to beat the placebo response. To address this problem, smart pill bottles are being deployed in clinical trials to monitor when patients take their study meds.17

Perhaps the biggest variable driving patient heterogeneity is the patient. In the case of depression, there is tremendous heterogeneity in the presentation and origin of symptoms. Some patients exhibit every symptom, and others only exhibit one or two. Some patients display depressive symptoms later in life after major events, and others show symptoms from a young age with no obvious “event.” Because of these differences, it can be difficult to identify the mechanistic treatment needs of each individual patient based on the responses of the group. Add in gender, education, and social, psychological, and financial stressors, and you can see how the appropriate treatment could vary within just one individual at different stages in life, let alone for millions on a nationwide scale over time.

The elements of clinical trial design including the size, duration, number of treatment arms, and follow-up frequency are just a few of the variables that can influence outcomes in both the drug treatment and placebo arm of a trial. After more than seventy years of using placebo controls in clinical trials, we are only now starting to use historical placebo control data to map and project disease trajectories as well as predict placebo effects over time. Clinical trial designs that deviate from the gold standard of randomized placebo controls to managing placebo responses have had mixed results.

One such design, the placebo run-in, in which all the participants are given a placebo for the initial part of the study and then randomized, seemed like a great idea to weed out placebo responders. But this approach does not seem to lead to a greater effect size of the drug compared to a placebo.18 Some sponsors initiate multiple phase II or III trials simultaneously at different doses, gambling that at least one or two of them will work, and thus they would have saved time. As in the case of the Alzheimer’s aducanumab studies, this approach also has its drawbacks and can lead to confusion. More sophisticated trial designs are currently being investigated. One in particular, the sequential parallel comparison design (SPCD), has gained the recent attention of investors, media, and patients.

SPCD was designed by Maurizio Fava and David Schoenfeld at Massachusetts General Hospital in Boston with the aim of minimizing the effects of placebo responders. At first glance, the algorithm seems convoluted, but it makes a lot of sense. First, patients are randomized to a drug or placebo. After a prespecified period of time, those who don’t respond to the placebo are rerandomized to a drug or placebo, but the placebo responders are kept on the placebo. The patients on drug in stage 1 stay on the drug throughout the trial. All the participants initially randomized to the drug stay on the drug. This innovative method was used most recently in the clinical trials of Alkermes’s depression drug ALKS-5461 (pharmaceutical companies often affectionately name their drugs by their stock symbol plus a number). After two clinical trials, ALKS-5461, a fixed-dose combination of buprenorphine (a potent mu-opioid agonist) and samidorphan (a mu-opioid antagonist), failed to get approval from the FDA. While you might think the problem was that ALKS-5461 is an opioid seeking approval for treating depression in the throes of an opioid epidemic and COVID-19 pandemic, the impasse with the FDA appears to be related to disagreement over the use and analysis of the SPCD design. Ever cautious, the FDA is requiring more data from other trials to legitimize SPCD as an acceptable trial design. Quite naturally, drug sponsors are reluctant to run SPCD trials to produce the data without the assurance that the SPCD trials will be considered valid by the FDA. Given the importance of getting it right and the risks associated with being an early adapter, the future of novel designs will no doubt take some time to unfold.

Placebos in Clinical Trials: Where We Stand

The streptomycin and paluridine trials of the late 1940s are commonly touted as the first placebo-controlled clinical trials of the modern era, but they were preceded by the well-controlled patulin trial of 1944.19 There was great promise that patulin, an antibiotic that like penicillin, was extracted from mold, was indeed the cure for the common cold. On Halloween 1943, the Sunday Express (London) heralded that “extensive tests—some on naval men—have since been carried out and it is believed they confirm its efficiency. The results of these tests are about to be published by the Medical Research Council.”20 But within months the disappointment and amnesia set in. The study conducted did not discern a difference between the drug and placebo.

Our ability to set aside disappointing results and move on to the next study is a hallmark of the resilience of drug discovery. Over the seventy-five years since these early trials, we have tended to rationalize and marginalize failures, while creating a story and vision of success for the next novel drug with compelling mechanisms of action, promising early efficacy and safety. Sometimes we get it right. In 2023, Merck’s Keytruda, an antibody used in cancer immunotherapy, will become the best-selling drug in the world, projected to reach $22.5 billion by 2025. Yet increasingly in clinical trials of neurological and psychiatric conditions, the number failing to demonstrate efficacy beyond the placebo control is growing.

Through averaging, they completely miss the subset of people who have strong positive or negative responses to both the drug and placebo.

Still, there is much hope. It is important to remember that the current analyses of trials look at the average effects across large groups of people. Through averaging, they completely miss the subset of people who have strong positive or negative responses to both the drug and placebo. While it is easy to do a post hoc analysis and say that older participants or patients with a certain genotype are likely to be placebo or drug responders, these subset analyses are frowned on. And for good reason, since the more specific subgroups you look at, the more likely you are to find a significant result just by chance. Further, original studies are carefully designed to have sufficient power to discern a significant effect. In subset analyses, the smaller sample sizes increase the likelihood of a false positive finding. Hence the holy grail in precision trials is to be able to predict who will respond or benefit from a therapy or even placebo.

As I will discuss in chapter 7, machine learning allows us to grapple with the complexity and heterogeneity in order to identify important patient- and clinical trial–level variables that enhance our precision. Patient-level variables include disease severity, age, sex, history of the condition, comorbidities, previous and current medications, neuroimaging data, and genomics. So it is here, amid the hills of heterogeneity, that the next great clinical trial challenge, precision trials and precision prescription, has set up camp.