ONE OF THE most heartbreaking stories about abysmal experimental design involves amyotrophic lateral sclerosis (ALS), better known as Lou Gehrig’s disease. The search for a treatment for this deadly degenerative disease is rife with studies so poorly designed that they offered nothing more than false hope for people essentially handed a death sentence along with their diagnosis. Tom Murphy was one of them.
Once an imposing figure, Murphy had played football and rugby in college. His six-foot-three frame and barrel chest gave him a solid presence. But his handshake wasn’t the crushing grip you might expect. The first time we met, it was a gentle squeeze. When we met again a year later, we didn’t shake hands at all. Murphy had lost his formerly impressive strength due to ALS.
People around the world donated more than $100 million to fight this deadly ailment during the Ice Bucket Challenge of 2014, but for most people its real-life consequences are an abstraction: something about the degeneration of nerves. For Murphy, a fifty-six-year-old father of three, ALS was a very concrete, slow march toward the day when his nerves could no longer direct his diaphragm to draw air into his lungs. (Physicist Stephen Hawking is the rare exception who has managed to survive for many years despite the diagnosis.)
Murphy, remarkably, was not bitter about this turn of events when he told his story. Nor was he resigned to fading away when he first noticed some unusual muscle twitches in the winter of 2010. He went to his doctor, who, after a brief examination, sent him to a neurologist. Murphy actually ended up seeing three different neurologists before he finally got the diagnosis.
“When the guy said, ‘Sorry to tell you, but you have two to four years. Get your stuff together,’ I thought, ‘Really?’ It was a real curveball. I would never have thought that in a million years.” To prepare for what was likely to come, Murphy and his wife, Keri, sold the family home and bought a modern ranch-style house in Gainesville, Virginia, which Murphy could navigate without having to contend with stairs. He would eventually be getting around on wheels, once the muscle tone in his legs had faded. A giant TV graced the open and airy living room, where Murphy watched sports that he could no longer play himself.
But Murphy’s doctors also offered at least a sliver of hope. “The first thing they told me is we have a drug trial; would you like to be in it? And of course I thought it sounded pretty good,” Murphy said. People with ALS find their strength declines within a few years, and trials of potential drugs are only available to reasonably strong patients. So most only get one shot at an experimental treatment. In May 2011 he settled on the test of a drug called dexpramipexole (or simply “dex”), becoming one of about nine hundred patients enrolled in a multi-million-dollar study. But when the drug company analyzed the data collected, the news was disappointing. Dex was not slowing the progression of symptoms in this group of patients. The trial was a bust.
Murphy was philosophical. There’s no question the disease is a tough one to counteract. Almost everything scientists have tried for ALS has failed (other than one drug with very marginal benefit). So all scientists in the field have gone in knowing the likelihood of failure is high, but they didn’t know exactly why until a nonprofit research center called the ALS Therapy Development Institute (ALS TDI) in Cambridge, Massachusetts, began investigating that question. Researchers there decided to look at the original studies to see what they could learn. They discovered that the original animal studies to test these drugs were deeply flawed. They all used far too few mice, and as a result they all came up with spurious results. Some experiments used as few as four mice in a test group. Sean Scott, then head of the institute, decided to rerun those tests, this time with a valid experimental design involving an adequate number of mice that were handled more appropriately. He discovered that none of those drugs showed any signs of promise in mice. Not one. His 2008 study shocked the field but also opened a path forward. ALS TDI would devote its efforts to doing this basic biology right.
Scott died of ALS in 2009 at the age of thirty-nine—the disease runs in his family. His successor, Steve Perrin, has carried on as Scott would have, insisting on rigorous animal studies as the institute’s scientists search for anything to help people like Tom Murphy. And they’re not simply taking the basic—and what should have been obvious—step of starting with enough mice in each experiment. Male and female mice develop the disease at somewhat different rates, so if scientists aren’t careful about balancing the sexes in their experiments, they can get spurious results. Another problem is that the ALS trait in these genetically modified mice can change from one generation to the next. The scientists at ALS TDI look at the genetics of every single animal they use in an experiment to make sure that all are identical. “These variables are incredibly important,” Perrin said. Other scientists had often overlooked those pitfalls.
To get robust results, Perrin’s group uses thirty-two animals—and compares them to an untreated group of thirty-two more mice. Academic labs don’t use large numbers of mice in their experiments in part because they cost a lot of money. Perrin said each one of these tests costs $112,000, and it takes nine months to get a result. If you’re testing three dosages of a medication, each requires its own test. Perrin’s institute has shown clearly that cutting corners here can lead to pointless and wasteful experiments. Even so, “we still get some pushback from the academic community that we can’t afford to do an experiment like that,” he said. It’s so expensive that they choose to do the experiments poorly.
It’s not fair to blame the scientists entirely for this failure. The National Institutes of Health (NIH) paid for much of this research, and funding was stretched so thin that scientists said they didn’t get as much as they needed to do their studies. So they made difficult choices. As a result, funders, including the NIH, spent tens of millions of dollars on human trials using these drugs, without first making sure the scientific underpinnings were sound. ALS patients volunteered to test lithium, creatine, thalidomide, celecoxib, ceftriaxone, sodium phenylbutyrate, and the antibiotic minocycline. A clinical trial involving the last one alone, bankrolled by the NIH, cost $20 million. The results: fail, fail, fail, fail, fail, fail, fail. Science administrators had assumed that the academic scientists had all done the legwork carefully. They had not.
Of course a poorly designed study is simply a waste of time. Even so, it took years for officials at the NIH to realize the magnitude of this problem with ALS. One of the first people to pick up on this was Shai Silberberg at the National Institute of Neurological Disorders and Stroke (NINDS). He grew up in Israel and had trained as a biophysicist—a discipline that brings a high degree of precision to its studies, compared with the messier disciplines that involve live animals and people. That sensibility gave Silberberg a fresh perspective on the goings-on at the institute. The director asked him to serve on a committee that reviewed human tests involving neurological diseases, such as ALS, “and I couldn’t believe what I was seeing,” Silberberg told me. As he watched the review process, he discovered that the scientists spent all their time dwelling on questions about how to design the human trials—how many subjects, what the endpoints should be, making sure the analysis was framed correctly, and so on. All that’s critical, of course. But Silberberg realized that nobody applied that same degree of care when it came to evaluating the animal studies upon which the human experiments were based. “I was in total shock,” he said, when he realized that scientists basically skipped over those discussions. “There was almost no talk about whether the data to justify the [clinical trial] is solid or not.” The assembled experts started out with the assumption that it was and were betting millions of dollars and the goodwill and lives of volunteers on the chance of that hope.
“I don’t fault the reviewers, because clinical trials are so complex it’s only natural that their focus was on the design, and if a disease is devastating, you kind of overlook this key part,” Silberberg said. He felt “like the kid saying the emperor had no clothes.” He convinced his boss, NINDS director Story Landis, that something was horribly wrong. Of course this was, to say the least, embarrassing. Human nature and institutional preservation both created incentives to hide or downplay problems. People on Capitol Hill looking for an excuse to slash domestic spending could cite this as an example of government waste. But Landis didn’t back off. She started writing and talking publicly about the problem. Her boss, NIH director Francis Collins, was stunned when he heard about the pointless ALS trials that taxpayers had funded. “Humans were being put at risk based on that kind of data, and that took my breath away,” he told me. It became readily apparent that the ALS story was not simply an isolated incident. Biomedical research had a problem, and Collins was, more than anyone, responsible for the enterprise. “Certainly there were people who didn’t want to hear it. And I think there are still people who don’t hear it,” he said. But “as stewards of the public trust, we don’t want to just sweep that under the rug. We want to face it square on and to be as transparent as possible and say, ‘Okay, Houston, we have a problem here,’ and we are all collectively going to have to face it.”
Congress started to wake up to the issue. Republican senator Richard Shelby of Alabama raised it at a hearing on March 28, 2012. He brought up a December 2011 Wall Street Journal story based on the Bayer study of replication failures from that fall. “This is a great concern, Dr. Collins,” Senator Shelby said at the hearing. “I don’t want to ever discourage scientific inquiry, and I know you don’t, or basic biomedical research. But I think we on this subcommittee, we need to know why so many published results in peer-reviewed publications are unable to be successfully reproduced. When the NIH requests $30 billion or more in taxpayer dollars for biomedical research—which I think is not enough—shouldn’t reproducibility, replication of these studies, be a part of the foundation by which the research is judged? And how can NIH address this problem? Is that a concern to you?”
“It certainly is, Senator,” Collins replied. The NIH director assured the Senate panel that he was on the case. Indeed, momentum for action was mounting fast. Only a few hours after that hearing ended, Nature published the even more devastating analysis by Glenn Begley and Lee Ellis. Collins assigned his chief deputy, Lawrence Tabak, to focus on the issue. The two acknowledged the issue plainly in a comment in Nature in January 2014, laying out a proposal to use NIH’s leverage as the major funder of biomedical research to address the underlying problems. Those proposals gradually became new formal guidelines for grant applicants. As of January 2016, researchers must take some basic steps to avoid the most obvious pitfalls. When applying for a grant, they need a plan to show that the cells they are using are actually what they think they are (this is not a trivial issue, as we shall see). They need to show they’ve considered the sex of the animals they will use in their studies. They need to show that they’ve taken the time to find out whether the underlying science looks solid. And scientists must show in their applications that they will use “rigorous experimental design.” Researchers are supposed to be held accountable for all this during the annual reviews of their grants. It’s not clear how aggressively the various grant managers at NIH will enforce these new rules—officials historically have only canceled grants for egregious behavior, like fraud. So these steps are hardly cure-alls, but they are moves in the right direction.
It could take a long time for these new expectations to ripple through the culture of biomedical research. Grants written today could lead to research that won’t be published for years. And of course there’s likely to be resistance to any change with a whiff of more bureaucracy. Many academic scientists already spend more than half their time writing grant proposals, and because money is so tight, most of those don’t get funded. People focused on treatments and cures aren’t happy to wait around while scientists jump through hoops—even though research is not likely to succeed absent good experimental design.
Setting this system right requires changing the incentives. Steve Perrin at the ALS Therapy Development Institute said his operation is “the perfect paradigm for how to fix some of these problems.” His institute focuses on treating a single disease and has hired a careful mix of people to chase that goal. It is also quite different from a university lab, where the work is done mostly by people in training: graduate students and postdoctoral fellows. Perrin uses staff scientists, not students. ALS TDI has another advantage over university labs, which must turn over a large percentage of their grant funding to their institutions: “We don’t waste half of our investment on overhead,” Perrin said. He doesn’t even try to get grant funding from the NIH. Federal grant money is so tight these days that more than 80 percent of proposals get rejected, so it’s not worth his while to have scientists devoting endless hours to writing grants. Instead, ALS TDI relies heavily on individual donors, especially people with a loved one with ALS or themselves stricken by the condition. (The wealthy board chairman who hired Perrin started funding the organization after he was diagnosed with ALS, and two members of the board had children with the disease.) “The biggest [fund-raising] challenge that we have in ALS is that our patients lose their battle with our disease very fast, which means our development team is constantly looking for new support,” Perrin lamented.
Once they have the money, they know exactly how to put it to use. Given their expertise with the mouse model of ALS, they offer to reproduce the results from other laboratories, to validate—or most often to deflate—findings from academic and pharmaceutical labs. They have an aggressive program to develop their own drugs, based on tests carried out in their labs. The institute resides on the fourth floor of a modern lab and office building right across the street from the Massachusetts Institute of Technology (MIT). Next door are two other world-class institutions: the Broad Institute (which sequences genomes for ALS TDI) and MIT’s Whitehead Institute. Perrin doesn’t hesitate to farm out work that the many capable firms around Cambridge and the rest of the world can do more efficiently. Each Monday, the institute takes delivery of one hundred young genetically engineered mice, which are housed in a windowless expanse behind the light-filled and cheerful labs where nearly forty scientists work on artfully curved lab benches. Some spend their days doing experiments with the mice; others explore chemicals and biological compounds that are potential new drugs. This is the face of rigorous work, but rigor takes time and money—two commodities in perpetually short supply.
Federal rules are not acting alone in the push to improve experimental design. Disease advocacy organizations are playing an increasingly important role, with those for muscular dystrophy being a prime example. In 1986, Eric Hoffman and his academic mentor, Louis Kunkel, discovered the gene that goes awry in one common form of this disease, Duchenne muscular dystrophy. That launched Hoffman on his own respected academic career in the basic biology of this disease. In the 1990s, he was outraged when another medical researcher, Peter Law, enrolled desperate families in an experimental therapy for muscular dystrophy that Hoffman considered “snake oil,” as he told me. (Hoffman’s public protests were so rancorous that Law at one point sued him for defamation.) Law was injecting muscle-like cells into the young patients, despite overwhelming skepticism that this would work. “That was an incredible waste of resources and something that had no scientific basis at all,” Hoffman told me. His moral outrage led him to push for some basic “standard operating procedures” for research on this disease: a standard that researchers worldwide would agree to and follow. To fund this, Hoffman turned to the US Department of Defense, which offers funding outside biomedicine’s normal peer review process. He said he figured the NIH wouldn’t give him a grant “because it is not hypothesis-driven, sexy research. It’s developing rigor. There aren’t many sources of funding for rigor.”
His colleague Kanneboyina Nagaraju (known simply as “Raju”) had also been vexed that sloppiness was common in muscular dystrophy research. He said almost all of it was done in academic labs “where the sample size is determined by the amount of money they had at the time.” They’d call trials “pilot studies” and run them without proper controls, he said. With the money that Hoffman raised, Raju staged an international meeting, cosponsored by the European Commission, that developed consensus standards for the field.
Hoffman and Raju then built a laboratory at the Children’s National Medical Center in Washington, DC, to run rigorous mouse tests that were part of this new standard. Scientists around the world realized that they could farm out these critical experiments to Raju rather than trying to perfect the technique in their own labs. This became such a popular service that Hoffman and Raju turned it into a small company (which moved to Halifax, Nova Scotia, in 2013). They have run tests on more than sixty potential drugs. Raju said fifty-five were found to be totally worthless, and the remaining five showed at least some promise. At one point, Raju said, a company gave him one drug to test and gave the same drug to a second lab in Italy running the same standard operating procedures. The company didn’t mention that comparison until the experiment was completed. The results were not identical but close, he said, showing that the results are likely reproducible.
Frequently, small companies will plunk down a few hundred thousand dollars for these tests before investing even more in a potential drug. Raju said most companies accept the judgment from his studies. One French company that had only a single product in the works, closed shop after it got disappointing results, though others have ignored the clear warning signs, Raju told me. There’s more at stake than just investors’ dollars, Hoffman said. “It’s the families. It’s the patients. It’s the physicians. It’s the hospitals.” Drug development is resource intensive “in terms of people’s lives. They are the experiment. They become part of the drug development program.” That resource should not be wasted. (Hoffman has since moved to Binghamton University.)
Debra Miller also has sent drugs to Raju for testing. She started Cure Duchenne after her son developed the disease in 2003. She and her husband decided to become venture philanthropists—investing in companies they believe in rather than simply funding academic research. In addition to hiring lawyers to do due diligence on these projects, her group hires top-notch researchers in the field. If something is going to fail, she told me, she wants it to fail fast. “So many small family foundations get seduced by the latest shiny object,” she said, and they don’t bother to insist on this kind of testing. “It all sounds like it’s going to work.” Miller knows better. Small biotech companies may nurse a dubious idea just long enough to sell the product to big pharma. Or companies may figure that it’s easier to get a drug approved to treat a side effect of a rare disease like muscular dystrophy, when they really hope to use it for more common afflictions. Rare diseases like muscular dystrophy have a faster track through the Food and Drug Administration, so this is a potentially lucrative strategy. But parents don’t want to spend their effort on drugs that may be only peripherally useful to their children.
When the advocacy group Parent Project Muscular Dystrophy was approached to fund research into a new drug, the group’s leaders, John Porter and Sharon Hesterlee, insisted that the idea behind it first be tested using the standard operating procedures that Raju had developed. “Our mantra is we don’t want clinical trials to fail for stupid reasons,” Porter told me. Not following these basic procedures “is certainly one of the stupid reasons for clinical trials failing.” More than two dozen companies are now working on drugs for muscular dystrophy, so there is plenty of promise, but also a need to figure out the best candidates. Porter, a former NIH official who dealt with these diseases, used a second layer of review if something passed initial muster: “If a company comes to us with a project that they want to take to clinical trials, one of the first questions we ask is, has it been reviewed by TACT?”
TACT, which stands for the TREAT-NMD Advisory Committee for Therapeutics, was originally funded by the European Union but now runs with its own resources. It is a no-nonsense venue for reviewing potential drugs for neuromuscular diseases like muscular dystrophy. Twice a year, some of the world’s experts in the field review submissions, ask tough questions, and render judgment—often harsh judgment. In the early days, many proposals came from academic researchers, but increasingly the reviews are done at the request of drug companies, which pay $5,000 or $10,000 to TACT to help defray expenses. The committee has reviewed dozens of proposals. Participants who elect to use this process get a confidential report that they can choose to share with potential funders and investors. A politely worded summary is posted on the public website, so at least the broad contours of a review are accessible even if a company decides not to share the details, which must be released in their entirety or not at all.
Getting biomedical research right means more than avoiding the more obvious pitfalls, like choosing the right number of animals, randomizing the experiments, blinding the observers so they don’t fool themselves, and running proper comparison groups. It’s also critical to think about whether the underlying assumptions are correct. The story of ALS offers a sobering example. The mice used in these studies have a specific mutation, called SOD-1. That trait shortens their lives and gives them some symptoms suggestive of the condition, but they do not in fact develop true ALS. Scientists developed this mouse model after discovering the SOD-1 mutation in some people who have an inherited form of ALS. But only 2 percent of people with ALS carry this mutation, so it’s hardly the whole molecular story behind the disease. And that means it’s not clear exactly what value comes from all the painstaking work with these mice by ALS TDI and many other labs.
The SOD-1 mice are used because scientists have had nothing better to model this disease. They are painfully aware of that. In fact, they’ve developed new strains of mice with mutations seen more commonly in ALS patients to get around the shortcomings of SOD-1 mice. But those new animals have shortcomings of their own: they don’t die prematurely, which means it’s much harder to study them in experiments where the endpoint is early death.
This is another reminder that no tool in biomedical research is perfect, so scientists always have to make do. And they may not take the time to question the assumptions on which their field rests. Scientists most often start a research project by building on what their mentors or peers have done before. An entire field may rely on a particular animal model, even though scientists often have no idea whether it’s a valid surrogate for human disease. It’s quite common to cure a disease in a mouse model, only to discover that it’s irrelevant for treating human disease. And that, for scientists trying to conduct rigorous scientific research, makes mice a big, hairy problem.