MISLEADING ANIMAL STUDIES have led to billions of dollars’ worth of wasted effort and dead ends in the search for drugs. Failures in animal studies have also had deadly consequences. In 1993, researchers at the National Institutes of Health (NIH) wanted to test a potential drug for hepatitis B, a liver infection that affects hundreds of thousands of Americans. The compound, called fialuridine (FIAU), looked promising. It was similar in action to some of the drugs that had been developed to fight HIV, and it had passed animal tests with flying colors. Researchers gave it to mice, which tolerated it well. They then gave it to rats, which also had no trouble with the drug. Tests in monkeys also suggested the drug would be safe. A short-term study in humans was largely encouraging, so researchers enlisted fifteen volunteers to take the drug for several months.
At first, the volunteers managed the relatively mild side effects. But after a couple of months, one fell ill. It was liver failure. Since a toxic drug reaction can often cause liver failure, the researchers immediately told the remaining fourteen patients to stop taking the FIAU. But it was too late. In the following weeks, six more volunteers developed serious liver damage. Five people eventually died, and the Institute of Medicine, which reviewed this debacle, said the two other patients probably would have died as well if they hadn’t received liver transplants.
At root, the problem is that lab animals aren’t just small, furry humans. Researchers have been well aware of the underlying shortcomings of animal research for decades. To cite just one dramatic example, researchers working to develop a cholesterol-lowering drug thought they had hit on just the right compound after many years of laboratory research. But when they gave it to rats, it was an utter failure. One pharmaceutical company gave up on it entirely. But a Japanese researcher persevered despite the apparently show-stopping animal results. Arika Endo asked a colleague who was using chickens for experiments to test it, and it turned out the compound worked in the birds. That success marked the birth of statins, drugs used by millions of people today.
Stories like this teach an obvious lesson, especially for the most common laboratory animal, the mouse. “Nobody knows how well a mouse predicts a human,” said Thomas Hartung at Johns Hopkins University. In fact a test on mice doesn’t even predict how a drug will work in another rodent. For instance, certain drug-toxicity tests run separately on rats and mice only reach the same conclusion about 60 percent of the time. And if mice do a so-so job of predicting what will work in a rat, Hartung said, we should be very humble about what they tell us about human beings.
This is about more than drugs. Hartung said roughly half of the chemicals that show up as potential cancer-causing agents in mouse experiments are probably not human health hazards. Coffee is one example. Researchers have tested thirty-one compounds isolated from coffee, and of those twenty-three flunked the safety test. “You wouldn’t be able to add coffee to food if it were synthesized” and put through a battery of safety tests, Hartung said. “Aspirin also would fail almost all animal tests these days.”
Likewise, exciting results in animals often don’t apply to human disease. A wave of enthusiasm followed the discovery of a weight-controlling hormone, leptin, in mice. Mutations in this gene caused obesity in mice, and giving those mice leptin slimmed them down. So scientists eagerly looked for this same effect in people, but leptin supplements rarely helped. There’s no miracle pill to treat people who are overweight.
Biomedical scientists understand these shortcomings but tend to gloss over them. “You need to publish, and you can’t write, ‘I used a terrible model.’ You have to do grant applications and you want awards, so you only show the good sides,” Hartung said. “You’re penalized if you do it differently, if you’re honest about weaknesses.” Experiments with mice and rats are everywhere in biomedical research. Nobody keeps a tally of the numbers used, but often-quoted estimates put the figure in the United States alone at well over 10 million animals a year, the vast majority being mice. One reason everybody uses mice: everybody else uses mice. They are “model organisms” used in basic biology studies as well as for safety research on experimental drugs. Scientists have developed hundreds of inbred strains of mice by breeding siblings with each other to perpetuate a genetically pure line of animals. Other strains are genetically engineered, allowing scientists to add or subtract specific traits. Entire industries have grown up around the breeding, shipping, housing, feeding, and care of mice.
Malcolm Macleod, a neurologist at the University of Edinburgh, worries about biomedicine’s dubious reliance on mice. He has spent much of his career trying to find ways to reduce the brain damage caused by strokes. In hundreds of animal experiments, mostly using mice, drugs have shown promise for treating stroke. Billions of dollars have gone into this research, and yet not one single drug acting on brain cells has been shown to be helpful when tested in people (the drug tPA is effective by breaking down clots and helping restore blood flow, but it doesn’t act on nerve cells damaged by a stroke). Neurologists started calling this long string of failures a “nuclear winter” for stroke research.
“My reading of the animal data for stroke is that it’s not possible to say if they’re good or bad models,” Macleod told me. It could be that experimentally induced strokes in mice are so radically different from a human stroke that there are no real lessons to learn. Or it could be that the experiments have been carried out so loosely that they have led the entire field astray.
One example in particular stands out. AstraZeneca had high hopes for a drug called NXY-059. Researchers ran twenty-six experiments involving 585 animals (mostly rats), and in this case the compound appeared to protect the animals’ brains from strokes induced by researchers. On the strength of those results, researchers tested the drug in an enormous and ambitious study involving more than 1,700 people. The results, which showed a modest reduction in disability, were sufficiently encouraging that they tested another 3,300 people who had just suffered strokes. It was a dramatic failure.
Macleod dissected that study and found many shortcomings. Failures like this have turned him into something of a crusader. At one point he decided to find out how widespread preventable biases had become in animal research. In 2015 he and his colleagues sampled animal research papers authored by researchers at the top universities in the United Kingdom. Only 17 percent were blinded. Likewise, scientists often failed to assign their animals randomly to drug versus control groups and to state whether they had conflicts of interest. Almost none explained how they settled on the number of animals they used in their studies. “It is sobering that of over 1,000 publications from leading UK institutions, over two-thirds did not report even one of four items considered critical to reducing the risk of bias,” Macleod and colleagues wrote, “and only one publication [out of 1,000] reported all four measures.”
Those problems are not hard to fix, so there’s room for rapid reform. But those improvements won’t help in the many instances when animals are poor stand-ins for human disease. In the case of stroke, Macleod noted that adolescent, male animals often used in the studies may not be a good substitute for an elderly human being having a stroke. And drug doses for animals may differ dramatically than those for people. He cautioned that the extent of brain damage in an animal may not be a good surrogate for human disability or death. Researchers, he admonished, need to think about these deeper issues, not just about whether they can run the same experiment and get the same results.
Stroke research is far from the only area stymied by dubious animal models. Pain studies, which also rely heavily on mice, have reached similar dead ends. A decade ago, pharmaceutical companies got very excited about a new class of pain medications called NK-1 antagonists. Experiments involved measuring a drug’s effect by tying off nerve bundles in the animals to make them hypersensitive to pain. Using that method, the drugs seemed quite effective at reducing pain in the mice. That triggered a race among drug companies to come up with painkillers to block this pathway. Companies spent many millions of dollars trying to turn this exciting mouse research into a human drug. “People took this to the clinic and it didn’t do anything. Nothing at all,” Barbara Slusher at Johns Hopkins told me. The mouse test was completely misleading when it came to judging real pain in people. One of the more spectacular drug-development failures in recent decades, it led researchers to realize they needed a whole new strategy to look for effective painkillers.
Slusher said it also soured the pharmaceutical industry on putting too much faith in animal studies. “It used to be that if your drug worked in an animal model, okay, we’re cool.” Drug companies would sign a deal with the researchers and move the potential drug forward in the development pipeline. Now pharma cares less about animal studies in pain research and instead wants human tests to validate an idea before making an investment. “So this is a change in the mind-set of how new targets are chosen.” It’s a sensible step, but of course it puts far more onus on academic researchers to do rigorous work.
Yet another great disappointment involves the treatment of deadly inflammation, related to trauma, burns, and other serious injuries. Sepsis and related conditions strike more than a million Americans each year and claim more than 200,000 lives. Geneticist Ron Davis at Stanford was curious to understand why there had been so little progress in finding drugs to stop that often fatal cascade of events. Heading up a small army of colleagues, he set out to test whether the commonly used mouse model for inflammation was a good match for the human disease. About 150 antisepsis drugs have been developed using mice, but not one has been helpful in people. The researchers identified about 5,000 genes activated or deactivated by inflammation in humans who had suffered trauma, burns, or blood infections. When they then looked at the analogous genes in one common strain of mice, they found essentially no correlation between the genes in mice with induced inflammation compared with humans. The biology of inflammation seemed to be dramatically different between the two species.
“That was a bit of a shocking result,” he told me. It suggested that decades of inflammation research using mice was misguided and that scientists who continue to use mice for this research could be wasting their time. It was not a message the sepsis field wanted to hear. “It’s amazing how much pushback we got from that result, including from within our own trauma center,” Davis said. It was seen not simply as a provocative observation about inflammation but a frontal attack on studies involving mice. “There’s a worry that mice will be invalidated,” he said.
It’s not uncommon for scientists to resist disruptive ideas, but “the power of science is, the truth will eventually come through,” Davis said. “It’s just a matter of how long that will take. You can suppress things, but you can never win.” Some researchers have pushed back against Davis’s findings, while other observers suggest there’s even more at play. David Masopust at the University of Minnesota exposed laboratory mice—which are usually kept in sterile surroundings—to wild mice carrying germs typical of the species and discovered that their immune systems were dramatically different. So the differences between laboratory mice and people may be both genetic and environmental.
Despite all these pitfalls, scientists aren’t interested in abandoning research with mice. Buried under Stanford’s central campus is a labyrinth of tightly controlled rooms, stacked floor to ceiling with mouse cages. Visitors must gown up and slip booties over their shoes—not for their own health but to protect the untold thousands of mice that live there, under a vast, bucolic landscape punctuated with palm trees. Joseph Garner, who studies animal and human behavior, walked down a corridor and poked his head in one rodent-filled room after the next. Mice huddled together in clear plastic cages connected to an elaborate ventilation system to prevent germs from circulating. Even so, a musty smell typical of animal labs permeated the rooms, somehow sweet and stale at the same time, like home-brewed beer. Labels and barcodes keep this massive enterprise organized.
Garner said that mice have great potential for biological studies, but at the moment, he believes, researchers are going about it all wrong. For the past several decades, they have pursued a common strategy in animal studies: eliminate as many variables as you can, so you can more clearly see an effect when it’s real. It sounds quite sensible, but Garner believes it has backfired in mouse research. To illustrate this point, he pointed to two cages of genetically identical mice. One cage was at the top of the rack near the ceiling, the other near the floor. Garner said cage position is enough of a difference to affect the outcome of an experiment. Mice are leery of bright lights and open spaces, but here they live in those conditions all the time. “As you move from the bottom of the rack to the top of the rack, the animals are more anxious, more stressed-out, and more immune suppressed,” he said.
Garner was part of an experiment involving six different mouse labs in Europe to see whether behavioral tests with genetically identical mice would vary depending on the location. The mice were all exactly the same age and all female. Even so, these “identical” tests produced widely different results, depending on whether they were conducted in Giessen, Muenster, Zurich, Mannheim, Munich, or Utrecht. The scientists tried to catalog all possible differences: mouse handlers in Zurich didn’t wear gloves, for example, and the lab in Utrecht had the radio on in the background. Bedding, food, and lighting also varied. Scientists have only recently come to realize that the sex of the person who handles the mice can also make a dramatic difference. “Mice are so afraid of males that it actually induces analgesia,” a pain-numbing reaction that screws up all sorts of studies, Garner said. Even a man’s sweaty T-shirt in the same room can trigger this response.
Behavioral tests are used extensively in research with mice (after all, rodents can’t tell handlers how an experimental drug is affecting them), so it was sobering to realize how much those results vary from lab to lab. But here’s the hopeful twist in this experiment: when the researchers relaxed some of their strict requirements and tested a more heterogeneous group of mice, they paradoxically got more consistent results. Garner is trying to convince his colleagues that it’s much better to embrace variation than to tie yourself in knots trying to eliminate it.
“Imagine that I was testing a new drug to help control nausea in pregnancy, and I suggested to the [Food and Drug Administration (FDA)] that I tested it purely in thirty-five-year-old white women all in one small town in Wisconsin with identical husbands, identical homes, identical diets which I formulate, identical thermostats that I’ve set, and identical IQs. And incidentally they all have the same grandfather.” That would instantly be recognized as a terrible experiment, “but that’s exactly how we do mouse work. And fundamentally that’s why I think we have this enormous failure rate.”
Garner goes even further in his thinking, arguing that studies should consider mice not simply as physiological machines but as organisms with social interactions and responses to their environment that can significantly affect their health and strongly affect the experiment results. Scientists have lost sight of that. “I fundamentally believe that animals are good models of human disease,” Garner said. “I just don’t think the way we’re doing the research right now is.”
Malcolm Macleod has offered a suggestion that would address some of the issues Garner raises: when a drug looks promising in mice, scale up the mouse experiments before trying it in people. “I simply don’t understand the logic that says I can take a drug to clinical trial on the basis of information from 500 animals, but I’m going to need 5,000 human animals to tell me whether it will work or not. That simply doesn’t compute.” Researchers have occasionally run large mouse experiments at multiple research centers, just as many human clinical trials are conducted at several medical centers. The challenge is funding. Someone else can propose the same study involving a lot fewer animals, and that looks like a bargain. “Actually, the guy promising to do it for a third of the price isn’t going to do it properly, but it’s hard to get that across,” Macleod said.
There’s an intellectual tug-of-war in biomedicine now about how best to increase the rigor of these studies. Should we improve animal studies? Or would we be better off simply trying to find replacements? Neuroscience professor Gregory Petsko at Weill Cornell Medical College is in the latter camp. He has spent his career studying neurological diseases, including ALS and Alzheimer’s. “The animal models are a disaster,” he said. “I worry not just that they might be wrong. ‘Wrong’ animal models you can work with. If you know why it’s wrong, you can use the good parts of the model, and you don’t take any information from the parts that are bad. But what if the neurodegenerative disease models are not wrong but irrelevant? Irrelevant is much worse than wrong. Because irrelevance sends you in the wrong direction. And I think the animal models for nearly all the neurological disorders are in fact irrelevant. And that scares the shit out of me, if you pardon the expression.”
Researchers are so dependent on mouse experiments in neurological studies that if a potential compound doesn’t work in the animals they won’t try the much more expensive and time-consuming experiments on people. “That’s what keeps me up at night—the possibility that I might have something that works but I would never be able to prove that it works,” Petsko said. When you consider that many experiments have weaknesses above and beyond simply using mice, “it’s not beyond the realm of possibility that a viable treatment already has been tested and failed because the trial design was bad.… I wouldn’t rule it out absolutely.” These aren’t abstract questions. “You’ve got to do this right.”
Petsko’s hopes lie in a burgeoning technology called induced pluripotent stem (iPS) cells. Those cells, taken from a patient with a particular disease (say, Alzheimer’s), can be induced in the lab to become nerve cells, genetically identical to the patient’s but now growing in the lab. Petsko would like to use those cells to study disease biology and to reserve animal tests for safety and for insights into how a potential drug interacts with an entire individual. These iPS cells are rapidly gaining popularity. But like everything in biomedical research, this tool has its pluses and minuses. The downside is, once triggered to become a nerve cell, they continue to divide and proliferate—exactly the opposite of what happens to a nerve cell in the brain. So it’s not clear how faithfully they mimic reality.
Thomas Hartung has been experimenting for a while with brain cells that proliferate in the lab but also morph into different brain cell types, forming round clumps of cells, called organoids. The cells in his lab come from patients with autism and Down syndrome. The clumps apparently can’t think—though they do generate electrical signals, just like brain cells, and organize themselves in a manner reminiscent of how they are juxtaposed in the brain. They also use the chemical signals that underlie brain function. “If we create conditions in cell culture which are mimicking the organism, we are more likely to get relevant results,” he told me. “You can do personalized toxicology with these cells. If I took your cells, I could tell you are more sensitive than another person to certain drugs, for example.” These are early days for this technology, but there’s a rapidly growing industry around cultivating disembodied blobs of cells in the lab. The Defense Advanced Research Projects Agency, which funds far-out ideas, has poured money into this line of research at multiple labs. So has the NIH. And Hartung has private money to work on the problem as well.
A company on the Boston waterfront is now mass-producing a related technology for use by pharmaceutical companies and academics alike. Emulate is a spin-off of the Wyss Institute at Harvard. The company has taken over industrial space in an old building that overlooks a working dry dock. Imposing concrete pillars with flared tops punctuate the floor space. The army used this building during World War II to assemble tanks. When Emulate moved in its high-tech manufacturing equipment, the landlord didn’t bother to ask how much anything weighed. The freight elevators were built to hold a battle tank, and the concrete floors are nineteen inches thick.
Instead of growing spherical clumps of cells, Emulate builds ersatz organs using clear plastic chips that fit easily in the palm of a hand. Engineers swathed in white coveralls position themselves at laser cutters and 3-D printers that are designed to churn out prototype chips. Geraldine Hamilton, the president and chief scientific officer, excitedly showed me one chip with a flexible membrane. “In some organs, stretching is one of the most relevant mechanical forces,” she explained, so that has been recreated as closely as possible for cells like those in the lungs, where growth and development depend on that kind of movement. Liver cells don’t care about stretching, but they do react to how fluids flow past. So liver chips instead include an intricately arranged micro-irrigation system to control nutrients as they flow in and waste products as they flow out.
Hamilton told me they’d managed to keep a miniaturized liver alive and healthy for more than a month. It would be of absolutely no use to someone in need of a transplant, but these chips can be used to mimic liver biology and to test drugs. People have been trying experiments like this in the lab for many years, often with disappointing results. “When you put a drug into a dish, it just sits there on top of the cells. That’s not the way we get exposed to drugs.” The flow-through system is much more realistic.
These systems can be remarkably lifelike. The scientists introduce a single layer of cells onto a chip, but over the course of a week, elaborate structures grow. In the gut-on-a-chip, the natural folds found in the intestine spontaneously appear. Different types of cells form layers, much as they do in the human body, and closely resemble normal tissues under the microscope. Hamilton popped open a laptop and showed me a movie of the lung-on-a-chip in action. Tiny hairs, called cilia, waved like a kelp forest or a field of wheat. “Not only are the cilia beating just like you’d expect in your lung… you can see them clearing the particles in a directional manner,” she said with a touch of pride and awe. She showed me a video of the immune system cells in action within a lung-on-a-chip, fighting back an infection. “We can actually see a single white blood cell. It comes in, it sticks, it wiggles itself through the cell membranes… and you can see it coming out on the other side and engulfing the bacteria.”
“I’ve seen this ten thousand times, and it doesn’t ever get old,” said Chris Hinojosa, Emulate’s principal microfluidics engineer. Daniel Levner, the company’s chief technology officer, agreed: “This video was the aha moment for this technology.” As just one hint of how it could be useful, Hamilton’s group treated these lungs-on-a-chip with steroids in an effort to understand why people with chronic obstructive pulmonary disease resist this type of drug. “We were able to mimic that and look for potential therapeutics,” Hamilton told me.
She hopes these techniques will prove so effective that the FDA will someday accept them in place of animal studies. That will not happen anytime soon, but Hamilton told me pharmaceutical companies are already starting to use chips in place of mice for experiments that won’t need the FDA’s stamp of approval.
Emulate offers lessons about the broader rigor and reproducibility issues. It’s not just about devising a system that has the potential to sidestep some of the pitfalls of animal research. Levner said company scientists have to meet a higher standard than their counterparts in academia. “The tongue-in-cheek joke about the postdoc is you have to repeat [an experiment] three times—which is all you need for statistical significance for your Nature paper and your faculty position—and you’re done. Doing it three times is very difficult, but it’s also difficult to make three times into thirty times into three hundred times into three thousand times.” Academics don’t have that motivation, but this company needs to meet that mark to be successful. “You have your reliability [and] reproducibility,” added Hinojosa, “and you know that when you see something, it’s a real thing.”
It’s easy to be lured by the idea that new technology will swoop in and resolve the shortcomings in animal research. That has sparked perpetual hope, with a history that includes development of specialized mouse strains and increasingly sophisticated strategies such as growing human organs (and tumors) inside mice. Indeed, technology is racing ahead throughout biomedical research. DNA sequencing technology has generated avalanches of data (recalling as well that avalanches are dangerous). Microscopy has opened our eyes to a fascinating world within living tissues. These tools provide deep insights into biology, but there is a startling disconnect between technological progress on the one hand and medical progress on the other. Technology is racing ahead at breathtaking speed, but medical progress is not.
Jack Scannell has been thinking about that a lot. His career has taken him from the world of pharmaceuticals to Wall Street and now to the University of Edinburgh. Scannell suspects rapid technological evolution is actually part of the problem. The more scientists come to understand the basics of biology, the more seduced they become by the idea that they can find cures by unearthing the fundamental mechanism of disease. This has deep intellectual appeal and could be true if we were approaching a complete understanding of how biological systems operate. But most of the time we only get a faint glimmer. Scientists may discover a single enzyme system involved in cancer and then hunt down molecules that can block that enzyme to treat disease. That raises hopes.
Every once in a while, something actually pans out—the anticancer drug Gleevec was a spectacular success built on this very idea. But most of the time those strategies fail. “So you got some spectacular successes which people remember, and a whole bunch of failures which people forget,” Scannell told me. Instead, people look at the successes as signs that the molecule-by-molecule approach is working, when he argues it’s almost always failing. “There is an almost automatic assumption that you need to rummage around in molecules to really understand things.” But that actually accounts for a tiny share of medical advances.
Why do most ideas fail? Scientists tend to chalk it up to bad luck. But Scannell argues that evolution has created so many redundant systems that targeting a single pathway in a complex network will rarely work. Diet drugs are a good example. “We have evolved seventeen different biological mechanisms to avoid starving to death [figuratively speaking]. Drugging one of those mechanisms isn’t going to do anything!” The fact that there are many pathways to cancer also explains why chemotherapy drugs often work for a while and then fail. The tumor evolves a work-around.
Back in the decades when drug development was progressing rapidly, doctors weren’t trying to create new drugs based on a deep understanding of biology. They just experimented on people—not mice—to see what worked. “I wouldn’t necessarily seek to defend the historic approach,” Scannell said. “I think today people would be horrified if they knew how drug discovery really worked in the fifties and sixties. But I also think it is a historical fact that it was an efficient way to discover drugs. It may be an ethically unpalatable fact and something you would never wish to revisit, but I think probably bits of it could be revisited with not huge risk.”
Some of the most successful drugs are the result of serendipity, as is the case for metformin, the most widely used drug for type 2 diabetes. Decades ago a researcher in the Philippines studying an obscure compound to treat flu and malaria reported that it also seemed to lower blood sugar. In 1957, a Parisian scientist noticed that published observation and tried the drug in animals. British researchers tried it in diabetics in the 1960s, and it worked surprisingly well. The original compound was discovered as an herbal extract, and to this day scientists don’t understand the biological mechanism. But it doesn’t matter. It works.
Chance still plays a major role in drug development. Doctors monitoring their patients do occasionally make surprising discoveries. Patients taking minoxidil for blood pressure control noticed unusual hair growth—and Rogaine was born. And with more than 1,000 approved active ingredients in our drugs, surprises (sometimes pleasant) are inevitable. “Most new uses of drugs are developed by doctors through field observation,” Scannell told me.
The lesson here is twofold. First, it’s better not to assume that a specific drug works with pinpoint precision. Most drugs “are magic shotguns, not magic bullets,” he said, and sometimes the “off-target” effects can be useful (clearly, they can also be nasty side effects). Second, it’s important to remember that many important discoveries start with human beings in a medical clinic rather than with mice in cages.
The bottom-up approach to drug development isn’t all doom and gloom. There’s a growing list of successes around a new generation of drugs known as biologicals. These are antibodies or other molecules designed to hit very specific targets, and they have on occasion succeeded in translating basic biological insights into worthwhile treatments. Checkpoint inhibitors are one example, raising hopes that targeting pathways in the immune system to fight tumors might finally control cancer. The progress isn’t turning out to be as dramatic as proponents had hoped, and even the drugs that work do so for a minority of patients. But these are early days.
Animals will still be an important element of that research, despite their shortcomings. Model organisms still provide valuable insights into fundamental biology—whether it’s tetrahymena for its revelations about telomeres, fruit flies for their insights into genetics, or even mice used to study the basic wiring of the brain. If researchers studied only human beings, those insights would be vastly more difficult to achieve, not to mention ethically impossible in many instances.
Even so, translating that basic biology into medical advances is anything but straightforward. And that is even more the case for another ubiquitous tool used in biomedical research: disembodied cells floating in laboratory flasks. A rich history shows that this type of research is rife with problems. But dire warnings have largely been ignored. That is one of the starkest cautionary tales involving rigor and reproducibility in biomedical research.