Chapter Two

IT’S HARD EVEN ON THE GOOD DAYS

image

WHEN YOU STOP and think about it, experimental science can be a bizarre enterprise. In the classical view of research, scientists first come up with an exciting idea or make a provocative observation. Having done that, a good scientist will next actively look for evidence that his or her idea is wrong. Yet how disappointing it would be to discard your own seemingly wonderful idea. That’s the first point at which human nature collides with the scientific process. In the words of the brilliant physicist Richard Feynman, “The first principle is that you must not fool yourself—and you are the easiest person to fool.”

Scientists who can navigate those treacherous intellectual waters then face more daunting challenges: they must wade through the real-life human environment of academic science—funding, promotion, publication, and fame—which is full of perverse incentives that discourage them from probing deeply enough to find out whether their exciting ideas are actually wrong. Many of the problems in biomedical research today result when scientists often unwittingly stray from standard methods, so it’s worth exploring how healthy science should work. Good methods not only test ideas; they help scientists avoid fooling themselves.

Careful science is a surprisingly young enterprise. Before the seventeenth century, natural philosophers, as scientists were then called, often relied on the word of authorities to sort out truth from fiction. For many hundreds of years, European intellectuals assumed that all knowledge already existed, and their job was simply to interpret the writings of the Greek philosopher Aristotle, who was looked upon as the ultimate authority. Around the time of Galileo (1554–1642), that edifice started to crack. Natural philosophers dared to conduct their own experiments to search for the truth. Some “facts” they examined seemed strange, like the widely held notion that you could heal a wound by putting ointment on the knife that caused it. A natural scientist realized he could test “strange” facts by trying to replicate them, as David Wootton explains in The Invention of Science.

A society formed in Florence after Galileo’s death in 1642 took as its motto provando e reprovando, test and test again. Members announced their findings in the Reports of the Society for Experiments. Scientific publishing was born. British philosopher Francis Bacon had not long before formalized the scientific method: make a hypothesis, devise a test, gather data, analyze and rethink, and ultimately draw broader conclusions. That rubric worked reasonably well when scientists were exploring easily repeatable experiments in the realm of physics (for example, studies involving vacuum pumps and gases). But biology is a much tougher subject, since there are many variables and a great deal of natural variation. It’s harder to see phenomena and harder to make sure personal biases don’t creep in.

Here’s a thought experiment to illustrate that point. Imagine you’re locked into a windowless house in the woods. You can’t sense day or night, and you can’t feel the temperature outside. But you do have a reliable clock, and you can hear the birds. If you take meticulous notes, you can gradually discern a pattern showing which birds are singing and when. There’s tremendous natural variability, of course. A late spring or short winter will throw off the pattern from one year to the next, but eventually you will be able to discern the seasons and determine the length of a full annual cycle. Of course it would be far better to fling open your door and enjoy actual nature in all its splendor, but that’s not part of the bargain. Your conclusions must be inferences only. Likewise, a biomedical researcher can rarely witness directly the object of study. Life is mostly chemistry, and most of chemistry is invisible. Living cells also change depending on subtle variations in their environment, and those are hard to sort out as well. But with time, a picture gradually emerges from deductions based on indirect evidence.

“There is in fact no such thing as direct observation, for the most part,” said Stanford University’s Steven Goodman. “Every scientific observation is filtered through an instrument of some sort.” The tool may be an electron microscope, or it may be a clinical trial—in which the tool involves observing a group of human subjects—or it may be a statistical method. So the first question is, do you trust your tool to give accurate answers? “And if you don’t believe that everything was properly done, you’re not going to believe in the findings. So your ability to trust what you ‘see’ depends on your degree of trust in the instrument you’re using.” Scientists aren’t simply evaluating plain facts, and their tools are rarely razor-sharp scalpels. Our observations of birdsong in the thought experiment will never be precise enough to determine that a year is 365.25 days. But bad ideas (like the hypothesis that a year lasts one hundred days) shouldn’t survive the test of time.

“We might think of an experiment as a conversation with nature, where we ask a question and listen for an answer,” Martin Schwartz at Yale wrote in an essay. This process is unavoidably personal because the scientist asks the question and then interprets the answer. When making the inevitable judgments involved in this process, Schwartz said, scientists would do well to remain passionately disinterested. “Buddhists call it non-attachment,” he wrote. “We all have hopes, desires and ambitions. Non-attachment means acknowledging them, accepting them and then not inserting them into a process that at some level has nothing to do with you.”

Sometimes this process produces observations that become readily accepted. For example, James Watson and Francis Crick discovered that the DNA in our chromosomes is arranged as a double helix, with our genes encoded in the units that make up the rungs of the ladder. That observation is so clearly established as correct that it forms the basis for entire industries and fields of science. Nobody questions the structure of DNA, in part because it has proven so useful. But far more ideas linger in a world of twilight truth. Findings may be seen in one lab or several, but they don’t easily become accepted as clear descriptions of nature, and they don’t lead to useful insights for the treatment of disease. It can take many years for good ideas to rise to the top and bad ideas to drift to the bottom. And that limbo can be stretched out when experimental results from one lab clash with those of another. When done right, this process can yield deep insights into how biology works and lead to new ideas for maintaining health and treating disease. But it’s a constant struggle for even the best scientists in the world to know whether they are fooling themselves.

image

The story of telomerase—a vital enzyme involved in aging and cancer—illustrates this point well. Carol Greider was a graduate student at the University of California, Berkeley, in the 1980s and working with her mentor, Elizabeth Blackburn, on a weird little single-celled pond critter called tetrahymena. The scientists were trying to understand how this microscopic protozoan, which looks like a hairy teardrop, manages to replenish the DNA at the ends of its chromosomes when it divides. When a cell divides, the chromosomes in its nucleus replicate as well. But with each cell division, the DNA at the chromosome ends gets a little shorter. That may seem like an obscure detail, but these chromosome tips are critical to life as we know it. And nobody understood how any organism could replenish its chromosome tips until Christmas Day 1984, when Greider discovered an enzyme in the tetrahymena that she thought could be doing the job.

“Rather than say, ‘Look, let’s find every piece of evidence we can to show that this is a new enzyme,’ instead we did the opposite,” Greider told me. “We said, ‘How can we disprove our own hypothesis?’” She started to look for something else responsible for rebuilding the DNA at the chromosome tips. She was essentially trying to figure out whether she was fooling herself. When she tells this story to students, she reels off a long list of all the different steps she took to disprove her own hypothesis “because I would rather show that I’m wrong than have someone else show that I’m wrong.” After an entire year of looking for flaws in their own work, she and Blackburn finally published the paper. Not only did her conclusion stand the test of time, her diligence led them to share the 2009 Nobel Prize for Physiology or Medicine.

The discovery of telomerase is a textbook case of science done right. But research on the enzyme also illustrates how difficult it can be for scientists to sort out competing ideas and move a field forward. Hundreds of scientists are now struggling to understand its role in biology and disease—and much of what they publish is contentious. A couple of years ago Greider was at a meeting dedicated to telomerase when someone declared, “Probably greater than 50 percent of what is published in the field just is not true.” She agreed. Her former mentor Elizabeth Blackburn has published papers suggesting that meditation can make your telomeres grow and potentially extend your life span (Blackburn also founded a company to measure telomere length). Greider politely declined to talk about that unorthodox idea, but she spoke about other areas of telomere research that she considers questionable.

Indeed, she has spent a lot of time and money in her lab putting the findings of other scientists to the test. At one point she turned the tables on a scientist who had published another splashy paper about a unit of the telomerase enzyme called TERT. Steven Artandi and colleagues at Stanford University had engineered mice to produce an abundance of TERT and discovered that they grew a lot of extra hair. This apparently had nothing to do with telomeres, so the Stanford group proposed that TERT played another role in the cell, switching on and off genes unrelated to chromosome tips. Hair growth was just one example; other scientists suggested that TERT could flick on or off other genes.

Greider works with a physician colleague at the Johns Hopkins University School of Medicine, Mary Armanios, who treats people with diseases caused by defective telomerase. They realized that the ailments they were studying might be related to some of these surprising new results. So Greider ran an experiment to put the Stanford results to the test. She started with a strain of mice with extralong telomeres. They could remain healthy for several generations without telomerase. She then deleted the TERT gene from these mice, not only disabling telomerase but also affecting all other functions related to TERT. Even so, these mice remained healthy for several generations. Based on that observation, she concluded that TERT doesn’t play an essential role in gene regulation, as the Stanford scientists proposed. It is only essential as a part of telomerase.

So here’s a case where two powerful tools of biology produce two different results. Neither tool is perfect—mice that produce far too much TERT aren’t ideal, but neither are mice that are missing a gene. “It’s not an issue of reproducibility; it’s an issue of interpretation and understanding the mechanism,” Steven Artandi told me. Does TERT actually turn genes on and off in normal tissue? “I’m not sure we proved it,” Artandi acknowledged. “I think proving things takes time. So I think the jury is out.” In Greider’s view, the case is closed unless Artandi can come up with new data. This is all part of the normal, healthy process of science.

Like Carol Greider’s lab, Tom Cech’s at the University of Colorado, Boulder, spends a surprising amount of time doing studies that, in the end, just end up debunking results from other labs—setting the record straight rather than generating new discoveries. For example, his team discovered that a commercially available antibody used in dozens of experiments to flag the TERT protein in fact didn’t work as advertised. The company that sold the antibody to research labs removed it from its catalog as a result, but many reports with false conclusions based on that errant ingredient remain in the scientific literature. These papers are intellectual land mines for scientists who aren’t keeping fully abreast of the field—and given the avalanche of scientific publications, keeping up with any field is daunting.

Cech, a Howard Hughes Medical Institute (HHMI) investigator who shared the 1989 Nobel Prize for Chemistry, is utterly philosophical about this. To him, these stories simply illustrate that science has strong self-correcting mechanisms built in, and eventually the truth will emerge. But it’s not always comfortable for a scientist to raise these issues. The people you criticize “might be reviewing your grants” and deciding whether you deserve funding, Cech told me. “They might make a decision about whether they’ll give you a job offer. They might be reviewing some other papers of yours. So there’s a tendency to be careful about being too negative about other people’s work.” Cech, as a heavyweight in this field, feels free to speak his mind. “I can be brave enough to pick up a microphone [at a conference] and say, ‘We’ve tried to reproduce Bill’s work at Harvard and we think it’s completely wrong.’ About six other people will then raise their hand and say, ‘It’s wrong. We wasted a year on it too.’ People who’ve been quiet suddenly jump up and say, ‘You’re absolutely right.’”

Cech says his early career didn’t involve chasing down so many false leads. His Nobel Prize–winning work involved experiments more in the realm of chemistry than biology: he showed that RNA could help catalyze biological reactions inside cells. “We were protected for a while against having to deal with this kind of stuff,” Cech told me. Biology is much more fickle, however, and as his research has become less based on chemistry, “we now are seeing the dark underbelly of real biology research. But it is what it is.” Discovering an error in someone else’s work is “not a glorious day in the lab. That’s a setback, and whether you’re right or they’re right, it’s not fun, really.” Some members of his lab told me that at least they learned something in the process of sorting out a problem. “They’re being very generous” with that assessment, Cech said. “It’s really sort of a pain.”

image

One reason scientists have been slow to recognize the problems of rigor and reproducibility in biomedicine is that failure is an inevitable part of the process, so cautious researchers expect a lot of “discoveries” to be wrong. They see it every day in their own labs. Most experiments simply don’t work. The equivalent of a .300 batting average at the lab bench would be phenomenal. A standard-bearer of this philosophy is Stuart Firestein, a Columbia University biologist and author of Failure: Why Science Is So Successful, which argues that science only advances when researchers try something, fail, and then learn from their failures.

According to his argument, if everything worked exactly as expected, scientists would simply be chasing their own tails, reinforcing existing ideas rather than finding new ones. Failure and ignorance propel science forward, Firestein argues, and many would agree on that point. He extolls the virtues of the self-correcting nature of science. Sure, lots of stuff that gets published turns out to be rubbish. “I don’t see this as a problem but rather as a normal part of the process of experimental validation.” He writes that if scientists took too much time to make sure their results were correct, publication would slow to a “virtual trickle.” And even then, the results could still turn out to be flawed. It’s the job of the entire community, not simply the scientist who makes a claim, to figure out what’s right and what’s wrong.

But in his exuberant defense of failure, Firestein treads into territory where his colleagues are less apt to follow. He thinks it’s perfectly fine that of the fifty-three papers Glenn Begley at Amgen studied, only six could be reproduced. “This has been characterized as a ‘dismal’ success rate of only 11%. Is it dismal? Are we sure that the success of 11% of landmark papers isn’t a bonanza? I wonder if Amgen, looking carefully through its own scientists’ data, would find a success rate higher than 11%—or lower? And what did Amgen pay for those 6 brand new discoveries? Nothing. Not a penny.” He continues, “This so-called dismal success rate has spawned a cottage industry of criticism that is highly charged and lewdly suggestive of science gone wrong, and of a scientific establishment that has developed some kind of rot in its core,” Firestein writes (Firestein, Failure © Oxford University Press). Obviously he begs to differ.

There’s no question that the scientific enterprise is usually self-correcting—in the long run. If time and money were no object, the truth would usually emerge from the cacophony of research papers that run the gamut from cringe-worthy to brilliant. Nobody’s arguing that the constant flow of errors has brought science to a standstill. The concern, though, is that people with deadly diseases are watching their own lives slip away. And taxpayers, by way of the National Institutes of Health (NIH), don’t have infinitely deep pockets. Funds poorly spent in one sloppy research lab could instead have been invested in rigorous studies elsewhere. Significantly, Firestein doesn’t acknowledge that many errors in biomedical science—from the mislabeling of melanoma cells as breast cancer cells to the purchase of antibodies that don’t perform as advertised—are easily preventable. I asked Begley about the failure rate at Amgen. “I expect that 90 percent of experiments do fail,” he told me, not just at the company but everywhere. “Our business is about managing failure.” But he said if experiments fail because the scientists were lazy, the work was sloppy, or the analysis was bad, “that’s not the failure of an experiment—it’s a failure of the experimenter.” And that’s what he says he saw in the dozens of studies that he could not reproduce.

image

Part of the everyday challenge of research is trying to avoid fooling oneself through bias. Inevitably it creeps into even the best scientific efforts. Bias is often impossible to avoid because it frequently involves pitfalls that scientists simply can’t foresee. So it too is part of the fabric of scientific research. And there is a seemingly endless list of ways that a scientist can unconsciously inject bias. Surveying papers from biomedical science in 2010, David Chavalarias and John Ioannidis cataloged 235 forms of bias. Yes, 235 ways scientists can fool themselves, with sober names such as confounding, selection bias, recall bias, reporting bias, ascertainment bias, sex bias, cognitive bias, measurement bias, verification bias, publication bias, observer bias, and on and on. Biases are usually not deliberate or even a conscious choice.

One classic example is that, for many years, researchers favored using male mice because they found it more challenging to deal with the estrous cycles of females. Only many years later did they appreciate that they were deeply skewing some of their results by studying only males. Reporting bias is another common problem in biomedicine. Scientists are much more likely to report the results of an experiment that “worked” than one that failed, even though discovering the lack of an effect can be just as important as a positive finding. That tendency skews the biomedical literature, tilting it to create a publication bias that can grossly distort the purported effect of a drug.

Observer bias is another big problem. Scientists pursuing an exciting idea are more likely to see what they’re looking for in their data, and that alone can completely skew the results. Medical researchers sometimes avoid this by running double-blind trials in which neither the experimental subjects nor the scientists themselves know who’s taking a drug and who’s taking a placebo. Blinding is also good practice in laboratory experiments, but it’s not rigorously applied. Ken Yamada at the NIH says it’s more likely to be used in obvious situations, such as when someone is peering through a microscope and making judgment calls about cell shapes. But many scientists who conduct animal experiments don’t bother to blind those studies (or if they do, they don’t bother to report this important fact in their publications). Yamada said since the issue of reproducibility bubbled up in the past few years, he has become much more aware of the need to blind his experiments. He now insists on it in his lab. “It is more work because you have to involve someone else to do the blinding or you need to have some randomization technique or something like that,” he said. And tapping neutral lab members to do that work can distract them from their own projects. So it can be a nuisance, but “it’s really worth it.”

Sometimes bias arises because the effect scientists are trying to study can’t be measured cleanly or quantified. Gregory Petsko at the Weill Cornell Medical College studies Alzheimer’s disease. Scientists want to measure cognition in Alzheimer’s studies, but Petsko says cognition “is not a thing.… I don’t even know what that is, much less how you measure it. Those are serious reproducibility issues, and they stem not out of malfeasance… but rather out of our inability to understand something that we’re trying to use.”

Still, scientists need to find something to measure, particularly if they’re trying to figure out whether a drug is having an effect. And often they’re looking for a small one. A change of only 10 or 20 percent can have profound biological significance (a small change in body temperature, for instance, is likely to be disastrous). And it can be very difficult to see a small effect, particularly if you’re measuring something highly variable. Consider your daily commute. If you drive, the time will vary simply because of traffic. If you wanted to know whether alternate routes were faster, it could take many trials before you would be confident that one was, on average, better.

Not only are the effects variable but the instruments for measuring them can be blunt. Muscular dystrophy is a disease affecting mostly boys that causes gradual loss of muscle control. Scientists studying it rely very heavily on a measure called the six-minute walk test to gauge how the disease is progressing and to test whether a potential drug is effective. Researchers put two pylons down in a hospital corridor and ask their young patients to circle them for six minutes. The researchers measure how far they travel during that time. Not surprisingly, children’s performances can vary a lot. Researchers at Nationwide Children’s Hospital in Columbus, Ohio, asked nine boys to perform the six-minute walk test. After they’d rested, four randomly selected boys were asked to do it again with a reminder to “walk quickly and safely.” The other five were told that they’d get $50 if they could beat their previous time. The boys with the cash incentive improved a lot more than the other boys did. That may not sound surprising, but Eric Hoffman, a muscular dystrophy researcher at Children’s National Medical Center in Washington, DC, was astounded by the amount of improvement a $50 cash incentive generated: the effect was bigger than scientists have seen for any of the drugs being developed for muscular dystrophy. He said the six-minute walk test is a terrible measuring stick (though he used saltier language he asked me not to repeat). “That’s not something you want your whole drug development worldwide to depend on.” But, for lack of anything better, the test is in fact the accepted standard (and boys are not tempted with $50 bills in the formal drug studies).

image

Even when bias is minimized, biomedical scientists always have to face the fact that nature is fickle. Ken Yamada at the NIH said he spent more than a year perfecting a technique to mass-produce a useful protein called fibronectin. He had the system humming along beautifully, “and then suddenly over the span of a few weeks my isolation techniques didn’t work. I tried everything I could think of” but couldn’t produce the protein at a useful rate anymore. “We are just dealing with biological systems,” he said, “and sometimes these things happen.” Yamada was never able to figure out why his process was so reproducible for a year and then suddenly just stopped working. He ultimately devised a new method. “It’s kind of humbling,” Yamada told me. The one saving grace was that he hadn’t yet published his methods for mass-producing this widely used protein. “It would have been very embarrassing.”

Every scientist can tell you a story about experiments gone awry for reasons they’d never expected. An undetectable change in water quality can disrupt experiments. Sometimes switching from one batch of nutrients to another can make a major difference. (Nutrients may be variable biological material as well—fetal bovine serum, for example, is commonly used to help cells grow in the laboratory.) One lab moved its precious genetically engineered mice to a new facility and watched with horror as they all died—apparently because the bedding for the animals was switched to a commonly used corn-based material. Olaf Andersen at the Weill Cornell Medical College told me he nearly lost a friendship over differing results published by his lab and that of a close colleague. Finally, after some bitter words, they decided to sit down and try to resolve the discrepancy. Sorting through the possibilities took months, but apparently the difference boiled down to this: Andersen cleaned his glassware with acid, while his colleague used detergent.

Much of the time, scientists never get to the bottom of the mystery. They find a work-around (or sometimes a whole new project) and move on. But Curt Hines couldn’t simply deflect his problem. Hines was toiling away in a lab in Berkeley while his collaborators were doing complementary experiments in Boston. The experiments in this study required freshly gathered breast tissue (from women undergoing breast-reduction surgery). The scientists needed to isolate one of the many cell types found in healthy breast tissue as part of an experiment to study breast cancer. “We had tried shipping cells across the country, but the cells didn’t really like that. You end up with a vial of dead cells,” Hines told me. So the teams decided they would run the same experiment in parallel. But when they compared the results achieved by Hines’s lab at Lawrence Berkeley National Laboratory and his collaborator’s lab at the Dana-Farber Cancer Institute, they were crestfallen to discover that they didn’t match. This was particularly troublesome because both labs had decades of experience in working with breast cells, so work in one lab should readily be reproduced in the other. “It didn’t matter how we’d do it,” Hines told me. “I’d get my profile, [Boston postdoc Ying Su] would get her profile.” To get to the bottom of this, he tried to duplicate the Boston setup as closely as possible. In addition to the more traditional lab gear, “they were using a Cuisinart,” Hines said. “I went down and got the exact same model of Cuisinart they were using. That didn’t fix it.”

Finally, after a year of struggling to solve this very basic problem, Hines’s lab chief, Mina Bissell, said that the two scientists needed to get together in the same lab and work it out. They decided to meet at Hines’s lab, in a modern glass-sheathed building near the Berkeley waterfront. After sitting side by side as they worked through what they thought was an identical protocol, they discovered why their results diverged. At one step in the process, Hines stirred the cells by putting them in a device that rocked them back and forth gently, while Ying used a more vigorous stirring system that involved a spinning bar. Both methods are used routinely in labs, so there was no reason to suspect that this mundane step would produce utterly different results, but it did. “It took a little bit of luck, a little bit of patience, but I’m also stubborn,” Hines said. They published their tale in a scientific journal—a step that remarkably few labs bother to take, even when a mystery is probably plaguing labs elsewhere. Unfortunately, problems like this are so frequent in research that scientists may consider solutions too mundane to mention, and journals may not readily recognize them as scientific results worthy of publication.

Often, the problem scientists are trying to solve is not only extraordinarily challenging but also confusing. In cancer, for example, what kills people is usually not the initial tumor but the disease as it spreads—metastasizes—through the body. Your tax dollars have been hard at work exploring the key steps in this process. One critical step in metastasis occurs when a tumor cell manages to invade healthy tissue. Scientists believe if they can understand that process, they might be able to develop an anti-metastasis drug, which could revolutionize cancer treatment. Metastasis involves enzymes that eat through the structural framework of tissues, called the extracellular matrix. Hundreds of published scientific papers describe this process but report conflicting results. So in an attempt to sort out the story, the editors of the Journal of Cell Biology asked two NIH scientists, Thomas Bugge and Daniel Madsen, to review the literature and answer one simple but important question: Do these tissue-dissolving enzymes come from the invading cancer cells, from the tissue that’s being invaded, or both? Bugge thought this would be a simple exercise to bring some clarity to a muddled field. He was wrong. After realizing how many studies had been published on this subject, he and Madsen narrowed the field to just four enzymes in four major cancers: breast, colon, lung, and prostate. That still left them with nearly 250 studies to examine. And the results were all over the map. Some concluded that the enzymes came only from the tumor cells; some concluded that the enzymes came only from the surrounding tissue; some said they came from a combination of the two. One study used seven molecular probes to isolate the source of the enzymes and concluded that they were all coming from the surrounding tissue. Another used six probes and “found the exact opposite,” Bugge told me. “They were trying to be very thorough. Those were very well-done studies. They just came to opposite conclusions.”

Bugge came away from his study realizing that he had not only failed to arrive at a simple answer but discovered a deep problem affecting an important area of cancer research. “It was certainly never, at least for my part, meant to be a paper on reproducibility in science,” Bugge said. “That’s not the reason we started doing it. It was just halfway into the process we realized that is what it would have to be.” I sat down with Bugge four months after his paper was published. He had just returned from a scientific meeting where many of the field’s leading researchers got together to catch up. Bugge had long ago realized that the field was divided into camps: one believed the enzymes came from cancer cells, and the other didn’t. Perhaps these competing mind-sets explain why nobody had previously tried to reconcile the conflicting studies. He figured that his finding would cause some soul-searching or at least reflection, “but there was no discussion.… Maybe people don’t find this to be important,” he said with a tinge of irony.

In the 1990s, pharmaceutical companies had spent millions of dollars trying to block matrix-dissolving enzymes, hoping to come up with a drug that would stop cancer from metastasizing. They all ended in failure. You’d think it would be worth understanding why, and that requires understanding this critical piece of biology. But scientists aren’t rewarded for looking back—their careers depend on looking forward, toward the next big idea.

image

Given the complexity of nature and the inevitable limitations of the tools to study it, scientific disagreements can last years or even decades. (While arguably half of all studies in the telomerase field are wrong, for example, scientists undoubtedly disagree about which half to toss out.) “We only work on things that we don’t understand,” said Mark Davis, an HHMI investigator at Stanford. “And so that means by definition we’re perpetually confused, and we’re just trying to fight through the fog, and at least hold on to some things and illuminate some things that we hope are true, with the best tools that we have available.” He said that’s what makes his job so exciting. “It’s different from making cars or something.” The true art in science is figuring out which ideas are good and should rise to the top and which ideas to discard to move science forward. “The trouble is with human beings,” he said. “Human beings have vested interests.”

“If you’re a professional in any area, your status in the field, especially in academia, is based on your perceived expertise in that area. So the last thing you want to hear is usually some punk kid undermining stuff you’ve worked on. So there’s an inherent friction there.” And that’s not only true for an individual study; it can apply to entire fields. A transformative idea can be disruptive. Davis and his students had been working on one research paper for more than four years. Although deliberately vague about the unpublished ideas, he said his team had looked at some basic immunological findings observed in mice and often extrapolated to people. The new research shows that the mouse data don’t apply to humans and are leading immunologists astray. But journals keep rejecting the paper, Davis said. “One comment came back: ‘If this paper is published it will set the field back 10 or 20 years!’ And I thought that was a really remarkable statement. But ultimately I interpreted it as a cry for help. If you’re so insecure about your field that one paper could do so much damage, what have you got? What are you so proud of here that could be swept away so easily?”

“Clearly the old guard will suffer with a paradigm change, and sometimes their whole career will go away,” Davis told me. “That’s a very real thing that can happen, especially in crappy fields where nothing will happen for years and years and suddenly something happens.” Davis said that ten years ago human immunology fit his definition of a crappy field. There were a few talented scientists, but the field was intellectually adrift. “It was just dead. It was a wasteland.”

The same was true for cancer immunotherapy, he said, despite the breathless promises about interferons (immune-system modulators), which ultimately delivered far less than hoped. There was a lot of serious effort, “but you also had all these charlatans. You had all these people who were saying, ‘Well I’m too busy to do controls in my experiment because I’m curing cancer. Don’t bother me with doing good science. I’m curing cancer.’ But they weren’t.” Eventually, some scientists started cutting through the clutter. Researchers had identified parts of the immune system they call checkpoints and developed custom-built drugs, “checkpoint inhibitors” that help enlist the body’s immune system to unmask and attack certain tumors.

The story of checkpoints emerged from a rigorous set of observations and was carefully confirmed in multiple laboratories. Drug companies were gradually able to take that knowledge to develop a powerful new class of anticancer drugs. Though still not effective for most people who take them, they do make a remarkable difference for a subset of patients. Davis said cancer immunology is no longer on his list of crappy fields.

This is a reminder of just how much is at stake. In retrospect we can see how much more rapidly these ideas would have advanced had it not been for the blind alleys and missteps along the way. To some extent, those meanderings are unavoidable—that’s simply the nature of research on the frontier. But there are also many painful examples of delays, diversions, wasted time, and wasted money that scientists could have avoided had they been more careful along the way. And the failures begin right at the start—as scientists set out to design their experiments.