IT WAS ONE of those things that everybody knew but was too polite to say. Each year about a million biomedical studies are published in the scientific literature. And many of them are simply wrong. Set aside the voice-of-God prose, the fancy statistics, and the peer review process, which is supposed to weed out the weak and errant. Lots of this stuff just doesn’t stand up to scrutiny. Sometimes it’s because a scientist is exploring the precarious edge of knowledge. Sometimes the scientist has unconsciously willed the data to tell a story that’s not in fact true. Occasionally there is outright fraud. But a large share of what gets published is wrong.
C. Glenn Begley decided to say what most other people dared not speak. An Australian-born scientist, he had left academia after twenty-five years in the lab to head up cancer research at the pioneering biotech company Amgen in Southern California. While working in academia, Begley had codiscovered a protein called human G-CSF, which is now used in cancer treatments to reconstitute a person’s immune system after a potentially lethal dose of chemotherapy. G-CSF ultimately proved to be Amgen’s first blockbuster drug, so it’s not surprising that years later, when the company wanted to create an entire cancer research program, it hired Begley for the job.
Pharmaceutical companies rely heavily on published research from academic labs, which are largely funded by taxpayers, to get ideas for new drugs. Companies can then seize upon those ideas, develop them, and make them available as new treatments. Begley’s staff scoured the biomedical literature for hot leads for potential new drugs. Every time something looked promising, they’d start a dossier on the project. Begley insisted that the first step of any research project would be to have company scientists repeat the experiment to see if they could come up with the same results. Most of the time, Amgen labs couldn’t. That was duly noted on the dossier, the case was closed, and the scientists moved on to the next exciting idea reported in the scientific literature.
After ten years at Amgen, Begley was ready to move on. But before he went, he wanted to take stock of the studies that his team had filed away as not reproducible—focusing in particular on the ones that could have led to important drugs, if they had panned out. He chose fifty-three papers he considered potentially groundbreaking. And for this review, the company didn’t simply try to repeat the experiments—Begley asked the scientists who originally published these exciting results to help.
“The vast majority of the time the scientists were willing to work with us. There were only a couple of occasions where truly the scientists hung up on us and refused to continue the conversation,” he said. First, Begley asked the scientists to provide the exact materials that they had used in the original experiment. If Amgen again couldn’t reproduce the result with this material, they kept trying. “On about twenty occasions we actually sent [company] scientists to the host laboratory and watched them perform experiments themselves,” Begley told me. This time, however, the original researchers were kept in the dark about which part of the experiment was supposed to produce positive results and which would serve as a comparison group (the control). Most of the time, the experiments failed under these blinded conditions. “So it wasn’t just that Amgen was unable to reproduce them,” Begley said. “What was more shocking to me was the original investigators themselves were not able to.” Of the fifty-three original exciting studies that Begley put to the test, he could reproduce just six. Six. That’s barely one out of ten.
Begley went to the Amgen board of directors and asked what he should do with this information. They told him to publish it. The German drug maker Bayer had undertaken a similar project and got nearly as desultory results (it was able to replicate 25 percent of the studies it reexamined). That study, published in a specialty journal in September 2011, hadn’t sparked a lot of public discussion. Begley thought his study would gain more credibility if he recruited an academic scientist as a coauthor. Lee Ellis from the MD Anderson Cancer Center in Houston lent his name and analysis to the effort. He, too, had been outspoken about the need for more rigor in cancer research. When the journal Nature published their commentary in March 2012, people suddenly took notice. Begley and Ellis had put this issue squarely in front of their colleagues.
They were hardly treated as heroes. Robert Weinberg, a prominent cancer researcher at the Massachusetts Institute of Technology, told me, “To my mind that [paper] was a testimonial to the silliness of the people in industry—their naïveté and their lack of competence.” When they spoke at conferences, Begley said, scientists would stand up and tell them “that we were doing the scientific community a disservice that would decrease research funding and so on.” But he said the conversation was always different at the hotel bar, where scientists would quietly acknowledge that this was a corrosive issue for the field. “It was common knowledge; it just was unspoken. The shocking part was that we said it out loud.”
The issue of reproducibility in biomedical science has been simmering for many years. As far back as the 1960s, scientists raised the alarm about well-known pitfalls—for instance, warning that human cells widely used in laboratory studies were often not at all what they purported to be. In 2005, John Ioannidis published a widely cited paper, titled “Why Most Published Research Findings Are False,” that highlighted the considerable problems caused by flimsy study design and analysis. But with the papers from Bayer and then Begley, a problem that had been causing quiet consternation suddenly crossed a threshold. In a remarkably short time, the issue went from back to front burner.
Some people call it a “reproducibility crisis.” At issue is not simply that scientists are wasting their time and our tax dollars; misleading results in laboratory research are actually slowing progress in the search for treatments and cures. This work is at the very heart of the advances in medicine. Basic research—using animals, cells, and the molecules of life such as DNA—reveals the underlying biology of health and disease. Much of this endeavor is called “preclinical research” with the hope that discoveries will lead to actual human studies (in the clinic). But if these preclinical discoveries are deeply flawed, scientists can spend years (not to mention untold millions of dollars) lost in dead ends. Those periodic promises that we’re going to cure cancer or vanquish Alzheimer’s rest on the belief that scientific discoveries are moving us in that direction. No doubt some of them are, but many published results are actually red herrings. And the shock from the Begley and Ellis and Bayer papers wasn’t just that scientists make mistakes. These studies sent the message that errors like that are incredibly common.
At first blush, that seems implausible, which is perhaps one reason that it took so long for the idea to gain currency. After all, scientists on the whole are very smart people. Collectively they have a long record of success. Biomedical research is responsible for most of the pills in our medicine cabinets, not to mention Nobel Prize–winning insights about the very nature of our being. Many biomedical scientists are motivated to discover new secrets of life—and to make the world a better place for humanity. Some scientists studying disease have relatives or loved ones who have suffered from these maladies, and they want to find cures. Academics aren’t generally in it for the money. There are more lucrative ways to make use of a PhD in these fields of science. Last but not least, scientists take pride in getting it right. Failure is an inevitable aspect of research—after all, scientists are groping around at the edges of knowledge—but avoidable mistakes are embarrassing and, worse, counterproductive.
The ecosystem in which academic scientists work has created conditions that actually set them up for failure. There’s a constant scramble for research dollars. Promotions and tenure depend on their making splashy discoveries. There are big rewards for being first, even if the work ultimately fails the test of time. And there are few penalties for getting it wrong. In fact, given the scale of this problem, it’s evident that many scientists don’t even realize that they are making mistakes. Frequently scientists assume what they read in the literature is true and start research projects based on that assumption. Begley said one of the studies he couldn’t reproduce has been cited more than 2,000 times by other researchers, who have been building on or at least referring to it, without actually validating the underlying result.
There’s little funding and no glory involved in checking someone else’s work. So errors often only become evident years later, when a popular idea that is poorly founded in fact is finally put to the test with a careful experiment and suddenly melts away. A false lead can fool whole fields into spending years of research and millions of dollars of research funding chasing after something that turns out not to be true.
Failures often surface when it’s time to use an idea to develop a drug. That’s why Glenn Begley’s results were so jaw-dropping. That very high failure rate focused on studies that really mattered. Drug companies rely heavily on academic research for new insights into biology—and particularly for leads for new drugs to develop. If academia is pumping out dubious results, that means pharmaceutical companies will struggle to produce new drugs. Of course, Begley’s test involved just fifty-three studies out of the millions in the scientific literature. And he chose those papers because they had surprising, potentially useful results. Perhaps a survey of more mundane studies would show a higher success rate—but, of course, those studies aren’t likely to lead to big advances in medicine.
There has been no systematic attempt to measure the quality of biomedical science as a whole, but Leonard Freedman, who started a nonprofit called the Global Biological Standards Institute, teamed up with two economists to put a dollar figure on the problem in the United States. Extrapolating results from the few small studies that have attempted to quantify it, they estimated that 20 percent of studies have untrustworthy designs; about 25 percent use dubious ingredients, such as contaminated cells or antibodies that aren’t nearly as selective and accurate as scientists assume them to be; 8 percent involve poor lab technique; and 18 percent of the time, scientists mishandle their data analysis. In sum, Freedman figured that about half of all preclinical research isn’t trustworthy. He went on to calculate that untrustworthy papers are produced at the cost of $28 billion a year. This eye-popping estimate has raised more than a few skeptical eyebrows—and Freedman is the first to admit that the figure is soft, representing “a reasonable starting point for further debate.”
“To be clear, this does not imply that there was no return on that investment,” Freedman and his colleagues wrote. A lot of what they define as “not reproducible” really means that scientists who pick up a scientific paper won’t find enough information in it to run the experiment themselves. That’s a problem, to be sure, but hardly a disaster. The bigger problem is that the errors and missteps that Freedman highlights are, as Begley found, exceptionally common. And while scientists readily acknowledge that failure is part of the fabric of science, they are less likely to recognize just how often preventable errors taint studies.
“I don’t think anyone gets up in the morning and goes to work with the intention to do bad science or sloppy science,” said Malcolm Macleod at the University of Edinburgh. He has been writing and thinking about this problem for more than a decade. He started off wondering why almost no treatment for stroke has succeeded (with the exception of the drug tPA, which dissolves blood clots but doesn’t act on damaged nerve cells), despite many seemingly promising leads from animal studies. As he dug into this question, he came to a sobering conclusion. Unconscious bias among scientists arises every step of the way: in selecting the correct number of animals for a study, in deciding which results to include and which to simply toss aside, and in analyzing the final results. Each step of that process introduces considerable uncertainty. Macleod said that when you compound those sources of bias and error, only around 15 percent of published studies may be correct. In many cases, the reported effect may be real but considerably weaker than the study concludes.
Mostly these estimated failure rates are educated guesses. Only a few studies have tried to measure the magnitude of this problem directly. Scientists at the MD Anderson Cancer Center asked their colleagues whether they’d ever had trouble reproducing a study. Two-thirds of the senior investigators answered yes. Asked whether the differences were ever resolved, only about a third said they had been. “This finding is very alarming as scientific knowledge and advancement are based upon peer-reviewed publications, the cornerstone of access to ‘presumed’ knowledge,” the authors wrote when they published the survey findings.
The American Society for Cell Biology (ASCB) surveyed its members in 2014 and found that 71 percent of those who responded had at some point been unable to replicate a published result. Again, 40 percent of the time, the conflict was never resolved. Two-thirds of the time, the scientists suspected that the original finding had been a false positive or had been tainted by “a lack of expertise or rigor.” ASCB adds an important caveat: of the 8,000 members it surveyed, it heard back from 11 percent, so its numbers aren’t convincing. That said, Nature surveyed more than 1,500 scientists in the spring of 2016 and saw very similar results: more than 70 percent of those scientists had tried and failed to reproduce an experiment, and about half of those who responded agreed that there’s a “significant crisis” of reproducibility.
These concerns are not being ignored. From the director’s office in Building 1 at the National Institutes of Health (NIH), Francis Collins and his chief deputy, Lawrence Tabak, declared in a 2014 Nature comment, “We share this concern” over reproducibility. In the long run, science is a self-correcting system, but, they warn, “in the shorter term—the checks and balances that once ensured scientific fidelity have been hobbled.” Janet Woodcock, a senior official at the Food and Drug Administration (FDA), was even more blunt. “I think it’s a totally chaotic enterprise.” She told me drug companies like Amgen usually discover problems early on in the process and bear the brunt of weeding out the poorly done science. But “sometimes we [FDA regulators] have to use experiments that have been done in the academic world,” for example, by university scientists who are working on a drug for a rare disease. “And we just encounter horrendous problems all the time.” When potential drugs make it into the more rigorous pharmaceutical testing regimes, nine out of ten fail. Woodcock said that’s because the underlying science isn’t rigorous. “It’s like nine out of ten airplanes we designed fell out of the sky. Or nine out of ten bridges we built failed to stand up.” She rocked back and laughed at the very absurdity of the idea. And then she got serious. “We need rigorous science we can rely on.”
Arturo Casadevall at the Johns Hopkins Bloomberg School of Public Health shares that sense of alarm. “Humanity is about to go through a couple of really rough centuries. There is no way around this,” he said, looking out on a future with a burgeoning population stressed for food, water, and other basic resources. Over the previous few centuries, we have managed a steadily improving trajectory, despite astounding population growth. “The scientific revolution has allowed humanity to avoid a Malthusian crisis over and over again,” he said. To get through the next couple of centuries, “we need to have a scientific enterprise that is working as best as it can. And I fundamentally think that it isn’t.”
We’re already experiencing a slowdown in progress, especially in biomedicine. By Casadevall’s reckoning, medical researchers made much more progress between 1950 and 1980 than they did in the following three decades. Consider the development of blood-pressure drugs, chemotherapy, organ transplants, and other transformative technologies. Those all appeared in the decades before 1980. His ninety-two-year-old mother is a walking testament to steadily improving health in the developed world. She is taking six drugs, five of which “were being used when I was a resident at Bellevue Hospital in the early 1980s.” The one new medication? For heartburn. “You would think that with all we know today we should be doing a lot better. Why aren’t we there?”
The rate of new-drug approval has been falling since the 1950s. In 2012, Jack Scannell and his colleagues coined the term “Eroom’s law” to describe the steadily worsening state of drug development. “Eroom,” they explained, is “Moore” spelled backward. Moore’s law charts the exponential progress in the efficiency of computer chips; the pharmaceutical industry, however, is headed in the opposite direction. If you extrapolate the trend, starting in 1950, you’ll find that drug development essentially comes to a halt in 2040. Beyond that point developing any drug becomes infinitely expensive. (That forecast is undoubtedly too pessimistic, but it makes a dramatic point.) The only notable uptick occurred around the mid-1990s, when researchers made some remarkable progress in developing drugs for HIV/AIDS. (The situation improved modestly in the years after Scannell and colleagues’ analysis ended in 2010.) These researchers blame Eroom’s law on a combination of economic, historical, and scientific trends. Scannell told me that a lack of rigor in biomedical research is an important underlying cause.
For Sally Curtin, it’s personal. Crisis struck on February 5, 2010. She came downstairs in her eastern Maryland home to find her fifty-eight-year-old husband, Lester “Randy” Curtin, lying unconscious on the floor. She and an emergency crew fought through a blizzard to get him to the hospital. It took doctors four days to reach a diagnosis, and the news could hardly have been worse. Randy had a brain tumor, glioblastoma multiforme.
Both Sally and Randy worked at the National Center for Health Statistics (part of the Centers for Disease Control and Prevention). He was the guy colleagues went to when they were having trouble working through a statistical problem. When it came to his own odds, the doctors told them not to look at the survival numbers—but “we’re numbers people,” Sally Curtin told me. “The first thing we did was go look at the numbers.” Half of patients with this diagnosis live less than fifteen months, and 95 percent are dead within five years.
“I had never heard the term glioblastoma. It seemed unreal to me that there was a cancer this lethal that they had not made progress on in fifty years,” Sally told me. This cancer strikes about 12,000 Americans per year. (Senator Ted Kennedy was one of the most notable victims. Vice President Joe Biden also lost his son Beau to glioblastoma.) Even so, the Curtins hoped they could beat the odds. They signed Randy up for three separate clinical trials at the National Institutes of Health—experimental treatments that they hoped would keep the spreading tumors in check. None of them worked. In fact, in one brief period during the treatment, the tumors grew by 40 percent.
The worst part was that the disease was attacking the brain of a man with a powerful intellect. “His oldest daughter put it best. She said it’s like telling someone who’s afraid of the water that you are going to have death by drowning.” With treatment options exhausted, Randy returned home to Huntingtown, Maryland, and registered for hospice. As the disease progressed, Sally said her husband had hallucinations. He would smash furniture, and once he pulled down the TV. “He really scared the kids,” nine-year-old Daniel and eleven-year-old Kevin. “It wasn’t like he was abusive or angry at us. He was just out of his mind” as the tumor grew. He hung on for seven months, increasingly agitated and in constant pain. Near the end, he asked Sally to overdose him with morphine, but she could not take his life. Eventually he slipped into a coma. At one point he had a seizure that jolted him out of it and was lucid enough to tell Sally, “I love you.” That was the last thing he said to her. Five days later, he died, shortly after his sixtieth birthday.
Sally told me the story with strength and resolve. I wasn’t surprised to learn that she had insisted on speaking at his memorial service, eight days after he died. She says she was able to keep her composure. Now in her early fifties, she’s trying to figure out what the life of a widow is supposed to look like.
Glioblastoma provides a glimpse into the broader challenges facing biomedical science. Over the years, scientists have published more than 25,000 papers on the disease. The NIH spends about $300 million a year on brain cancer research. Scientists have made some headway in understanding the biology of this disease, but it simply hasn’t translated into effective treatments—in part because the cells and animals studied in the lab are poor stand-ins for human beings. The failure rate may also reflect a lack of rigor in some studies testing experimental treatments in people.
At Arizona State University in Tempe, Anna Barker has been keeping score. Taped to the door of her light-filled conference room was a poster filled with print so fine I couldn’t read the words from a short distance away. Barker told me it listed two hundred clinical trials that had been run on glioblastoma multiforme. Every single one was a failure. And the results of cancer studies overall aren’t that much better. “Probably 65 to 80 percent of our trials in oncology fail,” she told me. “Look at the money wasted. It’s unbelievable.” Barker has a personal passion for doing something about this. “Ultimately I lost my whole family to cancer, which is pretty amazing when you think about it.” She was twelve years old when her grandmother died of pancreatic cancer, and “that was the reason I wanted to work in cancer research. But it never dawned on me as I grew up… that I would lose my sister and mother and father to cancer.”
Before coming to Arizona State, Barker was deputy director of the National Cancer Institute. There she saw one big problem with cancer research: scientists were not approaching many studies with enough rigor. Each scientist had his or her own way of working, but those were not standardized or often repeatable. That’s the culture of biomedical science today—researchers are individual entrepreneurs, each attacking a small piece of the problem with gusto. Barker says that unfortunately the quality of the work is all over the map—and there’s typically no way to tell which studies you can believe and which you can’t, especially when scientists try to add together results from different laboratories, each of which has used its own methods.
“Everyone says, ‘It’s not my problem,’” Barker told me. “But it has to be someone’s problem. What about accountability? At the National Cancer Institute, we spent a lot of money. Our budget was $5 billion a year. That’s not a trivial amount of investment. If a major percentage of our data is not reproducible, is the American taxpayer being well served?” Barker reads scientific journals with trepidation. “I have no clue whether to trust the data or not,” she said. She has thought long and hard about how biomedical science got to this point and has some strong ideas about how to seize this moment to institute significant reforms. She’s putting some of these ideas to the test by trying to revolutionize the treatment of glioblastoma brain tumors (more about that in Chapter 9).
Barker has good reasons to approach the scientific literature with caution. When an exciting scientific discovery is reported, scientists are quick to jump on the bandwagon, often without considering whether the original finding is in fact true. Here’s a case in point. In 1999 and 2000, several scientists made a startling claim: they announced that bone marrow stem cells could spontaneously transform themselves into cells of the liver, brain, and other organs. “Transdifferentiation,” as this was called, created instantaneous excitement because up until that point scientists had been harvesting from human embryos the stem cells they wanted for research. It seemed transdifferentiation could provide a much less fraught method. In short order, the scientific literature filled with dozens and eventually hundreds of papers backing up this rather remarkable finding. Some scientists even dropped whatever they’d been working on and started devoting their time to transdifferentiation.
The first splash of very cold water came from Amy Wagers, working in Irving Weissman’s lab at Stanford University. First she irradiated mice to kill off their bone marrow cells and then injected them with a single bone marrow stem cell from another mouse—a cell that glowed green. That cell did indeed divide and create a variety of bone marrow cells (which is what these stem cells normally do, so no surprise). But it did not transdifferentiate into kidney, brain, liver, gut, muscle, or lung cells as previous experiments claimed it would. In a second experiment, she surgically connected the circulatory systems of several pairs of mice, with one in each pair containing the green glowing cells, and observed them for six to seven months. Wagers and her colleagues examined millions of cells in the recipient mice and again found no evidence of transdifferentiation. True, a few cells unexpectedly turned green (which other scientists had noticed in previous studies), but that was because cells were fusing with one another, not changing their fundamental identities. So much for the idea of transdifferentiation. In 2002, Wagers concluded with typical scientific understatement that transdifferentiation is “not a typical function” of normal stem cells found in the bone marrow.
Some researchers simply dismissed Wagers’s discovery and kept on publishing their transdifferentiation results. It can be hard to give up on an idea, particularly if you’ve placed a heavy intellectual bet on it, investing time, reputation, and money. And scientists kept on reporting results that they thought were building their case. It was all a mirage. “Most of these studies turned out not to be reproducible,” wrote Sean Morrison, a Howard Hughes Medical Institute investigator at the University of Texas Southwestern Medical Center. The mundane explanation for the initially exciting results actually had nothing to do with cells changing from one type to another. “This episode illustrated how the power of suggestion could cause many scientists to see things in their experiments that weren’t really there and how it takes years for a field to self-correct,” Morrison wrote in a scientific editorial, noting that scientists are sometimes too eager to rush forward “without ever rigorously testing the central ideas. Under these circumstances dogma can arise like a house of cards, all to come crumbling down later when somebody has the energy to do the careful experiments and the courage to publish the results.”
Scientists concerned about this problem in biomedical research have come to call it the reproducibility crisis. But that term doesn’t capture the true scope of what’s happening. The scientific method, when properly used, doesn’t simply apply to the conduct of a single experiment and ask whether it can be reproduced. The scientific method should also help researchers build a deeper understanding of biology and disease. It’s not enough to know whether a particular discovery can be replicated using the exact same set of ingredients. Scientists want to find results that mean something more broadly. The overarching goal of biomedical research is to understand the basic processes that lead to disease so that medical science can intervene to ease human suffering and improve health.
That requires rigor in every individual experiment. But it also requires rigorous thought and insight putting those results into a broader context. And biomedical science is now suffering from a lack of rigor. Of course one way to measure rigor is to look at the first, fundamental step: testing whether individual studies can be reproduced. That’s one reason Glenn Begley’s paper struck such a nerve. And despite the hullabaloo it has caused, there has been a noticeable silence from the forty-six labs that produced the forty-seven papers Begley (and often they themselves) could not reproduce. “None of the papers have since been retracted,” Begley said. “No one has published follow-ups saying that the data was different. So I think probably they just felt that the first time was fine and something went wrong the second time. Many of these investigators had already moved on, which is typical in academia. They had moved on to whatever the next project was. So in most cases there was no real desire to set the record straight.” In industry, exploring an idea for a new drug can quickly balloon into serious money, “so you can’t just assume that you will be able to justify spending $100 million without replicating first.”
When Begley undertook his project, Amgen signed secrecy agreements with the individual scientists, promising not to reveal their identities so as to spare them potential embarrassment. Other scientists have criticized Begley’s work on those grounds—in point of fact, his report about reproducibility is not reproducible! Nobody can choose the same experiments and try to repeat Begley’s work. He agrees with his critics that secrecy is a major shortcoming of his study. The only people who could disclose that information were the authors of the original studies, he said. “And they chose not to do so.”
“I’ve invested my life in this area and this was just shocking,” Begley says. “Since then, I’ve tried to do as much as I can. I hope that this will really change the way science is performed.” After Begley’s paper about irreproducible results made its splash, he heard from a postdoctoral researcher who pleaded with him to reveal the identities of those who did the flawed experiments. The young scientist worried that he was wasting his time working on a project based on one of them. Begley explained that the deal he’d cut with the researchers prevented him from exposing them, but the question did disturb him. His solution was to write a follow-up comment in Nature titled “Six Red Flags for Suspect Work.” In it, he ran down the list of the six most common preventable failures he encountered. They’re worth repeating here because they are very common failings found in biomedical research, and they explain a good deal of the reproducibility problem. Here are the questions that researchers should ask:
1. Were experiments performed blinded—that is, did scientists know, as they were doing the experiment, which cells or animals were the test group and which were the comparison group?
2. Were basic experiments repeated?
3. Were all the results presented? Sometimes researchers cherry-pick their best-looking results and ignore other attempts that failed, skewing their results.
4. Were there positive and negative controls? This means running parallel experiments as comparisons, one of which should succeed and the other of which should fail if the scientist’s hypothesis is correct.
5. Did scientists make sure they were using valid ingredients?
6. Were statistical tests appropriate? Very often biomedical scientists choose the wrong methods to analyze their data, sometimes invalidating the entire study.
This list should be as familiar to scientists as the carpenter’s dictum is to home builders: measure twice, cut once. Alas, the rules are often not applied. Training for a career in biomedical research is a haphazard process, with few formal courses. People learn from their mentors, for better or worse. In some fields, it’s simply not tradition for scientists to follow these commonsense standards. For example, scientists studying mice in the lab may or may not believe it is important to assign their animals randomly to their study and control groups. And even when scientists follow these rules, they can still fail to generate reproducible results. Biomedical research is challenging even under the best circumstances.