IN 1994, NINA DESAI and her colleagues at the Ohio State University published some good news: they announced the creation of a new tool to produce test-tube babies. The team said it had isolated a line of human cells from a woman’s womb and coaxed them to grow perpetually in the lab. They planned use these cells as microscopic nursemaids, to help provide growth factors for human embryos being nurtured in the lab to help infertile couples conceive.
Desai moved to Cleveland and in 2000 became director of in vitro fertilization (IVF) research at the Cleveland Clinic Foundation. There she put her special cell line to work. As she described in a 2008 paper, she would put 15,000 to 30,000 of these cells in a plastic dish. A permeable membrane kept them from direct contact with the embryos but allowed the biological materials they produced to wash into a second chamber, which contained embryos being nurtured in preparation for implantation in a woman’s womb. She reported that the procedure had been used to treat the embryos of 316 women between January 2004 and March 2007. By the date of her publication, the tests had resulted in 111 apparently healthy infants born to 76 women.
But the story was not quite so simple. When Christopher Korch, at the University of Colorado School of Medicine in Denver, came across this paper, he was puzzled about the exact identity of the cell line that Desai said she had established. Normal cells can be coaxed to grow in the lab for a while, but they eventually peter out. This cell line kept going—it had become immortal. “When you hear that, then you really have to worry,” Korch said. “That is the warning sign you’ve got something else in there.” Cells rarely transform spontaneously, but they readily become contaminated with aggressive cell lines that can easily move through a lab. This enormous contamination problem dates back more than half a century and has cast doubt on a large slice of the biomedical research literature.
Scientists first managed to keep human cells proliferating in the lab in 1951. Researchers at the Johns Hopkins Hospital extracted some cervical cancer tissue from a woman whose story is chronicled in Rebecca Skloot’s The Immortal Life of Henrietta Lacks. Those rapidly proliferating cells became a favorite lab tool for scientists interested in studying cervical cancer in particular and human biology more generally. Unlike typical cells extracted from a person, these kept dividing and proliferating indefinitely. This immortal cell line, labeled HeLa, was just the first of many. And because HeLa cells grew so quickly, they became rapacious weeds in the world of biomedical research labs. The slightest lapse in hygiene can transfer a HeLa cell from one dish to another that’s harboring a different line. The fast-growing HeLa cells quickly crowd out the other cells and simply take over.
Through the 1960s and 1970s, Walter Nelson-Rees made no end of enemies in science by testing cell lines purported to be from many different cancers and pointing out correctly—but brusquely—that they were in fact HeLa cells. Nelson-Rees, who curated the National Cancer Institute’s cell repository in Oakland, California, waged a bitter campaign at that time to make scientists realize that they were not, in fact, exploring breast cancer or liver cancer or whatever they thought they were working on. Word of his campaign spread throughout the science world and far beyond into the newspapers and magazines of the day. In 1986, science writer Michael Gold wrote A Conspiracy of Cells, a lively history of Nelson-Rees and his campaign against HeLa. And how did the scientists using cells in their research respond? They mostly ignored the problem.
Even today, despite the easy availability of conclusive identity tests, HeLa crops up frequently in labs intending to investigate something else. KB cells, used extensively to study oral cancer? They’re actually HeLa. Human epithelial type 2 (HEp-2) cells, thought to be a sample of cancer from the larynx? Actually HeLa, as well. Chang liver, Intestine-407 (Int-407), and WISH cells? HeLa, each and every one. That’s ancient history—all were exposed for what they are in the 1960s. Even so, more than 7,000 published studies have used HEp-2 or Int-407 cells, unaware that they were actually HeLa, at an estimated cost of more than $700 million.
And that’s just a sliver of the problem. A 2007 study estimated that between 18 and 36 percent of all cell experiments use misidentified cell lines. That adds up to tens of thousands of studies, costing billions of dollars. About a quarter of those misidentified lines are actually HeLa, but there are plenty of other masqueraders out there. Sometimes even the species isn’t correct. Nelson-Rees found a “mongoose” cell line was actually human and determined that two “hamster” cell lines were from marmosets and humans, respectively. “Have the Marx Brothers taken over the cell culture labs?” Roland Nardone asked in a 2008 paper bemoaning this state of affairs.
Nardone, a biologist at the Catholic University of America, took up the cause of contaminated cells in 2005, after his son wished him a happy seventy-seventh birthday and asked what he intended to do with the rest of his life. After some reflection, he decided to pick up where Nelson-Rees had left off. In 2015, at the age of eighty-seven, Nardone, with a shock of white hair and bushy white eyebrows, rose out of a wheelchair and gripped a lectern at the National Institutes of Health (NIH) to talk about his ten-year quest to straighten out this glaring problem. He had written a paper in 2007 urging zero tolerance for these bogus cell lines. “I thought the bandwagon would be crowded with all the people who would want to jump on board,” he said. That was not to be the case. The NIH actually responded to Nardone’s letter promptly and set out a new policy. “But it wasn’t strong,” he said. “It just encouraged increased vigilance and oversight.” Nature published an editorial in 2009 declaring that it would institute a new policy to set this right but in fact took no formal action for years. By the end of his talk, Nardone had worked himself into a state of righteous indignation. “How dare we not authenticate our cells, regardless of what we’re doing!”
Nardone was not the only scientist growing alarmed about this issue. Gradually a group started to coalesce. Amanda Capes-Davis, an Australian cell biologist, emerged as a leader of this loose-knit band. She had set up a cell bank at her institute in Sydney and quickly became aware that many cells circulating among scientists were misidentified. She cast around to see if anyone had compiled a list of those imposter cells and couldn’t find one. So in 2009 she quit her job to devote her full attention to this matter. “I spent six months at home, working on what I thought was a reasonable list,” she told me. When she got about two-thirds of the way through, she discovered that a Scottish cell bank scientist, R. Ian Freshney, had been working on a list as well. “My first reaction was that six months was wasted. Someone else has already done this,” she said. “My second reaction was, let’s look at the lists.” It turned out that although they shared a lot, each scientist had dug up unique problem cell lines from the scientific literature. A partnership was born.
Their list of contaminated cell lines gradually grew into the hundreds. People had various methods of identifying these wayward cells. Nelson-Rees in the 1970s and 1980s examined them under a microscope to look at the patterns of the bands on their chromosomes. He also ran some enzyme tests that were more informative but still not definitive. By the 2000s, however, scientists could readily use genetic fingerprinting techniques to identify cell lines. So the nation’s research cell bank, the American Type Culture Collection (ATCC), decided it was time to settle on a standard test that biologists around the world would use to verify their cell lines. Officials there asked Capes-Davis, Freshney, and a small band of their associates to sit down and write that standard. It took two years, but in the end they settled on a cell-fingerprinting technology that was reliable, reproducible, and inexpensive—typically less than $200 per test.
For Capes-Davis, this whole issue became a labor of love that she performed as a volunteer from the studio of her Sydney home. Her passion was partly intellectual—there are intriguing mysteries to solve here—and partly ethical. “When I tried to establish cell lines [as a researcher] I went to collect samples from potential donors,” she said. “We have a responsibility to look after those samples.” With the standard in hand, Capes-Davis was then anointed to chair an organization that sprang up around this issue, the International Cell Line Authentication Committee. It maintains and updates the list of corrupted cell lines, which by 2016 had grown to 438, with no end in sight. And she continues to expose the history of these wayward cell lines—particularly the ones that Walter Nelson-Rees was already flagging decades ago, such as KB and Chang liver. “Those are my horror stories because they are still so widely used,” she said. “These are from the 1960s—why are we still using them so much?”
One of the most flagrant examples that Amanda Capes-Davis, Christopher Korch, and their colleagues investigated involved a cell line widely used to study breast cancer. This story starts in Houston, on January 23, 1976. A thirty-one-year-old woman diagnosed with early-onset breast cancer was seen at the MD Anderson Hospital and Tumor Institute. Fluid had been accumulating around her lungs. A hospital worker drew some into a syringe and delivered it to the laboratory of Relda Cailleau. Cailleau and her colleagues were in the midst of a six-year project to capture breast cancer cells in order to cultivate them in the laboratory. The cells from this young woman did indeed take hold in a petri dish, becoming part of a collection of nineteen different breast cancer cells extracted between 1973 and 1978 at MD Anderson. The cells from this particular woman were dubbed MDA-MB-435 (and sometimes labeled MDA-435). And it turns out they were especially useful, as they had the rare ability to spread in mice the way cancer metastasizes in people. In short order, labs around the country clamored for samples of MDA-MB-435 to study metastatic breast cancer. It proved so popular that in the late 1980s, the National Cancer Institute selected it as one of sixty key lines that would get extraordinary attention. This collection, dubbed the NCI-60, would be used to test hundreds of thousands of potential new cancer drugs. Over the years, hundreds upon hundreds of journal publications reported breast cancer experiments involving MDA-MB-435, as scientists hoped they were homing in on better treatments or even a cure. It turned out that MDA-MB-435 was an imposter.
The cell was unmasked quite by accident. Back in the late 1990s, scientists at Stanford University were developing a test that would allow them to look at a biological sample and see which genes are switched on or off in any given cell. Doug Ross was a postdoctoral researcher in a star-studded laboratory that helped develop these powerful new genetic tools. His boss, Pat Brown, put him in charge of a marquee project: a study of all sixty of the lines in the NCI-60. He and his colleagues set up an experiment to investigate about 8,000 genes in these cancer cells and to look for patterns. Which genes were turned on? Which were turned off? How did they differ from one type of cancer to the next?
In March 2000, Ross and his colleagues reported exciting results. Using their powerful new technique, they could tell one type of cancer from another simply by looking at patterns to see which genes were active and which were silent. The various lung cancer cells included in the NCI-60 had one genetic pattern in common. Prostate cancer cells all shared another. Melanoma cancers had their own unique gene-expression fingerprint. And so did breast cancer cells—well, almost all of the breast cancer cells. MDA-MB-435 didn’t come out looking like a breast cancer. Its gene pattern matched the melanoma cells and “really had nothing to do with the breast cancer cell lines,” Ross told me. “So we repeated the experiment to make sure we didn’t screw it up”—and got the same melanoma pattern. Ross borrowed a different sample of MDA-MB-435 from colleagues at Stanford. Same thing. It was looking a lot like a melanoma. “We just mentioned in the paper the possibility its tissue of origin was misidentified,” he said.
Further investigation has since revealed that the cells are nearly identical to another cell line in the NCI-60, a melanoma cell line called M-14. The NCI put up a note of caution to alert breast cancer researchers that the cell line appeared to be misidentified. Some scientists who had spent many years studying this “breast cancer” dug in their heels. “People were very invested in the tremendous effort they’d put into the cell line,” Ross said. Some developed a convoluted rationale to explain how MDA-MB-435 could still conceivably be breast cancer cells—an argument that holds little sway in the field. “You just shrug your shoulders and say, ‘That seems very unlikely to me,’ but that’s what people want to believe,” Ross told me. Many scientists still don’t realize that this is a melanoma cell line, and they continue to publish “breast cancer” studies based on this skin cancer cell. There are now more than 1,000 papers in scientific journals featuring MDA-MB-435—most of them published since Ross’s 2000 report. It’s impossible to know how much this sloppy use of the wrong cells has set back research into breast cancer.
Christopher Korch was fascinated by this story and spent many weeks, along with Capes-Davis, figuring out exactly what happened. Korch, now retired from academia, spends his energy, like Capes-Davis and Roland Nardone, trying to untangle decades of bungled science surrounding cell cultures. Among other investigative work, he has been trying to figure out whether there were any original, unadulterated breast cancer cells correctly labeled as MDA-MB-435. In the course of that detective work, Korch found a 1979 student dissertation that alludes to a collaboration between Relda Cailleau in Houston and Donald Morton in Los Angeles. Morton had isolated M-14—the melanoma cell line—a year before the thirty-one-year-old patient donated fluid containing her breast cancer cells. Korch suspects that the Houston cell lines got contaminated when Cailleau visited Morton’s lab.
Korch told me that he spends hours every day poring over these old histories—not just for MDA-MB-435 but for many other lines. It’s partly an intriguing detective story, but Korch also wanted to measure the magnitude of the problem. He started with the list of known contaminated cell lines. For example, Intestine-407, which is actually HeLa, has been used in at least 1,300 published experiments. HEp-2, also HeLa, is used in 5,700 papers. All told, he figured perhaps 12,000 papers are based on bogus cell lines. But that’s not the end of it. He estimates that, on average, each of those papers was cited in other papers thirty times. “When you start doing the multiplication, we’re talking about billions of dollars that have been spent using a cell line inappropriately.”
Now, that’s not completely wasted effort. Michael Gottesman at the National Cancer Institute had acquired KB cells in the 1980s from the national cell bank (the American Type Culture Collection) and used them for some of his studies to figure out why tumors develop resistance to anticancer drugs. When KB was unmasked as HeLa, Gottesman wasn’t exactly thrilled, but he says in his case it actually didn’t matter. “We were not particularly interested in the origin of the tumor,” he told me. “We just wanted a cancer cell line. It had the properties we wanted.” They grew fast and were relatively sensitive to anticancer drugs. And Gottesman was able to extract a gene from them to move his studies forward. “Even though there have been problems [with misidentified cell lines] they don’t always torpedo the research,” he said.
Korch agreed that research based on contaminated cells isn’t a total loss. “But how do you sort the wheat from the chaff to find what is usable?” It’s no simple matter even to identify tainted studies. For example, a search for “KB” in medical databases will inundate you with papers that use “KB” as the abbreviation for “kilobase,” a word used all the time in genomics studies. “I don’t see that the literature is going to be cleared up, ever.” Korch said. “It’s a gargantuan task.” After talking for more than two hours about these issues, I asked him if dealing with these problems is a passion or an obsession for him. He paused and stroked his white beard. “Where’s the line?” he replied with a smile. “I suppose I’m more of an obsessive person.”
That obsession ultimately led him to the papers describing Nina Desai’s work at the Cleveland Clinic. Korch couldn’t at first figure out what the cell was, because Desai never called it by name in her published papers. But then he came across an abstract from a scientific meeting in which Desai and colleagues from Emory University referred to a human endometrial cell line by name: EM-42.
Korch eventually tracked down a sample of EM-42 cells and confirmed his worst fears. They weren’t healthy human cells after all—they were HeLa. He couldn’t tell for sure if EM-42 was exactly the same cell line as the one that Desai used in her fertility clinic. Perhaps Desai used another cell line or contamination with HeLa occurred later. Scientists usually resolve these questions by sharing ingredients and talking to one another. But Korch said Desai had not replied to his e-mails and phone calls to resolve the story.
If EM-42 was used and the fertility work had been conducted flawlessly, the membrane would keep the HeLa cells away from the developing embryos. But labs do make mistakes. And the membrane wouldn’t stop snippets of DNA, or indeed viruses, from moving from the cells to the human embryo. Did the scientists at the fertility clinic know they may have been using cancer cells? And what about the parents? Korch asks. Of all the examples of potentially misidentified cell lines, he said, “that is the scariest I’ve seen.”
The use of cancer cells in IVF is also far outside the comfort zone of Andrew La Barbera, speaking as chief scientific officer of the American Society for Reproductive Medicine. He had worked in an IVF lab earlier in his career and would never have considered growing human embryos on a cancer cell line. “We would have considered that to be fraught with hazard.” Regardless of their precise identity, the simple fact that these cells were proliferating endlessly suggested something was wrong with them. “I don’t know how you would prove that any cell line would be harmless.”
Fertility clinics have waded into uncertain waters before. Some doctors tried to help human embryos grow by mixing in cells from monkeys or cows. In 2002, the Food and Drug Administration officially frowned on those procedures, given the potential hazard of a virus infecting the embryo. Desai’s 2008 paper reported that the babies were all healthy. As of this writing, the children would be about ten years old. La Barbera said their mothers should be informed of a potential laboratory mix-up, just in case. But if the Cleveland Clinic and Desai believe that as well, they aren’t saying. Desai would not discuss this story with me, and the Cleveland Clinic would not comment or talk about its processes for protecting the patients in these experiments.
Contaminated cell lines skew many different lines of research—including studies of the brain cancer glioblastoma. More than seven hundred studies report using a cell line called U-373, originally isolated as a glioma cell. For example, a study in Belgium used U-373 cells to argue that an experimental drug called ISO-1 could be worth testing as a treatment for brain cancer. Unfortunately, many of those studies may have been wasted effort. Cell banks like the American Type Culture Collection—which are supposed to be authoritative sources of reliable cell lines—made U-373 widely available for research. But in 1999, scientists discovered that people ordering U-373 were actually getting another cell line, U-251. At first blush this didn’t seem like that big a deal. U-251 also happened to be a glioma cell, so at least scientists using U-373 were still studying the right disease. But in 2014, scientists in Norway took a closer look and were very unhappy about what they saw. It turns out the U-373 cells were not merely mislabeled; they were a strain of U-251 that had been circulating for many years and had accumulated so many mutations and other genetic alterations that they barely resembled glioblastoma at all. There was no telling whether these cells had any relevance at all to human cancer.
Cell banks eventually tracked down an early sample of correctly labeled U-373 cells, and after 2010 they once again made those cells available. But many researchers don’t bother to buy fresh cell supplies from a cell bank when they’re starting a new round of experiments; they may pull out an old supply from a freezer or borrow them from a colleague down the hall. The result? Scientists continue to publish studies all the time involving “U-373” cells, “in which it is not obvious if the authors have used the cross-contaminated U-251 or the correct one,” Anja Torsvik at the University of Bergen in Norway and her colleagues noted in a 2014 article.
More than 1,700 papers have been published using what is arguably the most commonly used glioblastoma cell line, U87. And it turns out that it is troubled as well. Biologists in Uppsala, Sweden, isolated it from a forty-four-year-old woman with brain cancer nearly fifty years ago and managed to grow it as a perpetual cell line in their lab. In 2016, scientists from Sweden decided to compare the original cell line from their freezer with the U87 cells that have been sold by ATCC and used widely around the world. It wasn’t a match. Somewhere along the line, an imposter took over. In fact, the widely used U87 has a Y chromosome, so it appears to have come from a man. Fortunately the imposter is also a brain cancer. But the episode shows that even cell lines that have been validated by cell banks can still be misidentified.
It’s not clear how much value comes from research that relies on cell lines in the first place. Much as scientists appreciate the convenience of studying a disease in a petri dish, the results are often hard to apply to human illness. The very act of propagating cells in the laboratory changes them profoundly. In the first place, the process typically selects for cells that thrive while clinging to a plastic dish in a single layer. The cells are exposed to normal atmospheric oxygen levels, which are about four times higher than cells encounter in a tumor. “A lot of the regulatory factors that affect the growth of tumors is oxygen regulated so it’s a huge difference,” said Michael Gottesman at the NIH. These cells grow far more rapidly than they would in a tumor. In fact, cell lines derived from all sorts of cancers end up looking much more like one another than they do the original tumors from which they came. So the differences between the cell in a tumor and its progeny in a plastic dish can be quite dramatic. “Some people say that HeLa is a new species,” Gottesman told me. “It has lots of human components, but the cell line is so evolved. The chromosomes are all rearranged. It survives in tissue culture. It grows well. So it has made all these changes to adapt” to the environment where it now makes its home.
Gottesman said there is still useful information to be gleaned from cancer cells growing in a plastic dish, but it turns out these experiments typically don’t have much direct relevance to treating human tumors. The NIH established the NCI-60 in the late 1980s with high hopes that drugs showing promise when tested against these cells would also work for tumors in people. “I think it’s been a great disappointment,” Gottesman said. “It basically didn’t pan out.” He and some colleagues looked back on the many years of experiments, searching for a drug that resulted from that massive effort. They found just one: Velcade, a treatment for multiple myeloma (a cancer of immune system cells). A few months after we spoke, the cancer institute terminated the NCI-60 program. It’s launching a new technology it hopes will be better.
It’s easy to avoid the ubiquitous problem of misidentified cell lines. Scientists should simply ship a sample of their cells off to a commercial testing lab before they start their experiments to make sure the cells are what they expect. They should also authenticate their cells the same way after the experiment is done. Scientific funding agencies and journal editors are gradually pressuring scientists to do just that, but as Nardone discovered a decade ago, authorities are reluctant to insist. For one thing, scientists are independent operators and don’t like being told what to do. For another thing, the tests are not free, and even a couple hundred dollars can seem like a lot to a lab struggling to make ends meet. That penny-wise-but-pound-foolish attitude is unfortunately part of the culture of academic science, and as long as the consequences for a scientist’s career are minor, there’s not a great deal of incentive to change.
Many for-profit researchers see things quite differently. They simply can’t afford to be wrong about their cell lines. With that in mind, shortly after scientist Richard Neve arrived at Genentech, a biotech giant on the muddy shore of San Francisco Bay, he soon found himself immersed in a project to make sure the company was using only clean cell lines. He showed me around the sun-drenched campus in early 2015 (before he decamped to a different company) and led me to three gleaming tanks filled with supercold liquid nitrogen. These constitute the Genentech cell bank. Inside are nearly 100,000 plastic vials, each containing a tenth of a teaspoon of frozen cells. Neve said about 1,800 separate cell lines are stored here. Each morning, technicians fill requests from the company’s scientists by fishing out individual vials, scanning their barcodes to make sure that they have the right one, and then sending them along to the research scientists (Neve included) at the company. Anybody at Genentech who wants to use cells has to start here—borrowing from the neighbor down the hall, common in academia, is verboten. This assures that the cell line is correct. The company cell bankers also test regularly for a bacterial contaminant, mycoplasma, which is a major headache. It shows up regularly in academic labs and completely throws off the results of experiments.
“We’re trying to avoid any sharing of cell lines because you just don’t know what you’re getting,” explained Neve, a Brit with a supershort crew cut and the exacting manner of a neat freak. Genentech routinely sends cells out to be authenticated with the standard commercial test. Neve and his team also developed an in-house testing system using a different technology, called SNP analysis, which costs just $6 per sample in ingredients (of course they had to invest in a laboratory instrument as well). Neve takes this issue very seriously. In fact, he is part of the loose-knit Capes-Davis group. He has published data in scientific journals to help scientists identify bad cell lines. The company’s cell-handling operation is highly automated—with robots to move materials around and scanners to keep track of everything. Unlike academics, who are rewarded for publishing intriguing results, companies only benefit if the research ends up delivering a profitable product. Errors cost time, and time is money. And their systems do catch potentially serious problems. “If we [Genentech] are making mistakes,” Neve told me, “God knows what it’s like out there” in the world of academia.
Tests to authenticate cell lines aren’t a panacea. For example, they can’t identify whether a given cell is from a sample of liver, brain, or gut. That’s an increasing concern because scientists are isolating new cell types all the time and using them in place of the more traditional immortalized cancer cells. In addition, there are no routine tests for identifying cells that come from laboratory animals, and those too are being used with increasing frequency. So even if scientists adopt the current authentication technologies, they will need to be developing new tests to keep up with the trends in biomedical research.
And as if cell line problems aren’t bad enough, there’s an even bigger problem involving another ubiquitous laboratory tool: antibodies. Monoclonal antibodies are custom-built molecules designed to identify and glom onto specific materials inside cells. When they work, they’re great: just as your natural antibodies can target a single molecule on the surface of a specific germ, laboratory-produced antibodies are supposed to work like guided missiles, to home in on a specific substance. They often carry a fluorescent marker so biologists can easily flag the material they’re looking for. Antibodies are quite reliable in some circumstances—for example, in early pregnancy tests, where they tag a hormone that’s produced during pregnancy. But far too often, the 500,000 antibodies marketed for research by a multi-billion-dollar industry don’t work as advertised. Many are produced by injecting the targeted substance into a rabbit and collecting the antibodies that result—a technique highly prone to providing misleading results. Biomedical researchers have been slow to grasp just how big a problem this is.
Stan Artman’s story shows what’s at stake. In the fall of 2011, the fifty-two-year-old Atlanta resident looked down at his leg and saw a black spot. “I had been golfing that day, and I was in the woods chasing balls. I thought maybe I’d picked up a tick.” He tried a couple of tick-removal tricks but the black spot remained. His wife, as it happens, is a dermatology nurse, and she sent him off to a doctor for a closer look. As Artman tells the story, the dermatologist at Emory University wasn’t sure what the spot was. Artman decided to get it removed just in case. Pathologists still couldn’t figure out whether it was worrisome, so they sent a sample off to the University of California, Los Angeles, for further analysis. The word came back: it was probably melanoma, a potentially lethal form of skin cancer.
Melanoma poses a real conundrum for patients, particularly less advanced cases like Artman’s. Surgery can be curative, but patients can also improve their odds a bit if they sign up for a year of unpleasant and sometimes debilitating interferon treatment. For Artman’s stage of disease, it only benefits about one in every thirty patients who take the plunge, and there’s no way of knowing in advance who that one will be. Instead, Artman’s doctors recommended that he try an experimental antimelanoma vaccine and have more extensive surgery to remove lymph nodes where the cancer could conceivably be hiding. Watchful waiting was an option, but he said doctors told him, “If you’re the type of person who a year from now developed a mass in your lung, you might feel you’ve missed your golden moment to get everything out.”
He opted for surgery. A few days later, he noticed with alarm that the surgery site had turned bright red. It was “a heartbreaking moment, of ‘now what,’” he said. “What the hell is this?” It turned out to be an infection that sent him back to the hospital for nine days of intensive antibiotic treatment. So even the seemingly conservative step of surgery carried risks, while the rewards were very uncertain.
“There’s this big gray area when it comes to melanoma,” Artman said. A blood test to guide some of these difficult decisions could have given him some idea of whether he would remain cancer free. “It would take some of the question out of it,” he said. “It wouldn’t take the worry out if I came back positive, but I’d feel better knowing something was definitely one way or the other.”
David Rimm at Yale University was apparently well on the way to developing a test like that, and he might have succeeded had he not had a very unfortunate encounter with bad antibodies. A renowned pathologist, Rimm realized he might be able to develop a test using commercially available antibodies to identify melanoma patients who would benefit from additional, aggressive treatment. So he collected tissue samples from about two hundred patients, some with metastatic melanoma and some with less aggressive disease. He then tested out about eighty different antibodies purchased from various companies to see if an antibody combination would identify patients more likely to benefit from interferon, more surgery, or other potentially risky treatments. The antibodies were all directed at known features of melanoma cancer cells. No single antibody provided a strong signal, but Rimm found that if five specific antibodies all found their targets and lit up, that pattern appeared to be a strong predictor of patients who would most benefit from aggressive treatment. “So we had ourselves a test. We were psyched.”
Rimm wanted to make sure he got the same results with a second group of patients, so he applied for two grants from the NIH to continue the work. Reviewers were ecstatic with the proposal, giving it a top score. Rimm got two grants for $1 million each to pursue the work. He ordered fresh antibodies from the suppliers to start his confirmatory experiments. That’s when the project started to unravel. When he ran the same tests on a different sample, three of the five antibodies still lit up as expected. But the other two did not. And the three alone weren’t reliable enough on their own as a test for cancer patients. What went wrong? “To this day we don’t know what it was,” Rimm said. His best guess was that the new batch of antibodies he ordered weren’t exactly the same as the initial batch. The reason could be as simple as that the first batch of antibodies had come from one particular rabbit and the second from another. Whatever the cause, it was a fatal error for his would-be test.
After many years of increasingly excited effort and millions of dollars in research funding, the whole thing fell apart. It still seemed like a great idea, but “we would’ve had to start from scratch to reinvent our test,” Rimm said. He was demoralized and defeated. “I didn’t see how we could possibly fund this work again.” He could just imagine what the review committee would say if he submitted another funding request: You tried. You failed. It’s over. So he stopped working on melanoma altogether.
Rimm said he had not realized how unreliable antibodies could be. He had assumed that they were as trustworthy as anything else he bought from a lab supplier and that he could simply believe what he read on the label. But, especially when produced in live animals, antibodies are anything but dependable. While in theory they bind to a unique site, they may also glom onto several different proteins, not just the one that scientists expect. And it’s not always a simple matter to identify these “off-target” effects. As a result of this painful experience, Rimm has become an evangelist for cleaning up the mess with antibodies. And it’s a big mess. Glenn Begley said faulty antibodies were apparently responsible for a lot of the results he was unable to reproduce at Amgen.
Rimm’s experience is just one example of the trouble with antibody tests. Another mix-up cast doubt on the very existence of a hormone that helps burn fat when people exercise. “Irisin,” discovered in 2012 by Bruce Spiegelman and his colleagues at Harvard, appeared to turn standard body fat into “brown fat,” which actively burns calories and may play a role in weight loss. Of course, anything to do with a potential fat-burning pill is immediately interesting. Scientific supply companies sprang into action and started selling antibodies that they said were specific for identifying irisin. Soon dozens of scientific researchers were using those antibody tests to see whether exercise, diet, or even Turkish baths would affect irisin levels in people.
Harold Erickson, a biochemist at Duke University, became skeptical about irisin. Though it was not his field of study, he was drawn to what he saw as inconsistencies in the story. He teamed up with a scientist in Germany who had tried—and failed—to find irisin in horses that had just finished heavy bouts of exercise, which is exactly when biologists would expect to find the hormone circulating in the blood. Erickson ordered some of the antibody kits that target irisin and concluded that they might not do so at all. He published a paper that went so far as to call irisin a “myth.” (Irisin is named for a Greek goddess, so wordplay was irresistible.) “That was pretty aggressive” as a choice of words, Erickson admitted to me. He also encouraged his university news office to put out a fairly aggressive press release, touting his assertion that irisin may in fact not exist.
That infuriated Bruce Spiegelman at Harvard. After the public challenge, he and his colleagues went back to run a new set of experiments that were much more sensitive than the off-the-shelf antibody tests. They published a follow-up paper announcing that they did indeed see irisin circulating in human blood. The levels were just very low. That’s not unusual for hormones, which can be active in small concentrations. But those low levels can skew antibody tests.
Erickson argued that irisin’s apparent blood levels are so low that commercial antibody kits shouldn’t be able to detect it. (Indeed, at least one company withdrew its product from the market during this flap.) I asked Spiegelman whether he thought those kits were valid. “I have no idea,” he said. “That’s somebody else’s problem. We’ve never used them.”
Though he still harbored doubts, Erickson backed away from his claim that irisin doesn’t exist. But the entire episode has left the field in a bit of turmoil. Only a few researchers in the United States have received federal grants to fund work on it, even though, if it pans out, it could have big consequences for obesity, diabetes, and other major diseases. Researchers overseas are still publishing on it and still using the disputed antibody test kits. Drug companies could also be working quietly on irisin, with an eye toward a weight-loss pill. “They wouldn’t necessarily tell me,” Spiegelman says. Even though they would ultimately have to license Spiegelman’s patent for irisin, he says they wouldn’t bother to ask until they were ready to try the drug in people.
Spiegelman was quite confident of his results, but it will take some time for the dust to settle, as scientists argue the facts of this provocative finding. He said that’s simply a consequence of working on the cutting edge of science. “If you don’t want to deal with these things, work on something that’s been done 100 times.”
Often scientists either don’t realize that they’ve run into problems with their antibodies or simply heave a sigh and move on to a different project. But David Rimm decided to call attention to bad antibodies by publishing a paper flagging this as a major problem. The paper has been downloaded more than 35,000 times, “so there’s hope that people actually want to do something about this,” Rimm told me. He’s been working on a solution. He led me upstairs to one of the labs he oversees. Here, scientists were at work constructing microscope slides, called index arrays, dotted with tissue samples smaller than the period at the end of this sentence. One slide had nearly one hundred dots. The idea is that each slide can be a miniature laboratory to validate antibodies. Put a drop of antibody on each slide, and some of the dots should light up; others should not. Dots with a small amount of the target material should light up a little; dots with a lot of it should light up brightly. Rimm doesn’t trust his own eyes to tell the difference (that could lead to observer bias), but there are instruments that can measure all this precisely. “When you use people [to eyeball a sample], you’re just destined to fail,” Rimm said. “I like people. I am one. I like pathologists. I am one. But it’s the wrong tool for the job.”
He helped develop and commercialize some of those precision instruments that are now sold to research labs. Rimm is also trying to get antibody companies to construct and sell index array test slides as well. That would go a long way toward solving the antibody problem, but at a high cost: his lab spent more than two months and $10,000 to produce a slide that would check the performance of a single antibody. To make a slide like that for every one of the 500,000 antibodies out there would run into the billions of dollars.
In 2014, Rimm raised all the problems with antibodies at a scientific meeting at the St. Johns Laboratory in London. After he spoke, John Mountzouris, from a big antibody company called Abgent, said that his company had recently run a very basic test of all 18,000 antibodies in its catalog. “As a result several thousand antibodies were discontinued immediately,” Mountzouris told the gathering. It was a moment of reckoning for his firm. “The revenue did decline from antibody sales. We hope that’s short term,” he said. But he assured higher-ups in the company that their scientist-customers will eventually realize that they’re better off buying validated antibodies than going to any of the dozens of other companies that sell products with similar labels but less rigorous verification (often at a lower price).
Not only did the company whittle down its existing catalog; it dramatically slowed the rate of introduction for new antibodies, from 4,000 or 5,000 to 1,000 per year, Mountzouris told the meeting attendees. “That will allow us to produce antibodies that we will be proud to put out there for people to buy.”
The British company Abcam is by far the largest in the business (though because there are so many players, it accounts for only 20 percent or so of the market). It, too, has been amping up its systems to weed out untrustworthy antibodies and to sell based on quality rather than price. Abcam officials told me the company has taken basic steps to validate almost all of the antibodies it sells. In 2015 Abcam also started screening antibodies by removing the purported target of the antibody from cells, using a gene-editing technology called CRISPR. If those antibodies still light up when exposed to cells that don’t contain their intended target, they are obviously not working as expected. The company plans to validate five hundred antibodies a year this way. The first several hundred tests showed that 60 to 70 percent of the antibodies passed this stringent test, Alejandra Solache, Abcam’s head of reagents, product development, and manufacturing, told me. That means, of course, that the remaining 30 to 40 percent of these products—many of them popular antibodies used widely in research—weren’t up to snuff. “When something doesn’t pass we will remove it from the catalog and will notify the customers that basically this particular antibody is not specific for this protein,” Solache said. The company sells well over 100,000 products, mostly antibodies, so this expensive and time-consuming validation effort will only apply to a tiny percentage of its products for years to come.
Antibody companies like Abcam that are willing to invest in improving the quality of their products can help put a dent in the reproducibility problems brought on by poor antibodies, but Bill Campbell, Abcam’s general manager for the Americas, said commercial enterprises can’t solve the whole problem. When he worked in a lab, he said, he always ran control experiments to assure himself that he wasn’t falling victim to imperfect materials. “Scientists also need to make sure they do the right controls and not try and advance things too quickly by taking shortcuts,” he said.
It would help if scientists working with antibodies could refer to standard laboratory procedures to help them avoid some of the common pitfalls. David Rimm is pushing for that now. Unfortunately, standards for antibodies aren’t nearly as straightforward as standards for authenticating cell lines. Experiments involving antibodies are more varied, so one size does not fit all. And as the story of cell lines makes clear, simply having a standard isn’t enough. Most scientists must be coerced by funding agencies or their employers to run these tests.
Assuming scientists do step up to take on the problems of authenticating cell lines and antibodies, that could address perhaps a quarter of the problems underlying rigor and reproducibility issues. That’s a big chunk, to be sure. Improving experimental design would further reduce these unforced errors in science. But carefully designed and executed experiments are still worthless unless their results are analyzed with care. That’s the next crucial link in the chain of scientific rigor.