________________________________________
That night, after returning to my little room at Cold Spring Harbor Laboratory, I lay down on the bed and stared at the ceiling. So far, I had had a nice—one might even say somewhat distinguished—career. I had a permanent research position with solid funding, was doing interesting projects, and got invited several times a year to give talks around the world. Now I had really stuck my neck out, publicly promising to sequence the Neanderthal genome. If we succeeded, it would clearly be my biggest achievement to date; but if we failed, it would be a very public embarrassment, almost surely a career-ending one. And I knew that succeeding would not be as easy as I had made it sound in my talk. We needed three things to succeed: many 454 sequencing machines, lots more money, and good Neanderthal bones. We had none of them, but fortunately no one else seemed to realize this. I knew it only too well, however, and I lay in bed a long time, with all the things we needed to make the project possible running through my head.
The first priority was access to lots of sequencing machines from 454 Life Sciences. An obvious move would be to visit Jonathan Rothberg in Branford, Connecticut, which was not far from Cold Spring Harbor. Next morning at breakfast, I collected the key people involved in our Neanderthal work, all of whom were at the meeting: Ed Green, Adrian Briggs, and Johannes Krause. After breakfast, we jumped into my rental car and took off for Branford. I have a deplorable tendency to pack too many commitments into too little time and as a consequence am chronically late for appointments, flights, and other scheduled activities. This outing was no exception. As we drove toward Port Jefferson on northern Long Island, we realized we were probably going to miss the ferry across the Long Island Sound to Bridgeport. As it happened, we were the last car to squeeze aboard (in fact, the rear of the car jutted out over the water as we steamed across). I hoped this close call was a good omen.
This was the first of what would be several visits to 454 Life Sciences. Jonathan Rothberg was just as intense and full of maverick ideas in person as he had been on the phone. For balance, there was Michael Egholm, the practical-minded Dane concerned with reality checks and getting things done. As the project progressed, I came to appreciate both men immensely; between Jonathan’s vision and drive and Michael’s down-to-earth practicality, they made a terrific pair. Our discussion that day was dominated by what it would take to sequence the Neanderthal genome. It was clear that we would apply the “shot-gun” technique that Craig Venter had introduced and used in his bid to sequence the human genome at his company Celera. This approach involved sequencing random fragments and then putting them together computationally by looking for overlaps between fragments. One major complication involves repetitive DNA sequences in the genome; such sequences make up about half of the genomes of humans and apes. Most of these repetitive sequences are a few hundred or even thousands of nucleotides long, and many occur not only a few but many thousands of times across the genome. Therefore, shot-gun approaches typically use not only short DNA fragments but also longer fragments so that one can “bridge” such repeat sequences with fragments that “anchor” it in single-copy sequences on each side of the repeat. This makes it possible to know where each repeat element sits in the genome. But our ancient DNA was already broken down into short pieces. Therefore, we planned to use the human reference genome (the first human genome, sequenced by the public genome project) as a template for reconstructing the Neanderthal sequences. But while this should work for DNA sequences that occurred a single time in the genome, we could not hope to determine the sequences of all the repetitive parts. To me, it seemed a small sacrifice: the single-copy sequences tend to be the most interesting parts of a genome as they contain the most genes with well-known functions.
We also needed to decide how much of the genome to sequence. Before visiting 454 Life Sciences, I had decided to sequence about 3 billion nucleotides from our Neanderthal bones. This goal was dictated mostly by what I thought was possible, and also because it was approximately the size of the human genome. The fragmented nature of the ancient DNA meant that we would get sequences of many bits of the genome just once; other bits twice, from two independent fragments; others three times; and so on. It also meant that there were many parts of the genome we would not see at all, simply because no DNA fragment we sequenced would happen to include them. Statistically, we could expect to get two-thirds of the entire genome at least once and so fail to see about one-third. In genome-speak, this is known as 1-fold coverage, since statistically each nucleotide has the chance to be seen once. I felt that 1-fold coverage was a feasible goal, and one that would provide a good overview of the Neanderthal genome. Importantly, the resulting genome would be a stepping stone of sorts. Future sequences, derived from other Neanderthals, could be put together with ours to arrive at higher “coverage” until eventually all of the genome, at least the parts that were not repetitive, had been seen.
The goal I had set ourselves was thus somewhat arbitrary. Compared with sequencing efforts expended on present-day genomes, it was also rather humble, as those other projects aimed for 20-fold coverage or more. However, the task was still monumental. Our very best extracts contained just 4 percent Neanderthal DNA. I counted on finding more such bones and hoped that some would contain even a bit more Neanderthal DNA, assuming that if the average stayed at 4 percent, to get our 3 billion nucleotides we would have to generate some 75 billion nucleotides in all. And since our fragments were short, 40 to 60 nucleotides on average, this added up to between 3,000 and 4,000 runs on the new sequencing machines. It was the equivalent of devoting the entire facility at 454 Life Sciences to the Neanderthal project for many months—and at normal customer prices, it would be impossible for us even to contemplate.
Ed, Adrian, Johannes, and I discussed all this with Jonathan and Michael. The project clearly had appeal not only to Jonathan but to 454 Life Sciences as a company, because of its potential both to provide truly unique insights into human evolution and also, more pragmatically, to give 454’s technology even higher visibility. I gladly agreed that the company people would be real scientific partners as well as co-authors on future publications with us, but that didn’t mean we could do sequencing for free. Finally, we arrived at a price: $5 million. I could not decide whether that was good or bad news. It was more money than I had hoped to pay, but not a completely outlandish sum. We said we would go back home and think about it.
After the negotiations were done, Jonathan offered the four of us take-out sandwiches and sodas and then asked if we wanted to see his house before we headed back to the meeting at Cold Spring Harbor. We agreed. After our late lunch, we followed him home. I had grown up in humble circumstances, and my mother, a refugee from the Soviet invasion of Estonia at the end of World War II, had transmitted a highly pragmatic outlook to me. As a result, I am not easily impressed by luxury. But the visit to Jonathan’s place turned out to be very memorable, even though we never got to see his house. Instead, we visited the grounds where he lived on a peninsula in Long Island Sound. On the beach, he had built an exact replica of Stonehenge—exact, that is, except that it was made of Norwegian granite and therefore heavier than the original, and it was slightly modified to account for how the sun would fall between the stones on the birthdays of his family members. As we walked among the huge monoliths, Jonathan turned to me and said, “Now you probably think I’m crazy.” I of course said no, but not only out of politeness. I really didn’t think Jonathan was crazy. He was deeply fascinated by ancient history, and more important, he had big ideas and was able to turn his dreams into reality. His Connecticut Stonehenge was, I thought, another good omen for our undertaking.
The next day, back at Cold Spring Harbor, I could not concentrate at all. Five million dollars was a lot of money, about ten times as much as a big research grant in Germany. The Max Planck Society provides generous funding to its institute directors so that they can concentrate on research and not on grant writing, but $5 million was still a much higher amount than the entire yearly budget for my department. I worried that we would need to turn this project over to some genome center, just because we didn’t have the money. Then I remembered Herbert Jäckle, the developmental biologist who had helped persuade me to move to Germany in 1989 when he was professor of genetics in Munich. He, too, had moved to a Max Planck institute—the Institute for Biophysical Chemistry in Göttingen—and had again played an important yet unofficial role in getting me to move, in 1997, from Munich to Leipzig to join in establishing the Institute for Evolutionary Anthropology. In fact, ever since I had come to Germany, when I had faced crucial turning points in my scientific life, Herbert had always been there with support and advice. Now he was vice president of the biomedical section of the Max Planck Society. Fortunately, the society is a research organization where scientists, such as Herbert, rather than administrators or politicians are in charge. That very afternoon I decided to call him from Cold Spring Harbor.
I don’t call Herbert often, so I think he realized that this was a matter of some import and urgency. When I got him on the line, I described our calculations on the feasibility of sequencing the Neanderthal genome and the cost, and I asked if he had any advice on how one might raise that much money in Europe. He said he would think about it and get back to me in a few days. I flew back to Leipzig the next day, torn between hope and despair. Perhaps we could find a rich benefactor, but how do you find one of those?
Two days after my return, true to his word, Herbert called me. The Max Planck Society, he said, had recently set up a Presidential Innovation Fund to support extraordinary research projects. He had discussed our project with the society’s president, and the society was ready—in principle—to support us with the requested funds, distributed over three years. They had even reserved the money, pending a written proposal, which would need to be reviewed by experts in the field. I was totally taken aback; I don’t recall even thanking him properly before hanging up. This money made all the difference in the world! I ran from my office into the lab and babbled the news to the first people I met. Then I immediately sat down and started drafting a proposal describing the results and calculations that had convinced us we could sequence the Neanderthal genome within three years, given sufficient resources.
At the proposal’s conclusion, I had to present a financial plan. When I started working it up, something extremely embarrassing dawned on me. I had called Herbert from the United States and said we needed “5 million” for the project, thinking in terms of US dollars. Herbert, in Europe, must have thought I meant 5 million euros. He may even have said that the Max Planck Society had reserved “5 million euros” for our project, but I had been too excited to register it. At the exchange rate, then, that amounted to US$6 million. What to do? Perhaps I could quietly increase the budget to accommodate the additional 20 percent in funds—but that would be disingenuous and might even be noticed once we signed a contract with 454 Life Sciences. I called Herbert and, with considerable embarrassment, explained the situation. He laughed. Then he asked whether we might not have extra costs in Leipzig, above what we were to pay 454 Life Sciences for sequencing. Of course we would. We would have to extract DNA from many fossils to find good ones, and test them all by making sequencing runs from them ourselves. So we needed to buy our own sequencing machine from 454 for testing all the extracts, and we needed reagents to run it. With the difference the exchange rate made, we could really make the project fly. I was elated and wrote a plan that included the work that would be done at our institute in Leipzig.
Meanwhile, Eddy Rubin’s group in Berkeley had made a bacterial library of the entire Neanderthal extract we had sent them. Jim Noonan, Eddy’s postdoc, had sequenced every drop. What they had retrieved amounted to a bit over sixty-five thousand base pairs. In Branford, they had used about 7 percent of the extract we had sent them and produced about a million base pairs. So, just as Adrian had predicted, the direct-sequencing approach was about two hundred times more efficient in generating DNA sequences from an extract. Eddy insisted nonetheless that his method could be more efficient, and that we should continue to send extracts to him. This was a fundamental disagreement. I realized with some disquiet that I could no longer in good conscience send extracts to Berkeley, when we could generate so much more data from each extract in Branford. But I put the decision off, thinking that it would become obvious to Eddy that the bacterial cloning was inefficient once we wrote up a manuscript describing the results of the two different approaches.
However, by this point it was impossible to conceive of a way to write just one paper, given the use of two completely different methods, the tremendous difference in the amounts of data generated, and the disagreement with Eddy about the viability of the bacterial-library approach. So we decided to write two papers. One was to be written by Eddy with us as co-authors, the other by us and Michael Egholm, Jonathan Rothberg, and the others at 454. Eddy’s paper stated: “The low coverage in library NE1 is more likely due to the quality of this particular library rather than being a general feature of ancient DNA,” suggesting that if one assembled more libraries, better results would be achieved. Given that the earlier cave-bear libraries had been just as inefficient, I disagreed with the assessment, but we stayed civil. Eddy submitted the paper in June to Science and it was accepted in August. Because we had much more data to analyze for the 454 paper, we couldn’t submit our paper until July to Nature. Eddy graciously arranged with Science to delay publication of the cloning paper until the paper with 454 Life Sciences had been reviewed and accepted in Nature so that the two papers could appear in the same week.
While this was going on, we began to prepare for what we hoped would be the production of large amounts of Neanderthal sequences. The first thing I did was to arrange production of 454 sequencing libraries in our clean room in Leipzig so that the precious, contamination-prone DNA extracts would not have to leave our laboratory. I also used a chunk of the new money to order a 454 sequencing machine so that we could test the libraries. Then Michael Egholm and I worked out a plan. We would make DNA extracts from bones, produce 454 sequencing libraries in our clean room, and use our new 454 sequencing machine to test the libraries. When we identified promising libraries, we would send them to Branford for production sequencing. The sequencing would be done in stages, and we would pay in installments once a certain amount of Neanderthal nucleotides had been sequenced. The latter was my suggestion, and I was amazed that 454 agreed to it, given that our earlier work together had shown that the best library so far had contained only 4 percent Neanderthal DNA and 96 percent assorted unwanted DNA of bacterial, fungal, and unknown origin. We did not yet know what percentage of Neanderthal DNA would be in the libraries we would produce. If it turned out to be 1 percent instead of 4 percent, then 454 would have to sequence four times as much to get its money, since the contract stipulated the number of Neanderthal nucleotides sequenced, not the total number of total nucleotides (which would include all the bacterial ones). Neither the scientists at 454 nor their attorneys who looked at the contract before it was signed appeared to take any notice of this. In a sense, it didn’t matter, since there was a clause that allowed either party to get out of the collaboration at any time. We were obviously not going to be able to force 454 to sequence forever against their will. But it still seemed a much better contract than one that stipulated that the company would sequence a certain amount of raw nucleotides for us, irrespective of whether these were microbial or Neanderthal in origin.
I felt very good about the collaboration with 454. We complemented each other’s strengths excellently, and the people at the company were fun and easy to talk to. However, one difference between us was that 454 was under great pressure to establish itself in an emerging market for high-throughput sequencing technologies that was clearly going to become very competitive. Already, two other big companies had announced their intention to start selling high-throughput sequencing machines. 454 therefore wanted positive publicity about their involvement in the Neanderthal project, and they wanted this publicity not in two or three years, when the Neanderthal genome would presumably be sequenced and published, but as soon as possible. Just as Michael Egholm took our concerns and priorities into account, I wanted to take their priorities seriously. So when the contract was signed with 454, we allowed them to arrange a press conference in our institute in Leipzig on July 20, 2006, shortly after we had submitted our joint paper to Nature. Michael and another senior executive from 454 flew in for the event. We also invited Ralf Schmitz, the curator of the Neanderthal type specimen who had given us samples from the Bonn museum in 1997. He brought along a copy of the Neanderthal bone from which we had determined the first Neanderthal mtDNA sequences. We wrote a press release that pointed out that we were putting together the methods for ancient DNA analysis that our group had developed over many years of painstaking work with 454 Life Sciences’ novel high-throughput sequencing technology to analyze the Neanderthal genome. We also mentioned that, by coincidence, we announced this almost exactly on the day 150 years after the first Neanderthal fossil was discovered in Neander Valley.
The press conference was an electrifying event. The room was full of journalists, and media from across the globe followed it via the Internet. We declared that we would determine about 3 billion Neanderthal nucleotides within two years. It was wonderful to contemplate that what I had started secretly in the lab in Uppsala more than twenty years earlier, afraid that my PhD supervisor would find out what I was doing, had developed into this. It was a heady time.
It was also a time of great scientific and emotional ups and downs. About a month after the press conference came a definitive down. The two papers led by Eddy Rubin’s and our group were not yet out, but we had already shared our 454 Neanderthal data with Jonathan Pritchard, a young and brilliant population geneticist at the University of Chicago who had helped Eddy analyze his smaller data set of cloned Neanderthal DNA fragments. We received an e-mail from two postdocs in Pritchard’s group, Graham Coop and Sridhar Kudaravalli. They were worried about patterns they saw in the 454 data: in particular, there were higher numbers of differences from the human reference genome in the shorter DNA fragments than in the longer DNA fragments. Ed Green in our group quickly confirmed that they were right. This was worrying. It could mean that some of the longer fragments were not from the Neanderthal genome but represented modern human contamination. I e-mailed Eddy, telling him that we saw some worrying patterns in the 454 test data. We agreed to send our data to Eddy’s group in exchange for their data. After the exchange of data, Jim Noonan in Eddy’s group quickly e-mailed back and said that he saw what we and the Chicago postdocs had already seen in the 454 data.
It seemed that we might have to rewrite or withdraw our Nature paper, which was already in press, and I e-mailed Eddy, saying that we would try to figure out what was going on as fast as we could in order not to hold up his paper. Back when I was a postdoc in Allan Wilson’s lab, we had once withdrawn a paper that Nature had already accepted because we had found that we had made a mistake in the analysis that changed the main conclusions we presented. I worried that we would have to do this again.
There was now frantic activity in our group. It was not unreasonable to assume that the patterns Jonathan’s group saw were due to some level of contamination, but it was not straightforward to come up with an estimate of how much contamination there might be. It would have been an error to simply assume contamination was the problem, however. We were acutely aware that we did not understand many aspects of how the short, damaged ancient DNA sequences behaved in comparison with the human reference genome. Perhaps other factors than contamination were at play? Unfortunately, we needed to act fast as our paper was already in press and Eddy was eager to publish his paper.
Ed had noticed that the shorter Neanderthal fragments in our 454 data contained more G and C nucleotides than the long ones. G and C nucleotides tend to mutate more often than A and T nucleotides, so this could lead to more differences between present-day humans and Neanderthals in the short (and GC-rich) sequences than in the long (and AT-rich) sequences. To test this, Ed matched up short and long Neanderthal fragments to the corresponding sequences in the human reference genome and compared those sequences in the reference human genome to those from other present-day humans. Although those comparisons did not include any Neanderthal sequences at all, they nonetheless showed that the human sequences corresponding to the shorter Neanderthal sequences had more differences from other human sequences than the longer ones. This observation suggested that the GC-rich sequences simply mutated faster, so maybe it would account for the higher number of differences seen in the shorter sequences. Before we could be certain, however, other factors also needed to be considered, especially the way in which we mapped Neanderthal sequences to the human reference genome sequence. Ed noticed that longer fragments of Neanderthal DNA had a better chance of being matched in the correct position in the human genome than shorter fragments, simply because they contained more sequence information. Therefore, a higher percentage of the short fragments might actually be bacterial DNA fragments that just happened to be similar to some part of the human reference genome. This, then, also might contribute to the observation that the shorter fragments contained more differences from the human reference genome. Such a phenomenon might have been overlooked in other ancient data sets—for example, the mammoth data, where fragments were on average longer. But I felt very uneasy. It seemed that every day we uncovered new things about how short and long DNA fragments differed in terms of how they behaved in our analyses. Obviously, we did not understand everything that was going on. What’s more, we still hadn’t excluded the possibility that our samples were contaminated by modern human DNA.
We had, of course, considered the possibility of contamination from the outset. In the extracts we sent to Eddy and to 454, we had assayed the level of contamination based on mtDNA and found it to be low. We knew that contamination could have entered the extracts once they had left our laboratory; we had even put a caveat about this in our Nature manuscript. I felt strongly that the only solid assay for contamination we had was the one based on assessing the observed mtDNA fragments, since the mtDNA was the only part of the genome where we knew about differences between Neanderthals and modern humans. Everything else was influenced by imponderables, such as differences in GC content, differences in mismapped bacterial DNA fragments, and other unknown factors. So I argued that we should look again at the mitochondrial DNA in the sequences that had been determined by 454.
In 2004, we had sequenced a part of the mtDNA from the very same Neanderthal bone, Vi-80, from which we had prepared test extracts for 454 and Eddy’s group. I suggested that we should look among the sequences we had gotten from 454. Surely some of those must overlap nucleotide positions that differed between this particular Neanderthal individual and present-day humans. This would tell us which fragments were unambiguously of Neanderthal origin and which were of modern human origin and would enable us to estimate directly the level of contamination in the actual final 454 data set. Frustratingly, Ed found that we did not have enough data in hand to do this. The sequences done by 454 contained only forty-one mtDNA fragments and none of them came from the part of the mtDNA genome that we had determined earlier from this or other Neanderthals. We checked the Berkeley data, but they were so scant that not even a single mtDNA fragment had been observed.
Happily, there was a solution: we had so much library left that we could simply sequence more DNA fragments. This should then yield fragments that could tell us whether we had contamination in the library or not. I contacted 454 and convinced the people there to quickly do more sequencing. They did six more runs in record speed, and as soon as the data were transferred to our server, Ed found six fragments that overlapped positions in the variable part of the mtDNA we had sequenced in 2004.
All six fragments matched the Neanderthal mtDNA and differed from present-day human mtDNA! These were direct data suggesting that we had very little contamination in our sequences. Interestingly, these molecules, although clearly ancient, were not particularly short; four of them were 80 or more nucleotides long. This suggested that truly ancient DNA fragments were present also among the longer DNA fragments. Thus it was likely that the differences seen between short and long molecules were due to factors other than contamination. Ed was so elated that he ended the e-mail to the group describing these results with “I could kiss every one of you.”
We decided to go ahead with the Nature paper. Susan Ptak, a population geneticist in our group, sent a long technical e-mail to Eddy and Jim Noonan explaining why we felt that comparisons between long and short sequences were influenced by too many factors both known and unknown to represent strong evidence of contamination and explained why we trusted the direct mtDNA evidence more. She wrote: “Although there is indirect evidence which suggests some level of contamination, we now have a direct measure of the contamination rate in the final data set, which still suggests it is low.” We received no reply to this e-mail. Given the rather tense relationship that had developed between our groups, we did not find this too surprising.
This was a tremendously stressful incident. Ironically, as it turned out, both Eddy and we were right. The future would show that the data generated at 454 did contain contamination, but also that the indirect ways of detecting contamination via comparisons of long and short fragments were largely inadequate.
The two papers were published in Nature and Science on the 16th and 17th of November.{48} There was the predictable excitement in the press, which I had by now gotten used to. In fact, I was much more preoccupied than excited. We had promised the world that we would sequence 3 billion base pairs of the Neanderthal genome within two years. Our paper ended with an estimate of what this would require—namely, about twenty grams of bone and six thousand runs on the 454 sequencing platform. We said that this was a daunting task, but added that technical improvements that would make the retrieval of DNA sequences on the order of ten times more efficient could “easily be envisioned.” The improvements we had in mind involved losing less material when making libraries for sequencing and taking advantage of secret future improvements to the 454 machines that Michael had revealed to us.
Things were looking up, but a major challenge still remained: finding good Neanderthal bones. The truth was that we did not have anywhere near twenty grams of Neanderthal bone of the quality of Vi-80, the bone we used in the test runs for the two papers. In fact, the piece we had left from Vi-80 weighed less than half a gram. I optimistically told myself that since one of the first Vindija bones we tried contained almost 4 percent Neanderthal DNA, surely we would find others that were equally good. Perhaps we would even find some that were better. I had to turn my full attention to this problem as soon as possible. First, however, I had to undertake a more unpleasant task: ending the collaboration with Eddy Rubin.
Terminating a scientific collaboration is often difficult, and it is even more so when a collaborator has become a personal friend. I had stayed with Eddy’s family in Berkeley; we had biked up the hills to his lab together; we had gone together to the theater in New York during Cold Spring Harbor meetings. I had always enjoyed his company. So I long pondered my e-mail to Eddy and wrote several drafts of it. I explained how I differed with him on bacterial cloning’s usefulness, and said that I felt that our communication, particularly on this point, had not been productive. I also noted that it now seemed that his group was trying to do the same things our group was trying to do, rather than working in a complementary way. For example, in our phone conferences, they had suggested that we send them our DNA extracts and the PTB reagent we had synthesized so that they could treat our extracts with our PTB. Neither I nor my group had been thrilled by this notion. I hoped I had expressed my reasons for not working together in a way that wasn’t hurtful or insulting, but it was still with some trepidation that I sent the e-mail. Eddy answered that he saw my points but that he continued to believe in the future potential for improvements and utility of bacterial libraries. I was relieved that he had taken my letter graciously, but we were now, clearly, competitors rather than collaborators.
The competition became apparent almost as soon as I turned my attention to the procurement of Neanderthal bones. Eddy was trying to obtain them too, I discovered, and from many of the same people we had worked with for years. In fact, I found out that already back in July, Wired magazine had published an article about Eddy’s Neanderthal efforts. The Wired piece ended with a quote from Eddy: “I need to get more bone. I’ll go to Russia with a pillowcase and an envelope full of euros and meet with guys who have big shoulder pads. Whatever it takes.”