16
At a ceremony on 26 June 2000 a group of distinguished scientists and public figures gathered in the East Room of the White House to hear President Clinton announce the completion of the human genome DNA sequence. It was a day of celebration and optimism after a long and sometimes unseemly race between the public and private sectors to complete the sequence. The publicly funded initiative was led by Francis Collins, director of the Human Genome Project, while the private effort was represented by Craig Venter, president of the privately funded biotech company Celera Genomics. The celebration was well deserved. Scientists from many countries had contributed to revealing, in their correct order, the 3 billion DNA units that make up the human genome.* The sense of optimism was tangible but, unfortunately, less well founded.
President Clinton, in his address, predicted, ‘It will revolutionise the diagnosis, prevention and treatment of most if not all human diseases.’ An armada of projects committed to turning this dream into a reality was launched around the world. The possibility of failure was never contemplated when President Clinton predicted that, in the near future, cancer would be known not as an incurable disease but ‘only as a constellation of stars’ in the night sky. Tragically, this has not turned out to be the case and cancer is just as much a scourge today as it ever was.
Although most diseases have a genetic basis, unfortunately very few of them are caused, like cystic fibrosis, SCID or Tay-Sachs, by mutation of a single gene. The diseases that inflict the most damage on the most people have generally turned out to involve a multitude of genes, each of small effect. Without clear gene targets to attack, the scientific armada was soon drifting aimlessly. The blame for the underestimation of the wretched complexity of the genome was laid squarely at the feet of our own stubborn refusal to behave as laboratory animals. As Alfred Sturtevant, one of the early pioneers of genetic mapping, remarked, progress in human genetics was severely limited because ‘unfortunately, breeding experiments with humans are generally frowned on’. Despite this worrying statement, Sturtevant was actually a vociferous opponent of the eugenics movement sweeping Europe and America in the early twentieth century. He chose the fruit fly Drosophila as his model animal. Others have used mice or zebra fish, but although these species have admirable qualities as laboratory animals and can be genetically manipulated to harbour mutations, their patently obvious dissimilarity to ourselves had frustrated researchers who had hoped to use these species to identify human disease genes. At that point, scientists realised that they might be able to speed up the process of human disease gene identification by using a much more familiar animal. Dogs suddenly became the newest destination for the armada.
All armadas need charts to guide them, and a few years after the White House announcement the sequence of the dog genome became the focus for a powerful team of genetic cartographers. Their headquarters was the newly opened Broad Institute on the banks of the Charles River in Boston where it was closely allied with the nearby MIT and Harvard. Hugely well endowed, the Broad is awash with the very latest equipment and top-notch scientists to match. At the helm of the institute is one of the doyens of human genome research, Eric Lander, with several scientific scoops already under his belt. The Broad was keen for a substantial success to merit its extravagant funding, and the dog genome was to be it.
Lander and his team had the job done quickly and the dog genome was published in full in Nature on 5 December 2005.1 It is crammed with detail, much of it beyond the scope of this book, and introduced several new ways to analyse the vast amount of data generated by the relentless grinding of the DNA sequencing machines. The dog was the fourth complete genome to be published, after the human, mouse and rat, the last two being obvious targets as popular laboratory animals. In overall size, the dog genome is smaller than that of the human by about 500 million bases (2.8 versus 3.3 gigabases, abbreviated as Gb) and, perhaps surprisingly, smaller than that of the mouse. But genome size is not a good guide to complexity and, as I enjoyed telling my students, the size of the human genome lies between those of the lupin and the newt.
There are also fewer dog genes, the parts of the genome that encode proteins. By searching out the tell-tale DNA sequences marking the beginning and end of genes, the Broad team found 19,300 protein-coding genes in the dog, compared to 22,000 in humans.
Another type of analysis revealed the parts of the genomes of both dog and human that had been evolving quickly. Earlier findings had shown that, compared to the mouse, the most rapid genome evolution in humans was to be found in genes concerned with brain development. This fitted perfectly with our vain self-image of intellectual superiority within the animal kingdom. The dog genome, however, showed an equally rapid rate of evolution in dog brain genes. Interestingly, the other genes that appear to have accelerated evolution in humans were associated with sperm production and mitochondria.
The latter are not the genes carried on mitochondrial DNA itself, but on mitochondrial genes that have been ‘kidnapped’ by our nuclear DNA over the course of evolution. I found the explanations put forward for this observation most intriguing. They suggest the powerful influence that sexual selection has had on our own evolution. Competition between sperm for the prize of success in fertilisation is well known across many species. It is especially intense in primates and may well have encouraged the rapid evolution of genes conferring the production of greater quantities of sperm or speedier motility to beat the competition to the egg. Less clear-cut is the rapid evolution of the kidnapped mitochondrial genes themselves, though a case for sexual selection can be made here as well, bearing in mind that sperm are powered by mitochondria as they paddle furiously towards the finishing line, marked by the unfertilised egg.
Genomes are not sequenced in their entirety all at once, but in chunks containing roughly 50,000 base pairs. The sequenced segments must then be assembled correctly to recreate the DNA sequence as it is in the genome. Chromosomes are simply long, very long, linear strings of DNA. On each chromosome of a pair the order and position of genes along it is the same, but the precise DNA sequence is slightly different in ways we will look at soon. It is no small task to place the sequence of chromosome segments in the correct order. This is achieved by sequencing multiple DNA segments that overlap one another. Powerful, very powerful, computers then match up the overlaps and produce the complete sequence. Such is the complexity of this process that it comes as no surprise that there are so many software engineers among the forty-four authors on the 2005 paper that announced the complete sequence. Forty-four authors, but only one dog, contributed to the project – a female Boxer called Tasha.
The Broad paper trumped an earlier partial dog genome sequence produced by Craig Venter and his team at the privately funded Institute of Genome Research (TIGR) in Maryland. As already mentioned, Craig Venter led one of the teams involved in the original discovery of the human genome sequence. In the long-standing tradition of scientists using themselves as guinea pigs, it was Venter’s own DNA that was the first to be sequenced. And it was Venter’s own poodle, Shadow, that became the very first dog to have its genome laid bare by science.
In addition to sequencing the dog genome, and thoroughly checking for accuracy, the Lander team at the Broad Institute also searched Tasha’s genome for bases that differed between the chromosomes in a pair. These could then be used, in ways we shall look at shortly, to trace genetic disease traits, at first in dogs and later perhaps in humans too. The first search examined the detailed sequence of Tasha’s two chromosomes for differences involving just a single base pair. This kind of difference, caused originally by a mutation during DNA copying, is perpetuated in subsequent generations and can spread throughout many of the descendant dogs.
Lander’s team identified all the positions in the dog genome where the two chromosomes differed at a single base. For example, there might be a T base on one of the pair and a C base at the equivalent position on the other. These variations in the sequence are known as single nucleotide polymorphisms (SNPs, pronounced ‘Snips’). Just by comparing the sequence of Tasha’s two chromosomes, the team from the Broad Institute found an astonishing 770,000 bases at which the two chromosomes differed. Once the genome sequence of Venter’s poodle Shadow was included to enhance the search for SNPs, the total increased the count to nearly 1.5 million. By a bit of rough and ready partial sequencing of nine different dog breeds the number of SNPs went up to 2.5 million. That is an astonishing total but still only averages out at one SNP per 900 DNA bases, or just over 0.1 per cent, across the dog genome.
There is a lot that can be done with 2.5 million SNPs, especially when you know exactly where they are within the genome. Firstly, they can be used to manufacture ‘chips’ that can rapidly test any dog’s DNA for all 2.5 million of them. In practice this is more than enough, so the chips that were manufactured were restricted to a more manageable, and economic, 100,000 SNPs.
To explain the value of these markers I need to introduce one vital aspect of chromosome behaviour. We’ve covered the fact that in mammals all individuals have matched pairs of chromosomes, one coming from the mother, the other from the father. Each carries the same genes in the same places, and for most of their life in the cell nucleus, the chromosomes keep themselves to themselves. The genes issue instructions to the cell that are followed to the letter and synthesise the whole range of proteins that we and all other animals need to grow and sustain life.
Blood genes tell red blood cells how to make haemoglobin, bone genes tell bone cells how to make collagen, hair genes tell hair cells to make keratin and so forth. All the while the sets of genes on the two chromosomes act independently of each other, quietly getting on with the job of life. Meanwhile, a few cells, inside the testis and the ovaries, are being made ready for the next generation. These are the germ cells, to distinguish them from the workaday somatic cells, and their sole function is to pass on their DNA to the next generation. But not all of it. As we have seen, somatic cells contain pairs of chromosomes, thirty-nine pairs in the dog and twenty-three pairs in humans. Germ cells, however, that’s to say sperm and eggs, only have one of each pair. When a sperm fertilises an egg the two sets of chromosomes are combined and the normal number is restored. But something else happens too. Within the germ cells that go on to become sperm and eggs the pairs of chromosomes begin to dance with each other, moving closer and closer until they are touching. At these fleeting contacts something truly amazing happens. The chromosomes, those long strands of DNA, actually break and re-form with their dance partner. The embrace is short-lived and is over in a matter of seconds. The entwined chromosomes break the clinch and move apart. But during that brief embrace something truly amazing occurs, as we will see a little later.
* Technically this was the first draft of the human genome sequence, substantially complete but excluding some stretches of unimportant ‘junk DNA’.