11

All My Ancestors

CH11%20Chromosome.tif

Chromosome 11.

We have traveled this far by listening to the clear music of just two solo instruments, free from the background rumble of the genome. The sharp and precise notes of mitochondria have traced the echo of our maternal ancestors back tens of thousands of years, following the journeys of women. The ferocious, warring blasts of the Y chromosome picked out the erratic history of men. The rest of the genome has been silent in narrating the story of our ancestors, leaving us free to concentrate on the separate melodies of men and women. Now is the time to turn up the volume on the rest of our genome, sit back, and listen to the sound of the whole orchestra.

The fraction of our genome carried by the two principal soloists is tiny. Mitochondrial DNA carries only thirty-seven genes on its compact circle of precisely 16,569 bases. Although very much larger, at 58 million bases, the Y chromosome has fewer active genes than mitochondria, only twenty-seven, owing to its decayed and enfeebled state. The rest of the genome, on which we depend for virtually all our genetic instructions, is far larger again, with just over 3 billion bases spread over twenty-three, for the most part, healthy and robust chromosomes, containing about twenty-five thousand genes. Already you must feel how easily this could become a cacophony of different sounds, completely drowning the sweet music of our soloists. And you would be right, because interpreting the ancestral signals coming from the main bulk of our genome is far less straightforward.

For a start, because we inherit the DNA in our genomes from both parents and it is shuffled at each generation, it is almost impossible to tell which ancestor is responsible for passing on which segment of DNA. We all have two copies of each gene, but without testing our parents directly, we cannot tell which copy we received from which parent. And that is just our parents. When it comes to more distant ancestors, who cannot be tested, then it becomes virtually impossible.

Then there is the issue of the generation paradox. Just like us, both our parents have two copies of every gene. But they each pass on only one copy to us and so we, you and I, only ever get half of one parent’s DNA and half of the other. What happens to the rest of it? Some DNA may be passed on to our brothers or sisters, but the rest goes nowhere. The generation paradox arises because, for every generation back in time, the number of our ancestors doubles, but we still only inherit the same amount of DNA. To clarify this let us choose a particular gene, beta-globin, that controls one of the subunits of hemoglobin in our red blood cells. Thinking about our four grandparents, we will have inherited one copy of the globin gene from one of our paternal grandparents and the other copy from one of our maternal grandparents. But that leaves two grandparents whose globin genes we have definitely not inherited. Going back another generation, to our eight great-grandparents, we have inherited our globin genes from only two of them, leaving six grandparents whose globin genes did not get through to us. Likewise, going even further back and still with only two globin ancestors at each generation, fourteen out of our sixteen great-great-grandparents, and thirty out of thirty-two great-great-grandparents, will not have given us our globin genes. We will never know, without a lot of extra work, which of these thirty-two ancestors once carried the globin gene that we have inherited.

The globin gene is only one of thousands, so even if we received our globin genes from only two of our thirty-two great-great-grandparents, we will have inherited the copies of plenty of other genes from all of them. However, because the number of ancestors keeps on doubling at every generation that we go back, there will come a time when there are ancestors from whom we don’t inherit any DNA at all. But when will that be? With 25,000 genes and two copies of each, that makes 50,000 separate DNA segments. So when the number of ancestors exceeds 50,000 there must be some from whom we get no DNA. It is a simple calculation, just doubling at each generation 2, 4, 8, 16, 32, and so on. After fourteen generations this mathematical series gets to 16,384, and exceeds the 50,000 mark by generation 16, when we have 65,538 ancestors. With a generation time of twenty-five years, that is only 400 years ago, which takes us back to the beginning of the seventeenth century, about the time of the first English settlements in America.

However, this calculation assumes that we inherit our DNA in a neat and equitable way from our ancestors. In fact, which particular segments we inherit from which ancestors is completely random and therefore governed by the rules of chance. We get more DNA from some and less from others. This spread means that there are some much more recent ancestors, probably within only six generations, from whom we haven’t inherited any DNA at all. With the same 25-year generation time, that is only 150 years ago.

While this is not so long back, the numbers of ancestors are doubling at each generation and growing at an alarming rate. Which is where the paradox shows itself, because at some point the number of ancestors will exceed the entire population of the world. What is the solution? It is this: Although the number of ancestors doubles at each generation, some of them will be the same people. Not our parents, obviously, but two of our grandparents could, theoretically, be the same person. It’s unlikely but possible. The chances increase as we go back until, at some point, it becomes inevitable. Where that point is depends on how our ancestors lived. If they were mainly endogamous—that is, marrying among themselves like Ashkenazi Jews, for example—these “double” ancestors may have lived quite recently. For more exogamous ancestries, they will have lived further back in the past. However, whether from an endo- or exogamous ancestry, there will inevitably come a point when one person is the ancestor of everyone alive. This sounds absurd, but theoreticians have calculated that in an exogamous population of one million people, this person lived only twenty generations, or five hundred years, ago. Even when other factors, including a more realistic figure for the world population and the effects of migration and geographical isolation, were brought into the calculation in more sophisticated models, this “universal ancestor” still lived just seventy-six generations, or, assuming the same twenty-five-year generation time, only 1,900 years, ago.1 I do find it astonishing that, compared with the quarter-million-year history of our species, one individual, from whom everyone alive today can trace a line of descent, lived so recently.

He or she was only our most recent “universal ancestor,” and as you go further back in time the same theorists predict that the population divides into individuals from whom everyone can trace at least one line of ancestry—and the rest, who were the ancestors of nobody alive today. That point is reached at five thousand years ago, about the time the Pyramids were built. So the slaves who built them would either have been the ancestors of everybody alive today, including you and me, or of nobody. Beyond that point everyone is descended from exactly the same set of ancestors, though along different lines. Even further back, the proportion of people with no living descendants increases until only one couple remains who were the ancestors of everyone living today through every line of descent, except two. As you might expect, these estimates are surrounded by caveats. However, even the most sophisticated models incorporating factors like the opening of sea routes do modify the timing, but not by much. The principle is still valid, and the conclusion, strange though it may seem, is inevitable.

The two exceptions to this theoretical scenario are the direct matrilineal and patrilineal ancestries that are by now so familiar. They coalesce into universal ancestors a lot further back, around 65,000 years for the “Y-chromosome Adam” and 170,000 for “Mitochondrial Eve.” Why the difference? It all has to do with the behavior of men who have more than their fair share of children, something I explore in depth in Adam’s Curse.

Even without the complications of the fairly recent universal ancestors, our genomic DNA ancestry is a snarled tangle compared with the linear simplicity traced by mitochondrial and Y-chromosome DNA. We know exactly which ancestral path they have taken from the deep past to the present. They are the two clear voices above the tumult of the genome, but despite their clarity they cannot tell us the complete story of our ancestry, and a lot of it remains hidden. I used to think this was a blessing in disguise because it is almost impossible to grasp the concept and complexity of our complete DNA ancestry in a way that means anything. All narrative is lost as the numbers of our ancestors grow into the thousands and beyond. For a long time I could see no way of narrating our genetic past that went beyond mitochondria and Y chromosomes. I was happy listening to the crystal-clear voices as the Callas and Domingo of genetics sang their duets. I wasn’t interested in the cacophony of the orchestra and chorus.

Then, by chance, I did catch something in the air: the faintest possibility of a melody rising above the swirl of countless different instruments. It came in late 2008, when a former colleague, John Loughlin, was explaining to me how a new DNA technology was making his work on the genetics of osteoarthritis so much easier. He had been looking for genes involved in the cascade of biochemical events that lead to severe arthritis, requiring a major joint replacement. This was a very reasonable quest because he had already shown that this particular form of osteoarthritis had a high hereditable component, ergo there must be genes involved. But as so many prospectors of the genome have discovered, it is hard work, and the rewards rarely match up to the promise first imagined—more like panning laboriously for specks of gold than striking a mother lode. I knew that he had spent many years teasing apart the genomes of hundreds of patients who had undergone joint-replacement surgery. He and his team were looking for differences between these patients and the hundreds of other people of similar age and background who did not suffer from osteoarthritis. Like other scientists on a similar quest, he used the enormous range of genetic markers that had been discovered en route to decoding the entire human gene sequence. The theoretical basis was that if his joint-replacement patients inherited particular versions of any of these markers, compared with the control group, then this might indicate the chromosomal location of an “osteoarthritis gene.” It didn’t mean that the genetic marker itself was that gene, but that it might lie nearby. The work involved was enormous, with each marker being either analyzed alone or with a few more. With thousands of them to get through, the vast majority of which would be duds, it was a massive effort, and John spent most of his time either raising the money to pay for the work or cheering on his research team.

The technical breakthrough that made the difference was arrived at independently by scientists in Britain and America: They developed ways of fixing DNA to glass. Since I had known one of the English pioneers, Ed Southern, who was working in Oxford, I had seen the early versions using sheets of window glass about ten inches square, which he covered with a matrix of small drops of DNA solution, each containing a different synthetic segment of DNA that had been made to match exactly its equivalent in the human genome. These glass sheets were early prototypes, and by the time John Loughlin started to use the new technology for his osteoarthritis research, the whole system had been miniaturized so that half a million markers now fitted onto a silicon “DNA chip” about one inch square. The matrix of synthetic DNA markers was now far too small to see the individual spots, so the reactions were observed under a microscope.

For John and the other gene prospectors, this advance meant that instead of examining the thousands of markers in his patients and controls, either individually or in small groups, he could analyze half a million at once with a single DNA chip. No wonder he was pleased. The chips were and still are expensive, and the machinery to read them is beyond the budget of most university laboratories, so it made sense for this work to be contracted out to commercial labs that could benefit from the economies of scale. So now all John needed to do was get hold of the DNA, send it off to one of these labs, get the results back, and interpret them—and spend even more time raising the money to keep going.

While I could not fail to be impressed with the technical achievement of the DNA chips and the sheer slog it was saving John and his team, I did not immediately see how this would help unravel the tangled ancestry of our genome. It was only when John referred me to 23andMe, a Californian company that was offering chip-based DNA tests to the public, that I began to understand their potential. On its Web site were examples of what the company called “chromosome paintings.” The moment I saw them I caught the first melody from our genomic ancestors. What had been until then a formless noise, audible only to the oscilloscope of computation, suddenly resolved into woodwinds, strings, and brass. Within a week I was on my way to San Francisco.

Company headquarters are in the broad, winding avenues of Mountain View, right at the heart of Silicon Valley twenty miles south of San Francisco. As I found my way, I passed neat yet unremarkable two-story buildings set back from the road and half hidden by trees. The buildings may have been unremarkable, but the signs outside were certainly not, for here and in nearby Cupertino were the research headquarters of some of the best known global companies in electronics and computing: Google, Apple Computers, Cisco Systems, Siemens, and more. I had managed to arrange a visit at such short notice because the company’s director of research was former Stanford geneticist Joanna Mountain, whom I had met on the academic conference circuit and whose work on mitochondrial DNA I knew well. The place was buzzing, because 23andMe had recently won the 2008 Time magazine Invention of the Year award.

The central theme of the business was to use the DNA-chip technology to provide customers with information about their risks of developing a range of genetic diseases. Some diseases, like sickle-cell anemia and Tay-Sachs, have a simple one-to-one correspondence between identifiable mutations in known genes and developing the disease. However, for most diseases with an inherited component, like the osteoarthritis that John Loughlin was researching, the links to specific genes are a great deal more tenuous. One of the great research efforts of the past decade has been to identify these genes, hoping that what was true for Tay-Sachs and sickle-cell would also be true for diabetes, hypertension, Parkinson’s, and the rest. The initial optimism that drove the furiously competitive search for these genes, fueled by the prospects of patenting them and making a fortune, was soon tempered by reality. They proved to be at first elusive and then quite impossible to tie down. As the British geneticist (and wit) Steve Jones once remarked, trying to find them resembles T. S. Eliot’s description of the hunt for Macavity the Cat, the “Napoleon of Crime,” in his Old Possum’s Book of Practical Cats.

He’s the bafflement of Scotland Yard, the Flying Squad’s despair:

For when they reach the scene of crime—Macavity’s not there.2

That is not to say that genes are not involved at all in these common diseases, just that the prediction that they would be small in number and big in effect turned out to be wrong. The reality is that the genes involved are many in number and individually weak in their effect. Although the search for the “Napoleons of Crime” may have been abandoned, there are plenty of minor accomplices that have been found “loitering with intent” and taken in for questioning, and the outcome of these enquires has been to use the DNA results to adjust an individual’s risk of developing a disease.

People’s perceptions of risk are notoriously wayward, particularly when there are numbers attached, and bear only scant relation to anxiety levels. For example, I am much more worried about being crushed by a herd of stampeding cattle than I am of being killed in a car accident, even though the statistics show that I have it completely the wrong way around. In the United States 105 people were killed by cattle between 2003 and 2007 while 192,256 died on the roads. Not strictly comparable, I know, but you see what I mean. I could modify my personal risks downward by never going into a field full of cows, or avoiding traveling in a car. One of the presumptions of personal genetic risk analysis is that we will modify our lifestyles accordingly: If we have a higher than normal genetic risk of obesity then we will go on a diet, or if we are told we have an elevated risk of developing diabetes then we will avoid sugar. I have always thought this a very tenuous piece of reasoning. After all, millions of people smoke though they know very well that it might kill them. But I could not have put it better than a journalist, from The New Yorker, I think, who once interviewed me about the results of some tests he had done on himself through another company. After he had finished his questions, I asked him what he was going to do about this new knowledge about himself. “Eat more broccoli,” came his sardonic reply.

As you can imagine, there has been a great deal of debate about the value of these results, and even whether the tests should be offered to the public at all. A lot of this has been among professional medical geneticists who are fearful that people will discover they are at high risk of developing a grave genetic disease. There are good reasons for taking this seriously, and during my time in medical genetics, I have been impressed with the arrangements for counseling people who are contemplating a DNA test because of a family history of a genetic disorder. None illustrates the dilemma better than Huntington’s disease. This insidious and invariably fatal affliction seems deliberately designed to maximize cruelty to its unfortunate sufferers and their relatives. The symptoms of neurological and personality collapse do not show until around the age of thirty, after which there is a steady decline toward dementia and death. The pattern of its inheritance means that children who see one of their parents succumb have a 50 percent chance of inheriting the mutant gene and developing the disease themselves. Unlike Tay-Sachs and other recessive disorders, one mutant copy of the gene is enough to give the symptoms.

Finding the Huntington’s gene, in 1993, was one of the triumphs of the early years of genetic exploration and immediately offered the prospect of a genetic diagnosis before the onset of symptoms. Not that anything could be done about stopping the development of the disease, but there were circumstances when the DNA test was requested, most commonly when someone who was at risk but too young to show the symptoms was contemplating starting a family. Often this was someone who had already witnessed the suffering of a parent but did not know whether they carried the same death sentence in their DNA. There are so many factors that need to be considered, even before having a DNA test, that professional advice is essential. How will you respond to a positive result? Or even a negative one, which you would have thought would bring unrestrained relief but is often met with a deep feeling of guilt. How about identical twins? Since their DNA is exactly the same, the result of a test would apply equally to both, but what to do when one wants the test but the other does not? It is no surprise that suicide has been the response of some to finding out that they have the mutant gene, and under these circumstances it is easy to see that it would be catastrophic to offer the Huntington’s DNA test directly to members of the public without the backup of professional genetic counseling. I think considerations of this kind have made the medical genetics profession extremely wary about direct-to-consumer genetic testing for less acute disease susceptibilities, which is why on the whole, it is not in favor.

I have certainly had vigorous arguments with my colleagues about this, and I think they are wrong. First of all I think they underestimate the sophistication and common sense of customers. Second, their response is both arrogant and hypocritical in the sense that the same medical genetic community that has trumpeted the benefits of genetic research now wants to restrict public access. By all means root out the charlatans, but instead of sniffily looking the other way, help companies that have the resources and the motivation to do a difficult job well. And, by the way, I am not being paid to say this.

Although their primary objective is in the health-care aspects of modern genetic analysis, 23andMe was also well aware that the same genetic information could be used for personal ancestry research. Organizations like Oxford Ancestors and Family Tree DNA had proved that there was a market, while the appetite for personalized genetic health-risk evaluation by members of the public had never really been tested when they launched in 2007. No one knew how much people would be prepared to pay and how many would want it. But for ancestry the figures were there, and since it required no additional genetic analysis, only interpretation and presentation, it was sensible to offer ancestry testing as a sideline. And it was this sideline that brought me to Mountain View.

I was met at the door by Joanna Mountain and one of the cofounders, Linda Avey, whose background is in marketing. I had prepared a short presentation, mainly about the narrative qualities of DNA, as I recall, after which Joanna did the same, explaining how the company was adapting its DNA-chip system for ancestry applications. The others in the audience were mainly young, mostly scientists or software engineers. After a short tour and some individual meetings I left to rejoin the hell that is Highway 101 going north to San Francisco. Except this time I hardly noticed. I was very pleased with how things had gone. Although this was only an intial contact after all, I came away with a very positive impression of the company and the people and, more important, an offer of help with my research for DNA USA.

What had intrigued me from the start and had seemed to offer a way into the complexity of our genomic ancestry was the way in which the ancestral origins of human chromosomes were portrayed. Each of our twenty-two pairs of autosomes—that is to say all of our chromosomes, except the X and Y—were laid out in horizontal rank and in numerical order from the largest, number 1, at the top to the smallest, number 22, at the bottom. Each chromosome was sliced lengthwise along the middle so that the top and bottom slices represented the two copies of each chromosome that we possess. In examples of people with a mixed ancestry, different colors picked out the segments of their DNA that had come from one of three continental origins. Dark blue for European, green for African, and orange for Asian—which in the United States is a proxy for Native American.

This was not the first system to estimate the continental components in an individual’s genome. An earlier method had been developed that gave a quantitative estimate of African, European, and Asian DNA, but it did not break this down into chromosome segments. Rick Kittles, the cofounder of African Ancestry, had used a system, called AIMs for “ancestry informative markers,” with some interesting results that we will look at later, but something about the numerical brutality of AIMs made me wary of its use in individuals. Chromosome painting, on the other hand, seemed to overcome my misgivings and come much closer to the real situation for individuals with ancestors from different continents, and the visual representation made it much harder to misinterpret.

So how do you go about painting someone’s chromosomes? The underlying science depends, as always in genetics, on the variations between one individual’s DNA and the next. Without these there would be no genetics. One of the triumphs of the Human Genome Project, aside from reading the entire human DNA sequence, has been to discover millions of tiny differences between human genomes, known by their acronym SNPs, which we have already encountered in the Y chromosome. The initials stand for “single nucleotide polymorphism,” which means a difference only in the DNA sequence at a particular location on a particular chromosome. For example, where the sequence might be GGATTA on one chromosome and GGATCA on another, this is a SNP. Millions of SNPs have been discovered throughout the human genome, which once found can be identified by the DNA sequences on either side. So the SNP we introduced as GGATTA/GGATCA is flanked by unchanging DNA sequences that are known from the human genome sequence. You need only a sequence of around twenty bases on each side to uniquely identify any SNP. These short DNA segments are easy to synthesize, and easy to immobilize on a chip. Once on the chip they are able to detect which of the two sequences is present at the SNP in any DNA they are asked to test—or “interrogate” in the lingo. After some clever chemistry the spot on the chip where the synthetic SNP sequences are attached glows a fluorescent red for one version and green for the other, with these tiny signals being picked up by a powerful automated microscope. As there are half a million SNPs on a typical chip, and the microscope can scan all of them within a few minutes, very soon you know the sequence at all half million SNPs in the DNA being interrogated.

However, these chips are analyzing DNA from individuals who have two copies of each chromosome. This means that there are not two but three possible results for each SNP. If, in our example, GGATTA glows red and GGACTA glows green, when both chromosomes have the GGATTA version the spot will glow red. On the other hand, when both chromosomes have the alternative sequence GGACTA at the SNP, the spot will be green. But there will be times when both versions are present and one of the chromosomes has the sequence GGATTA while the other has GGACTA at the SNP. Under these circumstances the spot on the chip glows both red and green. Fluorescence filters on the microscope can deal with this and record both versions of the SNP. Even though the chip has analyzed half a million bases, this is only a fraction of the total of three thousand million. However, these bases have been chosen as the ones that are known to vary between chromosomes. We also know the precise chromosomal position of all half million of them.

At each generation our chromosomes shuffle their DNA sequences. Most of the time the chromosomes we received from our mothers and the ones we inherited from our fathers don’t have much to do with each other. They lead physically separate lives in the cell nucleus and carry on barking their instructions to our cells quite independently from one another. In most of our body cells they live apart throughout our lives, but in our germ-line cells there is a final embrace. Just before our chromosomes become packaged into eggs or sperm, the pairs line up with each other and swap DNA. Then they move apart and go their separate ways into the germ cells. There is a very sound evolutionary reason for this tender parting, as it creates an enormous amount of genetic diversity in the next generation that protects the offspring from parasites and pathogens, again something I explored in Adam’s Curse.

Whereas this was once thought to be a completely random process, and that DNA exchanges could happen anywhere along the length of the chromosomes, it turns out that this is not so. It now seems that there are “hot spots” along each chromosome where these exchanges are much more frequent. Rather than being completely random, as in a properly shuffled deck of cards, it is as though there are runs of cards that stay together.

The blocks of DNA between hot spots that are not disrupted by shuffling can be tens of thousands of bases long and contain several SNPs. This means that they tend to retain the combination of variants at each of the SNP sites within them. So a block with five SNPs might have the combination, as seen by the chip, of red/green/green/red/red, each one indicating the presence of a particular variant at the SNP site. This introduces a new level of discrimination, as there are now 25, or 32, possible combinations for this segment. This is the same principle that provides for the enormous range of genetic signatures generated by only a few markers on the Y chromosome when they are used in combination. Although the situation on the autosomes is far less helpful than on the Y chromosome, not least because exchanges are not exclusively confined to hot spots, the presence of relatively undisturbed segments of DNA is nonetheless valuable for the next stage in the chromosome painting process.

After the Human Genome Project finished in 2003, there were a lot of geneticists looking for something to do, and a lot of idle machinery. Some of them plowed on with sequencing other genomes, first mouse, then chicken, and so on. They are still going and, predictably, the species being sequenced are becoming more exotic. In 2011 the complete DNA sequence of the nine-banded armadillo and the canary were on their way to completion, in company with multitudes of potentially useful bacteria and fungi.3

Other geneticists switched their researches to studying the DNA variation among individual human genomes and soon began to realize that the human genome was falling into blocks. Thanks to the discovery of DNA-exchange hot spots and the cooler regions in between, a huge international scientific effort to describe these blocks as fully as possible began to take shape in 2002. How many there were, where the boundaries were, and so on. The impetus and the large sums made available were driven by the optimism of finding the elusive common disease genes, the “Napoleons of Crime.” By knowing where these blocks were, it was going to be easier to locate these genes by the simple strategy of association between the blocks and the presence or absence of the disease in question in large numbers of patients and controls. Where the association with a particular block was high, then Macavity must be hiding nearby. Surely?

To discover how these blocks were behaving the HapMap Project (after “haploblocks,” as these chunks are known) looked in detail at the genomes of individuals from three different parts of the world.4 The chosen ones, 270 in all, came from Africa, Asia, and Europe, and each individual’s DNA was typed for about 3 million SNPs. The ninety-strong African contingent was from Ibadan in Nigeria, members of the Yoruba tribe; the ninety Asian volunteers were from Tokyo and Beijing; while the ninety Europeans were actually Americans with their roots in northern and western Europe. The work was divided up among labs in the United States and Canada, England, China, and Japan with each lab concentrating on different chromosomes, as they had in the initial sequencing of the human genome. Many of the HapMap scientists were veterans of the Human Genome Project and knew their way around their favorite chromosomes. From these results half a million SNPs that were favorably placed within each block were selected, and these were put on a DNA chip. Like the Human Genome Project, one attractive feature of HapMap was the release of data into the public domain, and it was through this release that software engineers were able to get their chromosome brushes out and start painting.

The aim of chromosome painting is to assign each block in an individual’s genome to one of the three continental origins represented by the HapMap volunteers. This is of course a gross simplification, but it seems to work. Let us take President Obama as an example, not that I have his details. (I was told that the president has declined to be tested while he is still in office.) I may not have his chromosome painting, but I have a pretty good idea what it would look like. As everyone knows, President Obama has an African father and a European American mother. He has inherited one chromosome of each of the twenty-two autosome pairs from his father, with DNA blocks that will likely match up with the Nigerian volunteers more than they do with the Asians or the Europeans who helped build up the three continental reference collections. Equally, his other chromosome in each pair has come from his mother, whose DNA blocks will probably all match the European more closely than either the Asian or African chromosomes. The painting software makes these comparisons for blocks of DNA of about ten thousand bases all along the twenty-two pairs of chromosomes. With a total of 3 billion DNA bases to cover, this makes a total of about thirty thousand blocks to color in.

Mike MacPherson, the scientist who helped develop the program, explained to me on my visit that there are six possible combinations for each of these blocks along the chromosome pairs: African/African, Asian/Asian, European/European, and then the combinations of African/European, African/Asian, and Asian/European. Mike’s algorithm chooses which of these combinations fits best with the DNA being analyzed and fills in the painting accordingly. “African” blocks are colored a light green, “Asian” blocks are orange, and “European” blocks are dark blue. As each block is analyzed and painted separately, there is a set of conventions that govern the coloring of the top and bottom slices. When both copies of a block have their best match with only one of the reference samples, as in African/African, for example, then both top and bottom slices are painted green. The rules come into play when the two blocks match different reference samples. So for African/European blocks the top slice is painted green for African, and the bottom slice is dark blue for European. An African/Asian block has the Asian orange on the top slice and African green underneath. The third mixed block, Asian/European, has blue on top and orange below. There are examples of all of these in the illustrations for chapter 19.

Returning to my theoretical reconstruction of the president’s chromosomes, given that his father, Barack senior, was from Kenya and his mother, Ann Dunham, was a European American from Kansas, I would expect the chromosome pairs in his body cells to be African green on top, following the convention mentioned above, and European blue beneath pretty well as in the monochrome version in Figure 4 (A). Chromosomes don’t actually look like this in real life. This is just a diagram of one of them, but it does give me the opportunity of pointing out one of the features of chromosomes. They are divided into two arms, separated in the diagram by the gray disc. The discs are there to represent attachment points for muscle-like proteins that help to pull the two chromosomes apart during cell division. Although the attachment points are made of DNA, their sequences are very repetitive and hard to analyze and consequently have not been included in the HapMap coverage or the SNP chips and are shaded gray in color versions. There are one or two other gray regions that have been left off the chips for the same reason, but they are only small and we can forget about them.

Fig4_Presidents%20Chromosome.tif

Figure 4. Following one of President Obama’s chromosomes through four generations. The light gray blocks are DNA of African origin, while the darker blocks have a European origin.

The president’s children have inherited one of each pair of chromosomes from him and the other from their mother, Michelle, the first lady. The chromosome coming from the president (B) is an amalgam of the two chromosomes in his body cells (A), shuffled by DNA exchange. There is usually only one exchange on each chromosome arm at each generation, so the chromosome going to his first daughter, Malia Ann, in Figure 4 (C) might look like this, although the random nature of DNA exchange makes the precise pattern unpredictable.

We know from conventional genealogical research carried out by Megan Smolenyak and reported in the New York Times on October 7, 2009, that Michelle Obama has some European ancestors. However, for the sake of simplicity, we will ignore these and assume that all her ancestry is African. So the example chromosome in Malia’s body cells would look like C, with the mixed African/European chromosome (B) from the president and an African chromosome from the first lady.

Looking into the future, to the time Malia Obama has her own children, the chromosome she passes on will be another amalgam of the two she inherited from her parents, randomly shuffled by DNA exchange, like D in Figure 4 perhaps. If, to keep it simple, she marries a man with an African genome, her child—let’s say it is a boy this time—will have arrangement E in his body cells. Most of the DNA in this pair of chromosomes has an African origin, all except for the European DNA in the dark blocks that have come, originally, from his great-grandmother, Ann Dunham. Sure, we can give a percentage of African and European DNA in this pair of chromosomes (roughly 88 percent African and 12 percent European by the look of it), and all the other chromosomes once we have “interrogated” them on the DNA chip. But in my view that doesn’t take us much further than the ethnic ancestry tests derived from the AIMs. What really distinguishes chromosome painting from its forerunner is that, since we know precisely where genes are located on each chromosome, we can tell the continental origin of each one in any individual.

In the president’s theoretical grandson—let’s call him Harry—most of the genes along this chromosome will have an African origin, but for genes located within the two-tone blocks, he will be working on a fifty-fifty combination of African and European genes. If, for example, the gene for the ABO blood group was in one of these blocks, then his blood group will be decided by a mixture of African and European DNA. If the block contained a muscle protein gene, his muscles would be powered equally by African and European genes. Since both the size and boundaries of these blocks is so random, unless they happen to be identical twins, it is extremely unlikely that any two of the president’s grandchildren will inherit the same blocks of European DNA, and hence the same European genes, on this chromosome. When all the chromosomes are brought into the comparison, then what was vanishingly unlikely becomes virtually impossible, and—though each of the president’s grandchildren may have close to the average of one-eighth European DNA that is expected—the number and the identity of the genes with a European ancestry will be quite different in all of them.

By the time of the next generation, assuming once more for simplicity that Harry marries an African, his child, the by-now-former president’s great-grandchild will have only one small segment of European DNA on the chromosome we have been following. It came originally from the president’s mother and has survived through four generations, diminishing by roughly half at every one. It may survive for many more generations to come, or it may be eliminated by the forces of random chance at any one of them.

We have followed only the ancestry of one chromosome through three generations, and even then we have assumed that the chromosomes that joined the genealogy from outside are of entirely African ancestry. As you can imagine, where these incoming chromosomes are themselves built up of blocks of DNA with different continental ancestries, the picture soon becomes very complicated. But, however intricate it is, we would still be able to recognize the ancestral origin of the blocks of DNA and identify the genes that were contained within each one of them. I liked the way the chromosome portraits got so close to the actual situation and illustrated it so well. Our genomes are all mixtures built up of bits and pieces from a huge number of ancestors, and when these ancestors came from different continents, the variety is both obvious and intriguing. This was what I wanted to explore in America, and as soon as I returned to England from San Francisco, I began to plan in detail how to go about it.