Introduction

The human genome - the complete set of human genes - comes packaged in twenty-three separate pairs of chromosomes. Of these, twenty-two pairs are numbered in approximate order of size, from the largest (number i) to the smallest (number 22), while the remaining pair consists of the sex chromosomes: two large X chromosomes in women, one X and one small Y in men. In size, the X comes between chromosomes 7 and 8, whereas the Y is the smallest.

The number 23 is of no significance. Many species, including our closest relatives among the apes, have more chromosomes, and many have fewer. Nor do genes of similar function and type necessarily cluster on the same chromosome. So a few years ago, leaning over a lap-top computer talking to David Haig, an evolutionary biologist, I was slightly startled to hear him say that chromosome 19 was his favourite chromosome. It has all sorts of mischievous genes on it, he explained. I had never thought of chromosomes as having personalities before. They are, after all, merely arbitrary collections of genes. But Haig’s chance remark planted an idea in my head and I could not get it out. Why not try to tell the unfolding story of the human genome, now being discovered in detail for the first time, chromosome by chromosome, by picking a gene from each chromosome to fit the story as it is told? Primo Levi did something similar with the periodic table of the elements in his autobiographical short stories. He related each chapter of his life to an element, one that he had had some contact with during the period he was describing.

I began to think about the human genome as a sort of autobiography in its own right - a record, written in ‘genetish’, of all the vicissitudes and inventions that had characterised the history of our species and its ancestors since the very dawn of life. There are genes that have not changed much since the very first single-celled creatures populated the primeval ooze. There are genes that were developed when our ancestors were worm-like. There are genes that must have first appeared when our ancestors were fish. There are genes that exist in their present form only because of recent epidemics of disease. And there are genes that can be used to write the history of human migrations in the last few thousand years. From four billion years ago to just a few hundred years ago, the genome has been a sort of autobiography for our species, recording the important events as they occurred.

I wrote down a list of die twenty-three chromosomes and next to each I began to list themes of human nature. Gradually and painstakingly I began to find genes that were emblematic of my story. There were frequent frustrations when I could not find a suitable gene, or when I found the ideal gene and it was on the wrong chromosome. There was the puzzle of what to do with the X and Y chromosomes, which I have placed after chromosome 7, as befits the X chromosome’s size. You now know why the last chapter of a book that boasts in its subtide that it has twenty-three chapters is called Chapter 22.

It is, at first glance, a most misleading thing that I have done. I may seem to be implying that chromosome 1 came first, which it did not. I may seem to imply that chromosome 11 is exclusively concerned with human personality, which it is not. There are probably 30,000—80,000 genes in the human genome and I could not tell you about all of them, partly because fewer than 8,000 have been found (though the number is growing by several hundred a month) and partly because the great majority of them are tedious biochemical middle managers.

But what I can give you is a coherent glimpse of the whole: a whistle-stop tour of some of the more interesting sites in the genome and what they tell us about ourselves. For we, this lucky generation, will be the first to read the book that is the genome. Being able to read the genome will tell us more about our origins, our evolution, our nature and our minds than all the efforts of science to date. It will revolutionise anthropology, psychology, medicine, palaeontology and virtually every other science. This is not to claim that everything is in the genes, or that genes matter more than other factors. Clearly, they do not. But they matter, that is for sure.

This is not a book about the Human Genome Project — about mapping and sequencing techniques — but a book about what that project has found. On June 26, 2000, scientists announced they had completed a rough draft of the complete human genome. In just a few short years we will have moved from knowing almost nothing about our genes to knowing everything. I genuinely believe that we are living through the greatest intellectual moment in history. Bar none. Some may protest that the human being is more than his genes. I do not deny it. There is much, much more to each of us than a genetic code. But until now human genes were an almost complete mystery. We will be the first generation to penetrate that mystery. We stand on the brink of great new answers but, even more, of great new questions. This is what I have tried to convey in this book.

 

 

PRIMER

 

The second part of this preface is intended as a brief primer, a sort of narrative glossary, on the subject of genes and how they work. I hope that readers will glance through it at the outset and return to it at intervals if they come across technical terms that are not explained. Modern genetics is a formidable thicket of jargon. I have tried hard to use the bare minimum of technical terms in this book, but some are unavoidable.

The human body contains approximately ioo trillion (million million) CELLS, most of which are less than a tenth of a millimetre across. Inside each cell there is a black blob called a NUCLEUS. Inside the nucleus are two complete sets of the human GENOME (except in egg cells and sperm cells, which have one copy each, and red blood cells, which have none). One set of the genome came from the mother and one from the father. In principle, each set includes the same 30,000-80,000 GENES on the same twenty-three CHROMOSOMES. In practice, there are often small and subtle differences between the paternal and maternal versions of each gene, differences that account for blue eyes or brown, for example. When we breed, we pass on one complete set, but only after swapping bits of the paternal and maternal chromosomes in a procedure known as RECOMBINATION.

Imagine that the genome is a book.

 

There are twenty-three chapters, called CHROMOSOMES.

Each chapter contains several thousand stories, called GENES.

Each story is made up of paragraphs, called EXONS, which are interrupted by advertisements called INTRONS. Each paragraph is made up of words, called COD ON s. Each word is written in letters called BASES.

 

There are one billion words in the book, which makes it longer than 5,000 volumes the size of this one, or as long as 800 Bibles. If I read the genome out to you at the rate of one word per second for eight hours a day, it would take me a century. If I wrote out the human genome, one letter per millimetre, my text would be as long as the River Danube. This is a gigantic document, an immense book, a recipe of extravagant length, and it all fits inside the microscopic nucleus of a tiny cell that fits easily upon the head of a pin.

The idea of the genome as a book is not, strictly speaking, even a metaphor. It is literally true. A book is a piece of digital information, written in linear, one-dimensional and one-directional form and denned by a code that transliterates a small alphabet of signs into a large lexicon of meanings through the order of their groupings. So is a genome. The only complication is that all English books read from left to right, whereas some parts of the genome read from left to right, and some from right to left, though never both at the same time.

(Incidentally, you will not find the tired word ‘blueprint’ in this book, after this paragraph, for three reasons. First, only architects and engineers use blueprints and even they are giving them up in the computer age, whereas we all use books. Second, blueprints are very bad analogies for genes. Blueprints are two-dimensional maps, not one-dimensional digital codes. Third, blueprints are too literal for genetics, because each part of a blueprint makes an equivalent part of the machine or building; each sentence of a recipe book does not make a different mouthful of cake.)

Whereas English books are written in words of variable length using twenty-six letters, genomes are written entirely in three-letter words, using only four letters: A, C, G and T (which stand for adenine, cytosine, guanine and thymine). And instead of being written on fiat pages, they are written on long chains of sugar and phosphate called DNA molecules to which the bases are attached as side rungs. Each chromosome is one pair of (very) long DNA molecules.

The genome is a very clever book, because in the right conditions it can both photocopy itself and read itself. The photocopying is known as REPLICATION, and the reading as TRANSLATION. Replication works because of an ingenious property of the four bases: A likes to pair with T, and G with C. So a single strand of DNA can copy itself by assembling a complementary strand with Ts opposite all the As, As opposite all the Ts, Cs opposite all the Gs and Gs opposite all the Cs. In fact, the usual state of DNA is the famous DOUBLE HELIX of the original strand and its complementary pair intertwined.

To make a copy of the complementary strand therefore brings back the original text. So the sequence ACGT become TGCA in the copy, which transcribes back to ACGT in the copy of the copy. This enables DNA to replicate indefinitely, yet still contain the same information.

Translation is a litte more complicated. First the text of a gene is TRANSCRIBED into a copy by the same base-pairing process, but this time the copy is made not of DNA but of RNA, a very slightly different chemical. RNA, too, can carry a linear code and it uses the same letters as DNA except that it uses U, for uracil, in place of T. This RNA copy, called the MESSENGER RNA, is then edited by the excision of all introns and the splicing together of all exons (see above).

The messenger is then befriended by a microscopic machine called a RIBOSOME, itself made pardy of RNA. The ribosome moves along the messenger, translating each three-letter codon in turn into one letter of a different alphabet, an alphabet of twenty different AMINO ACIDS, each brought by a different version of a molecule called TRANSFER RNA. Each amino acid is attached to the last to form a chain in the same order as the codons. When the whole message has been translated, the chain of amino acids folds itself up into a distinctive shape that depends on its sequence. It is now known as a PROTEIN.

Almost everything in the body, from hair to hormones, is either made of proteins or made by them. Every protein is a translated gene. In particular, the body’s chemical reactions are catalysed by proteins known as ENZYMES. Even the processing, photocopying error-correction and assembly of DNA and RNA molecules themselves - the replication and translation - are done with the help of proteins. Proteins are also responsible for switching genes on and off, by physically attaching themselves to PROMOTER and ENHANCER sequences near the start of a gene’s text. Different genes are switched on in different parts of the body.

When genes are replicated, mistakes are sometimes made. A letter (base) is occasionally missed out or the wrong letter inserted. Whole sentences or paragraphs are sometimes duplicated, omitted or reversed. This is known as MUTATION. Many mutations are neither harmful nor beneficial, for instance if they change one codon to another that has the same amino acid ‘meaning’: there are sixty-four different codons and only twenty amino acids, so many DNA ‘words’ share the same meaning. Human beings accumulate about one hundred mutations per generation, which may not seem much given that there are more than a million codons in the human genome, but in the wrong place even a single one can be fatal.

All rules have exceptions (including this one). Not all human genes are found on the twenty-three principal chromosomes; a few live inside little blobs called mitochondria and have probably done so ever since mitochondria were free-living bacteria. Not all genes are made of DNA: some viruses use RNA instead. Not all genes are recipes for proteins. Some genes are transcribed into RNA but not translated into protein; the RNA goes directly to work instead either as part of a ribosome or as a transfer RNA. Not all reactions are catalysed by proteins; a few are catalysed by RNA instead. Not every protein comes from a single gene; some are put together from several recipes. Not all of the sixty-four three-letter codons specifies an amino acid: three signify STOP commands instead. And finally, not all DNA spells out genes. Most of it is a jumble of repetitive or random sequences that is rarely or never transcribed: the so-called junk DNA.

That is all you need to know. The tour of the human genome can begin.