GENOME

All forms that perish other forms supply,
(By turns we catch the vital breath and die)
Like bubbles on the sea of matter borne,
They rise, they break, and to that sea return.

Alexander Pope, An Essay on Man

In the beginning was the word. The word proselytised the sea with its message, copying itself unceasingly and forever. The word discovered how to rearrange chemicals so as to capture little eddies in the stream of entropy and make them live. The word transformed the land surface of the planet from a dusty hell to a verdant paradise. The word eventually blossomed and became sufficiendy ingenious to build a porridgy contraption called a human brain that could discover and be aware of the word itself.

My porridgy contraption boggles every time I think this thought. In four thousand million years of earth history, I am lucky enough to be alive today. In five million species, I was fortunate enough to be born a conscious human being. Among six thousand million people on the planet, I was privileged enough to be born in the country where the word was discovered. In all of the earth’s history, biology and geography, I was born just five years after the moment when, and just two hundred miles from the place where, two members of my own species discovered the structure of DNA and hence uncovered the greatest, simplest and most surprising secret in the universe. Mock my zeal if you wish; consider me a ridiculous materialist for investing such enthusiasm in an acronym. But follow me on a journey back to the very origin of life, and I hope I can convince you of the immense fascination of the word.

‘As the earth and ocean were probably peopled with vegetable productions long before the existence of animals; and many families of these animals long before other families of them, shall we conjecture that one and the same kind of living filaments is and has been the cause of all organic life?’ asked the polymathic poet and physician Erasmus Darwin in 1794.¹ It was a startling guess for the time, not only in its bold conjecture that all organic life shared the same origin, sixty-five years before his grandson Charles’ book on the topic, but for its weird use of the word ‘filaments’. The secret of life is indeed a thread.

Yet how can a filament make something live? life is a slippery thing to define, but it consists of two very different skills: the ability to replicate, and the ability to create order. Living things produce approximate copies of themselves: rabbits produce rabbits, dandelions make dandelions. But rabbits do more than that. They eat grass, transform it into rabbit flesh and somehow build bodies of order and complexity from the random chaos of the world. They do not defy the second law of thermodynamics, which says that in a closed system everything tends from order towards disorder, because rabbits are not closed systems. Rabbits build packets of order and complexity called bodies but at the cost of expending large amounts of energy. In Erwin Schrodinger’s phrase, living creatures ‘drink orderliness’ from the environment.

The key to both of these features of life is information. The ability to replicate is made possible by the existence of a recipe, the information that is needed to create a new body. A rabbit’s egg carries the instructions for assembling a new rabbit. But the ability to create order through metabolism also depends on information — the instructions for building and maintaining the equipment that creates the order. An adult rabbit, with its ability to both reproduce and metabolise, is prefigured and presupposed in its living filaments in the same way that a cake is prefigured and presupposed in its recipe. This is an idea that goes right back to Aristode, who said that the ‘concept’ of a chicken is implicit in an egg, or that an acorn was literally ‘informed’ by the plan of an oak tree. When Aristode’s dim perception of information theory, buried under generations of chemistry and physics, re-emerged amid the discoveries of modern genetics, Max Delbruck joked that the Greek sage should be given a posthumous Nobel prize for the discovery of DNA.²

The filament of DNA is information, a message written in a code of chemicals, one chemical for each letter. It is almost too good to be true, but the code turns out to be written in a way that we can understand. Just like written English, the genetic code is a linear language, written in a straight line. Just like written English, it is digital, in that every letter bears the same importance. Moreover, the language of DNA is considerably simpler than English, since it has an alphabet of only four letters, conventionally known as A, C, G and T.

Now that we know that genes are coded recipes, it is hard to recall how few people even guessed such a possibility. For the first half of the twentieth century, one question reverberated unanswered through biology: what is a gene? It seemed almost impossibly mysterious. Go back not to 1953, the year of the discovery of DNA’s symmetrical structure, but ten years further, to 1943. Those who will do most to crack the mystery, a whole decade later, are working on other things in 1943. Francis Crick is working on the design of naval mines near Portsmouth. At the same time James Watson is just enrolling as an undergraduate at the precocious age of fifteen at the University of Chicago; he is determined to devote his life to ornithology. Maurice Wilkins is helping to design the atom bomb in the United States. Rosalind Franklin is studying the structure of coal for the British government.

In Auschwitz in 1943, Josef Mengele is torturing twins to death in a grotesque parody of scientific inquiry. Mengele is trying to understand heredity, but his eugenics proves not to be the path to enlightenment. Mengele’s results will be useless to future scientists.

In Dublin in 1943, a refugee from Mengele and his ilk, the great physicist Erwin Schrodinger is embarking on a series of lectures at Trinity College entitled “What is life?’ He is trying to define a problem. He knows that chromosomes contain the secret of life, but he cannot understand how: ‘It is these chromosomes … that contain in some kind of code-script the entire pattern of the individual’s future development and of its functioning in the mature state.’ The gene, he says, is too small to be anything other than a large molecule, an insight that will inspire a generation of scientists, including Crick, Watson, Wilkins and Franklin, to tackle what suddenly seems like a tractable problem. Having thus come tantalisingly close to the answer, though, Schrodinger veers off track. He thinks that the secret of this molecule’s ability to carry heredity lies in his beloved quantum theory, and is pursuing that obsession down what will prove to be a blind alley. The secret of life has nothing to do with quantum states. The answer will not come from physics.³

In New York in 1943, a sixty-six-year-old Canadian scientist, Oswald Avery, is putting the finishing touches to an experiment that will decisively identify DNA as the chemical manifestation of heredity. He has proved in a series of ingenious experiments that a pneumonia bacterium can be transformed from a harmless to a virulent strain merely by absorbing a simple chemical solution. By 1943, Avery has concluded that the transforming substance, once purified, is DNA. But he will couch his conclusions in such cautious language for publication that few will take notice until much later. In a letter to his brother Roy written in May 1943, Avery is only slightly less cautious:⁴

If we are right, and of course that’s not yet proven, then it means that nucleic acids [DNA] are not merely structurally important but functionally active substances in determining the biochemical activities and specific characteristics of cells — and that by means of a known chemical substance it is possible to induce predictable and hereditary changes in cells. That is something that has long been the dream of geneticists.

Avery is almost there, but he is still thinking along chemical lines. ‘All life is chemistry’, said Jan Baptista van Helmont in 1648, guessing. At least some life is chemistry, said Friedrich Wohler in 1828 after synthesising urea from ammonium chloride and silver cyanide, thus breaking the hitherto sacrosanct divide between the chemical and biological worlds: urea was something that only living things had produced before. That life is chemistry is true but boring, like saying that football is physics. Life, to a rough approximation, consists of the chemistry of three atoms, hydrogen, carbon and oxygen, which between them make up ninety-eight per cent of all atoms in living beings. But it is the emergent properties of life — such as heritability — not the constituent parts that are interesting. Avery cannot conceive what it is about DNA that enables it to hold the secret of heritable properties. The answer will not come from chemistry.

In Bletchley, in Britain, in 1943, in total secrecy, a brilliant mathematician, Alan Turing, is seeing his most incisive insight turned into physical reality. Turing has argued that numbers can compute numbers. To crack the Lorentz encoding machines of the German forces, a computer called Colossus has been built based on Turing’s principles: it is a universal machine with a modifiable stored program. Nobody realises it at the time, least of all Turing, but he is probably closer to the mystery of life than anybody else. Heredity is a modifiable stored program; metabolism is a universal machine. The recipe that links them is a code, an abstract message that can be embodied in a chemical, physical or even immaterial form. Its secret is that it can cause itself to be replicated. Anything that can use the resources of the world to get copies of itself made is alive; the most likely form for such a thing to take is a digital message - a number, a script or a word.⁵

In New Jersey in 1943, a quiet, reclusive scholar named Claude Shannon is ruminating about an idea he had first had at Princeton a few years earlier. Shannon’s idea is that information and entropy are opposite faces of the same coin and that both have an intimate link with energy. The less entropy a system has, the more information it contains. The reason a steam engine can harness the energy from burning coal and turn it into rotary motion is because the engine has high information content — information injected into it by its designer. So does a human body. Aristotle’s information theory meets Newton’s physics in Shannon’s brain. Like Turing, Shannon has no thoughts about biology. But his insight is of more relevance to the question of what is life than a mountain of chemistry and physics. Life, too, is digital information written in DNA.⁶

In the beginning was the word. The word was not DNA. That came afterwards, when life was already established, and when it had divided the labour between two separate activities: chemical work and information storage, metabolism and replication. But DNA contains a record of the word, faithfully transmitted through all subsequent aeons to the astonishing present.

Imagine the nucleus of a human egg beneath the microscope. Arrange the twenty-three chromosomes, if you can, in order of size, the biggest on the left and the smallest on the right. Now zoom in on the largest chromosome, the one called, for purely arbitrary reasons, chromosome I. Every chromosome has a long arm and a short arm separated by a pinch point known as a centromere. On the long arm of chromosome I, close to the centromere, you will find, if you read it carefully, that there is a sequence of 120 letters — As, Cs, Gs and Ts — that repeats over and over again. Between each repeat there lies a stretch of more random text, but the 120-letter paragraph keeps coming back like a familiar theme tune, in all more than 100 times. This short paragraph is perhaps as close as we can get to an echo of the original word.

This ‘paragraph’ is a small gene, probably the single most active gene in the human body. Its 120 letters are constantly being copied into a short filament of RNA. The copy is known as 5S RNA. It sets up residence with a lump of proteins and other RNAs, carefully intertwined, in a ribosome, a machine whose job is to translate DNA recipes into proteins. And it is proteins that enable DNA to replicate. To paraphrase Samuel Buder, a protein is just a gene’s way of making another gene; and a gene is just a protein’s way of making another protein. Cooks need recipes, but recipes also need cooks. Life consists of the interplay of two kinds of chemicals: proteins and DNA.

Protein represents chemistry, living, breathing, metabolism and behaviour - what biologists call the phenotype. DNA represents information, replication, breeding, sex - what biologists call the genotype. Neither can exist without the other. It is the classic case of chicken and egg: which came first, DNA or protein? It cannot have been DNA, because DNA is a helpless, passive piece of mathematics, which catalyses no chemical reactions. It cannot have been protein, because protein is pure chemistry with no known way of copying itself accurately. It seems impossible either that DNA invented protein or vice versa. This might have remained a baffling and strange conundrum had not the word left a trace of itself faindy drawn on the filament of life. Just as we now know that eggs came long before chickens (the reptilian ancestors of all birds laid eggs), so there is growing evidence that RNA came before proteins.

RNA is a chemical substance that links the two worlds of DNA and protein. It is used mainly in the translation of the message from the alphabet of DNA to the alphabet of proteins. But in the way it behaves, it leaves litde doubt that it is the ancestor of both. RNA was Greece to DNA’s Rome: Homer to her Virgil.

RNA was the word. RNA left behind five litde clues to its priority over both protein and DNA. Even today, the ingredients of DNA are made by modifying the ingredients of RNA, not by a more direct route. Also DNA’s letter Ts are made from RNA’s letter Us. Many modern enzymes, though made of protein, rely on small molecules of RNA to make them work. Moreover, RNA, unlike DNA and protein, can copy itself without assistance: give it the right ingredients and it will stitch them together into a message. Wherever you look in the cell, the most primitive and basic functions require the presence of RNA. It is an RNA-dependent enzyme l8 GENOME that takes the message, made of RNA, from the gene. It is an RN A-containing machine, the ribosome, that translates that message, and it is a litde RNA molecule that fetches and carries the amino acids for the translation of the gene’s message. But above all, RNA - unlike DNA - can act as a catalyst, breaking up and joining other molecules including RNAs themselves. It can cut them up, join the ends together, make some of its own building blocks, and elongate a chain of RNA. It can even operate on itself, cutting out a chunk of text and splicing the free ends together again.⁷

The discovery of these remarkable properties of RNA in the early 1980s, made by Thomas Cech and Sidney Altaian, transformed our understanding of the origin of life. It now seems probable that the very first gene, the ‘ur-gene’, was a combined replicator-catalyst, a word that consumed the chemicals around it to duplicate itself. It may well have been made of RNA. By repeatedly selecting random RNA molecules in the test tube based on their ability to catalyse reactions, it is possible to ‘evolve’ catalytic RNAs from scratch -almost to rerun the origin of life. And one of the most surprising results is that these synthetic RNAs often end up with a stretch of RNA text that reads remarkably like part of the text of a ribosomal RNA gene such as the 5S gene on chromosome 1.

Back before the first dinosaurs, before the first fishes, before the first worms, before the first plants, before the first fungi, before the first bacteria, there was an RNA world - probably somewhere around four billion years ago, soon after the beginning of planet earth’s very existence and when the universe itself was only ten billion years old. We do not know what these ‘riboorganisms’ looked like. We can only guess at what they did for a living, chemically speaking. We do not know what came before them. We can be pretty sure that they once existed, because of the clues to RNA’s role that survive in living organisms today.⁸

These ribo-organisms had a big problem. RNA is an unstable substance, which falls apart within hours. Had these organisms ventured anywhere hot, or tried to grow too large, they would have faced what geneticists call an error catastrophe — a rapid decay of the message in their genes. One of them invented by trial and error a new and tougher version of RNA called DNA and a system for making RNA copies from it, including a machine we’ll call the proto-ribosome. It had to work fast and it had to be accurate. So it stitched together genetic copies three letters at a time, the better to be fast and accurate. Each threesome came nagged with a tag to make it easier for the proto-ribosome to find, a tag that was made of amino acid. Much later, those tags themselves became joined together to make proteins and the three-letter word became a form of code for the proteins — the genetic code itself. (Hence to this day, the genetic code consists of three-letter words, each spelling out a particular one of twenty amino acids as part of a recipe for a protein.) And so was born a more sophisticated creature that stored its genetic recipe on DNA, made its working machines of protein and used RNA to bridge the gap between them.

Her name was Luca, the Last Universal Common Ancestor. What did she look like, and where did she live? The conventional answer is that she looked like a bacterium and she lived in a warm pond, possibly by a hot spring, or in a marine lagoon. In the last few years it has been fashionable to give her a more sinister address, since it became clear that the rocks beneath the land and sea are impregnated with billions of chemical-fuelled bacteria. Luca is now usually placed deep underground, in a fissure in hot igneous rocks, where she fed on sulphur, iron, hydrogen and carbon. To this day, the surface life on earth is but a veneer. Perhaps ten times as much organic carbon as exists in the whole biosphere is in thermophilic bacteria deep beneath the surface, where they are possibly responsible for generating what we call natural gas.⁹

There is, however, a conceptual difficulty about trying to identify the earliest forms of life. These days it is impossible for most creatures to acquire genes except from their parents, but that may not always have been so. Even today, bacteria can acquire genes from other bacteria merely by ingesting them. There might once have been widespread trade, even burglary, of genes. In the deep past chromosomes were probably numerous and short, containing just one gene each, which could be lost or gained quite easily. If this was so, Carl Woese points out, the organism was not yet an enduring entity. It was a temporary team of genes. The genes that ended up in all of us may therefore have come from lots of different ‘species’ of creature and it is futile to try to sort them into different lineages. We are descended not from one ancestral Luca, but from the whole community of genetic organisms. Life, says Woese, has a physical history, but not a genealogical one.¹⁰

You can look on such a conclusion as a fuzzy piece of comforting, holistic, communitarian philosophy - we are all descended from society, not from an individual species - or you can see it as the ultimate proof of the theory of the selfish gene: in those days, even more than today, the war was carried on between genes, using organisms as temporary chariots and forming only transient alliances; today it is more of a team game. Take your pick.

Even if there were lots of Lucas, we can still speculate about where they lived and what they did for a living. This is where the second problem with the thermophilic bacteria arises. Thanks to some brilliant detective work by three New Zealanders published in 1998, we can suddenly glimpse the possibility that the tree of life, as it appears in virtually every textbook, may be upside down. Those books assert that the first creatures were like bacteria, simple cells with single copies of circular chromosomes, and that all other living things came about when teams of bacteria ganged together to make complex cells. It may much more plausibly be the exact reverse. The very first modern organisms were not like bacteria; they did not live in hot springs or deep-sea volcanic vents. They were much more like protozoa: with genomes fragmented into several linear chromosomes rather than one circular one, and ‘polyploid’ - that is, with several spare copies of every gene to help with the correction of spelling errors. Moreover, they would have liked cool climates. As Patrick Forterre has long argued, it now looks as if bacteria came later, highly specialised and simplified descendants of the Lucas, long after the invention of the DNA-protein world. Their trick was to drop much of the equipment of the RNA world specifically to enable them to live in hot places. It is we that have retained the primitive molecular features of the Lucas in our cells; bacteria are much more ‘highly evolved’ than we are.

This strange tale is supported by the existence of molecular ‘fossils’ — little bits of RN A that hang about inside the nucleus of your ceils doing unnecessary things such as splicing themselves out of genes: guide RNA, vault RN A, small nuclear RNA, small nucleolar RNA, self-splicing introns. Bacteria have none of these, and it is more parsimonious to believe that they dropped them rather than we invented them. (Science, perhaps surprisingly, is supposed to treat simple explanations as more probable than complex ones unless given reason to think otherwise; the principle is known in logic as Occam’s razor.) Bacteria dropped the old RNAs when they invaded hot places like hot springs or subterranean rocks where temperatures can reach 170 °C - to minimise mistakes caused by heat, it paid to simplify the machinery. Having dropped the RNAs, bacteria found their new streamlined cellular machinery made them good at competing in niches where speed of reproduction was an advantage — such as parasitic and scavenging niches. We retained those old RNAs, relics of machines long superseded, but never entirely thrown away. Unlike the massively competitive world of bacteria, we — that is all animals, plants and fungi — never came under such fierce competition to be quick and simple. We put a premium instead on being complicated, in having as many genes as possible, rather than a streamlined machine for using them.¹¹

The three-letter words of the genetic code are the same in every creature. CGA means arginine and GCG means alanine — in bats, in beetles, in beech trees, in bacteria. They even mean the same in the misleadingly named archaebacteria living at boiling temperatures in sulphurous springs thousands of feet beneath the surface of the Atlantic ocean or in those microscopic capsules of deviousness called viruses. Wherever you go in the world, whatever animal, plant, bug or blob you look at, if it is alive, it will use the same dictionary and know the same code. All life is one. The genetic code, bar a few tiny local aberrations, mostly for unexplained reasons in the ciliate protozoa, is the same in every creature. We all use exactly the same language.

This means — and religious people might find this a useful argument — that there was only one creation, one single event when life was born. Of course, that life might have been born on a different planet and seeded here by spacecraft, or there might even have been thousands of kinds of life at first, but only Luca survived in the ruthless free-for-all of the primeval soup. But until the genetic code was cracked in the 1960s, we did not know what we now know: that all life is one; seaweed is your distant cousin and anthrax one of your advanced relatives. The unity of life is an empirical fact. Erasmus Darwin was outrageously close to the mark: ‘One and the same kind of living filaments has been the cause of all organic life.’

In this way simple truths can be read from the book that is the genome: the unity of all life, the primacy of RNA, the chemistry of the very earliest life on the planet, the fact that large, single-celled creatures were probably the ancestors of bacteria, not vice versa. We have no fossil record of the way life was four billion years ago. We have only this great book of life, the genome. The genes in the cells of your little finger are the direct descendants of the first replicator molecules; through an unbroken chain of tens of billions of copyings, they come to us today still bearing a digital message that has traces of those earliest struggles of life. If the human genome can tell us things about what happened in the primeval soup, how much more can it tell us about what else happened during the succeeding four million millennia. It is a record of our history written in the code for a working machine.