The first inklings that the interactions between genes were at least as important as genes themselves in framing the form and disposition of living things came as long ago as 1908, when an English physician named Archibald Garrod (1857-1936) noticed something curious about a collection of rare diseases in humans. The diseases were curious because they did not seem to be the result of infection or contagion. They did seem to occur more often in some families than in others, and this tendency was most marked when families were inbred. At a time when Mendelian genetics was still quite new, Garrod applied the lessons of Mendel’s peas to humans, describing the inheritance of recessive traits – in this case, a tendency to contract certain rare diseases, which Garrod called ‘inborn errors of metabolism’.1
One of these diseases is now known as alkaptonuria. Its symptoms include mottling of the skin and a progressive arthritis, and it afflicts approximately one person in every 200,000. Another disease in the same ‘family’ is phenylketonuria. This disease, which afflicts infants and young children, has a range of symptoms ranging from eczema to tremors and mental retardation. Fortunately it is easily treatable, by strict adherence in childhood to a diet that does not contain the amino acid phenylalanine. One in 10,000 babies is born with phenylketonuria, making it one of the commoner inherited diseases. This frequency – and the need for sufferers to keep to a strict diet – explains otherwise mysterious slogans on jars and packets in supermarkets proclaiming that the contents ‘contain a source of phenylalanine’. Garrod’s insight was that these seemingly disparate diseases were the result of defects in inheritance, and that they were somehow connected. We now know that they are caused by inherited deficiencies in certain substances called enzymes.
Enzymes are examples of catalysts – substances that facilitate chemical reactions which might otherwise not occur at all. This constraint applies to virtually every active change within the body, from the growth of your hair and nails to the very first cell division of the embryo and all subsequent cell divisions; from the breakdown of food in your intestines to the exhalation of carbon dioxide in your breath. Enzymes are what builds your body and what breaks it down again. Without enzymes, the strands of DNA cannot be copied, nor can their cargo of genetic information be read or acted upon. But of what manner of substance are these everyday miracles that make life possible, these commonplace fixers of the marvellous? Enzymes – like the cells and bodies whose lives they control – are proteins, made of chains of amino acids. And herein lies a conundrum. If enzymes are part of the body and are responsible for its creation and function, then what makes the enzymes? This is where genetic information enters the story.
We now know that alkaptonuria is the result of a single genetic mutation, a misprint in the DNA. Normally, this piece of DNA – this gene – contains the instructions needed to make an enzyme called homogentisate 1,2-dioxygenase. The single task of this enzyme is to oversee the conversion of a substance called homogentisic acid into another substance, maleyl acetoacetate. Genetic mutation leads to an enzyme that doesn’t work. Without it, homogentisic acid builds up in the tissues in the same way that rubbish piles up in the streets when the dustmen go on strike. Too much homogentisic acid, like too much rubbish, is a nuisance and can even be a hazard to health, and it is its accumulation that causes the clinical effects of alkaptonuria.
Phenylketonuria results from a different mutation, this time in a gene that contains the instruction for an enzyme called phenylalanine hydroxylase. This enzyme oversees the conversion of phenylalanine into another amino acid, tyrosine. A mutation in the gene for phenylalanine hydroxylase leads to a defective enzyme, unable to carry out this process. Phenylalanine builds up in the body until it becomes poisonous, causing the symptoms of phenylketonuria. This is why people with phenylketonuria should avoid eating foods that contain phenylalanine, or any other substance which the body might digest to produce phenylalanine.
Enzymes tend to have very specific functions. That is, each enzyme can catalyse only one very particular chemical reaction – there is no single enzyme that can break a molecule of phenylalanine into atoms all at once. The breakdown process is a cooperative venture, a kind of production line, in which each one of many small steps is governed by just one enzyme, in the way that each stage in a production line is overseen by a specialist worker or custom-built machine that can do that job, and no other – and yet all such workers or machines must be present for the production line to function. The instructions to make each enzyme are found in just one, unique gene – one gene for each enzyme.
Phenylalanine hydroxylase and homogentisate 1,2-dioxygenase are separate enzymes, but they both play their part in the process of breaking down the amino acid phenylalanine – it was this connection which, in retrospect, led Garrod to group alkaptonuria and phenylketonuria together in the same family of diseases. Phenylalanine hydroxylase, which catalyses the conversion of phenylalanine into tyrosine, is the first enzyme in the line. Homogentisate 1,2-dioxygenase – the deficiency of which leads to alkaptonuria – governs a step in the pathway much further down the chain, many steps removed from phenylalanine. In fact, this pathway is not a straight line but contains many branching points, and products from other pathways may join it. The phenylalanine breakdown process represents a network of cooperative genes, working in the context of the genome as a whole. And yet, at any point in the network a genetic mutation can disable one of these enzymes, causing disease. Because mutations are carried in the genes, these diseases are inherited.
A failure of any one of these enzymes, at any point in this process, can result in the pathological accumulation of one by-product or another, and it is this that causes disease. Each enzyme is therefore associated with its own inherited disease – but because all the enzymes are concerned with the same pathway, these diseases may share certain features. For example, people suffering from one or other of this family of related diseases may have fairer complexions than might be expected from their family histories, a result of a deficiency of skin pigmentation. Conditions such as albinism – in which the skin contains no pigment at all – are caused by defects in the metabolism of the amino acid tyrosine. Because tyrosine is an important by-product of the digestion of phenylalanine, people with phenylketonuria or other closely connected diseases may also seem unusually blond.
Garrod lived in a time before genes were understood in terms of chromosomes and DNA, and so would not have understood the problem in terms of genes and enzymes. The modern concept of the gene as a unit of inheritance came with experiments by two American scientists, George W. Beadle (1903-89) and Edward L. Tatum (1909-75). After gaining his doctorate in 1931, Beadle went to do research on fruit flies with Morgan. His work in Morgan’s lab and afterwards led him to suspect that the variation of traits in fruit flies might not be the result of simple mutations in genetic material, but the summation of entire chains of events at the biochemical level which were controlled by networks of genes acting together. The link between gene and trait was not a simple one-to-one correspondence, but the result of a long and sometimes circuitous path of biochemical events, rather like the events which we now know result in Garrod’s family of diseases.
Beadle’s career took him to Stanford in California, where he started to work with Tatum on an entirely different organism: a species of fungus, the bread mould Neurospora crassa. From breeding experiments involving a range of mutant mould strains unable to digest one compound or another, the scientists were able to map metabolic pathways – similar to the one which, in humans, leads by incremental steps from phenylalanine to homogentisic acid and beyond, with each step controlled by a single enzyme. It was Beadle and Tatum who came up with the theory that might be written on a T-shirt: ‘One Gene, One Enzyme’. This model means exactly what it says – that for each enzyme in a metabolic pathway, there had to be just one gene. The genome is a list of instructions to make a pharmacopoeia of enzymes, and from these enzymes bodies can be built.2 Importantly, Beadle and Tatum showed that traits were not, as a rule, switched on and off by single enzymes, but were – as Beadle had begun to suspect in his days with Morgan – the end states of a whole pathway of enzymes, the structure of each one of which was determined by a gene. What we see in organisms is the result of networks of genes working together. This message has, unfortunately, become overshadowed and obscured by the catchy ‘One Gene, One Enzyme’ slogan, which is all too easily elided into ‘One Gene, One Trait’.
The birth of the ‘network’ view of genetic activity as we now understand it came in 1961, with the publication of an article by two French researchers, François Jacob (b. 1920) and Jacques Monod (1910–76). The article was entitled ‘Genetic regulatory mechanisms in the synthesis of proteins’ and contained a new and significant word – ‘operon’.3 Jacob and Monod worked mostly at the Pasteur Institute in Paris, and began their collaboration in 1958. Among their many accomplishments was the proposal for the existence of messenger RNA, and also the discovery of so-called regulatory genes, whose function was not to produce an enzyme but to control the activities of those that did. Such entities are essential elements of control in a network of interacting genes.
Where Crick and his colleagues worked out the nature of the genetic code by studying the viruses that infested the gut bacteria E. coli, Jacob and Monod were interested in the bacteria themselves. Their particular concern lay in understanding how the bacteria digested the sugars on which they fed. One of these sugars was lactose, the sugar commonly found in milk. In chemical terms, a molecule of lactose is two molecules of a simpler sugar, glucose, stuck together. Lactose, on its own, is of no use unless it can be broken down into glucose – a common energy currency for all forms of metabolism, whether bacteria or people. E. coli bacteria split lactose with the help of an enzyme called beta-galactosidase.
Jacob and Monod noted that bacteria did not produce this enzyme all the time, but only when necessary – that is, if there were any lactose around to digest. This observation, in itself, was not news. Scientists had known for some years of enzymes which appeared only when they had a job to do. Where Jacob and Monod broke new ground was in how they approached the phenomenon, in terms of the regulation of genes. Perhaps, they thought, there were two kinds of gene. The first would be the ‘structural’ genes, coding for enzymes and other proteins we can readily see and measure. Then there would be a more shadowy class of genes – the regulatory genes, which would oversee the activities of other genes, ensuring that they were ‘switched on’ only when needed. The proteins produced by regulatory genes would be far less abundant than those made by structural genes, and consequently much more difficult to isolate.
Beta-galactosidase, like all enzymes, is created from information held in a gene. This gene is transcribed into messenger RNA and translated into the string of amino acids that make up the finished protein enzyme. But if the enzyme is made only when there is lactose to digest, Jacob and Monod speculated, there must be a mechanism for detecting the presence or absence of lactose and switching the beta-galactosidase gene on or off, as necessary. In what is now seen as one of the classic experiments of modern genetics, Jacob and Monod showed that structural genes, such as the one containing the code for making beta-galactosidase, were ordinarily switched off, or ‘silent’, unless prompted into action by the appropriate environmental stimulus, in this case the presence of lactose. ‘Silence’ in this context means that the beta-galactosidase gene was not being transcribed into messenger RNA, so no enzyme would be made. Jacob and Monod reasoned that genes would not be silent of their own accord, but would have to be silenced by some external agency. However, this selfsame agency would also loosen its hold in the presence of the appropriate stimulus, such as a lactose molecule. There had to be something that acted as both sensor and censor.
It quickly became apparent that this molecular censor had to be quite distinct from beta-galactosidase. Jacob and Monod showed that strains of E. coli existed which, by virtue of mutations in the beta-galactosidase gene, were unable to digest lactose by themselves, and had to be provided with glucose by the researchers, else the unfortunate microbes starved to death. But Jacob and Monod found another, more interesting kind of mutant strain in which the bacteria produced perfectly normal beta-galactosidase – but did this whether lactose was present or not. This indiscriminate synthesis of the enzyme could not have been connected with any damage to the beta-galactosidase gene, because the enzyme itself was quite normal, so something else must have been happening. Jacob and Monod supposed that some kind of mutation was disabling a gene at one remove – a gene that would normally produce a substance whose sole job it was to prevent the transcription of the beta-galactosidase gene. Inactivation of this blocking gene would lead to the constant transcription of the beta-galactosidase gene, however much lactose was present in the environment. The product of this blocking gene (which was later identified as a protein) was called a ‘repressor’. The gene encoding the repressor was not a structural gene, but a regulatory one: a gene whose function was to control the activities of other genes.
Jacob and Monod identified yet a third kind of mutant connected with the digestion of lactose. There exist mutant bacteria that have normal genes for both the repressor and for beta-galactosidase, but still synthesize the enzyme irrespective of the presence of lactose. It turns out that these bacteria carry mutations in an untranscribed section of DNA next to the beta-galactosidase gene. This section of DNA became known as the operator. For the repressor protein to silence the beta-galactosidase gene, it first has to attach itself to the operator. Mutating the operator is a bit like bombing a runway so that planes can no longer land there. The repressor protein might be normal, but if the operator sequence is so disfigured by mutation that the repressor cannot bind to it, then the production of beta-galactosidase will carry on as if the repressor weren’t there at all.
By combining all these details, Jacob and Monod painted the first picture of what we now see as the network view of genetics, a simple case of genetic regulation in which bacteria would be able to produce the enzymes they needed in order to digest a simple sugar, and to do so only when they needed it. When there was no detectable lactose in the environment (which is the case for most of the time) the repressor gene ensured that there was always enough repressor protein around to sit on the operator, so that the beta-galactosidase gene was switched off and no beta-galactosidase made. But when lactose was present, molecules of lactose mobbed the repressor, preventing it from attaching itself efficiently to the operator. The beta-galactosidase gene was then free to produce enough enzyme to digest the lactose. When the job was done, and lactose fell below the concentration required to interfere with the repressor, the repressor would take up residence once more on the operator, and the beta-galactosidase gene would be switched off again.
The genes for the repressor, for the operator and for beta-galactosidase were thus united by their function: to digest lactose, when the opportunity arose. But Jacob and Monod found something else – that all three entities sat close together on the same strand of DNA. They saw the significance of a cluster of genes associated by location as well as function, and called such a cluster an operon. The lactose-digesting operon became known as the lac operon.
Many other operons have since been found in bacteria of all kinds, each one containing a set of structural genes necessary to perform a certain function, and regulatory genes to ensure that the structural genes are switched on only when necessary. The lac operon is very simple – just one structural gene, one regulatory gene and an operator – but the significance of the concept of the operon goes well beyond the realm of bacteria. What is a genome if not a kind of operon, a physically associated cluster of genes, all of which share a single task, the creation and maintenance of an organism? If the genome is an operon whose function is to create and maintain an organism, it must contain genes directly necessary for that function: structural genes to produce enzymes as diverse as, say, phenylalanine hydroxylase and beta-galactosidase, as well as the proteins such as collagen and keratin from which bodies are made. But the genome must also contain regulatory genes that control the activities of these structural genes, to ensure that they are switched on at the appropriate times and in the correct sequence, so that the result of the sequence of cell divisions at the start of the development of an individual gives rise to an embryo, functioning and complete in all its parts.
The genome of any organism is conceptually equivalent to the lac operon of E. coli, if far more elaborate. Jacob and Monod’s epochal work made the point that the path between DNA and the organism lay not in the information that DNA contained, but in how that information was controlled. The significance of the work was not lost on the scientific community, and Jacob, Monod and their colleague André Lwoff (1902-94) were awarded a Nobel prize in 1965.
Jacob and Monod did not actually succeed in isolating the lac repressor. It is in the nature of regulators that only a few molecules are required at any one time for them to be effective, so isolating a regulatory protein is rather like searching for a particular strand of microscopic hay in a haystack itself far too small to see with the naked eye. The task, however, was accomplished in 1967, by a scientist named Walter Gilbert, who went on to pioneer the DNA sequencing technology eventually used to sequence the human genome. Gilbert won a Nobel prize in 1980 for his work on genetic sequencing. Meanwhile, also in 1967, a young academic called Mark Ptashne isolated another repressor, an achievement which earned him a full professorship at Harvard at the relatively tender age of thirty-one. Ptashne’s work underscores the point that a genome is really an operon writ large, for the repressor he isolated was responsible for the control of an entire genome – that of a virus named bacterio-phage lambda.4
Like all viruses, lambda consists of an inert string of genetic code, packaged for safety in a protein coat. When a virus meets a bacterium, the protein coat sticks to the bacterial cell wall, but the viral genome makes its way inside the cell itself. There, the genes are read by the bacterium’s own transcription and translation machinery, churning out more copies of the viral genome and the proteins that comprise the protein coat. Soon the entire effort of the bacterium is converted to the manufacture of viruses. Eventually the bacterium, bloated with viruses, explodes – scattering thousands of new viruses throughout the surrounding medium. Some of the viruses will meet other bacteria, promoting the infection.
And so it happens, but only some of the time. For lambda has a secret life. Sometimes a lone virus will infect a cell and, rather than furthering the spread of infection, will splice its genome into the much larger genome of the bacterium. Once inserted, the viral genome will behave like any other piece of bacterial DNA. When the DNA is copied before cell division, the viral DNA will be copied too. The genome of the virus can remain, silent within the genome of its host, for many generations. But, just now and then, a sleeping virus – a distant descendant of the original infectious particle, in a host equally remote from the first victim – will cut loose. Lambda is not alone in this habit. Some viruses that cause disease in humans, such as the herpes simplex virus, can hide out for decades in exactly this way, silent within the genome of its host.
But how can a virus stay silent at all, given the natural tendency of viruses to hijack cells immediately on infection? Some viruses, such as lambda, clearly have a choice – destroy a host immediately, or hide out and defer destruction to another day. And as in the case of the ‘decision’ of the lac operon to synthesize an enzyme or remain quiet, the mechanism of choice is governed by a repressor.
In the 1950s it was found that some strains of lambda were always destructive, and never established a dormant state. The cause was reasoned to be a mutation in a regulatory gene, the so-called lambda repressor gene. This gene would contain the code for a protein that sits on defined parts of DNA – operators -so that the transcription enzymes are denied access to the genes. The lambda repressor would function in a very similar way to the lac repressor. However, rather than working within just one operon in a genome, the lambda repressor would block transcription in the entire lambda genome, a collection of some fifty genes. In this sense, the whole lambda genome could be thought of as a single operon. Mutant viruses, unable to produce their own repressor, would always be switched on, rather like a lac operon unable to make a lac repressor. But when viruses hide in a genome, all their genes would be switched off except one – the one that produces the repressor itself. This strategy would have an additional advantage for the virus that gets in first. Because there is a low level of repressor protein in an infected cell, the cell is protected from further infection by viruses.
All this begs the question of how a dormant virus might pick its moment to wake. The answer lies in the mechanism of mutation itself. Ultraviolet light is a potent mutagen, in that DNA is especially sensitive to it. This explains the link between sunbathing and skin cancer: tanning is the body’s natural response to UV as it tries to shield itself from the Sun’s rays. Our skin cells have an array of enzymes that repair DNA damaged by UV light. Mutations in one or other of these enzymes leads to a variety of syndromes in which patients are more than usually susceptible to skin cancer. Many bacteria are rather poor at repairing DNA damaged by UV light, and E. coli – which naturally spends its life in the human intestine, a place where, proverbially, the Sun never shines – is one of them.5 UV light is as lethal to a virus lying dormant in a bacterial genome as it is to its host, unless the virus can escape before the host dies. To do so, it must find a way to activate its genome and make more virus particles, but this cannot happen unless it can find a way to inactivate its own repressor, liberating its own genes from enforced silence. But what could repress the repressor? It turns out that the physical environment plays its part in the process: UV radiation is especially damaging to DNA, and the lambda repressor gene is especially sensitive to it. Degraded by UV, the repressor gene is no longer able to keep synthesizing repressor molecules. Released from bondage, the genes in the lambda genome awake from their long dormancy, the viral genes are transcribed and the long-delayed infection can continue.
All this remained rather speculative, in the absence of physical evidence for the repressor. It fell to Ptashne and his colleagues to hunt down this hitherto elusive creature, which they did. They crowned their remarkable achievement by catching the repressor in the course of binding to the operator, isolating it, crystallizing it and subjecting it to X-ray diffraction studies of the kind pioneered by the Braggs and Franklin. In this way, Ptashne’s team obtained a picture, in atom-by-atom detail, of how a repressor protein actually interacts with an operator – genetic regulation at work.
The lac operon and bacteriophage lambda are simple examples of genetic switches. With the lac operon, the repressor has the relatively mundane task of controlling a single gene – the gene for an enzyme, beta-galactosidase – ensuring that it is only produced when it is required. The scope of the lambda repressor is much greater, for it controls all the genes in a genome. The genome is very simple, to be sure, but the point is made: the control exercised by the shepherd-like lambda repressor on the flock of its genome shapes the destiny of the entire organism, and that is surely what a genome is all about.
The suspicions of Garrod, reinforced by the careful work of Beadle and Tatum on the mould Neurospora, showed that there was more to genetics than birds on a wire. To posit a direct, one-to-one correspondence between genes and traits was a caricature of the real thing. What we see as disease states, or the range of normal development, reflects the activities of many genes – perhaps all the genes in a genome – working together in a harmonious way. In this holistic model, in which genes interact as a network, the precise role of any one gene can be hard to tell. As Jacob and Monod so elegantly showed, mutations in any one of the structural gene, the operator or the repressor gene could explain why E. coli bacteria produced too much beta-galactosidase, or none at all: but only when all three genes are considered together could their function be fully understood.
Jacob and Monod’s work on the lac operon, dramatically extended by Ptashne on the lambda repressor, was a harbinger of greater things to come, and a view – emerging only now, with the relatively easy, large-scale sequencing of genomes – that the function of individual genes is less important than how the various functions of genes interact in a network in which, behind the visible storefronts of structural genes, regulatory genes are hard at work. Even though regulatory genes are even harder to grasp, as physical realities, than are genes responsible for enzymes or proteins such as collagen, the facts of genetic regulation are all around us, waiting to be investigated. The world is shaped by genetic regulation. Because of such regulation, the aphids studied by Bonnet, in which generation succeeds wingless generation, can suddenly grow wings and fly away when their bodies sense the presence of ladybirds. Thanks to genetic regulation, a hydra when cut in two can regenerate the missing parts of its body, the phenomenon of regeneration that so captivated Trembley and Haller, and was the subject of the work of Morgan’s youth and old age. Thanks to gene regulation, plants can appear with leaves instead of petals, suggesting to Goethe that laws of form might exist beneath the variety of life.
Thanks to genetic regulation, a single, spherical cell can divide, divide again and – within 28 days – become a recognizable miniature of a human being. Creating a human embryo, however, implies a degree of regulation far more sophisticated than the once-only events that turn single genes on or off, as in the lac operon. But this is a relatively simple thing, the first case of the phenomenon of genetic regulation to be described – and, as such, an exemplar, a proof of principle and an icon. But there is no reason why regulation need be limited to a single event, or that a single regulator should act only on structural genes. Regulators can interact with other regulators, producing a cascade of regulation – offering untold opportunities for subtlety and flexibility.
Even then, the secret of genetic regulation is the ability to respond to changing conditions, be it the presence or absence of lactose, UV radiation or ladybirds. Arguably the most changeable environment in the whole of nature is the developing embryo. The regulatory genes that direct the division of a fertilized egg into two cells, within minutes of fertilization, and two into four within hours, find themselves in a completely different environment with each division. Groups of cells require the activation of whole batteries of structural genes which would be quite superfluous in a single cell: genes, for example, coding for proteins that keep two otherwise separate cells stuck together so that they can become a tissue; or proteins that make up cell-surface receptors, and actively controlled pores that allow the passage of some substances – but not others – through cell membranes, so that cells can communicate with one another.
It is thanks to regulatory genes that these developing tissues can interact to create yet further tissues and organs. In the very early embryo, interactions between the ectoderm and the endoderm create an entirely new layer of cells, the mesoderm. Further interactions shape the mesoderm into the notochord, the tissues that will become the somites, and the lateral plate mesoderm that will become the body wall. The notochord, once formed, coerces the neural tube into closure; interaction between the somitic and lateral plate mesoderm creates the kidneys. Meanwhile, regulatory substances secreted by the primordial germ cells, returning from their long journey to the yolk sac, sculpt parts of the body wall into the sex organs; regulatory substances found in males – but not females – ensure that these sex organs become testes, and the primary germ cells become sperm.
Couched in these terms, the development of an individual – the creation of form from a formless egg – is a dynamic trait, a consequence of a cascade of genetic, regulatory interactions, no different in principle from the sequences of enzyme-controlled metabolic events that Beadle suspected lay behind the visible expression of each trait in flies and, later, Neurospora. Searching for the regulatory network that creates the human embryo, or indeed the embryo of any complex organism, poses a greater challenge than the investigation of mould metabolism. For one thing, mutations in regulatory genes which are important in the network that creates an embryo are unlikely to produce creatures with small but picturesque mutations – such as the single white-eyed fly among reds, or a dwarf pea plant where all others are tall – which represent rather small quirks in the normal range of variation, and which can be used in breeding experiments. Such changes are small ripples on the surface of a deep and murky pool of regulation – minor alterations made once the structure of the organism is all but complete. On the contrary, mutations in regulatory genes will produce things altogether more puzzling – those same marvels that fascinated Paracelsus, Pare and Bacon, captivated Geoffroy, and drove Bateson to compose medieval bestiaries recording insects with legs growing out of their heads. In short, monsters.