The most famous breakthrough in modern science was the discovery of the structure of DNA – the genetic material of all organisms in nature – by Francis Crick and James Watson. The term ‘gene’ was coined in 1909 by Wilhelm Johannsen, and by the mid-1930s it had been established that genes were physical entities. By the early 1950s it was known that the chemical material of the gene was DNA (deoxyribonucleic acid). Watson and Crick, in 1953, proposed a model for the DNA molecule in the form of a double helix, with two distinct chains wound round one another about a common axis. They also suggested that to replicate DNA the cell unwinds the two chains and uses each as a template to guide the formation of a new companion chain – thus producing two double helices, each with one new and one old chain. In this extract from his autobiography, What Mad Pursuit (1989), Crick explains the process of earlier discovery that made possible his and Watson’s advance. RNA, mentioned in the extract, is ribonucleic acid.
At the time I started in biology – the late 1940s – there was already some rather indirect evidence suggesting that a single gene was perhaps no bigger than a very large molecule – that is, a macromolecule. Curiously enough, a simple, suggestive argument based on common knowledge also points in this direction.
Genetics tells us that, roughly speaking, we get half of all our genes from our mother, in the egg, and the other half from our father, in the sperm. Now, the head of a human sperm, which contains these genes, is quite small. A single sperm is far too tiny to be seen clearly by the naked eye, though it can be observed fairly easily using a high-powered microscope. Yet in this small space must be housed an almost complete set of instructions for building an entire human being (the egg providing a duplicate set). Working through the figures, the conclusion is inescapable that a gene must be, by everyday standards, very, very small, about the size of a very large chemical molecule. This alone does not tell us what a gene does, but it does hint that it might be sensible to look first at the chemistry of macromolecules.
It was also known at that time that each chemical reaction in the cell was catalysed by a special type of large molecule. Such molecules were called enzymes. Enzymes are the machine tools of the living cell. They were first discovered in 1897 by Edouard Buchner, who received a Nobel Prize ten years later for his discovery. In the course of his experiments, he crushed yeast cells in a hydraulic press and obtained a rich mixture of yeast juices. He wondered whether such fragments of a living cell could carry out any of its chemical reactions, since at that time most people thought that the cell must be intact for such reactions to occur. Because he wanted to preserve the juice, he adopted a stratagem used in the kitchen: he added a lot of sugar. To his astonishment, the juice fermented the sugar solution! Thus were enzymes discovered. (The word enzyme means ‘in yeast’.) It was soon found that enzymes could be obtained from many other types of cell, including our own, and that each cell contained very many distinct kinds of enzymes. Even a simple bacterial cell may contain more than a thousand different types of enzymes. There may be hundreds or thousands of molecules of any one type.
In favourable circumstances an enzyme could be purified away from all the others and its action studied by itself in solution. Such studies showed that each enzyme was very specific, and catalysed only one particular chemical reaction or, at most, a few related ones. Without that particular enzyme the chemical reaction, under the mild conditions of temperature and acidity usually found in living cells, would proceed only very, very slowly. Add the enzyme and the reaction goes at a good pace. If you make a well-dispersed solution of starch in water, very little will happen. Spit into it and the enzyme amylase in your saliva will start to digest the starch and release sugars.
The next major discovery was that each of the enzymes studied was a macromolecule and that they all belonged to the same family of macromolecules called proteins. The key discovery was made in 1926 by a one-armed American chemist called James Sumner. It is not all that easy to do chemistry when you have only one arm (he had lost the other in a shooting accident when he was a boy) but Sumner, who was a very determined man, decided he would nevertheless demonstrate that enzymes were proteins. Though he showed that one particular enzyme, urease, was a protein and obtained crystals of it, his results were not immediately accepted. In fact, a group of German workers hotly contested the idea, which somewhat embittered Sumner, but it turned out that he was correct. In 1946 he was awarded part of the Nobel Prize in Chemistry for his discovery. Though very recently a few significant exceptions to this rule have turned up, it is still true that almost all enzymes are proteins.
Proteins are thus a family of subtle and versatile molecules. As soon as I learned about them I realized that one of the key problems was to explain how they were synthesized.
There was a third important generalization, though in the 1940s this was sufficiently new that not everybody was inclined to accept it. This idea was due to George Beadle and Ed Tatum. (They too were to receive a Nobel Prize, in 1958, for their discovery.) Working with the little bread-mould Neurospora, they had found that each mutant of it they studied appeared to lack just a single enzyme. They coined the famous slogan ‘One gene – one enzyme’.
Thus the general plan of living things seemed almost obvious. Each gene determines a particular protein. Some of these proteins are used to form structures or to carry signals, while many of them are the catalysts that decide what chemical reactions should and should not take place in each cell. Almost every cell in our bodies has a complete set of genes within it, and this chemical programme directs how each cell metabolizes, grows, and interacts with its neighbours. Armed with all this (to me) new knowledge, it did not take much to recognize the key questions. What are genes made of? How are they copied exactly? And how do they control, or at least influence, the synthesis of proteins?
It had been known for some time that most of a cell’s genes are located on its chromosomes and that chromosomes were probably made of nucleoprotein – that is, of protein and DNA, with perhaps some RNA as well. In the early 1940s it was thought, quite erroneously, that DNA molecules were small and, even more erroneously, simple. Phoebus Levene, the leading expert on nucleic acid in the 1930s, had proposed that they had a regular repeating structure [the so-called tetranucleotide hypothesis]. This hardly suggested that they could easily carry genetic information. Surely, it was thought, if genes had to have such remarkable properties, they must be made of proteins, since proteins as a class were known to be capable of such remarkable functions. Perhaps the DNA there had some associated function, such as acting as a scaffold for the more sophisticated proteins.
It was also known that each protein was a polymer. That is, it consisted of a long chain, known as a polypeptide chain, constructed by stringing together, end to end, small organic molecules, called monomers since they are the elements of a polymer. In a homopolymer, such as nylon, the small monomers are usually all the same. Proteins are not as simple as that. Each protein is a heteropolymer, its chains being strung together from a selection of somewhat different small molecules, in this instance amino acids. The net result is that, chemically speaking, each polypeptide chain has a completely regular backbone, with little side-chains attached at regular intervals. It was believed that there were about twenty different possible side-chains (the exact number was not known at that time). The amino acids (the monomers) are just like the letters in a font of type. The base of each kind of letter from the font is always the same, so that it can fit into the grooves that hold the assembled type, but the top of each letter is different, so that a particular letter will be printed from it. Each protein has a characteristic number of amino acids, usually several hundred of them, so any particular protein could be thought of crudely as a paragraph written in a special language having about twenty (chemical) letters. It was not then known for certain, as it is now, that for each protein the letters have to be in a particular order (as indeed they have to be in a particular paragraph). This was first shown a little later by the biochemist Fred Sanger, but it was easy enough to guess that this was likely to be true.
Of course each paragraph in our language is really one long line of letters. For convenience this is split up into a series of lines, written one under the other, but this is only a secondary matter, since the meaning is exactly the same whether the lines are long or short, few or many, provided we take care about splitting the words at the end of each line. Proteins were known to be very different. Although the polypeptide backbone is chemically regular, it contains flexible links, so that in principle many different three-dimensional shapes are possible. Nevertheless, each protein appeared to have its own shape, and in many cases this shape was known to be fairly compact (the word used was ‘globular’) rather than very extended (or ‘fibrous’). A number of proteins had been crystallized, and these crystals gave detailed X-ray diffraction patterns, suggesting that the three-dimensional structure of each molecule of a particular kind of protein was exactly (or almost exactly) the same. Moroever many proteins, if heated briefly to the boiling point of water, or even to some temperature below this, became denatured, as if they had unfolded so that their three-dimensional structure had been partly destroyed. When this happened the denatured protein usually lost its catalytic or other function, strongly suggesting that the function of such a protein depended on its exact three-dimensional structure.
And now we can approach the baffling problem that appeared to face us. If genes are made of protein, it seemed likely that each gene had to have a special three-dimensional, somewhat compact structure. Now, a vital property of a gene was that it could be copied exactly for generation after generation, with only occasional mistakes. What we were trying to guess was the general nature of this copying mechanism. Surely the way to copy something was to make a complementary structure – a mould – and then to make a further complementary structure of the mould, to produce in this way an exact copy of the original. This, after all, is how, broadly speaking, sculpture is copied. But then the dilemma arose: it is easy to copy the outside of a three-dimensional structure in this way, but how on earth could one copy the inside? The whole process seemed so utterly mysterious that one hardly knew how to begin thinking about it.
Of course, now that we know the answer, it all seems so completely obvious that no one nowadays remembers just how puzzling the problem seemed then. If by chance you do not know the answer, I ask you to pause a moment and reflect on what the answer might be. There is no need, at this stage, to bother about the details of the chemistry. It is the principle of the idea that matters. The problem was not made easier by the fact that many of the properties of proteins and genes just outlined were not known for certain. All of them were plausible and most of them seemed very probable but, as in most problems near the frontiers of research, there were always nagging doubts that one or more of these assumptions might be dangerously misleading. In research the front line is almost always in a fog.
So what was the answer? Curiously enough, I had arrived at the correct solution before Jim Watson and I discovered the double-helical structure of DNA. The basic idea (which was not entirely new) was this: all a gene had to do was to get the sequence of the amino acids correct in that protein. Once the correct polypeptide chain had been synthesized, with all its side-chains in the right order, then, following the laws of chemistry, the protein would fold itself up correctly into a unique three-dimensional structure. (What the exact three-dimensional structure of each protein was remained to be determined.) By this bold assumption the problem was changed from a three-dimensional one to a one-dimensional one, and the original dilemma largely disappeared.
Of course, this had not solved the problem. It had merely transformed it from an intractable one to a manageable one. For the problem still remained: how to make an exact copy of a one-dimensional sequence. To approach that we must return to what was known about DNA.
By the late 1940s our knowledge of DNA had improved in several important respects. It had been discovered that DNA molecules were not, after all, very short. Exactly how long they were was not clear. We know now that they appeared to be short because, being long molecules (in the sense that a piece of string is long), they could easily be broken in the process of getting them out of the cell and manipulating them in the test tube. Just stirring a DNA solution is enough to break the longer molecules. Their chemistry was now known more correctly, and moreover the tetranucleotide hypothesis was dead, killed by some very beautiful work by a chemist at Columbia, the Austrian refugee Erwin Chargaff. DNA was known to be a polymer, but with a very different backbone and with only four letters in its alphabet, rather than twenty. Chargaff showed that DNA from different sources had rather different amounts of those four bases (as they were called). Perhaps DNA was not such a dumb molecule after all. It might conceivably be long enough and varied enough to carry some genetic information.
Even before I left the Admiralty there had been some quite unexpected evidence pointing to DNA as near the center of the mystery. In 1944 Avery, MacLeod, and McCarty, who worked at the Rockefeller Institute in New York, had published a paper claiming that the ‘transforming factor’ of pneumococcus consisted of pure DNA. The transforming factor was a chemical extracted from a strain of bacteria having a smooth coat. When added to a related strain lacking such a coat it ‘transformed’ it, so that some of the recipient bacteria acquired the smooth coat. More important, all the descendants of such cells had the same smooth coat. In the paper the authors were rather cautious in interpreting their results, but in the now-famous letter to his brother Avery expressed himself more freely. ‘Sounds like a virus – may be a gene,’ he wrote.
This conclusion was not immediately accepted. An influential biochemist, Alfred Mirsky, also at the Rockefeller, was convinced that it was an impurity of the DNA that was causing the transformation. Subsequently more careful work by Rollin Hotchkiss at the Rockefeller showed that this was highly unlikely. It was argued that Avery, MacLeod, and McCarty’s evidence was flimsy, in that only one character had been transformed. Hotchkiss showed that another character could also be transformed. The fact that these transformations were often unreliable, tricky to perform, and only altered a minority of cells did not help matters. Another objection was that the process had been shown to occur just in these particular bacteria. Moreover, at that time no bacterium of any sort had been shown to have genes, though this was discovered not long afterward by Joshua Lederberg and Ed Tatum. In short, it was feared that transformations might be a freak case and misleading as far as higher organisms were concerned. This was not a wholly unreasonable point of view. A single isolated bit of evidence, however striking, is always open to doubt. It is the accumulation of several different lines of evidence that is compelling.
It is sometimes claimed that the work of Avery and his colleagues was ignored and neglected. Naturally there was a mixed spectrum of reactions to their results, but one can hardly say no one knew about it. For example, that august and somewhat conservative body, the Royal Society of London, awarded the Copley Medal to Avery in 1945, specifically citing his work on the transforming factor. I would dearly love to know who wrote the citation for them.
Nevertheless, even if all the objections and reservations are brushed aside, the fact that the transforming factor was pure DNA does not in itself prove that DNA alone is the genetic material in pneumococcus. One could quite logically claim that a gene there was made of DNA and protein, each carrying part of the genetic information, and it was just an accident of the system that in transformation the altered DNA part was carrying the information to change the polysaccharide coat. Perhaps in another experiment a protein component might be found that would also produce a heritable change in the coat or in other cell properties.
Whatever the interpretation, because of this experiment and because of the increased knowledge of the chemistry of DNA, it was now possible that genes might be made of DNA alone.
Source: Francis Crick, What Mad Pursuit: A Personal View of Scientific Discovery, London, Weidenfeld & Nicolson, 1989.