“WE HAVE DISCOVERED the secret of life!” With that apocryphal boast, Francis Crick (1916–2004) ushered James Watson into the Eagle Pub in Cambridge and the rest of us into the age of DNA. One year later, in 1953, the scientific announcement of the discovery had a very different tone. In the pages of the august journal Nature, Watson and Crick open their article with a dry British understatement that others have emulated in the years since. Their discovery, they noted, “has novel features which are of considerable biological interest.”
Both announcements heralded something later generations have come to take for granted. The duo modeled the structure of DNA, showing that it exists as double strands that, when separated, can make proteins or copies of themselves. With this trick, the molecule can do two remarkable things—hold the information to make proteins that build bodies and pass that information along to the next generation.
Watson and Crick, following the work of Rosalind Franklin and Maurice Wilkins, found that individual DNA strands are composed of sequences of other molecules, set like beads on a string. Each of these molecules, known as bases, can be one of four types, typically designated A, T, G, and C. One DNA strand can have a series of billions of bases, forming strings like AATGCCCTC or any combination of the four letters.
It is a humbling thought: much of who we are resides in the order of molecules in a chemical strand. If you think of DNA as a molecule that contains information, it is as if we have millions of supercomputers in every cell. Human DNA is composed of a chain of roughly 32 billion bases. That strand is broken up into chromosomes, wrapped, and coiled to sit inside the nucleus of each cell. Our DNA is packed so tightly that if unwound, connected, and stretched out, each strand would be about six feet long. Each of our trillions of cells contains a tightly wound six-foot-long molecule coiled to one-tenth the size of the smallest grain of sand. If you uncoiled the DNA from each of the four trillion cells in your own body and put them end to end, your personal DNA strand would run almost to Pluto.
When sperm and egg unite during conception, the fertilized egg ends up with DNA from both parents. Hence genetic information flows from generation to generation. Our own DNA comprises contributions from our biological parents, our parents’ DNA from their biological parents, and so on, ever deeper into the past. DNA forms an unbroken connection among living things through time. One of Darwin’s great insights can be deployed to translate this simple notion of a family lineage to an even broader history. The molecular implication of his idea is that if we share a common ancestor with other species, then there should be a continuous flow of their DNA to our own. Just as our DNA passes from generation to generation, from parents to offspring, so, too, should it pass from ancestral species to descendant species over the four-billion-year history of life. If true, DNA is a library that resides within each cell of every creature on the planet. Locked in the order of those As, Ts, Gs, and Cs would be a record of billions of years of changes in the living world. The trick has been to learn how to read it.
With influential relatives that included famous anatomists, philosophers, artists, and a surgeon, Émile Zuckerkandl (1922–2013) was born in Vienna into a world of ideas, science, and art. As the Nazis came to power in Germany, his family sought refuge in Paris and Algiers. Family friends connected Zuckerkandl with Albert Einstein, who, using his influence, obtained an entrée for young Émile to study in the United States. The move took Zuckerkandl to the University of Illinois and laboratories there studying the biology of proteins. With an interest in oceans, he gravitated to marine stations in the United States and France during summers. There he became fascinated by crabs and the molecules at work when they grow and molt from tiny embryos to full-grown adults.
Zuckerkandl was entering biochemistry at a propitious time. In the late 1950s, scientists at the National Institutes of Health, as well as Francis Crick himself, were starting to decipher what the strings of As, Ts, Gs, and Cs meant. Each DNA sequence carries the instructions to make yet another sequence of molecules. Depending on the circumstances, a DNA sequence can be used as a template to make a protein or it can make copies of itself. To build a protein, the string of As, Ts, Gs, and Cs gets translated into a sequence of another type of molecule: amino acids. Different strings of amino acids, in turn, make different proteins. There are twenty different kinds of amino acids, and any one of them can reside at any point in the sequence. This code can produce an enormous number of different proteins. Some simple math: if there are twenty different amino acids that can assemble in any combination, and a protein chain is about one hundred amino acids long, the number of different proteins that can be made is a 1 with 130 zeros behind it. The real number is much higher because the length of the protein in our estimate, one hundred, is relatively small. The biggest protein in the human body, known as titin, consists of a string of 34,350 amino acids.
The mental trick is to remember that DNA is made up of a string of bases, symbolized as letters, that codes for strings of amino acids that in turn comprise proteins. Because different proteins are made up of different amino acid sequences, the DNA sequence encodes for the diverse proteins that help make life anew in each generation.
By the late 1950s, researchers were able to map the sequences of amino acids of different proteins to begin to understand how they work in the body. These discoveries heralded an age in which scientists could study protein structure to understand disease. For example, in sickle cell anemia, diseased red blood cells live for only ten to twenty days, whereas healthy ones can live for almost ten times that. Moreover, sickle cells, as the name implies, have a distinctive shape. This difference causes them to be destroyed in the spleen much more easily than normal red blood cells, which have a disk-like shape. As a consequence, sickle cell anemia, in its most extreme cases, can be fatal by the age of three in almost 70 percent of sufferers. And what is the difference between a healthy red blood protein and a sickle cell one? Only a single amino acid in the string: the amino acid glutamate is replaced by one called valine at the sixth position in the sequence. A tiny difference in the amino acid sequence can have massive ramifications on the protein, the cells in which the protein is found, and the lives of the individuals who have those cells.
Inspired by the power of this new biology, Zuckerkandl turned his attention to species in his marine laboratory. He speculated that when crabs molt from small embryos to full-grown adults, certain proteins are at work. He set out to look at the structures of proteins and how they control crab respiration, growth, and the molting of their shells.
Then his life changed by a form of scientific kismet. Linus Pauling (1901–94), then a Nobel laureate in chemistry, was visiting France and stopped by the marine lab to see some friends. Zuckerkandl, with his love of proteins and crabs, sought out Pauling, more like how a fan would approach a rock star than a scientist looking for a new research project. That interaction would transform Zuckerkandl and, ultimately, science itself.
By the mid-1950s Pauling had uncovered the structure of crystals and the fundamental properties of atoms and molecular bonds, and he had even formulated a molecular theory of general anesthesia. He ended up losing the race with Watson and Crick to uncover the structure of DNA. Later, he would spend considerable effort promoting his theory that vitamin C warded off the common cold and other infections.
Pauling grew up in Oregon and attended Oregon State Agricultural College. His fearless approach to science has made him a hero of mine. I am on the selection committee for a foundation in New York that funds artists and scientists at key moments in their careers. The foundation has been awarding fellowships since the 1920s and has retained every application it ever received. Its offices on Park Avenue are a treasure trove of letters, files, and applications of Nobel laureates, novelists, dancers, and academics of all stripes. A colleague there knew of my interest, and when I came to work one morning, I saw an old crinkled file waiting on my desk. It was Pauling’s application from the 1920s. At the time applications required college transcripts and doctors’ notes, items we would never request today. I took particular interest in his transcript from Oregon State. His record was distinguished by its highs and lows. As expected, he had As across the board in geometry, chemistry, and math. His work in “camp cookery” merited an undistinguished C. Gym was an ongoing string of Fs for years. In his second year, Pauling established one of the top grades in his class in a required course on “explosives.” He ultimately won two Nobel Prizes: after receiving the award in chemistry in 1954 for understanding proteins, he won the peace prize in 1962, for his work against nuclear testing. Pauling’s As in chemistry and explosives in college augured well for his future life.
After a short conversation, Pauling saw something special in Zuckerkandl and invited him to move to Caltech. But Pauling’s offer came with strings attached. Pauling did not have a lab of his own at the time because he was away most days working on his antinuclear activities. Pauling set Zuckerkandl up with a colleague whose lab was equipped to do biochemistry experiments. When Zuckerkandl broached his idea of working on crab proteins, Pauling waved that aspiration aside. For over a decade, Pauling had been interested in how nuclear radiation could affect cells. One target of this work was the protein hemoglobin, which ferries oxygen in the blood from the lungs to the cells of the body. Pauling suggested, to put the term mildly, that young Zuckerkandl give up the aspiration to understand crabs and instead spend his time thinking about hemoglobin. While the shift derailed Zuckerkandl’s plans, the advice was prescient.
Zuckerkandl explored the hemoglobin proteins of different species using some of the era’s technologies, which were quite limited. He couldn’t sequence amino acid composition of the proteins of different species, so he extracted them and used relatively simple methods to assess their overall size and electric charge. With the safe assumption that proteins having generally similar amino acid sequences should have similar weights and electrical charges, he used these easily obtainable measurements as proxies for their overall similarity.
Zuckerkandl found that human and ape hemoglobins were more similar to each other in size and charge than they were to the hemoglobins of frogs and fish. This simple measurement held, for him, the glimmer of something important. He speculated that this similarity between human and ape proteins could be the result of evolution: the reason human and primate blood proteins were similar was because they are closely related. When he showed his initial result to the head of the laboratory, Zuckerkandl got the cold shoulder. The professor was an ardent creationist and would have none of this evolution talk in his laboratory. Zuckerkandl was welcome to work there, but the boss would have nothing to do with any publication that suggested that people and monkeys were related to each other. The door seemed to close for Zuckerkandl just as he saw a glint of success.
Then luck struck. Pauling got an invitation to contribute a paper to a Festschrift for another Nobel laureate, his close friend Albert Szent-Györgyi. Festschriften are books or special issues of journals produced to honor the retirement of a valued colleague. They typically contain papers celebrating a career in science contributed by friends and longtime colleagues. The key point is that virtually nothing important ever appears in these volumes, because the papers are usually remembrances sprinkled with slivers of new data. These volumes are not often peer reviewed; hence they can hold long pages of adulation for the honoree or data that authors couldn’t publish anywhere else. Knowing these facts, and wanting to honor his friend, himself a very bold scientist, Pauling had an idea. He approached Zuckerkandl with the idea of writing “something outrageous.”
This offbeat aspiration fueled one of the classic scientific papers of the twentieth century.
The timing was ripe for doing something bold in biochemistry. By the time Zuckerkandl entered Pauling’s orbit in the late 1950s, the amino acid sequences of different proteins were becoming available, and Pauling’s lab had access to the data. Today’s DNA sequencing was still a long way off, but sequencing the amino acid string of different proteins was possible, if tricky and slow. Pauling was acquiring sequences of the proteins of gorillas, chimps, and people, among others. Armed with this new information, Zuckerkandl and Pauling were ready to attack the fundamental question: What do the proteins of diverse animals tell about their relationships? Zuckerkandl’s initial results, using crude analyses of size and charge, implied that proteins might tell quite a bit about history.
A century before anybody knew about DNA and the sequences of proteins, Darwin’s ideas had made specific inferences about them. Darwin speculated that if creatures shared a genealogical tree, then the amino acid sequences of proteins of humans, other primates, mammals, and frogs should reflect their evolutionary history. Zuckerkandl’s initial experiments hinted that this could be the case.
Hemoglobin turned out to be an ideal subject for this research. All animals use oxygen in their metabolism, and hemoglobin is the blood protein that carries oxygen from the respiratory organs, either lungs or gills, to the body’s other organs. Zuckerkandl and Pauling compared the amino acid sequence of the hemoglobin molecule in different species and were able to estimate how similar the proteins were.
Each new species Zuckerkandl and Pauling added to their analysis brought Darwin’s prediction into ever clearer focus. The sequences of humans and chimps were more similar to each other than to cows. And all these mammalian hemoglobins were more similar to each other than to those of frogs. Zuckerkandl and Pauling confirmed that they could decipher the relationships among species, and the history of life more generally, from proteins.
The pair took their idea one step further in a bold thought experiment. What if, they speculated, proteins evolved at constant rates over long periods of time? If that were true, then the more proteins of two species differed from each other, the longer the time those species have been evolving independently from a common ancestor. By this logic, the reason proteins of humans and monkeys are more similar to each other than they are to those of frogs is that humans and monkeys share a more recent common ancestor with each other than either does with frogs. This makes sense given what we know from paleontology—the primate common ancestor of humans and monkeys would be more recent than the amphibian one they share with frogs.
If, as Pauling and Zuckerkandl speculated, proteins evolved at a constant rate, you could use differences in the sequence of proteins to calculate the time that these species shared that common ancestor. (See this page–this page for a discussion of the method.) Proteins in the bodies of different species could serve as a kind of clock for understanding evolution: no rocks or fossils would be needed to tell time in the history of life. This idea, so utterly outrageous when it was first proposed, is now known as the “molecular clock” and is used in many instances to calculate the antiquity of diverse species.
Zuckerkandl and Pauling were devising an entirely new way to infer the history of life. For more than a century, the history of life was deciphered by comparing ancient fossils. But now, by knowing the structure of the proteins of different animals, Pauling and Zuckerkandl could assess evolutionary relationships. This insight heralded a bonanza: bodies contain tens of thousands of proteins. The proteins of different species could be as informative as fossils. But these fossils aren’t in rocks—they lie inside every organ, tissue, and cell of every body of each living animal on the planet. If you knew how to look, you could uncover the history of life in any well-stocked zoo or aquarium. The history of all creatures was now knowable, even those for which the fossil record had yet to be unearthed.
DNA passes from generation to generation containing the information to make proteins and thereby bodies. Individuals and their bodies may come and go, but the molecules form an unbroken connection through the ages. The more we dig into that connection, the more we learn about the relationships between all living things.
With the publication of the Festschrift in the early 1960s, Zuckerkandl and Pauling ultimately gave birth to a new field of research using molecules to trace history. But you couldn’t have guessed at the future impact of their paper judging from the reaction of the scientific community at the time. “Taxonomists hated it. Biochemists thought it useless,” Zuckerkandl recalled on its fiftieth anniversary. Taxonomists, paleontologists—anybody focused on anatomy despised this idea. No longer would these fields have a monopoly on reconstructing evolutionary history. Zuckerkandl and Pauling showed that virtually every molecule in the body of living creatures can tell of past events. If paleontologists thought the paper threatened their survival, biochemists could not have cared less about it. Evolutionary studies were, to them, a kind of genteel backwater. In their view, serious scientists worked on protein structure, disease, and function, not on the relationships between people and frogs.
Chemical reactions and scientific ideas share a fundamental similarity: both typically need catalysts to happen. One person took Zuckerkandl and Pauling’s ideas to spawn a community of scientists who approached the history of life with new eyes.
In the early 1960s Allan Wilson (1934–91), a mathematics prodigy from New Zealand, switched to biology and joined the biochemistry faculty at the University of California at Berkeley. This was a time of unrest on campuses generally, at Berkeley in particular, and Wilson became one of the most politically active professors there. He relished disruption in everything he did, so much so that his students described political protests as a kind of group lab meeting.
A simple premise drove Wilson’s career until his untimely death at the age of fifty-six. He believed that if you cannot simplify a complex phenomenon into its constituent parts, then you don’t understand it. The mathematician in him led him to seek simple rules behind biological patterns and then develop rigorous means to test them. Wilson had a passion for developing bold and outrageously simple hypotheses to explain complex patterns in the history of life. Then he’d try to falsify his idea with as much research as possible. If the idea withstood his own data barrage, it was ready to reveal to the outside world. This approach made Wilson’s lab a raucous epicenter for some of the best and the brightest at Berkeley in the 1970s and ’80s. His laboratory became an intellectual hothouse with a freewheeling and intense attitude, attracting talented young students from around the world, many of whom later emerged as luminaries in their own right.
I arrived in Berkeley as a newly minted paleontology Ph.D. in 1987, when Wilson and his team were at the height of their discoveries. My world was centered on rocks and fossils, not on proteins and DNA. Wilson’s presentations were already attracting large crowds from across the university, and the battle lines between anatomists and molecular biologists were drawn and deeply entrenched. At one seminar, I was seated with a number of paleontologists who were growing increasingly uncomfortable with each passing slide of Wilson’s talk. The crescendo hit when Wilson presented a simple equation, with three variables, that he claimed revealed how fast evolution happens in different species. Seeing this slide, a colleague elbowed me and asked sarcastically, “So most of paleontology fits into that equation?”
For Wilson, the field of evolutionary biology was ripe for his kind of disruption. Zuckerkandl and Pauling’s idea of proteins as historical signposts fit his research style perfectly—it was simple and could be put to the test with new data. Animals have many proteins, proteins were becoming known with great regularity, and if there was a strong historical signal in the data, Wilson would not only find it but squeeze every possible inference out of it.
Wilson set his sights high. His question was: How closely are humans related to other primates? If any question was likely to stir up the dust, this was it. And since fossil evidence was relatively sparse for this part of the evolutionary tree, the molecular approach would be particularly meaningful.
Wilson had an almost magical ability to attract students into his orbit, nurture their talents, and help them make transformative discoveries of their own. After attending college in the Midwest, Mary-Claire King went west to study statistics. Arriving in California in the mid-1960s, she lost her drive for math and was hunting for a new intellectual focus. A course on genetics by one of the senior scientists at Berkeley kindled her passion for the field. Sticking her toe into the genetics world, she worked for a year in a lab only to discover that she simply didn’t have the touch for lab work. With a scientific career not looking very promising, she took a year off to work with Ralph Nader on consumer activism. Nader invited her to work with him in D.C., a move that would have precipitated a departure from graduate school. She considered the offer as she went to protests at Berkeley. The protests held sway over her time and opened her world to new people and personalities. One of those personalities was Allan Wilson.
After one protest, Wilson convinced King to return to graduate school, if only to earn the Ph.D. as a sheepskin helpful to her work in policy. Almost immediately, she was swept into Wilson’s data-centered activism in science. But the Wilson lab also presented new challenges for her to overcome: no longer in the realm of equations and numbers, she would now have to learn to work with blood, proteins, and cells.
What made matters even more fraught was that Wilson wanted her to do some sophisticated lab work. Since Zuckerkandl and Pauling had produced their initial work on proteins, a number of laboratories were devoting themselves to understanding which living apes are our closest relatives and how long ago our species diverged from them. Wilson and his group believed answers would come from getting as much new data as possible. In classic Wilsonian fashion, King decided to look not just at hemoglobin but at every protein she could get her hands on. A concurrence of signals in many different proteins should constitute a robust evolutionary signal. King and Wilson received chimpanzee blood from various zoos and human blood from hospitals. If King didn’t have a knack for laboratory work, she was going to have to find one: chimpanzee blood clots extremely fast, so she would have to work quickly or develop new methods. In the end, she did both.
King decided to use a rapid method to test the differences between proteins. The idea is a simple version of the one that Zuckerkandl had used a decade before. If two proteins differed in their sequence of amino acids, then their weights would differ also. Moreover, being composed of different amino acids means that they would carry different electrical charges. From a technical standpoint, if you put those proteins in a gel suspension to hold them and then ran a current through the gel, the proteins would migrate across to one edge, attracted by the charge. Similar proteins would migrate at the same speed, but proteins that were different would not. You can envision the gel as a kind of racetrack, where the charge would set the race in motion. Similar proteins would go a similar distance in a similar time. The more different they were, the farther apart their runs on the gel would be.
King launched her work still unsure of her skills. And now, to make matters worse, Wilson went off to Africa, leaving her largely on her own during his yearlong sabbatical. She would try to telephone him every week to review her data, but she was largely unmentored for days at a time.
From the start, things did not go well. King managed to extract the chimp and human proteins and put them on the gels. She ran the gels, but the chimp and human proteins moved almost the exact same distance for almost every protein. She wasn’t seeing any meaningful differences between humans and chimps. Had she extracted the proteins correctly? Was she running the gels poorly? Her hopes for a breakthrough seemed doomed.
During their regular conferences, King would share her data with Wilson, who would, in typical fashion, hammer her results with questions on technique as if he were still in Berkeley. No matter how hard he hit her work with every conceivable criticism, the result stood. The protein sequences of humans and chimpanzees were nearly identical. And it wasn’t just one protein that was telling the story, it was more than forty of them. In fact, King wasn’t flailing around aimlessly; she was revealing something fundamental about genes, proteins, and human evolution.
King then compared humans and chimps to other mammals. And here the importance of her discovery came into clear focus. Humans and chimpanzees are more similar genetically than two different species of mouse are to each other. Nearly identical species of fruit fly differ from one another genetically more than humans and chimps do. Humans and chimpanzees are, at the level of proteins and genes, almost identical.
King’s gels revealed a deep paradox. The anatomical differences between humans and chimps, including the essence of our human uniqueness—bigger brains, bipedalism, proportions of the face, skull, and limbs—weren’t deriving from differences in the proteins or genes that code for them. If the proteins and DNA that make those molecules are largely the same, then what was driving the differences? King and Wilson had a hunch but not the technology to test it.
Recent science has confirmed what King and Wilson first saw. Comparing whole genomes, chimpanzees and humans are anywhere from 95 to 98 percent similar.
The next advances didn’t come from the hands of a student and her adviser working alone. They would require big science—the kind of science where the results are announced by presidents and prime ministers.
When President Bill Clinton and Prime Minister Tony Blair held a press conference with the heads of rival teams sequencing the human genome—the publicly supported one led by Francis Collins and the private one directed by Craig Venter—they had only a very rough draft of the genome to announce. Despite the hoopla, at the time of the announcement in 2000, large chunks of the genome were missing, and little was known about which parts were important for human health and development.
The initial outcomes of the Human Genome Project had less to do with genomes than with technology. The race to sequence the human genome set off a technological frenzy that continues to the present day. Gordon Moore famously predicted in 1965 that microprocessing speed would double every two years. We feel the results of that increase with every purchase of digital devices: computers and phones have gotten ever more powerful and cheaper with each passing year. Genome technology has smashed even those rates of progress. The Human Genome Project took more than a decade, cost over $3.8 billion, and involved rooms full of machines. Today, there is an app for sequencing, and handheld gene sequencers are already on the market.
Once the human genome was mapped, those of other species emerged annually. Genomes are now announced so rapidly that the pace is limited only by the frequency at which scientific journals get published. We’ve had the mouse genome project, the lily genome project, the frog genome project—projects on everything from viruses to primates. At first it was a big deal to have a genome project published; the results would appear in A-list journals to great fanfare in the press. Nowadays, unless there is some important biological process or health issue at stake, new genomes get published with barely a mention.
While the luster of genome papers has faded, they continue to be a bonanza that would have delighted and enthralled Émile Zuckerkandl, Linus Pauling, and Allan Wilson. Armed with the genomes of flies, mice, and people, we can now look to them to ask central questions about life: How are species related, and what makes each one different?
Each of us is made up of trillions of cells—muscle, nerve, skeleton, and hundreds of others—working together, all packed and connected in just the right way. The flatworm, Caenorhabditis elegans, gets by with only 956 cells. If that is not surprising enough, consider this: despite the vast differences in number of cells and complexity of organs and body parts, humans and worms both have the same number of genes, roughly twenty thousand. And worms are just the beginning. Flies, too, have about the same number as we do. In fact, animals are true pikers compared to plants such as rice, soy, corn, and cassava, all of which have almost twice as many genes. Whatever is driving the evolution of complex new organs, tissues, and behaviors in the animal world isn’t coming from having more genes.
Even weirder is the organization of the genome itself. Remember our mantra: genes are strings of bases that are translated into a sequence of amino acids, and those amino sequences code for proteins. In essence, genes contain the molecular template for proteins. When a gene sequence is published, authors are required to make the data publicly available and deposit the information in a national computer database. After decades of work on genes, these repositories are burgeoning with sequences from thousands of genes from thousands of species. You can now sit at your desktop, type in a sequence, and see which gene from what species matches it. When you compare a whole genome to the genes in these databases, you can get a picture of what genes are inside by looking at the matches. In genome after genome published over the past two decades, one observation is completely inescapable: genes are rare in genomes. If genes are the part of the genome that codes for protein, then most of the genome doesn’t seem to be involved in making them. Gene sequences that code for proteins compose less than 2 percent of the human genome. That leaves some 98 percent with no genes at all in it.
Genes are but islands in a sea of DNA. With rare exceptions, this pattern holds for species from worms to mice. If most of the genome does not contain genes that code for proteins, then what does it do?
After serving in the French resistance during World War II, two French biologists, François Jacob (1920–2013) and Jacques Monod (1910–76), started work on bacteria to understand how they digest sugar. If any question seemed more esoteric and less related to the human condition, this was it.
Jacob and Monod showed that the common bacterium Escherichia coli can digest two sugars in its environment, glucose and lactose. The bacterial genome is relatively simple. Long stretches hold genes that contain the information to make the proteins that digest each sugar. When glucose is abundant, and lactose is rare, the genome makes the protein that digests glucose. When the reverse is true, the genome makes the one that digests lactose. While this state of affairs may seem simple and obvious, it was the basis for a revolution in biology.
The scientists discovered two components in the bacterial genome. In the first, the genes contain the information about the structure of each protein that digests the two different sugars. These are the As, Ts, Gs, and Cs that get translated into the sequences of amino acid strings that comprise a protein. Flanking the genes are other shorter strings of As, Ts, Gs, and Cs that don’t code for protein at all. When another molecule attaches to this stretch, it turns the gene on or off. This is the second component. Think of these shorter strings as molecular switches that control when a gene will be active and make a protein. In bacteria, genes and the switches that control their activity lie next to each other in the genome. Depending on which sugar is present, a molecular reaction controls which gene is active and, in turn, which protein is made.
Jacob and Monod discovered that the bacterial genome is a biological manufacturing process that makes proteins in the right place and time. There are two components: genes that code for proteins and switches that tell the genes when and where to be active. For this work, the pair won the Nobel Prize for Physiology or Medicine in 1965.
In the decades since Jacob and Monod’s Nobel, the twofold organization of the protein manufacturing process has been revealed to be a general feature all genomes. Animals, plants, and fungi all have genes that code for proteins and molecular switches that turn the genes on and off.
Their discovery provides clues to understanding what makes cells, tissues, and organs distinct. A human body is essentially a highly organized package of four trillion cells of two hundred different kinds, organized as tissues, from bone and brain to liver and skeleton. Cartilage tissue is composed of cells that make collagen, proteoglycans, and other constituents that combine with water and minerals in the body to give cartilage its pliant yet supportive properties. The constellation of proteins that make a nerve cell are different from those in cartilage, muscle, or bone.
Here’s the rub: every single body cell contains the same sequence of DNA, derived from the fertilized egg that started it all. The DNA inside a nerve cell is virtually identical to that in cartilage, muscle, or bone. If each cell has the same genes inside, then the differences among different cells come from which genes are active making proteins. The kinds of switches that Jacob and Monod discovered become essential to understanding how the genome builds different cells, tissues, and bodies.
If the genome is thought of as a recipe, then genes code for the ingredients, and the switches contain the instructions about when and where to add each ingredient. If 2 percent of the genome is made up of genes that make proteins, then part of that other 98 percent contains the information that tells genes when and where to be active.
When a genetic switch is flipped, usually by proteins attaching to it, a gene becomes active and makes a protein.
But how does the genome build a body? How does it produce changes to species in the history of life? Nobody knew it at the time of the Human Genome Project, but the small number of genes and their rarity in the genome were only the tip of the iceberg of surprises to come.
Sailors once believed that six-toed cats could bring good luck on ships. These so-called mitten cats were thought to make better mousers because their broad paws could balance them while at sea. Stanley Dexter, a sea captain, had a litter of these cats and gave one to his pal Ernest Hemingway, who was living in Key West at the time. This kitten, Snow White, gave rise to a lineage of six-toed cats that thrives to this day at the Hemingway estate. Besides being a highlight for tourists, these cats have played a role in a new conception of the workings of the genome.
Hemingway cats, or mitten cats, have broad paws with six or more digits.
People, too, occasionally have extra fingers and toes. About one in every thousand people is born with an extra digit in the hand or foot. In an extreme case, in 2010, a boy in India was born with thirty-four digits. Extra fingers can appear on the thumb side or the pinky side, or in split and forked fingers. Additional digits on the thumb side, known as preaxial polydactyly, are particularly important biologically.
In the 1960s scientists working on chicken eggs were probing how wings and legs are made in the embryo during development. Limbs emerge from the embryo’s body as tiny buds, looking like small tubes. Over a few days—the number varies by species—the bud grows, bones begin to form, and the growing end becomes shaped like a broad paddle. Digits, wrists, and ankle bones form inside this expanded surface.
Scientists discovered that by removing or moving the cells inside the paddle area, they could tweak the number of digits that form. If they excised a small strip of tissue from the terminal end, development of the limb stopped. If they cut out this strip during early development, the embryo formed a limb with few digits or none at all. If they extracted the strip at slightly later stages, the embryo might lack only a single digit. The stage of development at which the experiment is done matters: early removal has more dramatic effects on the embryo than later removal.
John Saunders and Mary Gassling from the University of Wisconsin, for reasons lost to time, extracted a tiny slice of tissue from the base of the growing paddle of a limb bud. This patch is nondescript—nothing about it looks unusual. It sits on the side of the paddle where the pinky will ultimately form. The researchers took this sliver of tissue, less than a millimeter long, and grafted it onto the opposite side of the limb bud, at the base of the paddle where the first digit would form. After sealing up the embryo in the egg, they let it complete development.
The embryo that emerged was a complete surprise. It looked like any normal chick, with a beak, feathers, and wings. But its wings, unlike normal wings with a pattern of three elongated fingers, had as many as six fingers. Something inside that little patch of cells contained instructions to make fingers.
Other labs soon got into the act. In the 1970s a group from England put tiny strips of tinfoil between the patch of tissue and the rest of the limb bud. The wings that emerged had fewer digits than normal. The foil served as a barrier between the patch and other cells. The implication is that some compound emanates from that patch of cells, diffuses across the developing limb, and stimulates digits to form. When a foil barrier stops that diffusion, fewer digits develop, and when the barrier is placed at a different point in the limb, more digits form. But what is the compound that is released?
In the early 1990s three laboratories, working independently, used new techniques to isolate the protein and the gene that makes it. The gene makes a protein during limb development that diffuses across the paddle of the limb bud. As it does so, the researchers found, it tells groups of cells which digits to form. High levels of the protein make a pinky, or fifth digit. Low levels make a first digit, or thumb. Intermediate levels form the digits in between. One of these groups of researchers named the gene Sonic hedgehog, a nod both to a gene known as hedgehog at work in other species and to a video game popular at the time.
But what tells the gene to make fewer or more digits? Are there switches at work for the Sonic hedgehog gene that influence the evolution of digits? Answering this question would be a key for understanding how genes build bodies and how they evolve.
As with most important moments in life and science, this story begins with an accident.
In the late 1990s a team of geneticists in London were inserting snippets of DNA into the genomes of mice to study brain formation. These fragments are part of a little molecular machine researchers make to attach to DNA and to serve as a marker for its activity. Every now and then something goes wrong with this kind of experiment. The fragment can land anywhere in the genome. If it lands in a biologically important part of the genome, a mutant can form. That’s what happened with this team’s experiment: some of the injected mice developed normal brains but had deformed fingers and toes. In fact, one of the mice had extra digits and very broad paws not unlike Hemingway’s mitten cats. The team was able to generate an entire family line of these mutants and, by scientific convention, give them a name. They called them Sasquatch, after the big-footed creature of the paranormal world.
Since their mutants were now useless for the study of brains, the team wondered if any limb biologists might be interested in them. They set up a poster at a scientific meeting announcing their results. Posters at conferences are sometimes thought to contain the B-list of scientific results, as the best ones get presented as talks. But posters also have a social element; people mill around and science gets discussed. It’s been my experience that more collaborations begin over posters than after talks.
The poster showed a type of polydactyly that was known to arise from a mutation in Sonic hedgehog: the extra fingers were on the pinky side. These mutations happen because Sonic hedgehog is turned on in the wrong side of the limb. So the obvious next step was to look at the activity of Sonic in the mutants, experiments that the team did to present in their poster. After they accidentally made the mutant, they looked at the tiny developing limbs under the microscope. The activity of Sonic in the mutants was abnormally expanded, just as you would have expected in this kind of polydactyly. These observations led to the hypothesis that the mutant Sasquatch had been produced by the snippet inserting into, or very near, the Sonic hedgehog gene.
The team didn’t attract a limb biologist to their poster, but Robert Hill, a distinguished geneticist at Edinburgh, randomly walked by and saw the photos of the Sasquatch mutant. From that, a new research program began.
Hill’s lab had gained renown for understanding the workings of the genome in eye development. Through that work, his team, including the young scientist Laura Lettice, had developed a toolkit to probe the genome to find fragments of DNA. Since they knew the DNA sequence of the snippet, they had to chug through the whole genome looking for where it ended up residing. Lettice was just starting her career and still quite green, but she had the patience and the skill set necessary to pull it off.
The team used a simple trick to identify the general location of the mutation on the strand of DNA. They attached a dye to a small molecule that was complementary to the piece of DNA that made the mutant. The idea was that this sequence would home in on the mutation, attach to it, and voilà, the dye would light up at that location. Since the mutation was affecting the activity of Sonic hedgehog, it was likely to be found in one of two places: in the gene itself or in the control region immediately adjacent to it, like the control regions Jacob and Monod had discovered in bacteria.
The reaction did not affect the gene of Sonic hedgehog. That area was not lit up by the dye. Whatever was affecting Sonic hedgehog in the limb, and causing polydactyly, wasn’t a mutation of the gene or, correspondingly, a change in its protein. The team concluded, as Jacob and Monod had, that one of the adjacent control regions was affected. But when they looked, they saw that this area was completely normal. So if neither the gene nor the adjacent switch was affected, what was the cause of the mutation?
As anybody who has ever tried to recover a model rocket on a windy day knows, you can waste a lot of time looking for something nearby when you should be looking really far afield. Hill, Lettice, and the team started trudging through the entire genome until they saw the signal. The snippet inserted was almost a million bases away from the Sonic hedgehog gene. That’s an enormous amount of genetic real estate between the site of the mutation and the site of the Sonic gene. Thinking they must be wrong, they repeated the process and reanalyzed the results. But try as they might, the result stood. A small region one million bases from the gene somehow controlled the activity of Sonic hedgehog. It was like finding the switch for a light in a living room in Philadelphia on a wall in a garage in suburban Boston.
Some genetic switches are located far from the gene they control. DNA is always looping, folding, and contorting to open and close, bringing switches back to the neighborhood of their gene to turn them on and make a protein.
Maybe changes to this remote site were the source of the extra digits? The team tracked down every six-fingered person or cat they could find—polydactylous patients in Holland, a child in Japan, even Hemingway’s cats—and examined their DNA. And in every single one, they found a slight mutation in that region one million bases away from the Sonic hedgehog gene. Somehow, a little mutation at the far end of the genome causes a change in the activity of Sonic, turning it on broadly across the limb, leading to additional fingers and toes.
While sequencing the pattern of As, Ts, Cs, and Gs in this special region, they found this stretch of DNA to be very distinctive. It is about fifteen hundred bases long, and its sequence is comparable among different creatures. People have the region in the exact same place as mice do, about one million bases away from the gene. So do frogs, lizards, and birds. It is present in everything with appendages, even in fish. Salmon have it, as do sharks. Every creature that has the Sonic hedgehog gene active in the development of its appendages, whether limbs or fins, has this control region almost one million bases away. Nature was telling scientists something important with this odd genomic arrangement.
At first glance, it is a wonder that polydactylous cats and people even survive to birth. Sonic hedgehog does not merely control limbs during embryonic development; it is a master gene controlling the development of the heart, spinal cord, brain, and genitals as well. Sonic is like a general tool that development pulls from its toolkit to make diverse organs and tissues. Accordingly, a mutation in the Sonic hedgehog gene should affect every structure where it is active; mutants would have deformed spinal cords, hearts, limbs, faces, and genitalia, among other organs. But what kind of animal would likely arise from a mutation in the Sonic hedgehog gene? Since so many aberrant tissues would likely be produced by a mutation in Sonic hedgehog, the answer would certainly be a dead one.
But the way Sonic hedgehog is controlled during development ensures this outcome doesn’t happen. Why? Mutations in the limb-control region only affect limbs. That’s why polydactylous people with this kind of Sonic hedgehog mutation have normal hearts, spinal cords, and other structures: the switch that controls the activity of the gene is specific only to a particular tissue, leaving the rest unaffected.
Imagine a house with many rooms, each with its own thermostat. A change to the furnace will affect the temperature in every single room, but changing a single thermostat will affect only the room it controls. The same relationship is true for genes and their control regions. Just as a change in the furnace will affect the entire house, an alteration in a gene, and the protein that is produced, can affect the entire body. A global change would be catastrophic, producing dead ends in evolution. But since the genetic control regions are specific to tissues, like a thermostat in a room, a change in one organ won’t affect any others. Mutants can be viable, and evolution can work.
Two kinds of genomic changes can play a role in evolutionary transformations. In the first, changes in genes can cause new proteins to form. A mutation in the sequence of As, Ts, Gs, and Cs in DNA could bring about a change in the amino acid chain that makes protein. If the DNA mutation causes a different amino acid to form along that string, then a new protein can be produced. This clearly happens in many of the major proteins of the body, such as the hemoglobin genes that Zuckerkandl and Pauling studied. The key point is that a change in a protein can affect the body everywhere that protein is found.
The second type of genomic change can occur in the switches that control the activity of genes. After seeing Bob Hill’s work, a lab in Berkeley wanted to find out whether the Sonic hedgehog switch was involved in limb evolution. They started with snakes, since they lack limbs altogether. When the region of the genome that holds the switch was removed from a snake and placed inside a mouse, the mouse’s limbs failed to form digits. Over time it appears that snakes acquired mutations in the switch that controls their ability to form limbs. The Sonic hedgehog protein in snakes is completely normal, as are their hearts, spinal cords, and brains. The change to the switch active in limbs meant that only the activity of Sonic in limbs changed.
This genetic trick holds the clues to general mechanisms for revolution in evolution. If the past decade and a half of research is any indicator, changes in the switches that control gene activity are behind major shifts in evolution of the bodies of vertebrates and invertebrates for organs as different as skulls, limbs, fins, fly wings, and worm bodies, among many others. In case after case, evolutionary transformations are less about changes in the genes themselves than in when and where they are active in development.
David Kingsley, a geneticist at Stanford, spent nearly two decades studying the tiny threespine stickleback, a fish that lives in oceans and streams around the world. Sticklebacks come in a variety of shapes: some have four fins, others two, and still others show different body shapes and color patterns. This diversity makes the stickleback a powerful system in which to explore how genetic changes can make fish different from one another. Using genomic technology, Kingsley has been able to show the exact regions of DNA that underlie most of these changes. Virtually every one is a switch that controls gene activity. The fish with only two fins has a gene with dramatically altered activity that inhibits the activity of a gene necessary for the development of the hind fin. He showed that the change was not to the gene but to the switch that controls the activity of the gene. Guess what happened when he took the switch from a fish that has four fins and put it into the ones that normally have only two? He resuscitated hind fins by making a four-finned mutant from two-finned parents.
We now have the technology to scan the entire genome to see where genes and their control regions reside. Control regions lie everywhere in the genome; some are close to the gene, while others, such as those for Sonic hedgehog, are far away. Some genes may have many control regions influencing their activity, others only one. However many there are, and wherever they may lie in the genome, there is an elegance, indeed a mystery, to how this molecular machine works.
New microscopes that allow us to see DNA molecules themselves also let us see what happens as genes turn on and off.
For a gene to become active, a molecular game of Twister needs to happen. Inactive regions of the genome are tightly coiled upon themselves, bundled around other small molecules to fit inside the nucleus. These regions are closed off and so are relatively inert. Before a region of the genome can become active, it needs to uncoil and open itself up to make a protein.
These are only the first steps in a finely choreographed dance that turns genes on and off. For a gene to activate, its switch needs to contact other molecules and attach to an area adjacent to the gene itself. These attachments trigger the gene to make a protein. In the case of Sonic hedgehog, the switch needs to fold a very long distance to initiate the activity of the gene. So here are the full steps of the dance that goes on when genes turn on: the genome opens, revealing the gene and its control region, parts attach, and a protein is made. This happens in every cell, with every protein.
A six-foot-long string of DNA is coiled until it is smaller than the size of the head of a pin. Conjure the image of it opening and closing in microseconds, writhing and turning to activate thousands of genes every second. From the moment of conception and throughout our adult lives, our genes are continually being switched on and off. We begin as a single cell. Over time, cells multiply, while batteries of genes are activated to control their behavior to form the tissues and organs of our bodies. As I write this book, and as you read it, genes are switching on in all four trillion of our cells. DNA contains many supercomputers’ worth of computing power. With these instructions, a relatively small parts list of twenty thousand genes can build and maintain the complex bodies of worms, flies, and people using control regions spread across the genome. Changes to this incredibly complex and dynamic machine underlie the evolution of every creature on Earth. Always coiling, uncoiling, and folding, our DNA is like an acrobatic maestro, a conductor of development and evolution.
This new science speaks to Mary-Claire King’s struggles to find differences between human and chimp proteins four decades ago. She and Allan Wilson foresaw the importance of genetic switches in the title of their 1975 paper, “Evolution at Two Levels in Humans and Chimpanzees.” One level was at the genes, the other at the mechanisms that control when and where genes are active. Major differences between humans and chimpanzees lie not in the structure of their genes and proteins but in the switches that control how they do their jobs during development. Seen in this way, the gulf between creatures that look as different as humans and chimpanzees, or worms and fish, becomes smaller at the genetic level. If a protein controls the timing or pattern of a developmental process, then changes to when and where that protein is active can have big effects on the bodies of adults.
Changes to the switches that control gene activity can affect embryos and evolution in a myriad of ways. If, for example, proteins that control brain development are turned on for a longer duration or in different places, the result can be larger and more complex brains. Tweaking the activity of genes can bring about new types of cells, tissues, and, as we’ll see, bodies.