VIII.7

Directed Evolution

Erik Quandt and Andrew D. Ellington

OUTLINE

  1. Directed evolution of nucleic acids

  2. Directed evolution of proteins

  3. Directed evolution of cells

  4. The future of directed evolution

Directed evolution is a process in which scientists perform experiments that use selection to push molecular or cellular systems toward some goal or outcome of interest. The objectives of this work include the production of substances of value and improved understanding of the evolutionary process. Elucidating the precise mechanisms by which improvements occur is often of particular interest. In general, directed evolution requires a genetic system in which information is encoded, heritable, and mutable; a means for selecting among variants based on differences in their functional capacities; and the ability to amplify those molecules or organisms that have been selected.

GLOSSARY

Aptamer. A short nucleic acid molecule that binds to a specific target molecule.

Bacteriophage. A virus that infects bacteria.

Esterase. An enzyme that splits esters into an acid and an alcohol in a chemical reaction with water, also known as hydrolysis.

Fluorescence-Activated Cell Sorting (FACS). A method for sorting a heterogeneous mixture of cells into two or more containers, one cell at a time, based on the specific light scattering and fluorescent characteristics of each cell.

Lipase. An enzyme that catalyzes the hydrolysis of ester chemical bonds of lipid substrates.

Messenger RNA (mRNA). The product of transcription of a DNA template, which in turn encodes the sequence for the production of a protein.

Peptide. A short polymer of amino acids linked by peptide bonds.

Polymerase Chain Reaction (PCR). Enzymatic reaction in which a small number of DNA molecules can be amplified into many copies.

Protease. A protein capable of hydrolyzing (breaking) a peptide bond.

Quasispecies. A large group of related genotypes that exist in a population that experiences a high mutation rate, in which a large fraction of offspring are expected to contain one or more mutations relative to the parent.

Ribozyme. An RNA molecule with a defined tertiary structure that enables it to catalyze a chemical reaction.

Transfer RNA (tRNA). An RNA molecule linked to an amino acid that is involved in the translation of an mRNA transcript into protein.

Transcription. The process of creating a complementary RNA copy (mRNA) of a sequence of DNA.

Transcription Factor. A protein that binds to specific DNA sequences, thereby affecting the transcription of genetic information from DNA to mRNA.

Translation. The process of decoding an mRNA molecule into a polypeptide chain.

Tumor Necrosis Factor-α (TNF-α). A protein involved in the regulation of certain immune cells that induces inflammation and cell death and thereby inhibits tumorigenesis and viral replication.

Directed evolution involves guiding the natural selection of molecular or cellular systems toward some goal of interest. That goal may be either to enhance basic understanding of natural systems or to produce something of value. In either case, this approach relies on changing the frequency of genotypes over time, with concomitant changes in function and phenotype, just as natural evolution does. However, from the point of view of an “evolutionary engineer” the mechanism by which changes in frequency are obtained is often of particular interest. In general, directed evolution requires a genetic system (a system in which information is encoded, heritable, and mutable), a means for sieving the variants that are present in that system (by differences in either function or fitness), and the ability to amplify those molecules or organisms that pass through the sieve.

The directed evolution of molecules and cells has been carried out for decades, although the methods used for directed evolution have gained in technical sophistication and in the breadth of systems that can be tamed. Indeed, if one includes animal and plant husbandry, then directed evolution has been coincident with the evolution of human society. This chapter first discusses molecular evolution, focusing on the selection of nucleic acids and proteins that have novel functions. The simple rule set for directed evolution described can be satisfied in a surprisingly large variety of ways, including in molecular systems that at first glance appear to have the properties of cells but that are not actually cellular. The chapter then considers how the evolution of cells can be directed and accelerated.

1. DIRECTED EVOLUTION OF NUCLEIC ACIDS

The forefather of the directed evolution of molecules was Sol Spiegelman of the University of Illinois at Urbana-Champaign. Spiegelman and his group studied a small bacteriophage, called Qbeta, that has an RNA genome. In nature, this virus infects host cells of the bacterium Escherichia coli to replicate. However, Spiegelman’s team found that the protein involved in replicating this bacteriophage, Qbeta replicase, was capable on its own of replicating RNA molecules in a test tube. The only requirements for replication were an initial RNA template, the replicase, nucleoside triphosphates, and appropriate buffer conditions. However, once freed from the confines of a cell, the Qbeta replicase tended to make multiple mutations and deletions in the RNA template, ultimately leading to smaller and smaller RNA molecules. The evolutionary fates of these so-called minimonster variants could be altered depending on the experimental conditions (Saffhill et al. 1970). Spiegelman’s demonstration was highly influential and became an icon for the field. However, Qbeta replicase proved too difficult to control, since any successful variants that arose were transient and quickly mutated into a complex and ever-shifting quasispecies. Thus other methods and systems were required to advance the study of directed evolution.

The modern era of directed molecular evolution had to await the development of several technologies, chief among them the chemical synthesis of DNA and the advent of the polymerase chain reaction (PCR). Chemical DNA synthesis allowed a defined, yet random, pool of nucleic acids to be generated, providing the perfect substrate for directed evolution experiments. If constant sequence regions were included at the termini of this pool, then its members could be exponentially amplified by the PCR. All that remained to ensure the selection of functional nucleic acids was to impose some sort of selection that would differentiate the members of the pool from one another, a feat that was performed by Jack Szostak and coworkers (Ellington and Szostak 1990). Each of the different sequences in the pool can fold into a different shape; each of the different shapes therefore potentially has a different function or phenotype. One of the first selections from a random sequence pool involved identifying single-stranded DNA molecules that could bind to a particular molecular dye. The nucleic acid pool was poured down a column containing immobilized dye molecules; some variants stuck to these molecules, while most of the population flowed through. Once the bound variants were eluted (by unfolding the nucleic acids) they were amplified by the PCR, and single strands were prepared from the double-stranded product. Iterative cycles of selection and amplification resulted in the gradual accumulation of those molecules that had high affinity for a given dye (plate 4). One analogy that is often used to describe these experiments is that they are akin to looking for a needle in a haystack; every time you grab a handful of hay that contains a single needle, you then convert that needle into a handful of needles. Given that the hay outnumbered the needles in the experiment described by a factor of almost 1010:1, iterative selection and amplification were essential to the purification of the needle.

The selection of nucleic acids that can bind to ligands became known as in vitro selection, and the binding sequences that result from this approach are called aptamers (from the Latin aptus, “to fit”). In both of the experiments described so far, the visions of the experimenters were driven not so much by specific hypotheses as by the availability of the technology and a desire for unfettered exploration. Spiegelman set off into the unknown to determine whether viral replication in the test tube (outside the host cell) was even possible, whereas Szostak took the great leap that there would be at least a few, previously unknown, shapes in the haystack that could bind to a dye. However, these technological innovations also ended up providing some support for the hypothesis that life may have got its start from selfreplicating nucleic acids. Along with the discovery by Tom Cech and Sidney Altman that RNA could act as a catalyst (North 1989), the finding that nucleic-acid-binding variants could be selected from random strings of information implied that nucleic acids might avoid the “chicken and-egg problem” with respect to the origin of living systems. That is, proteins are the functional machines, while nucleic acids bear information. Proteins are needed to replicate, while nucleic acids must be replicated. It seemed unlikely that both these complex biopolymers arose simultaneously, and this problem was a huge conundrum for thinking about origins. However, once it was clear that nucleic acids were in fact “chicken-eggs” (being both functional machines and information bearing), this conundrum was deftly resolved.

Around the same time that the antidye aptamers were being generated, Larry Gold and Craig Tuerk at the University of Colorado at Boulder showed that protein-binding nucleic acids could be selected from random sequence pools (Tuerk and Gold 1990). The same technology that might contribute to explaining life’s origins therefore could also be used to create new drugs. Probably the most famous aptamer produced by directed evolution is known as Macugen. It was selected to bind to and thereby inhibit the function of the human protein vascular endothelial growth factor (Ng et al. 2006). This experimentally evolved RNA molecule is now used clinically to combat wet macular degeneration, a common cause of blindness in the elderly.

Catalytic nucleic acids can also be selected from random sequence pools. One fascinating and important ribozyme is known as the Class I Bartel ligase, after David Bartel, who discovered it (Bartel and Szostak 1993). This ribozyme has been of seminal importance in understanding the origin and evolution of life. The Bartel ligase seems, on first impression, to be a miracle. The probability of selecting it from a random sequence pool can be calculated, and it turns out that it should have been found once every 10,000 times that Bartel carried out his directed evolution experiment. Although it is possible that Bartel was extraordinarily lucky, the alternative and probably better explanation is that while any particular ligase was unlikely to have evolved, the ligase function itself was much more likely to have arisen. Thus, if the experiment were to be carried out again, a molecule of similar functionality and complexity—but with an entirely different sequence—would emerge and be selected. This finding is extremely important because it means that there are many possible routes from origins to modern, complex systems. As the late paleontologist Steven Jay Gould suggested, if we were to run the tape of life again, we’d probably get a very different answer.

2. DIRECTED EVOLUTION OF PROTEINS

It has also proven possible to direct the evolution of proteins. However, in this instance genotype and phenotype are not embedded in the selfsame molecule—that is, they are not chicken-eggs in the way that some nucleic acids are. Therefore, there must be some other way to connect genotype and phenotype to perform directed evolution on proteins. One of the first ways this was done was by appending a short random library to the gene that encodes a coat protein of a bacteriophage, such that the library of peptides would then be expressed on the surface of the bacteriophage. The displayed peptide variants could then be selected on the basis of their ability to bind a ligand, as with nucleic acids, and amplified not by the PCR but by passage through cells that could be infected by the selected bacteriophage (Smith and Petrenko 1997) (plate 5). Such phage display methods have become very popular and are the basis for selecting not just peptides but antibodies and enzymes, as well. For example, Humira, an antibody drug effective in the treatment of rheumatoid arthritis, was selected via phage display. By displaying an antibody library against a protein involved in inflammation response (TNF-α), researchers were able to select and amplify high-affinity antibodies that could bind to the protein.

The same methods were later expanded to cell surfaces, so that individual cells now are the vehicle connecting protein variants on the surface to the genes encoding those proteins. As with the phage, the proteins on the cell surface could be selected for either binding or catalysis. For example, George Georgiou and coworkers showed that libraries of peptide proteases expressed on the surface of E. coli could be selected for altered substrate specificity (Varadarajan et al. 2005). Cleavage of a new desired “green” substrate by a given protease variant led to the accumulation of that color on the surface of the bacteria, while there was a parallel opportunity to cleave and accumulate a parental undesired “red” substrate. Both positive and negative selections could thus be applied to tune substrate specificity. The cells expressing protease variants with altered, desired specificities (colored green, but not red) were screened from the protease libraries based on a technique known as fluorescence-activated cell sorting (FACS) in which individual cells are sorted based on their fluorescence. The selected variants were further amplified by bacterial growth, and multiple cycles of screening and amplification led to the winnowing of the initial population to those few proteases with the desired new specificities. The ability to tune protease specificities may someday have practical applications, such as destroying undesirable proteins, including viral proteins.

The selection of proteins inside cells is similar to cell-surface display, except that instead of selecting directly for the ability of individual proteins to bind or catalyze reactions, researchers must instead select for the impact of binding or catalysis on cellular phenotypes. For example, some antibiotic resistance genes encode enzymes that modify antibiotics in some way. New resistance functions can be selected by challenging cells with a different antibiotic. For cells to survive, the resistance element must accumulate mutations that change its function. The challenge is often to make sure that selection is focused on a particular enzyme of interest and that the cell does not follow some other evolutionary pathway that leads to survival (i.e., a different cellular enzyme than the one of interest might mutate in a way that leads to antibiotic resistance). To focus selection on a particular enzyme, user-mutagenized libraries of enzymes can be generated, just as they were for the ribozyme and protease selections described earlier. There are various ways to both mutagenize and select a given library. In the 1990s, Pim Stemmer came up with a brilliant technique to speed the evolution of proteins by allowing for in vitro recombination (Stemmer 1994). In this method, known as DNA shuffling, different enzyme variants are selected from a library (or may otherwise already be present). By cutting the genes for the enzymes into pieces, and then using PCR to recombine and eventually reassemble them into the full-length gene, many mutations can be brought together in the same gene. This approach is much faster than natural recombination, which generally involves only two gene copies at a time. In either case, the interesting assumption is made that combinations (or at least some combinations) of favorable mutations will themselves be favorable. This assumption has in large measure turned out to be true, perhaps because genes have evolved to evolve. That is, the types of protein sequences and structures amenable to recombination are those that have been successful at responding to changing conditions during the long course of evolution.

The methods so far described all require living cells either to make phage or to express proteins. Other researchers have devised clever ways to carry out protein evolution even without cells, by using in vitro transcription and translation systems that contain all the components necessary to make proteins, including ribosomes and tRNAs. For example, Andreas Pluckthun and his group managed to stall ribosomes in the process of translating mRNAs and thereby could connect the mRNA information being read with the protein function being translated (Hanes and Pluckthun 1997). Similarly, Jack Szostak’s group figured out how to use the antibiotic puromycin to covalently couple a protein being translated on the ribosome to the mRNA making that protein (Roberts and Szostak 1997). As with natural selection and the various schemes described earlier, the coupling of genetically encoded information with function is essential for sieving through large sequence libraries to find rare functions. One of the advantages of these ribosome-based methods is that they can be used to look through much larger sequence populations than cell-based methods.

Researchers have also begun to create cell-like bubbles to assist with directed evolution (Griffiths and Tawfik 2006). Water-in-oil emulsions can be created by simply mixing these two components and shaking them (much like making salad dressing). If in vitro transcription and translation components are added to the aqueous component, then each small aqueous bubble in the sea of oil will be capable of making proteins. If only one DNA template or mRNA molecule is captured per bubble, then only one type of protein will be made in that bubble. The problem then becomes how to capture the bubble making the protein variant of interest. One solution is to have the protein feed back on the nucleic acid that produced it, and various methods have been developed that either mark the nucleic acid (for example, by methylation), amplify the nucleic acid (via a translated polymerase), or capture the nucleic acid (via a binding protein). Once the mixture is demulsified, all the nucleic acids are remixed together, but only those that have encoded a functional protein are marked, amplified, or bound, and they can be carried into subsequent rounds of selection and amplification. Another solution to the problem of connecting the appropriate bubble with the phenotype of interest has been to develop methods to capture the bubble itself. Adding a lipid coat to the aqueous bubbles allows them to be stabilized and sorted by FACS. Thus, no direct feedback loop to nucleic acids is required, which simplifies the procedures and greatly expands what kinds of enzymes can be selected. For example, lipases that act on esterases can turn over fluorescent substrates, so that more-active lipase variants will accumulate more fluorescence in their bubbles, which can in turn be sieved from a larger background population (Griffiths and Tawfik 2003). As with all the other techniques described, amplification in vitro provides a selective advantage to the functional lipases by increasing their representation in the next generation.

3. DIRECTED EVOLUTION OF CELLS

The directed evolution of whole cells can yield both the simplest and most complex products. For example, the adaptation of cells to ferment beer (by producing ethanol) and help with baking (by producing carbon dioxide) is almost as old as human society, and researchers from Louis Pasteur onward have bred cells for industrial purposes. In 1928 Alexander Fleming discovered that the antibiotic penicillin was made by the fungus Penicillium notatum. While this discovery would in time change the world of medicine, further engineering was necessary. Strong demand for the drug during the Second World War necessitated increased production beyond what the organism naturally produced. To direct the evolution of the organism toward greater antibiotic production, cultures were subjected to X-ray radiation, which was known to mutagenize DNA, thereby accelerating the accumulation of genetic diversity. This mutagenized population was then screened for those cells that produced the most penicillin (Backus, Stauffer, and Johnson 1946), and these selected cultures produced enough penicillin to meet the demand. This use of random mutagenesis followed by screening for a desired phenotype is commonly referred to as strain improvement and is a now common practice in many industries in which the production of a particular product is dependent on microbial synthesis.

Experiments that could support these sorts of applications were initiated by Barry Hall, of the University of Rochester. Hall (2003) wondered what would happen if an enzyme of E. coli was deleted. This enzyme, β-galactosidase, was responsible for the ability of these cells to grow on the sugar lactose. When the enzyme was deleted, the cells could not grow on lactose, at least not initially. Over time, however, the cells began to grow slowly and, eventually, evolved to grow more rapidly, because another gene in the organism had accumulated mutations that allowed it to break down lactose. By focusing selective pressure on one function, Hall turned the entire organism into a vehicle for finding and improving a suitable enzyme.

While research into the directed evolution of cells led to a better understanding of the source, rate, and type of mutations that could lead to new or modified proteins, it was still difficult to target mutations and selection within a genome, especially for complex phenotypes like antibiotic production that required the adaptation of multiple genes and enzymes in parallel. Going well beyond Pasteur and his contemporaries required the development of tools that can recombine and modify individual sites in a genome, and advances in molecular biology that allow entire genomes to be sequenced. These modern techniques have had a dramatic effect on the time and effort required to generate improved bacterial strains.

Just as protein shuffling was developed to facilitate the accumulation of favorable mutations, other methods have been developed for shuffling entire genomes. Following a single round of classical strain improvement (random mutagenesis and screening or selection), cells are stripped of their cell walls (turned into protoplasts) and then induced to fuse with one another in what is essentially a multiparent mating event. Because a cell can generally accommodate only one genome, the genetic material from fused cells must be resolved into a single unit by the cell’s recombination and repair machinery. The recombination process generates mosaic genomes, thereby amplifying the population’s genetic diversity. This process can be repeated multiple times, allowing the improved strain to accumulate many additive and even synergistic mutations with respect to the desired phenotype. In a striking example of the power of this method, a group improved the production of the antibiotic tylosin by Streptomyces fradiae by about ninefold after only two rounds of genome shuffling that took roughly one year (Zhang et al. 2002). The resulting strains were found to produce as much tylosin as strains that had been independently subjected to 20 rounds of classical strain improvement over 20 years.

Other efforts have focused on reprogramming the cell as a whole by mutating the master regulators that the cell uses to control how and when proteins are made. Changes to these transcription factors simultaneously affect the levels of expression for many genes, thereby altering the levels of many proteins and broadly affecting how the cell operates and behaves. The altered regulatory program is presumably not optimal for the organism in its natural context but may be much more productive in an industrial setting. By screening mutant libraries of transcription factors, researchers have isolated strains with increased tolerance to industrial processes and by-products as well as enhanced the production of small molecules.

While genome shuffling and transcription-factor engineering can accelerate evolution, these processes still rely on nondirected changes in sequence or expression as their inputs. To produce a revolution in genome engineering on the scale of that already occurring in protein engineering, it was necessary to direct mutations to particular genes, an achievement that George Church and coworkers published as multiplex automated genome engineering, or MAGE (Wang et al. 2009) (plate 6). This technique involves synthesizing single-stranded DNA oligonucleotides corresponding to particular genomic locations. The oligonucleotides can enter the cells, and mutations engineered into the oligonucleotides may then be incorporated into the genome through recombination. By using an automated system to grow cells and deliver the mutant oligonucleotides, the researchers were able to achieve a high rate of mutation at several genomic sites in only a few cycles. This ability to target mutations to specific genomic locations enables researchers to precisely manipulate cells in ways that could be used to optimize the production of proteins or other molecules for industry, or even to rewrite the genetic code at large.

These amazing advances in genomic engineering require concomitantly amazing new analytical tools. With the development of so-called next-generation sequencing platforms that generate massive amounts of sequence data, the directed evolution of a microbial population can now be observed at the level of individual sequence changes within entire genomes. In 1988 Richard Lenski established his “long-term evolution experiment” by inoculating a strain of E. coli into minimal media and subsequently transferring a small amount of the previous culture into fresh media each day. Over time, the cells have become adapted to these environmental conditions, and their growth rate has accelerated. The genomes of the evolved organisms from various points through 40,000 generations were sequenced to determine the many underlying mutations responsible for this adaptation (Barrick et al. 2009). As the genomic engineering techniques described earlier are increasingly melded with large-scale acquisition of sequence data, the ability to understand and shape genomes will become commonplace.

4. THE FUTURE OF DIRECTED EVOLUTION

For the pioneers of directed evolution, including Spiegelman and Hall, the experimenter was in charge of directing the selection, whether it was in a population of molecules or organisms. These researchers had to take whatever random mutations the experimental system provided and then study which of these mutations led to changes and functions of interest. This process remains characteristic of the field of directed evolution, since many researchers are still addressing basic questions such as, What outcomes can be produced? What can we make? However, the field is now also beginning to develop models that are predictive, in part based on the paired abilities to direct where mutations occur and to analyze the repertoire of phenotypes associated with vast numbers of mutations. This process will accelerate as protein-structure analysis and prediction provide an understanding of why some mutations work and some do not, and as the field of systems biology begins to more precisely define cellular states. In turn, the shift from phenomenology to quantification and prediction promises to be one of the most exciting aspects of evolutionary biology in the future, and it should ultimately yield a mature discipline of evolutionary engineering.

See also chapter III.3, chapter III.6, chapter VIII.3, chapter VIII.5, and chapter VIII.8.

FURTHER READING

Backus, M. P., J. F. Stauffer, and M. J. Johnson. 1946. Penicillin yields from new mold strains. Journal of the American Chemical Society 68: 152.

Barrick, J. E., D. S. Yu, S. H. Yoon, H. Jeong, T. K. Oh, D. Schneider, R. E. Lenski, and J. F. Kim. 2009. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature 461 (7268): 1243–1247. A single E. coli strain was evolved over thousands of generations. The analysis of this evolutionary path continues to illuminate how bacterial genomes can adapt to new conditions.

Bartel, D. P., and J. W. Szostak. 1993. Isolation of new ribozymes from a large pool of random sequences. [See comment.] Science 261 (5127): 1411–1418. The first, surprising isolation of a large and complex ribozyme from a completely random sequence pool. The Class I Bartel ligase is to this day one of the fastest and most complex ribozymes ever discovered.

Ellington, A. D., and J. W. Szostak. 1990. In vitro selection of RNA molecules that bind specific ligands. Nature 346 (6287): 818–822. One of the first demonstrations that complex RNA molecules could be isolated from a completely random pool. The selected phenotype, specific dye binding, was unexpected.

Griffiths, A. D., and D. S. Tawfik. 2003. Directed evolution of an extremely fast phosphotriesterase by in vitro compartmentalization. EMBO Journal 22 (1): 24–35.

Hall, B. G. 2003. The EBG system of E. coli: Origin and evolution of a novel beta-galactosidase for the metabolism of lactose. Genetica 118 (2–3): 143–156. While strain improvement by directed evolution was known, Hall revealed the precise molecular underpinnings of the evolution of lactose utilization. Otherwise cryptic proteins in the E. coli genome can evolve novel functions.

Hanes, J., and A. Pluckthun. 1997. In vitro selection and evolution of functional proteins by using ribosome display. Proceedings of the National Academy of Sciences USA 94 (10): 4937–4942.

Roberts, R. W., and J. W. Szostak. 1997. RNA-peptide fusions for the in vitro selection of peptides and proteins. Proceedings of the National Academy of Sciences USA 94 (23): 12297–12302.

Saffhill, R., H. Schneider-Bernloehr, L. E. Orgel, and S. Spiegelman. 1970. In vitro selection of bacteriophage Q-beta ribonucleic acid variants resistant to ethidium bromide. Journal of Molecular Biology 51 (3): 531–539. Even before the advent of modern molecular biology techniques such as DNA synthesis and sequencing, Saffhill proved it possible to evolve phage RNAs in vitro for novel phenotypes.

Smith, G. P., and V. A. Petrenko. 1997. Phage Display. Chemical Reviews 97 (2): 391–410.

Stemmer, W. P. 1994. Rapid evolution of a protein in vitro by DNA shuffling. Nature 370 (6488): 389–391. Pim Stemmer and coworkers developed a radical new technique for in vitro recombination and consequently demonstrated remarkable enhancements in the speed of directed evolution.

Tuerk, C., and L. Gold. 1990. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249 (4968): 505–510.

Wang, H. H., F. J. Isaacs, P. A. Carr, Z. Z. Sun, G. Xu, C. R. Forest, and G. M. Church. 2009. Programming cells by multiplex genome engineering and accelerated evolution. Nature 460 (7257): 894–898. Until the development of the technique known as MAGE, it was impossible to carry out multiple, site-directed alterations to a bacterial genome in parallel. This technique and others like it should continue to accelerate classic methods for the directed evolution of bacteria.

Zhang, Y. X., K. Perry, V. A. Vinci, K. Powell, W. P. Stemmer, and S. B. del Cardayré. 2002. Genome shuffling leads to rapid phenotypic improvement in bacteria. Nature 415 (6872): 644–646.