V.7

Evolution of Gene Expression

Patricia J. Wittkopp

OUTLINE

  1. The importance of regulatory evolution: A historical perspective

  2. Finding expression differences within and between species

  3. Genomic sources of regulatory evolution

  4. Enhancer evolution

  5. Evolution of transcription factors and transcription factor binding

  6. Evolutionary forces responsible for expression divergence

Genetic changes affecting either the function or regulation of a gene product can contribute to phenotypic evolution. Studies of evolutionary mechanisms have historically focused on changes in protein-coding sequences, but during the last decade, multiple lines of evidence have shown that changes in gene expression are at least equally important. The last few years have brought great progress in understanding the genetic basis of expression differences within and between species. From a growing collection of single-gene case studies and comparative analyses of gene expression on a genomic scale, common themes and patterns in regulatory evolution have begun to emerge.

GLOSSARY

Chromatin. The higher-order complex of DNA, histones, and other proteins that packages nuclear DNA within a eukaryotic cell.

Chromatin Immunoprecipitation. A technique in which transcription factors are cross-linked to DNA, the DNA is sheared, and fragments binding to a specific transcription factor of interest are isolated using an antibody. Identity of the isolated DNA fragments can be assessed by PCR, microarrays, or sequencing.

Cis-Regulatory Element. A DNA sequence (such as an enhancer or promoter) located near the coding region of a gene and that has allele-specific effects on gene expression.

Co-option. Using existing functional parts of a genome for new purposes.

Ectopic Expression. Expression in cells that do not usually express the gene of interest.

Orthologous Genes. Homologous genes that diverged following a speciation event.

Pleiotropy. Occurs when a mutation or gene affects more than one phenotype.

Quantitative Trait Locus (QTL). A region of the genome shown to influence a (quantitative) phenotype of interest.

RNA Interference (RNAi). A technique in which short RNAs are used to interfere with the successful production of proteins for a gene of interest.

Transcription Factor. A protein that binds to DNA in a sequence-specific manner and affects transcription.

1. THE IMPORTANCE OF REGULATORY EVOLUTION: A HISTORICAL PERSPECTIVE

For most of the twentieth century, conventional wisdom among biologists was that, as François Jacob described it, “cows had cow molecules and goats had goat molecules and snakes had snake molecules, and it was because they were made of cow molecules that a cow was a cow.” Near the end of the twentieth century, however, it became clear that this was not the case. Species-specific genes exist (see chapter V.6), but they are the exception rather than the rule; much of the biological diversity seen in nature is produced by genes whose functions are highly conserved among species. Discovering this conservation was a boon to the medical genetics community, because it justified the use of model organisms such as fruit flies and mice to investigate human disease, but also presented a paradox: How can divergent traits be constructed using conserved genes?

The answer to this question is, in part, by modifying the regulation of gene expression. Expression of a gene is necessary before it can impact the phenotype of an organism; that is, the DNA sequence encoding a gene product must be transcribed into RNA and then (usually, but not always) translated into a protein before the gene can function in a cell. Each cell expresses only a subset of the genes in its genome, and the specific genes expressed determine a cell’s fate (see chapter V.11). In 1969, before the molecular details of gene regulation were known, Roy J. Brittan and Eric H. Davidson proposed a theory for the regulation of gene expression in eukaryotic cells. They viewed gene regulation as integral to evolution and suggested that differences among species could be attributable to changes in the regulation of gene expression. Six years later, Mary-Claire King and A. C. Wilson published a seminal paper showing that the amino acid sequences of homologous human and chimpanzee proteins appeared to be more than 99 percent identical. Based on this result, they argued that the degree of protein divergence was insufficient to account for the extensive morphological, physiological, and behavioral differences between these two species.

Despite these (and similar) predictions more than 35 years ago, the idea that changes in gene expression might be a common source of phenotypic divergence did not gain mainstream acceptance among evolutionary biologists until after the turn of the twenty-first century. Seeds of this acceptance were sown when developmental biologists, using newly developed tools for visualizing gene expression, began comparing expression among species. This approach catalyzed the expansion of evolutionary developmental biology, a field of research known today as evo-devo. Within a few years, researchers acquired many examples of cases in which divergent RNA and/or protein expression of genes known to be important for development correlated with morphological divergence between species. Such correlations suggest that the genetic changes responsible for altered gene expression might be the same changes responsible for altered phenotypes. In parallel, quantitative geneticists mapping the mutations responsible for phenotypic differences among individuals of the same species or (less commonly) different species were finding that changes in protein sequence could not always account for the phenotypic effect of a quantitative trait locus (QTL) (see chapter V.12).

2. FINDING EXPRESSION DIFFERENCES WITHIN AND BETWEEN SPECIES

Early comparative studies of gene expression focused on one or a small number of genes within or between species. These low-throughput types of studies were (and still are) critical for establishing links between divergent gene expression and divergence of a particular phenotype; however, they are not suitable for obtaining the genomic measures of expression required to identify global trends in the evolution of gene expression. Rather, microarrays, which are short DNA sequences complementary to transcribed sequences from a particular species arrayed onto a filter or a microchip, have been used to quantitatively compare the abundance of RNA from hundreds to thousands of expressed genes in the genome simultaneously. Today, microarrays are largely being replaced by a method known as RNA-seq that uses massively parallel sequencing to obtain quantitative measures of gene expression (i.e., RNA abundance). Techniques for measuring protein abundance (which is not always highly correlated with RNA abundance) on a genomic scale are also available (e.g., two-dimensional gel electrophoresis, mass spectrometry), but thus far they have not been used to compare protein expression genome-wide in an evolutionary context.

By contrast, the transcriptome (i.e., the collection of all RNAs expressed in a biological sample) has been analyzed in a wide variety of taxa, including human, mice, fish, flies, yeast, and plants. Comparing transcriptomes has shown that differences in RNA abundance are common both within and between species and that the number of genes showing expression differences between a pair of species is often proportional to their divergence time. For example, in one of the first published transcriptome comparisons between species, microarrays containing sequences complementary to approximately 12,000 human genes were used to measure mRNA abundance in the white blood cells, liver, and brain of humans, chimpanzees, orangutans, and macaques. Comparing expression in samples from three humans, three chimpanzees, and one orangutan showed extensive variation within both humans and chimpanzees. The extent of expression divergence between humans and chimpanzees was smaller than the divergence observed when either of these species was compared to the orangutan, suggesting that expression divergence correlates with phylogenetic distance. In the samples derived from brains, one human was found to differ more from another human than from a chimpanzee, but this type of relationship is rare: polymorphic gene expression within a species is typically less extensive than divergent gene expression between species.

In a slightly different experiment, macaques were used as an out-group, and gene expression in humans was found to have evolved faster in the brain than in the liver or blood. Although it is tempting to speculate that this apparently accelerated evolution of gene expression in the human brain may have contributed to the evolution of human-specific cognitive abilities, a reanalysis of these data that more completely modeled the sources of variance in the experiment found more genes with differential expression in the liver than in the brain between humans and chimpanzees. This example illustrates the potential tremendous impact of statistical analysis methods on the conclusions drawn from this type of work. Particularly problematic in this case (and in other cases where a microarray with sequences from one species is used to compare expression between species) is accounting for the effects of sequence divergence between the microarray probes and the heterologous species. The newer RNA-seq method of quantifying and comparing RNA abundance among species circumvents this problem, but presents its own set of challenges for proper data analysis and interpretation.

3. GENOMIC SOURCES OF REGULATORY EVOLUTION

Heritable differences in the distribution of RNA or protein within or between species often result from changes in the sequence of genomic DNA. To understand the types of sequences in the genome that can be mutated to alter gene expression, one must consider the molecular mechanisms controlling transcriptional and posttranscriptional regulation of gene expression. Within prokaryotes and eukaryotes, these mechanisms are highly conserved, but they differ significantly between the two groups. The remainder of this chapter focuses solely on transcriptional regulation in eukaryotes because it has been studied most extensively in an evolutionary context. Also, the term gene expression is used synonymously with transcription from this point forward.

When, where, and how much mRNA is produced from a particular gene is determined by its cis-regulatory DNA sequences as well as the trans-regulatory transcription factor proteins and noncoding RNAs present in a cell. These cis-regulatory DNA sequences include the basal promoter that binds to RNA polymerase and its associated cofactors as well as one or more enhancers that encode instructions for spatiotemporal expression and the amount of mRNA to produce (figure 1). Basal promoter sequences are located near the transcriptional start site and are more highly conserved than enhancer sequences, because they bind to transcription factors such as the TATA-binding protein required for transcription of most genes. Enhancer sequences typically comprise a few hundred base pairs, can be located upstream (5’), downstream (3’), or in an intron of the associated gene (figure 1), and are bound by transcription factors that activate expression from the basal promoter in a subset of cells or under a subset of environmental conditions. In multicellular eukaryotes, expression of a gene tends to be controlled by multiple enhancers, each acting independently and controlling expression in a particular place, time, or environment. Because of their more limited effects on an organism (i.e., lower pleiotropy), enhancers are commonly thought to be more likely to harbor mutations that survive in natural populations and give rise to polymorphism and divergence than mutations in basal promoters or coding sequences of transcription factors.

img

Figure 1. Basic eukaryotic gene structure. Cis-regulatory sequences include enhancers and the basal promoter. Most transcription factors (TFs) bind to sequences in enhancers, and transcription factors that compose the RNA polymerase II complex bind to sequences in the basal promoter (neither are shown).

Chromatin can also have cis-regulatory effects on gene expression. Like the rest of the genome, cis-regulatory sequences are wrapped around histones and packaged into nucleosomes that form chromatin structure. The state of chromatin influences interactions between cis-regulatory sequences and trans-regulatory factors, thus it is also an important component of transcriptional regulation. Methods suitable for comparing chromatin structure within and between species have recently become available and researchers are investigating how chromatin structure evolves, as well as how this evolution impacts gene expression. Many transcription factors are known to modify chromatin, for example, by acetylating or deacetylating histones, so changes in cis-regulatory sequences affecting binding of such transcription factors could be responsible (at least in part) for differences in chromatin structure when they are observed (see chapter V.8).

Determining whether an expression difference between two genotypes is caused by genetic changes in cis- or trans-regulation can be done using transgenic analysis, allele-specific expression, or genetic mapping. In the first two cases, activity of homologous cis-regulatory sequences controlling a divergent expression pattern of interest are compared in the same cellular environment (i.e., when regulated by the same set of trans-acting factors). If a difference in the activity of the two cis-regulatory sequences is observed, this indicates that there has been functional cis-regulatory divergence. This test can be performed by using transgenes to move cis-regulatory sequences from species A into the trans-acting genetic background of species B (and vice versa) (figure 2A) or by simply crossing the two genotypes and testing for differences in allele-specific expression when the two cis-regulatory alleles are in the same heterozygous trans-acting genetic background (figure 2B). Putatively cis- and trans-acting changes can also be inferred from genetic mapping, in which regions of the genome contributing to the expression difference of interest are identified. If such a region is located close to the affected gene, it is assumed to act in cis; if such a region is located far from the affected gene, it is assumed to act in trans.

img

Figure 2. Determining whether divergent expression is due to a change in cis- and/or trans-regulation. (A) Transgenic analysis can distinguish between cis- and trans-regulatory divergence by comparing the activity of orthologous cis-regulatory elements (CREs) in the presence of the same set of transcription factors. This can be done by creating a pair of artificial genes, each with a CRE controlling expression of a protein that is easy to detect. These so-called reporter genes are then introduced into the genomes of the two species from which the CREs were derived. Different patterns of expression are expected if cis- or trans-regulatory changes occur between species; a hypothetical example of this is shown in which each box represents a region of tissue, and gray represents either native expression in species A and B or the expression of the CRE tested in each species using a reporter gene. (B) Measures of allele-specific RNA abundance can also be used to distinguish between cis- and trans-regulatory changes in diploid organisms. Schematic representations of cells from two different (homozygous) genotypes (two different species or two different genotypes from the same species) are shown. A schematic cell from an F1 hybrid produced by crossing genotype 1 and genotype 2 is also shown. In each cell, two copies of a gene are shown with the transcribed region indicated by a gray rectangle and the promoter location indicated by an arrow. The solid black line represents DNA, including the CRE, as indicated. Circles and triangles represent two different transcription factor proteins, each of which is present in multiple copies per cell. Hypothetical numbers of RNA molecules produced by each allele in each cell (# RNAs) is also shown. The F1 hybrid contains a CRE allele and transcription factors from each of its parental genotypes. If the expression difference observed between genotypes 1 and 2 is due solely to cis-regulatory changes (i.e., the trans-acting transcription factors are equivalent between genotypes), each allele produces the same number of RNA molecules in the F1 hybrid as it did when homozygous in genotype 1 or 2. If, on the other hand, the cis-regulatory sequences are functionally equivalent between alleles and the difference in RNA abundance observed between genotypes 1 and 2 results from differences in trans-acting factors between genotypes, the two CRE alleles in the F1 hybrid will produce an equal number of RNA molecules, with the precise number (15, in this example) determined by the specific type of trans-regulatory divergence. Combinations of cis- and trans-regulatory changes are also possible, with the cis-regulatory difference always reflected in relative expression between the two alleles in the F1 hybrid.

As a group, studies using transgenes to investigate regulatory evolution provide evidence for both cis- and trans-regulatory changes underlying expression divergence, with cis-regulatory divergence detected most often. Allele-specific expression has been used to examine sources of polymorphic and divergent expression genome-wide in flies, yeast, and plants, and these data suggest that trans-acting variation is the predominant source of expression differences among individuals of the same species, whereas cis-regulatory changes play a larger role in expression divergence between individuals of different species. Genetic mapping of expression differences has thus far been limited to variation within a species, but it also shows an abundance of variants with apparent trans-acting effects on gene expression segregating within a species. Both genetic mapping and allele-specific tests show that although cis-regulatory polymorphisms within a species are more rare than trans-regulatory polymorphisms, they tend to have larger effects on expression and are less likely to be recessive than trans-regulatory variants. The additivity of cis-regulatory mutations, combined with the expected lower levels of pleiotropy relative to trans-acting mutations, may also make them more likely to contribute to phenotypic evolution.

4. ENHANCER EVOLUTION

As described above, enhancer sequences are an important source of evolutionary change. This class of cis-regulatory sequences has been studied in the most detail during the last decade, and these studies have revealed a complex relationship between the DNA sequence and function of an enhancer. The evolution of enhancer sequences that are functionally conserved, functionally divergent, and those that have acquired novel activities are discussed below.

Because enhancer sequences are critical for the proper development and physiology of an organism, most mutations that alter their activity are expected to be deleterious and removed from a population by purifying selection. Consequently, enhancer sequences should be more highly conserved than surrounding nonfunctional DNA. In fact, they are, and this conservation is a helpful tool for finding enhancers within a genome (see chapter V.3). The degree of sequence conservation in an enhancer is typically lower than that of protein-coding sequences, however, because of the structure-function relationship of enhancers: the same enhancer activity can be produced by multiple arrangements of transcription factor binding sites, and most transcription factors binding sites are degenerate, meaning that the same transcription factor can bind to multiple sequences.

This complex relationship between enhancer sequence and function has been nicely illustrated by comparative studies of two Drosophila enhancers whose activity appears to be conserved between species. In the first case, the DNA sequence and transcription factor binding sites of an early embryonic enhancer (controlling “stripe 2” expression of the even-skipped gene) have been extensively changed between species, yet the function of the elements remains the same. Orthologous cis-regulatory elements from D. melanogaster and D. pseudoobscura had similar activities in transgenic D. melanogaster, whereas chimeric enhancers containing the 5’ half from one species and the 3′ half from the other showed abnormal activity. Similarly, extensive rearrangement of transcription factor binding sites was found in an enhancer driving conserved expression in the developing eye of Drosophila. The D. melanogaster allele of this enhancer has been extensively analyzed, allowing predictions to be made about the consequences of some observed changes. These types of analyses provide insight not only into evolutionary processes but also into enhancer architecture in general.

If enhancer sequences can change extensively and still retain their original function, how much does an enhancer need to change to acquire new activities? A number of studies have been published during the last few years, most notably from the laboratories of Sean B. Carroll, David M. Kingsley, and David L. Stern, that are suitable for addressing this question (see chapter V.12). In some cases, as little as a single nucleotide change is sufficient to account for the divergent activity of an enhancer, whereas in others, multiple mutations (on the order of 10 or fewer) are responsible for expression differences. In addition to single nucleotide changes, larger lesions also contribute to divergent activity. For example, in the threespine stickleback, recurrent deletions that disrupt the activity of an enhancer contribute to the repeated loss of pelvic structures in freshwater populations. In Drosophila, deletions in an enhancer of the desatF gene have been shown to contribute to expression divergence by (surprisingly) creating novel binding sites for an unknown transcription factor that activates expression. In the few cases where multiple changes have been implicated in expression divergence and their effects tested individually, the substations have been found to interact in a nonadditive (i.e., epistatic) fashion.

The majority of work on enhancer evolution has focused on cases in which enhancer activity is either conserved or divergent. But what about new enhancers? How do they evolve? Simulations suggest that new point mutations could frequently generate novel transcription factor binding sites and that they could fix over microevolutionary timescales, even in the absence of selection. This suggests that new enhancers driving novel expression patterns might frequently arise de novo. Despite this finding, all the cases of (putatively) novel enhancers characterized to date appear to have evolved using other mechanisms (i.e., duplication and divergence, transposition, or co-option), with co-option (i.e., repurposing) of existing regulatory elements the most common mechanism—in both fruit flies and primates, cis-regulatory sequences controlling novel expression patterns have been shown to include sites required for one or more preexisting enhancers.

5. EVOLUTION OF TRANSCRIPTION FACTORS AND TRANSCRIPTION FACTOR BINDING

To function, cis-regulatory sequences must be bound by transcription factors (TFs), which are proteins that bind to specific DNA sequences and influence (i.e., either activate or repress) transcription. Molecularly, TFs typically contain a DNA binding domain, one or more protein-protein interaction domains, a transcriptional activation or repression domain, and sometimes a chromatin modification domain. As a group, genes encoding TFs are among the most highly conserved in eukaryotic genomes, especially in their DNA binding domains. This high degree of similarity among species is seen not only in terms of protein sequence but also with functional tests. Perhaps the most seminal of these tests showed that the Drosophila eyeless and mouse Pax-6 genes are orthologous genes, and that ectopically expressing either of them in developing Drosophila wings or legs was sufficient to transform cells into ectopic eyes. Importantly, both the Drosophila and mouse alleles of this gene induced similar morphological transformations, with cell types and organizational structures resembling the normal Drosophila eye. This study, and others like it that followed, demonstrated that development is often controlled by highly conserved master regulatory proteins.

Conserved master regulatory proteins such as Pax-6 can create divergent structures by regulating different sets of target genes in different species. Changes in the identity of target genes are mediated by the evolution of TF binding, resulting primarily from changes in cis-regulatory sequences. Recently, techniques for monitoring the binding of a particular transcription factor genome-wide have been developed, and comparative studies show that the gain and loss of TF binding sites is very common among species. Between closely related species, changes in the quantitative binding of a TF to a particular site rather than the gain or loss of individual binding sites appears more prevalent. In the next few years, these types of experiments, which rely on chromatin immunoprecipitation, will likely be combined with genomic measures of gene expression and cis-regulatory sequence divergence to provide a more complete understanding of how changes in DNA sequence impact TF binding, and how this in turn affects gene expression.

As described above, many of the TFs functionally tested in vivo show conserved functions between species, but this is not always the case—even for highly pleiotropic regulators of development. For example, the function of the HoxA-11 protein has acquired a novel function required for pregnancy in placental mammals, and the Hox genes fuzi tarazu and Ultrabithorax have diverged between Drosophila melanogaster and other insects. In each case, the proteins seem to have retained some ancestral functions while gaining and losing others.

6. EVOLUTIONARY FORCES RESPONSIBLE FOR EXPRESSION DIVERGENCE

With differences in mRNA expression cataloged for a variety of species, researchers are now faced with the daunting task of figuring out what these expression differences mean for organismal phenotypes, especially fitness. Classic genetic mutants and reverse genetic techniques such as RNA interference (RNAi) can be used to assess the function of individual genes, but these techniques are rarely able to predict the consequences of the quantitative changes in expression commonly found in nature. To complicate matters further, mRNA levels do not always correlate with protein abundance, and similar changes in expression of different genes will almost certainly have different effects; for example, a 10 percent change in expression of one gene might have a larger effect on the phenotype than a 1000 percent change in expression of another gene. Connecting changes in gene expression to specific phenotypes is currently best done by studying one gene and one phenotype at a time; however, high-throughput phenotyping strategies currently being developed should soon make it possible to address this question more systematically.

Knowing the impact of a change in gene expression on fitness can help determine the likelihood that the change resulted from natural selection. Assessing the relative roles of neutral and nonneutral processes is a major challenge for evolutionary biology in general. To date, three main strategies have been used to investigate the role of natural selection in the evolution of gene expression: the comparative method, tests of neutrality, and empirical patterns (see chapter V.14). In the comparative method, evidence of natural selection is inferred when a change in expression is found to correlate with an environmental or other biological factor in a manner that exceeds the correlation expected simply because of shared ancestry. To use tests of neutrality, patterns of regulatory evolution expected from neutral processes must be specified or inferred from the data available. Studies of mutational variance for gene expression provide a starting point for developing these neutral models, but much more remains to be learned about the neutral expectations for regulatory evolution. Finally, empirical patterns, especially comparisons between polymorphism and divergence for expression of a particular gene, capture elements of regulatory variation that cannot easily be incorporated into neutral models. With this approach, one or more representative “baseline” genes assumed to be evolving neutrally are used as references to test for selection, but it is generally not clear which genes should be considered to be evolving neutrally. Presently, there is no consensus about the relative roles of selection and drift in shaping regulatory evolution, although most species show a strong signal of stabilizing selection within a species, indicating that expression levels do matter for fitness.

Regardless of the evolutionary forces underlying the evolution of gene expression, understanding how this important molecular phenotype evolves is a critical component of understanding how the evolutionary process works. The pressing question is no longer whether changes in gene expression contribute to phenotypic evolution but rather when and how they do. The development of many new tools for studying gene expression combined with the recent rapid accumulation of expression and transcription factor binding data suggest that researchers may be able to answer these questions soon.

FURTHER READING

Carroll, S. B. 2005. Evolution at two levels: On genes and form. PLoS Biology 3: e245. In honor of the thirtieth anniversary of the seminal work of King and Wilson, who provided some of the earliest experimental evidence of the importance of changes in gene expression for evolution, one of the current leaders of the evo-devo field takes a look at the experimental evidence supporting this assertion today.

Carroll, S. B., J. K. Grenier, and S. D. Weatherbee. 2004. From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design. 2nd ed. New York: Wiley-Blackwell. This book describes the central role of gene regulation in development and evolution with accessible discussions of specific case studies beautifully illustrated.

Davidson, E. H. 2006. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution. San Diego, CA: Academic. This book provides a comprehensive summary of the data and logic behind the assertion that gene regulatory networks are essential for development and play a critical role in evolution.

Halder, G., P. Callaerts, and W. J. Gehring. 1995. Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila. Science 267: 1788–1792. This landmark study demonstrated the remarkable conservation of regulatory proteins by showing that homologous genes from species as diverse as fruit flies, mice, and humans all had similar effects on development when introduced into the fruit fly.

Stern, D. L. and V. Orgogozo. 2008. The loci of evolution: How predictable is genetic evolution? Evolution 62: 2155–2177. After clearly describing the rationale and alternative models for regulatory evolution, this review provides the most thorough summary to date of studies identifying the types of genetic changes responsible for a divergent phenotype, contrasting the relative frequency of changes attributable to coding and to regulatory mutations.

Wittkopp, P. J., and G. Kalay. 2012. Cis-regulatory elements: Molecular mechanisms and evolutionary processes underlying divergence. Nature Reviews Genetics 13: 59–69. This review uses data from case studies revealing the genetic and molecular changes responsible for divergent cis regulatory sequences to examine how these types of sequences evolve and to gain insights into more general questions about the mechanisms of evolution.

Wray, G. A., M. W. Hahn, E. Abouheif, J. P. Balhoff, M. Pizer, M. V. Rockman, and L. A. Ramano. 2003. The evolution of transcriptional regulation in eukaryotes. Molecular Biology and Evolution 20: 1377–1419. Despite being nearly a decade old, this review remains one of the most complete discussions of the mechanics of gene regulation and its relationship to evolution.