Genes, Genomes, Phenotypes
Hopi E. Hoekstra and Catherine L. Peichel
Darwin’s 1859 theory of evolution by natural selection has three main tenets: (1) phenotypes vary among individuals, (2) phenotypic differences lead to differential fitness, and (3) these fitness-related phenotypes are heritable. Over decades, Darwin amassed huge amounts of data on natural variation and its effects on organismal fitness. By contrast, he knew almost nothing about heredity. While he knew phenotypes were inherited (i.e., that offspring resemble their parents), he had no knowledge of the mechanism by which this occurred. He acknowledged this missing link in his argument for evolution, and when pushed, devised a theory for the mechanism of inheritance (i.e., pangenesis), which was one of his few major errors. Of course, it was only later, in 1900, that Gregor Mendel’s experiments, which elucidated the laws of inheritance, were unearthed. Subsequent discoveries, such as Thomas Hunt Morgan’s experiments in Drosophila (1915) demonstrating that genes are carried on chromosomes, elucidated the material basis for heredity, and the discovery of the three-dimensional DNA structure by Watson and Crick (1953) showed that DNA is the molecule of inheritance. These seminal findings were central to the rise of molecular biology, and more recently of genomics, as powerful tools in all areas of biological inquiry, including evolution. Most certainly, Darwin couldn’t have envisioned the “era of genomics,” but he surely would be most pleased that genomic data further supported his theory.
Our ability to comprehensively describe genetic variation by sequencing complete genomes, the basic blueprint of an organism, represents an extraordinary technological advance that is remaking the field of biology in general, and evolutionary biology in particular. The rate at which whole genome sequences are being generated is astonishing. This enterprise started more than 30 years ago, when the modest 5386-base-pair genome of a bacteriophage was decoded. This achievement was quickly followed by the sequencing of several other, larger viral genomes. But sequencing free-living organisms with larger genomes required technological and computational advances, and it took almost another two decades to complete the entire genome sequence of Haemophilus influenzae (1.8 million base pairs). Fewer than six years later came the first complete human genome sequence (2.91 billion base pairs) at a cost of nearly $100 million. However, as the acquisition rate of complete genome sequences has increased, costs have decreased by four orders of magnitude, so that sequencing a complete human genome today costs roughly $10,000 and is expected to cost even less in the near future.
Arguably, all species now have the potential to be “genome enabled,” and comparisons of these genome sequences—both within and between species—is shedding unprecedented light on evolutionary processes. As the chapters in this section demonstrate, biologists now have the opportunity to observe evolution at a fundamental level; that is, to know which genotypes change over time. And biologists can obtain sequences not only from extant organisms but also from extinct species and historical specimens of existing species, allowing changes in genotype to be observed over time (see chapter V.15). Changes in genotype can occur on a number of levels, affecting the whole genome (see chapters V.2 and V.3), whole chromosomes (see chapter V.4), gene number and content (see chapters V.5 and V.6), gene expression (see chapters V.7 and V.8), and interactions among genes (see chapter V.9). These changes in genotype are translated into changes in phenotype through the process of development, and the new field of evolutionary developmental biology seeks to understand these connections (see chapters V.10 and V.11). Biologists are also concerned with determining which nucleotide differences in genome sequence actually contribute to differences in phenotype (see chapters V.12 and V.13). However, connecting genotype to phenotype is not enough for a complete understanding of evolution; it is also crucial to learn how evolutionary processes such as genetic drift, mutation, and particularly natural selection influence changes in genotype at a specific gene or across the entire genome over time (see chapters V.1 and V.14).
Our ability to sequence the complete genome of (almost) any individual has brought many surprises. Before the human genome sequence was revealed, guesses at the number of protein-coding genes in the genome ranged widely, from as few as 25,000 to as many as 120,000 genes. While a few estimates were close, no one had imagined that humans carry only about 23,000 genes, approximately the same number as most other eukaryotes. Most of the genome, in fact, comprises noncoding DNA (e.g., transposable elements, untranslated regions, and introns), and it is variation in this “other stuff” that is largely responsible for differences in genome size among species. Understanding genome dynamics—the processes responsible for the evolution of genome complexity—remains an exciting area of study (see chapter V.2). And with the sequencing of the human genome came the realization that comparisons of genomes across diverse species would greatly facilitate the identification of genes and regulatory elements. Comparative genomics might also provide an approach to finding regions of the genome important for phenotypic evolution, such as those evolving extremely rapidly, or those that are ultraconserved, which may be indicative of their functional significance (see chapter V.3). Genomics is also providing new insights into an unusual region of the genome: the sex chromosomes, which are inherited differently in the two sexes. How such sex chromosomes, including their gene content and gene expression levels, evolve is an exciting question, especially given the diversity of sex-determining mechanisms identified across species (see chapter V.4).
Despite the fact that change in gene number is not the major driver of genome size evolution, comparative genomics has revealed that gene content can vary by two orders of magnitude across species. Several mechanisms generate variation in gene number, including whole genome or whole chromosome duplication, as well as duplications of individual genes (see chapter V.5). Such gene duplicates are retained at early stages in their evolution if the original functions of the parent gene are divided between the parent gene and the duplicate; the gene duplicates can then evolve new functions at later stages of evolution. Although gene duplication is the primary source of new genes, additional mechanisms for generating new genes do exist, and new genes can even arise de novo (see chapter V.6). New genes arise at a surprisingly high rate, and recent evidence demonstrates that even very young genes have evolved essential functions within species.
However, gene duplications or new genes are not absolutely required for a new function or phenotype to evolve; often, changes in where or when a particular gene is expressed (i.e., “turned on”) are involved (see chapter V.7). Heritable genetic changes, either in proteins that regulate gene expression or in the DNA sequences they bind, can lead to the evolution of gene expression differences among species. Although evolutionary biologists are traditionally accustomed to thinking about the contribution of gene expression to evolution, recent research has demonstrated that changes in gene expression can also occur by epigenetic changes (i.e., changes can be stably transmitted across generations in the absence of change to DNA; see chapter V.8). The contribution of epigenetic change to evolutionary change has not yet been fully demonstrated, but investigating how genetic and/or epigenetic changes in gene expression contribute to phenotypic evolution is a promising area of future research.
When considering the contribution of a genetic or epigenetic change to phenotypic evolution, it is imperative to remember that genes do not act in a vacuum but interact with other genes in complex genetic networks (see chapter V.9). Recent technological advances have allowed biologists to investigate entire networks of genes, rather than analyzing only a single gene at a time. A surprising finding of such analyses is that biological networks are not randomly organized, suggesting that constraints imposed by the structure of genetic networks might influence evolutionary trajectories. Consideration of possible genetic and developmental constraints is also an important component of the study of evolutionary developmental biology, or evo-devo (see chapter V.10). In particular, organisms in which similar phenotypes have evolved repeatedly in response to similar environments provide powerful experimental systems to investigate the relative importance of natural selection and developmental constraints during phenotypic evolution. Coupled with studies of the molecular pathways and networks that underlie particular phenotypes, new insights are being gained into the ways in which the genome-level changes discussed above (e.g., genome size, gene number, gene expression, gene interactions) are translated through the process of development into phenotypic changes during evolution (see chapter V.11).
Although genome sequencing allows biologists to catalog nearly all the genetic changes that occur between species, a remaining challenge is to determine which of these genetic changes are responsible for the phenotypic changes observed between species. Work over the past decade in a few systems has begun to provide insights into the types of genetic changes that underlie phenotypic evolution, particularly for morphological traits (see chapter V.12). The challenge now is to widen this search to include additional phenotypes, such as behaviors and life history, and to additional organisms. This will allow evolutionary biologists to determine whether the genetic changes underlying phenotypic changes are indeed predictable, or whether the lessons learned so far are idiosyncrasies of the organism or phenotype studied. New technological and analytical tools will make this challenge easier to meet, particularly in organisms not amenable to genetic studies in the laboratory, and for traits without a simple genetic basis, that is, complex traits (see chapter V.13).
The ultimate challenge is to ask whether phenotypic changes observed between species are adaptive, that is, whether selection has played a role in their evolution. Once the genetic changes responsible for a particular phenotypic change are identified, it is possible to use the tools of molecular evolution to determine whether the genetic changes underlying phenotypic evolution are evolving neutrally, or under natural selection (see chapter V.1). This top-down approach starts with the phenotype, then identifies the underlying gene, and finally tests for molecular signatures of selection in the pattern of nucleotide variation in these genes. A complementary approach is to first identify the locations in the genome that appear to be under selection (see chapter V.14) and then work up to phenotype. This bottom-up approach is rapidly being used in a variety of systems because of the relative ease and low cost of sequencing whole genomes; however, these population-genomic studies still must connect the genomic regions under selection with actual phenotypes. While this remains challenging, such studies have already provided important new insights into the effects of natural selection at the level of the genome.
Evolutionary biology is in large part about reconstructing the past. While fossils provide a direct glimpse into the past, the fossil record is largely incomplete. A complementary approach is to compare extant organisms (or their genomes) and then infer ancestral states; however, genomic approaches offer yet another glimpse into the past via the sequencing of ancient DNA (see chapter V.15). Specifically, DNA can be extracted from long-deceased individuals (e.g., ancient humans) or extinct species (e.g., woolly mammoths or Neanderthals), and then sequenced at a candidate gene to gain information on a particular trait, at a large number of noncoding regions to infer demographic history, or increasingly across the entire genome, to more fully elucidate both past phenotype and past demography. Such studies will clearly continue to shed light on phenotypic traits, genetic origins, and biological relationships of now-extinct individuals to present-day populations and species. An increasing number of ancient DNA sequences provide another approach to deducing past events that will stand alongside discoveries from the fossil record.
The “era of genomics” is truly an exciting time for evolutionary biology, because there is now an unprecedented opportunity to directly answer fundamental and long-standing questions about the genetic basis of Darwin’s theory of natural selection. Evolutionary biologists can now identify change across the genome over time, determine the phenotypic effects of these genetic changes, and directly assess the role of natural selection in the evolution of genes, genomes, and phenotypes.