V.9

Evolution of Molecular Networks

Mark L. Siegal

OUTLINE

  1. Network representations of biological data

  2. Global organization of biological networks

  3. Evolution of global network organization

  4. Local organization and dynamics of biological networks

  5. Evolution of local network organization

  6. The future of evolutionary systems biology

The importance of interactions between genes has been evident to biologists since the rediscovery of Gregor Mendel’s work at the turn of the twentieth century. Indeed, William Bateson coined the term epistasis in 1907 to refer to the masking effect of a variant at one locus on a variant at another locus. Notably, this was two years before Wilhelm Johannsen coined the term gene. The study of genetic interactions remains central to many fields, from developmental biology to human genetics to evolution. No longer limited to a single pair of genes at a time, scientists are using new technologies to conduct systematic and comprehensive assays of interactions between biomolecules of various kinds. The large-scale data sets produced by such experiments require new methods of analysis. The tools and perspectives of graph theory, in which collections of objects are represented as networks of pairwise interactions, have become particularly important in organizing and analyzing biological data. Studies of model organisms have revealed that biological networks have very different patterns of connectivity than would be expected if their parts were connected at random. This departure from randomness is true both at the global level (considering all interactions of a specified type) and at the local level (considering small subsets of interactions). Understanding the sources of this nonrandomness is a major challenge for evolutionary biologists. Meeting this challenge will not only yield insights into genome organization but also more broadly impact fundamental, long-standing debates in evolutionary biology. These include debates over the existence of developmental constraints, and over the relative importance of adaptive versus nonadaptive processes in evolution.

GLOSSARY

Degree. The number of edges connecting to a node.

Edge. An interaction between two nodes in a network.

Feedforward Loop. A network motif in which two transcription factors jointly regulate a target gene and one of the transcription factors also regulates the other.

Global Network Organization. The pattern of connections of an entire network, as summarized in statistics such as the frequency distribution of node degree.

Homologous Genes. Genes that are related by descent from a common ancestral DNA sequence.

Local Network Organization. The pattern of connections of a subset of nodes in a network.

Modularity. The extent to which biological functions are divided into modules, which are defined as sets of interacting components whose functions are relatively independent of the functions of other such sets.

Network Motif. A subset of nodes connected to each other in a particular pattern.

Node. A component of a network that enters into pairwise interactions with other such components.

Posttranscriptional Gene Regulation. Cellular processes acting on an RNA molecule between the time of its transcription and its translation into protein that alter the ultimate abundance through time and space of the encoded protein.

Posttranslational Gene Regulation. Cellular processes acting on a protein molecule that alter the ultimate abundance and activity of the protein through time and space.

Protein Essentiality. The necessity of a protein for viability, usually determined by testing whether organisms can survive deletion of the gene that encodes the protein.

Subcircuit. A subset of interconnected nodes that performs a specific biological function.

Transcription Factor. A sequence-specific DNA-binding protein that activates or represses expression of a gene by, respectively, increasing or decreasing the probability that RNA polymerase will transcribe the gene.

1. NETWORK REPRESENTATIONS OF BIOLOGICAL DATA

All aspects of cell function require controlled interactions between biomolecules. Certain types of interaction have garnered more interest than others because they undergird the organization and logic of cellular processes. For example, protein-protein interactions are especially important because proteins must physically interact with each other to form macromolecular complexes. Such complexes perform essential activities within the cell, such as replicating DNA, transcribing RNA, and transporting cargo from one cellular compartment to another. Protein interactions also mediate communication between cells via signal-transduction pathways. Interactions between proteins and DNA are also critically important. Proteins must interact with DNA to achieve proper packaging of chromosomes and proper cell division, as well as to control when and where transcription happens. Because biological interactions underlie all aspects of cellular structure and information flow, understanding the ways in which these interactions change through time is central to understanding the evolution of form and function.

Many types of interaction can now be systematically investigated by highly parallel experimental platforms. Any collection of interactions can be represented as a network. The first comprehensive set of biomolecular interactions to be represented as a network was that of the enzyme-catalyzed biochemical reactions that compose intermediary metabolism. The familiar wall chart of these biochemical pathways, created by Gerhard Michal in 1968, was not the product of high-throughput experiments but instead a summary of decades of biochemistry research; still, it is entirely modern in form and serves as an excellent illustration of the elements of network representation. A network is defined by its nodes and its edges (figure 1A). The nodes are the components that enter into pairwise interactions. In the network of intermediary metabolism, the nodes are metabolites—the substrates and products of enzymatic reactions. Edges represent the interactions. In the network of intermediary metabolism, each edge is an enzymatic reaction that converts a substrate into a product.

img

Figure 1. Network terminology and organization. (A) A network comprises nodes (circles) connected by edges. An edge can be directed (arrows, top), when the entity represented by one node acts on or is converted into the entity represented by another node, as in regulatory or metabolic networks. Alternatively, an edge can be undirected (lines, bottom) when the interaction between the two connected entities is symmetrical, as in protein-interaction networks. (B) Global network organization can be described in part by the distribution of node degree (number of interactions per node). In a random network (top), node degree follows a Poisson distribution, whereas in a scale-free network (bottom), node degree follows a power-law distribution with negative exponent. An example of each type of network is shown (left). The characteristic distributions are shown as log-log plots of the number of nodes with each degree (right). (C) A feedforward loop is an example of a network motif. Two transcription factors (X and Y) jointly regulate a target gene (encoding protein Z) and one of the transcription factors (X) also regulates the gene encoding the other (Y) (top). When X and Y are activators, the dynamics of Z expression (bottom) will depend on whether both activators are necessary for target-gene expression (i.e., AND logic applies) or one is sufficient (i.e., OR logic applies). The difference is illustrated by a case in which X is expressed in three brief pulses followed by a sustained pulse (solid bars). If AND logic applies, then Z is expected to be activated only by the sustained pulse of X and to shut down quickly when X is no longer expressed (solid curve). By contrast, if OR logic applies or if the regulation of Z by X is removed to leave a simple linear pathway, then the loop would not filter out short pulses of X and would shut down more slowly (dashed curve).

There are several key points about network representation that are raised by the biochemical pathways wall chart. First, a given network may have either directed or undirected edges (figure 1A). A directed edge, drawn as an arrow from one node to a second node, implies that the entity represented by the first node acts on or is converted into the entity represented by the second node. An undirected edge, drawn as a line without arrowheads, implies symmetry in the interaction, as is the case in a physical association between two proteins (if A touches B, then B touches A). In the biochemical pathways network, edges are directed—substrates are converted into products; note, however, that in the wall chart, many edges run in both directions, because many enzymatic reactions are reversible.

Second, there is not necessarily one “correct” way to represent a set of biological interactions as a network. The choice to represent metabolites as nodes and reactions as edges is just that—a choice. One could instead represent enzymes as nodes and draw edges between enzymes that share substrates or products. Any network is an abstraction that makes salient some features of a system at the expense of others. For example, a large number of enzymatic reactions include ATP as a substrate or product, yet in the wall chart ATP is not depicted as a single node in the same way that other metabolites, such as glucose-6-phosphate, are. Instead, ATP is repeated throughout the network, wherever a reaction consumes or produces it. This change has no effect on the underlying mathematical representation of the network (e.g., one could easily look up the number of reactions that produce ATP if one wanted to), but it does give a very different visual impression than if ATP occupied a single node (with a very large number of edges connecting to it).

This leads to the third key point, which is that networks are simultaneously mathematical and visual objects. Statistical analyses, such as those described in subsequent sections, are performed on the mathematical object, which usually takes the form of a connectivity matrix. While it might often be clear when a visual representation departs from a strict definition of what constitutes a node or an edge (such as when ATP does not occupy a single node), it can be less obvious what decisions went into defining the connectivity matrix. For example, many types of biological data come from experiments that are prone to error. When gazing at a network, or especially when considering statistical analyses of a network, it is imperative to understand the experimental properties and arbitrary decisions that went into determining whether a node or edge was included.

2. GLOBAL ORGANIZATION OF BIOLOGICAL NETWORKS

Although the first network representation of biomolecular interactions dates back to 1968, the first graph-theoretical analyses of biomolecular networks were reported 32 years later. Perhaps not surprisingly, these too concentrated on metabolism, the system for which we have the most complete knowledge of connections among molecules. In 2000, Albert-Lázsló Barabási and colleagues presented network analyses of intermediary metabolism in 43 organisms, including at least one plant, animal, fungal, eubacterial, and archaeal species (similar analyses were presented at about the same time by Andreas Wagner and David Fell, who concentrated on the metabolic network of one of these species, the bacterium Escherichia coli). From the very start, therefore, the statistical analysis of biomolecular networks had a strong evolutionary focus. Intermediary metabolism had not been studied in most of the 43 organisms, so what made the analysis possible was the availability of their complete, or nearly complete, genome sequences. Which enzymes are encoded by the genomes were inferred by standard methods of identifying homologous genes. Thus, edges could be quite confidently drawn between substrates and products despite the lack of any direct biochemical evidence. In other words, the shared evolutionary history among organisms allowed researchers to fill in the gaps when constructing networks in additional species.

The key finding of Barabási and colleagues concerned the frequency distribution of the number of edges per node, otherwise known as the node degree (figure 1B). Consider a network in which the probability of drawing an edge is the same for any two nodes. This is a so-called random network, and its statistical properties have been studied since the seminal work of mathematicians Paul Erdős and Alfréd Rényi in the 1960s (such a network is also called an Erdős-Rényi network). In a random network, node degree follows a Poisson distribution; however, in the metabolic networks studied by Barabási and colleagues, both the number of edges entering a node (the in degree) and the number of edges emanating from a node (the out degree) follow a power-law distribution: the probability of having k edges is proportional to k raised to a negative power. Whereas a Poisson distribution peaks around its mean value, a power-law distribution with a negative exponent always peaks at 1. Such networks therefore have very many nodes with few edges and a few nodes with very many edges. Because there is no peak in the middle of the distribution, and therefore no “typical” node representing the network as a whole, power-law networks have been termed scale-free. The power-law distribution gives scale-free networks their characteristic hub-and-spoke appearance, similar to that of an airline route map. Related to this, scale-free networks exhibit the “small world” property, in that the network diameter, or average shortest path between nodes, is very short.

Soon after the analyses of metabolic networks were first reported, analyses of other biological networks followed. Protein-protein interactions have been identified systematically in several organisms, using either yeast two-hybrid assays or affinity purification followed by mass spectrometry. Data sets large enough to examine global network properties exist for model bacterial, fungal, plant, and animal species, as well as for humans. Although there is ongoing debate as to how well these protein-protein interaction data fit power laws, there is no doubt that the degree distribution in each case is a rapidly decreasing function of degree: some proteins have very many interaction partners whereas most proteins have few. The same pattern holds for genetic interaction networks. Genetic interactions are assayed as Bateson would: look for phenotypes resulting from perturbing two genes at once that cannot be explained by adding the effects of perturbing each gene separately. To date, this has been done most comprehensively in the yeast Saccharomyces cerevisiae, in which more than 5 million mutant combinations lacking two genes were generated and measured for the ability of cells to grow.

Degree distributions have also been examined for some transcriptional regulatory networks. Systematic identification of the edges in these networks—direct interactions between DNA-binding transcriptional activators or repressors and their target genes—is a difficult and ongoing task. In some model organisms, decades of molecular-genetic research have revealed regulator-target relationships of genes that participate in core cellular functions such as cell-cycle progression, developmental patterning, and stress response; however, especially in multicellular organisms, many transcription factors remain uncharacterized. Currently, the most popular high-throughput method for detecting a transcription factor’s direct targets is chromatin immunoprecipitation followed by microarray analysis or deep sequencing of the immunoprecipitated DNA (see chapter V.7). The global network of transcriptional regulation has been best studied in the bacterium E. coli and the yeast S. cerevisiae, although studies of less complete networks have been conducted in other microbes and in some animal species. In the transcriptional networks that have been analyzed, the distribution of out degree (number of targets of a transcription factor) appears to follow a power law. The distribution of in degree (number of direct regulators of a target gene) is less clear. In some analyses, the in-degree distribution appears to fit an exponential distribution better than a power law, whereas in others the reverse appears true.

It must be kept in mind that the systematic experiments that contribute data to interaction networks have generally been conducted in only one genetic background per species and in one experimental condition. It is virtually certain that the list of interactions would change if different genetic backgrounds or conditions were used. For example, a transcription factor might be posttranslationally modified under certain conditions in a way that affects its binding to DNA. The inferred number and identity of its target genes would therefore depend on the environment used for the chromatin immunoprecipitation experiment. It is unclear whether this limitation in current network data has any effect on the apparent global properties of these networks, such as their hub-and-spoke organization. One cautionary example is the analyses of the in-degree distribution of the yeast transcriptional regulatory network. Early analyses found that the in degree fits an exponential distribution better than a power law, whereas more recent analyses found that it indeed follows a power law. The difference could be caused by the addition of data on more transcription factors, and therefore better discovery of target genes with high in degree. Whatever the reasons for discrepancies between analyses, it should be understood that efforts to explain the evolution of global properties of biological networks must, for some time to come, be continually reevaluated against their best current empirical estimates.

3. EVOLUTION OF GLOBAL NETWORK ORGANIZATION

Making an analogy to nonbiological networks that are scale-free, such as the Internet, Barabási and colleagues stated two related hypotheses about biological networks. First, they proposed that hubs in biological networks should be more important than nonhubs. Second, they proposed that scale-free organization of biological networks makes them robust to random failure. In the Internet, hubs are indeed more important, as measured by their contribution to the efficiency of sending bits of data from one node to another. Disconnecting any desktop computer from the Internet typically has no impact on the ability of other users to send each other information, but disconnecting a major telecommunications center could fragment the network or at the very least cause rerouting delays. It is for this reason that the Internet is robust to random failure: hubs are vastly outnumbered by nonhubs, so if a random node is disconnected it is unlikely to be a hub and therefore unlikely to cause a major disruption.

Note that these two hypotheses about biological networks are, in essence, evolutionary ones. In the first hypothesis, “important” can be translated into biological terms as “making a large contribution to fitness.” In the second hypothesis, “robust to random failure” can be translated as “robust to mutation.” A priori, it is not obvious that protein-interaction hubs would be important to the extent that Internet hubs are. For example, a “housekeeping” enzyme that is essential for growth might have very few, if any, protein-interaction partners, whereas some large protein complexes are not essential for growth. Nonetheless, the first hypothesis was tested by Barabási and colleagues in 2001, using data from yeast on protein-protein interactions and on protein essentiality (as assayed by gene deletions). A significant positive correlation exists between protein-interaction degree and essentiality: hubs are indeed more likely to be required for viability than nonhubs are (at least under the growth condition assayed). Although this result is striking, there is as yet no consensus as to why the correlation holds. Whereas the edges in the Internet have very consistent meaning (transmission of bits of data), the edges in a protein-interaction network capture a highly heterogeneous set of biological functions, only some of which clearly qualify as information transmission (for example, a kinase phosphorylating its target protein). Understanding the correlation between degree and essentiality will require understanding how the many different types of protein interaction relate to the roles played by proteins in cellular function.

The hypothesis of protein-interaction hub importance has also been tested using data on the rates of evolution of protein-coding sequence. Again, importance has an evolutionary meaning, but this time the assumption is that proteins making a larger contribution to fitness will be constrained to evolve more slowly in amino acid sequence; however, rather than supporting a strong connection between node degree and evolutionary rate, analyses have, surprisingly, challenged the assumption on which they are based. The evolutionary rate/fitness assumption dates back to the landmark 1969 paper titled “Non-Darwinian Evolution” by Jack Lester King and Thomas Jukes and was a cornerstone of Motoo Kimura’s neutral theory (see chapter V.1). Nonetheless, recent analyses using comprehensive data from bacteria and yeast on gene dispensability have found absent or very weak correlations with protein evolution rate, especially when the expression level of the gene is controlled for. Controlling for other factors is the principal challenge of this type of analysis, because the relevant factors—such as protein abundance and node degree—tend to be correlated with each other. This problem leads to difficulty in inferring a causative role for any one factor. Consequently, as Eugene Koonin and Yuri Wolf have emphasized, the literature is replete with weak and contradictory claims, a situation symptomatic of, in their words, “a nascent field in turmoil.”

The evolution of global network organization has also been examined from the point of view of genome content. In 2003, Erik van Nimwegen showed that the number of genes encoding proteins in a particular functional category scales as a power law with the total number of genes in a genome. For example, across bacterial species representing broad phylogenetic and ecological ranges, the number of genes encoding transcription factors scales with the number of genes in the genome raised to a power of approximately two: for each doubling of genes in the genome, the number encoding transcription factors increases approximately fourfold. The finding of scaling exponents other than one implies that as genomes grow by gene duplication or shrink by gene loss, the probability that a gene will be added or deleted depends on the function of its encoded protein. The evolutionary causes for such departures from equal probability are unknown, but these power laws, like the power laws of node degree, must be explained by any satisfactory model of genome evolution.

It has been pointed out, most forcefully by Michael Lynch and Andreas Wagner, that an adequate null model of genome evolution, as it relates to global network organization, is sorely lacking. Whereas the null models of neutral or nearly neutral evolution have oriented the field of molecular evolution for decades (see chapter V.1), the field of network evolution has been proceeding without such grounding in rigorous population genetics. As Lynch and Wagner have independently noted, most early attempts to explain, for example, the power laws of node degree were strongly adaptationist. Consider the hypothesis that a scale-free network is robust to random failure. This predicted property could be seen merely as a by-product of a possibly nonadaptive process that gave the network its degree distribution. Instead, robustness was presented as a selectively advantageous property and therefore as a cause of the connectivity distribution. Evaluating the validity of such claims will require developing the null models against which to test them.

4. LOCAL ORGANIZATION AND DYNAMICS OF BIOLOGICAL NETWORKS

Local network organization refers to patterns of connection between subsets of nodes. For example, one might ask how the nodes that are connected directly to a particular focal node are connected to each other. In a protein-protein interaction network, such analysis could reveal the organization of proteins into functional modules or complexes. In a regulatory network, local analysis could reveal something less intuitive and therefore potentially more valuable: how network structure relates to network dynamics.

The divide between network structure and network dynamics is especially difficult to bridge because of two major gaps in characterization of regulatory networks, both of which are likely to persist for quite some time. The first major gap is the lack of kinetic rate constants for transcriptional reactions. The rates at which transcription factors bind and release their target DNA sites in vivo are in general difficult to measure, and therefore unknown except in the very rare cases in which advanced methods of single-molecule detection have been used. Rates of mRNA production and degradation have been estimated genome-wide in yeast and some other organisms, but these experiments typically involve nonphysiological conditions or mutational perturbation. Only very recently have methods been developed to observe the dynamics of transcription initiation, elongation, and termination at a single gene in vivo. At present, it is therefore generally not possible to describe the vast majority of regulatory systems with a complete set of coupled differential equations in which all parameter values have been specified from experimental data.

One potential way around the problem of missing rate constants would be to leave the kinetic parameters as variables and to infer their values based on measurements of easily determined quantities, such as mRNA abundance, in experiments where the regulatory system is perturbed away from its steady state. Indeed, this modeling approach to regulatory network inference makes up an extremely active subfield of computational biology. Such inference is extremely challenging, however, both because the number of experiments is often not much larger than the number of parameter values to be estimated, and because it is typically the case that regulatory networks are robust to changes in their parameter values. This robustness is an interesting property in its own right, with important implications for understanding of genetic and phenotypic variation in natural populations (see chapter V.11). What robustness means in practice is that many combinations of parameter values are consistent with the observed data.

The second major gap is the lack of understanding of the ways in which the effects of multiple transcription factors combine to determine target-gene expression. It is commonly accepted that most transcriptional regulation is combinatorial; that is, the effects do not simply add together but instead combine to create what amounts to a logic function. For example, if two activators regulate a given target gene, it might be the case that either one is sufficient to cause transcription (OR logic) or, alternatively, that both are needed (AND logic) (figure 1C). Despite the appreciation that the logic functions can have a large impact on transcriptional dynamics, and therefore that they should be a part of any mathematical model of gene regulation, they are mostly unknown. Ultimately, the logic functions should be reducible to kinetics as well, although the knowledge of cooperative and competitive protein-protein interaction kinetics as they relate to transcription in vivo is even poorer than protein-DNA kinetics. There is no available method for parallel determination of what these logic functions are, and gene-by-gene analyses are time consuming. Moreover, the cases in which the logic is understood might make up a biased subset, because discovering certain forms of regulation is easier than discovering others. For example, if several transcription factors are redundant in their effects on a particular promoter (i.e., OR logic applies), then it is unlikely that any one of them would be discovered by standard mutational analysis.

The severe challenges posed for a network-based understanding of regulatory dynamics, and the evolution of these networks, by the incomplete state of information about transcriptional kinetics and logic are compounded by posttranscriptional and posttranslational gene regulation, which are even less completely characterized; however, some inferences can be made without complete knowledge. Indeed, the robustness of regulatory networks implies that exact values of kinetic parameters do not matter to a large degree. Consider a simple regulatory network consisting of a transcription factor that regulates its own gene’s expression. If the transcription factor is a repressor, then over a wide range of parameter values, the system will show predictable behavior. Any increase in expression will lead to more repression, whereas any decrease in expression will lead to less repression. If a sufficient delay exists in the system, then a stable oscillation might be reached, rather than a fixed-point steady state, but in either case the tendency is toward stability. By contrast, if the transcription factor is an activator, then the system will be unstable. Above a certain threshold, an increase in expression will be amplified, producing a switch-like transition of the gene from off to on.

These examples illustrate that in simple cases at least, dynamic properties can be inferred merely from knowing the connections in a regulatory network and the identities of nodes as activators or repressors. This line of reasoning can extend to slightly more complicated cases as well. For example, a regulatory system with two transcription factors that repress each other’s expression is expected to behave like a switch. The system can be forced into one of two stable states in which high expression of one factor precludes expression of the other. Small regulatory subnetworks such as these are termed network motifs. In 2002, Uri Alon and colleagues introduced the notion of a network motif and investigated whether particular motifs are overrepresented in the transcriptional regulatory networks of E. coli and S. cerevisiae. They found that particular motifs are indeed overrepresented relative to random expectation. One example is the feedforward loop, in which two transcription factors jointly regulate a target gene and one of the transcription factors also regulates the other (figure 1C). The feedforward loop motif is especially relevant because of the link it provides between structure and dynamics. If the two transcription factors are activators, and they are both necessary for target-gene expression (i.e., AND logic applies), then the feedforward loop is expected to act as a noise filter. That is, the target gene will be activated only by a sustained pulse of the upstream activator, because only then will both activators be present simultaneously at sufficiently high levels. This kind of feedforward loop is also expected to shut down more quickly than a simple linear pathway when the upstream regulator is no longer present, because the other regulator is not sufficient to activate the target on its own.

5. EVOLUTION OF LOCAL NETWORK ORGANIZATION

As with global network organization, comparative data can be used to understand the evolution of local network organization. Indeed, Sean Carroll, Eric Davidson, and others have argued that the proper way to understand the evolution of developmental processes is at the local level of regulatory networks (see chapter V.11). For example, Carroll writes of “toolkit” genes, such as those that encode components of signal-transduction cascades, that are deployed for various purposes throughout development and across species. In Davidson’s terminology, these would be called “plug-ins”—small subcircuits of genes that perform specific molecular functions but perform potentially many developmental functions. Implicit in the concept of a toolkit or a plug-in is that the regulatory network is modular, comprising groups of genes that function as units. Although modularity also motivates the concept of a network motif (i.e., it makes sense to study motifs to the extent that their behavior is predictable despite their embedding in a larger network), Davidson takes pains to draw a distinction between a subcircuit and a motif, in that the former focuses on biological function whereas the latter focuses on kinetic behavior.

In addition to plug-ins, Davidson defines another type of subcircuit, the “kernel.” A kernel also comprises genes that function together, but the distinction is that they function together only to execute a single developmental function. Moreover, a kernel is defined as being evolutionarily conserved and comprising densely interconnected regulatory factors, loss of any one of which leads to developmental failure. The canonical kernel is a set of genes encoding transcription factors that collectively give a developing tissue or organ, such as the heart, its identity.

Davidson hypothesizes that different levels of phylogenetic divergence correspond to divergences of different network elements: deep, phylum-level divergences correspond to the ancient emergence of kernels; intermediate divergences correspond to redeployments of plug-ins; and recent divergences correspond to divergences of terminal-differentiation genes regulated by the kernels and plug-ins. Behind this hypothesis is the argument that particular network structures constitute a form of developmental constraint (see chapter V.10). For example, because a kernel’s genes are densely interconnected and essential for normal development, they might be under strong selection not to change their function; however, others have pointed out cases of so-called developmental systems drift, in which the essential output of a regulatory subcircuit remains unchanged despite changes in the subcircuit’s membership and interconnections. As pointed out by Lynch and others, an essential regulatory interaction may be lost through an intermediate stage of redundancy with another factor.

Why deep conservation marks some subcircuits, whereas developmental systems drift marks others, is unknown. Likewise, it is unknown why some network motifs, such as the feedforward loop, appear to be overrepresented in regulatory networks. As with global network organization, competing explanations for local organization favor adaptive or nonadaptive processes. Ultimately, these potential explanations of nonrandom features of networks must be measured against rigorous models of genome evolution.

6. THE FUTURE OF EVOLUTIONARY SYSTEMS BIOLOGY

The branch of evolutionary biology dedicated to the understanding of molecular networks has come to be known as evolutionary systems biology. It is difficult to predict the future of this new and contentious field. The points of debate can be subtle but are nonetheless critical. For example, Davidson’s distinction between a subcircuit and a motif might seem minor, but it amounts to an argument about the proper research program for evolutionary systems biology. Indeed, Davidson goes so far as to argue that studying kinetics is a distraction, a “siren-like” call to the “mechanistically inclined” to neglect causal regulatory logic in favor of the mere details of its execution. Likewise, Lynch’s self-described “contrarian” effort to build a rigorous population-genetic null model of genome evolution is a strong statement on where research effort should be allocated. These are but two of the debates that led Koonin and Wolf to declare evolutionary systems biology to be in turmoil. Despite the uncertain direction of research, what is clear is that these debates intersect with some of the most critical debates in evolutionary biology, including those concerning the role of nonadaptive processes in evolution and the existence of developmental constraints. The coming years will tell whether the approaches and perspectives of evolutionary systems biology can illuminate these long-running debates better than more established ones have.

FURTHER READING

Alon, U. 2007. Network motifs: Theory and experimental approaches. Nature Reviews Genetics 8: 450–461. A review of the evidence for network motifs in the regulatory networks of diverse species, and of the motifs’ functional significance.

Barabási, A.-L., and Z. N. Oltvai. 2004. Network biology: Understanding the cell’s functional organization. Nature Reviews Genetics 5: 101–113. A review of network terminology as applied to biological networks, and of the putative evolutionary origins and functional importance of scale-free network organization.

Carroll, S. B., J. K. Grenier, and S. D. Weatherbee. 2001. From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design. Malden, MA: Blackwell Science. An introduction to the evolution of development with an emphasis on changes in gene regulation and the concept of a genetic “toolkit.”

Davidson, E. 2006. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution. San Diego, CA: Academic/Elsevier. An introduction to the evolution of development with an emphasis on changes in gene regulation at different levels of network organization.

Koonin, E. V., and Y. I. Wolf. 2008. Evolutionary systems biology. In M. Pagel and A. Pomiankowski, eds., Evolutionary Genomics and Proteomics. Sunderland, MA: Sinauer. A review of the impact of systems biology on evolutionary genetics, and of the major unanswered questions in the evolution of networks.

Lynch, M. 2007. The evolution of genetic networks by nonadaptive processes. Nature Reviews Genetics 8: 803–813. An argument for the development of appropriate null models of regulatory-network evolution, and an analysis of one such model.

Wagner, A. 2008. Gene networks and natural selection. In M. Pagel and A. Pomiankowski, eds., Evolutionary Genomics and Proteomics. Sunderland, MA: Sinauer. A review of the role of natural selection in shaping the architectures of regulatory networks.