II.7

Using Phylogenies to Study Phenotypic Evolution: Comparative Methods and Tests of Adaptation

Richard Ree

OUTLINE

  1. Phylogeny and the comparative method

  2. Ancestral state reconstruction

  3. Model-based inferences of trait evolution

  4. Analysis of multiple traits: Correlated evolution and phylogenetic tests of adaptation

  5. Trait evolution and lineage diversification

  6. Accuracy and confidence in ancestral inferences

  7. Future directions for comparative methods

Phylogeny, in describing the genealogy of species, provides a historical framework for understanding the evolution of phenotypic diversity. Modern comparative biology uses the analysis of trait variation across species to infer the history of organic evolution and to elucidate evolutionary principles and processes. Comparative methods account for the fact that species are not entirely independent, but instead share evolutionary history by virtue of common ancestry. These methods commonly employ statistical models of trait evolution that can be used to estimate ancestral states, rates of change, directional trends, and correlations between traits. They can also be used to study the links between phenotypic evolution and patterns of lineage diversity, including rates of speciation and extinction.

GLOSSARY

Ancestral State. The phenotype or trait value of an ancestral species, usually inferred from the states of extant species given a phylogenetic tree.

Character (Trait). Any distinct, observable feature of an individual (e.g., aspects of morphology or behavior, gene sequences) or an emergent property of groups of individuals (e.g., ecological niche, geographic range, sexual dimorphism, mating system).

Likelihood. An optimality criterion based on probability distributions defined by a statistical model in which the preferred parameters are those that maximize the probability of the observed data. Given a stochastic model of trait evolution on the branches of a phylogenetic tree, likelihood can be used to assess ancestral character states.

Model of Trait Evolution. A statistical description of changes in states or values of a trait occurring stochastically through time, based on instantaneous rates that parameterize the underlying probability distributions.

Optimality Criterion. A set of principles or rules that measure the fit of data (e.g., observed character states) to a given hypothesis (e.g., a phylogenetic tree); in comparative biology, the most commonly used optimality criteria are parsimony and likelihood.

Parsimony. Also known as Occam’s razor, an optimality criterion based on the principle that the simplest explanation is most likely correct. For example, in ancestral state reconstruction, parsimony means choosing ancestral states that minimize the amount of change required to explain all the states observed at the terminal nodes of a phylogenetic tree.

1. PHYLOGENY AND THE COMPARATIVE METHOD

Charles Darwin’s theory of “descent with modification”—that species arise and diverge naturally from common ancestors—kick-started a revolution not only in the way biologists classify the living world (systematics), but also in the way they interpret its phenotypic diversity (comparative biology). Patterns of descent are described by phylogenetic trees, the availability of which continue to increase dramatically as a result of advances in genetic sequence acquisition, inference algorithms, and computational resources (see chapter II.2). Similarly, the use of phylogeny as a comparative framework for studying the history of “modification”—the sequence, tempo, and mode of evolutionary change in morphological characters, behaviors, and other traits of species—has seen much progress. Comparative methods facilitate analyses that are explicitly historical (e.g., what was the diet of the original Darwin’s finch?), or not amenable to experiments (e.g., do larger-bodied predator species require larger home ranges?). The following sections review the theory and methods commonly employed in making inferences about evolutionary history and processes from comparative phylogenetic data.

Some Terminology

In comparative biology, a trait, or character, can be thought of generally as any distinct, observable feature of a species (see chapter II.1). To understand how comparative methods work, one needs to know some terms for the component parts of phylogenetic trees. The tips of branches that typically represent extant species are called terminal (or leaf) nodes; points where branches split and diverge are called internal nodes. Individual branch segments connecting nodes to each other are called internodes, or simply branches, and represent single lineages (species). Branches can have associated length values, which measure the evolutionary distance between nodes, for example, in units of time or genetic divergence. It is generally assumed that phylogenies are rooted, meaning they have an explicit temporal orientation, with each branch connecting an ancestral node (earlier in time) to a derived one (descendant, later in time). An internal node or branch and all its descendants is called a clade; the terminal nodes in a clade represent a monophyletic group. A bifurcating split in a branch yields sister clades, by definition the same age. The deepest internal node in a tree—the most recent common ancestor of all its leaf nodes—is called the root node.

2. ANCESTRAL STATE RECONSTRUCTION

Which came first, the chicken or the egg? In this age-old dilemma lies a question about the evolutionary sequence of ancestral states. Oviparity (egg laying) is a trait shared by all birds, as well as crocodilians, lizards, snakes, and turtles. Since these taxa form a monophyletic group, it is likely their common ancestor was also oviparous (see chapter II.1)—an inference corroborated by the fossil record. Thus, comparative data clearly show that the egg came before the chicken.

Questions involving ancestral states are ubiquitous in comparative studies. In principle, the most direct and accurate means of inferring the state of an ancestor is to examine fossils of the ancestor as it existed in the past; however, this is often impractical, as the organisms and/or traits of interest may not preserve well. Moreover, even if such fossils are available, it is often difficult to be confident that they represent the true ancestral species, rather than a divergent and extinct side branch. Comparative methods for ancestral state reconstruction generally focus on the hypothetical common ancestors represented by the internal nodes of a phylogeny. The general problem to be solved is, What ancestral states at those nodes make the most sense, in light of the observed data—species’ traits arrayed across the tips of a phylogeny?

Parsimony

One approach to the answer appeals to the idea of simplicity: optimal ancestral states are those requiring the least change along the branches of the tree. In the case of a discrete character (e.g., red versus blue petals in a clade of flowering plants), this means the fewest transitions between states (colors). This is the principle of parsimony, also known as Occam’s razor. As illustrated by the chicken-and-egg example, if all species in a monophyletic group share the same state, it is parsimonious to infer that their common ancestors—the internal nodes all the way down the tree—were also the same. Otherwise, when species vary in their character states, algorithms are needed to find the ancestral values that fulfill the criterion of minimal change.

The parsimony criterion begs the question, How is change quantified? The answer requires assumptions about the relative “cost” of state transitions. For discrete states, for example, red petals versus blue, an unbiased view would assume equal costs of change in both directions, from red to blue and vice versa; however, equality may not always be preferred. For example, if it were known that red pigments in plants require an extensive and complex biosynthesis pathway involving many genes, in any of which a simple knockout mutation would disrupt the production of necessary precursors and result in blue petals, the cost of red-to-blue transitions might be down weighted. Under such weighting, inferring many changes from red to blue could be more parsimonious than inferring a few changes from blue to red. In general, assumptions about transition costs between n discrete states can be expressed as an n × n array of values, known as a step matrix. However, while the preference for an asymmetrical step matrix may be empirically grounded, in practice it may be difficult to objectively justify any specific choice of relative weights.

For continuous traits, such as body size, or petal color recorded as wavelength of reflected light, the parsimony criterion generally posits that the cost of change along a branch is proportional to the squared difference of the ancestral and descendant values—so-called squared-change parsimony. The optimal solution is that set of ancestral states that minimizes the sum of squared differences over all branches of the phylogeny. If these differences are weighted inversely by branch length, following the reasoning that more change is expected on longer branches, the inferred ancestral states are exactly equivalent to estimates under the assumption of Brownian motion evolution (see below).

Likelihood

With statistical comparative methods, the optimality criterion is based on likelihood rather than parsimony. The question shifts from “What is the least amount of change required to explain the observed states?” to “What is the probability of the observed states having evolved, given a model specifying ancestor-descendant probabilities of change?” The key difference lies in modeling evolution as a stochastic process governed by probability distributions, with the probability of change being a function of time. In phylogenetic terms, time refers to the length of the branch between ancestral and descendant nodes. This contrasts with parsimony, in which branch lengths are generally ignored, and change thus tends to be underestimated on longer branches. Note that the use of stochastic models of evolution does not imply that changes are themselves random—that is, nonadaptive. Stochastic models can also describe the unpredictable effects of natural selection. For both discrete and continuous characters, specifying a model allows ancestral states to be estimated by maximum likelihood methods.

In Markov models of discrete characters, a common assumption is exponentially distributed waiting times between transition events. In such models, the expected waiting time is dictated by the instantaneous rate of change of the character. Analogous to step matrices, transition rates between n states are commonly given in an n × n rate matrix, usually denoted Q. For a single branch of length t, probabilities of all pairs of ancestor and descendant states are easily computed as P(t) = exp(Qt). This approach integrates over all possible paths of change along a branch to calculate the probability that the descendant had state 0, 1, … or n given that the ancestor had, say, state 0. The likelihood of character data having evolved on a given phylogeny is obtained by recursively calculating these probabilities from a tips-to-root traversal of the tree’s branches.

For continuous characters, the most widely used model of evolution is Brownian motion, in which a trait’s value changes stochastically, in small positive or negative increments, at a constant rate. This process can be thought of as a random walk (often compared to the staggering of a drunken sailor). It is named for the random fluctuations in the position of pollen grains under the microscope, as first seen by Robert Brown in 1827. Brownian motion generally predicts that the trait values of a descendant will fit a normal distribution, centered on the value of the trait in the ancestor, with variance proportional to the intrinsic rate of change and the time separating the ancestor and descendant. Alternatives to Brownian motion include directional random walks, in which the mean of expected outcomes is shifted, and constrained random walks, such as the Ornstein-Uhlenbeck model, in which a “rubber band” parameter pulls values toward an optimum. The latter can be applied to questions such as whether phenotypes in different clades have evolved toward distinct adaptive peaks.

3. MODEL-BASED INFERENCES OF TRAIT EVOLUTION

Analysis of Single Traits

Stochastic models of trait evolution define the probability of observed states at the tips of a phylogenetic tree. Their parameters (e.g., instantaneous rates of change) can be estimated by maximum likelihood. As an alternative, one can apply Bayesian statistical methods, which assume some prior knowledge (a prior probability density) and then use likelihoods to estimate a probability that any ancestral trait value or combination of trait values is true (a posterior probability density).

In some cases, the model parameters are of greater interest than ancestral states, the latter being regarded as nuisance variables. In that case it is normal to integrate over all possible values of the ancestral states, measuring their individual contributions to the total probability. Parameter estimates may be sought because they can shed light on past evolutionary dynamics; moreover, one can test evolutionary hypotheses framed in terms of competing models. For example, is there a directional trend in the evolution of flower color, such as a higher rate of change from red to blue than vice versa? Likelihoods obtained using a model that constrains the “forward” and “reverse” rates to be equal can be compared to those from a model in which the rates are free to vary. Various statistical methods, including likelihood-ratio tests, Bayes’ factors, and other information content criteria, can be brought to bear on whether the observed data support one model or the other. In this case, support for the two-rate model would lend credence to the hypothesis of a directional trend.

By using phylogenies in which branch lengths are in units of absolute time, as can be obtained from fossil-calibrated molecular clock analysis, absolute rates of trait evolution can be estimated from comparative data. The felsen is a recently proposed measure of evolution that corresponds to an increase in one unit of variance per million years, calculated under the assumption of Brownian motion, for natural log-transformed trait values.

Rates of change are not the only parameters of interest in evolutionary models. In some cases, parameters that transform the tree itself may be invoked to test hypotheses about the tempo and mode of change. For example, the question of whether change in a trait has been gradual along branches, or punctuated (concentrated at cladogenesis events), can be framed as whether the likelihood is increased by scaling all branch lengths by a common power, κ. If κ is significantly less than 1, corresponding to branches having rates of trait evolution that are more equal than expected given their branch lengths, the punctuated hypothesis is supported. Other transformations have been designed to detect the signature of Ornstein-Uhlenbeck evolution as well as accelerating or decelerating Brownian motion evolution.

Phylogenetic Signal

Given the null expectation of similarity by descent, a basic question in comparative analysis is, To what degree does a trait actually covary with phylogeny? In other words, how much “phylogenetic signal” does a trait have? The question is often raised in the context of niche conservatism, that is, the tendency for closely related species to share ecological traits, and in studies of why some traits are more labile in evolution than others. Various tests of phylogenetic signal have been proposed. A common theme is the calculation of a test statistic that measures the fit of the data to the tree (e.g., as defined by a stochastic model of evolution), with the significance of the test statistic being judged against a null distribution generated from random permutations of the data across the tips of the tree. Another common strategy is to measure the fit of the data while transforming the tree’s branch lengths to be increasingly starlike (i.e., such that all its terminal branches appear to radiate from a single ancestral node). These tests can accommodate both discrete and continuous characters, and a range of evolutionary models. They can be useful for ascertaining whether an individual trait does or does not exhibit signal; however, measurement of phylogenetic signal as a quantity directly comparable across traits and trees is a more challenging problem, formally defined for continuous characters only in the context of Brownian motion evolution. Simon Blomberg and colleagues devised a statistic, K, that is greater than 1 if close relatives are more similar than expected from Brownian motion, and less than 1 (0<K<1) if they are less similar. This statistic is commonly used to compare the strength of phylogenetic signal across different combinations of traits and trees.

4. ANALYSIS OF MULTIPLE TRAITS: CORRELATED EVOLUTION AND PHYLOGENETIC TESTS OF ADAPTATION

Independent Contrasts and the Phylogenetic Regression

A common goal in comparative biology is to study correlated change in different traits, or correlations between phenotype and environment, as correlations can reveal evidence of adaptive, functional, or genetic constraints. As should be clear from the previous sections, the central problem facing such analyses is that species’ traits are not statistically independent data points, owing to similarities inherited from shared ancestry. A great deal of research has focused on ways to account for this nonindependence. A seminal advance in this area was Joseph Felsenstein’s method of phylogenetically independent contrasts, allowing measurement of the correlation between two continuous traits while taking account of nonindependence due to shared ancestry. The method assumes that both traits evolved according to Brownian motion, and that the branch lengths of the phylogeny correspond to expected variances in trait values. It calculates contrasts (differences in trait values) for each trait at all pairs of sister nodes on the phylogeny, using a recursive algorithm that proceeds from the tips of the tree toward the root, assigning weighted averages of trait values to the common ancestor of each sister pair. These contrasts are independent of phylogeny and can thus be studied using standard bivariate techniques for correlation and regression.

Independent contrasts sparked a cascade of theory that further explored, in mathematical terms, the covariance of species arising from phylogeny and the detection of trait correlations. These investigations have drawn heavily from statistical techniques based on matrix algebra, in which the phylogeny is transformed into a variance-covariance matrix specifying the shared and independent histories of species. A significant outcome of this work was a more general framework for the phylogenetic regression of traits based on generalized least squares (GLS). With GLS, the phylogenetic variance-covariance matrix can be constructed using arbitrary models of trait evolution, relaxing the need to assume Brownian motion. The framework has spawned a wide variety of related methods for parameter estimation, hypothesis testing, and ancestral state reconstruction from multivariate data sets of both continuous and discrete traits.

Discrete Markov Models of Correlated Evolution

Tests for correlated evolution are often motivated by hypotheses of adaptation. For example, are transitions to C4 photosynthesis favored in plant lineages that occupy arid environments? For discrete characters, a popular method for studying the correlated evolution of discrete traits uses Markov models, and extends the general framework described previously for univariate hypothesis testing. A correlated Markov model for two binary-valued traits has four discrete “states,” corresponding to the four combinations traits that a lineage can have (00, 01, 10, 11). The four states define a 4 × 4 rate matrix, Q, of which the 12 off-diagonal entries represent instantaneous rates of change. However, elements are set to zero where they correspond to transitions involving two changes (i.e., 00→11 and 01→10), reflecting the assumption that only one trait can change in an instant of time. Thus, the rate matrix allows up to eight free parameters describing rates of change between each pair of character states: q00→01, q01→00, q10→11, q11→10, q00→10, q10→00, q01→11, and q11→01. These can be used to test specific hypotheses. For example, if the independent trait is mesic versus arid habitat preference, and the dependent trait is C3 versus C4 photosynthesis, the hypothesis that evolutionary “gains” of C4 are concentrated in arid-inhabiting lineages could be formulated as a model in which q10→11 q11→10, and possibly also that q01→00 q00→01. This model can be tested by comparing its likelihood versus a model in which these rates are equal.

5. TRAIT EVOLUTION AND LINEAGE DIVERSIFICATION

A common theme in the preceding discussion is that comparative methods accept a given phylogeny as fixed, and account for its topology and branch lengths in making historical inferences, but otherwise treat the evolution of traits as independent of the processes that shaped the tree itself—namely, speciation (cladogenesis) and extinction. With this perspective, one can imagine the branches of the tree as static structures, their lengths and topology unaffected by the traits of the species they represent; however, abundant evidence has been found that traits do affect speciation and extinction. Moreover, a large body of theoretical and empirical work has focused on estimating the birth and death rates of lineages (including where these rates have shifted) from phylogenetic trees. What are the connections between these lines of inquiry?

A major branch of comparative biology studies the links between species’ traits and lineage diversification. For example, a pervasive idea is that certain traits represent evolutionary “key innovations” that played a role in the success of unusually large clades. In parametric terms, this amounts to asking whether rates of diversification are state dependent. To address such questions, comparative analyses often initially use a combination of methods that separately infer the history of trait evolution (e.g., where on the phylogeny did the putative innovation arise?) and the history of lineage diversification (where did rates of diversification change?), and subsequently associate the results. For example, bilaterally symmetrical flowers are thought to represent an adaptation for specialized animal pollination, which in turn may enhance the potential for reproductive isolation and the origin of new plant species. This hypothesis would be supported if bilateral symmetry is associated with higher rates of diversification. Independent evolutionary transitions in flower symmetry on the phylogeny of plants thus represent naturally replicated experiments that can be brought to bear on the question. In fact, it has been shown that bilateral clades are larger than their radially symmetrical sister groups more often than can be attributed to chance. Comparative analyses thus support the idea that bilateral flowers are key innovations in plants.

The sister-clade approach is appealing in its simplicity. By definition, sister clades are the same age, so their relative sizes directly reflect differences in net diversification (speciation minus extinction); however, the sister-group method relies on confident inferences of ancestral states, and a sufficient number of transitions that yield replicated sister-clade contrasts. Meeting these criteria can be difficult in practice. In particular, ancestral-state reconstructions can be positively misled if assumptions about the model of evolution are violated—including the assumption that diversification is not state dependent! The heart of the problem is that asymmetry in the direction of character evolution and inequality in state-dependent rates of speciation and extinction can each yield similar phylogenetic distributions of states (Maddison et al. 2007). If diversification rates are highly state dependent and unequal, character reconstructions that apply a standard Markov model might erroneously infer asymmetrical transition rates; conversely, if rates of trait evolution are asymmetrical, tests for state-dependent diversification might be falsely positive. In both cases, estimates of ancestral states will likely be inaccurate.

To solve this problem, joint models of trait evolution and diversification have recently been developed that incorporate parameters for the rate and direction of trait change as well as state-dependent rates of speciation and extinction. With such models, the likelihood function generally cannot be solved analytically, so parameter estimation requires numerical integration over all trees with extinct branches that are otherwise consistent with the observed tree. Analyses accounting for interactions between trait evolution and lineage proliferation are now becoming quite common. For example, a recent study of the nightshade family of plants used a joint model to show that self-incompatibility (which is frequently lost, but rarely, if ever, regained in this clade) is associated with higher net diversification relative to self-compatible lineages, demonstrating species selection (see chapter VI.14) for obligate outcrossing.

6. ACCURACY AND CONFIDENCE IN ANCESTRAL INFERENCES

How reliable are the basic tools of comparative biology—the phylogenetic relationships of species and theoretical models of trait evolution—for inferring patterns and processes in evolutionary history? The answer depends on a number of factors. Of primary concern is whether the tree is accurate, and whether the models are valid descriptors of the evolution of traits of interest. For example, ancestral state estimates are more likely to be accurate and unambiguous if the rate of evolution is low relative to the rate of lineage proliferation. Conversely, if a trait evolves quickly and exhibits rampant homoplasy (convergent and/or parallel evolution), ancestral states will tend to be more uncertain. Both parsimony and likelihood methods can be led astray, yielding positive support for erroneous conclusions, if their underlying assumptions are not met. Unfortunately, in the absence of independent lines of evidence (e.g., fossils), it is not always easy to determine whether and how these assumptions are violated. For example, even if one is able to demonstrate that a character’s phylogenetic distribution is consistent with Brownian motion evolution, it is more difficult to confidently establish that it actually evolved according to that process.

7. FUTURE DIRECTIONS FOR COMPARATIVE METHODS

Opportunities for evolutionary insight from comparative analysis will continue to grow with the accumulation of phylogenies and expanding knowledge of species’ traits. Research on improving the utility and power of comparative methods is important and ongoing. A continuing trend is that increasingly sophisticated comparative methods enhance the potential for statistical inferences of ancestral states and evolutionary processes. In particular, joint models of trait evolution and lineage diversification represent a significant step toward a unified framework for exploring the reciprocal interactions between these two processes; however, many challenges remain. For example, methods are generally lacking for multivariate analyses, and most are ill equipped to deal with inconstant rates of evolution or non-Markovian processes, such as the influence of density dependence or species interactions on trait evolution. Integration of trait data from fossils deserves greater attention, as do models in which trait change can be associated directly, as either cause or consequence, with speciation (cladogenesis). The latter are important because comparative methods generally assume that the state of an ancestor is inherited identically by both daughter species at divergence. Traits that violate this assumption include geographic ranges, which can be subdivided at speciation, and traits that underlie ecological speciation or otherwise directly promote reproductive isolation (such as host associations, habitat preferences, mate selection, etc.).

FURTHER READING

Felsenstein, J. 1985. Phylogenies and the comparative method. American Naturalist 125(1): 1–15. A seminal paper describing the method of independent contrasts.

Freckleton, R. P. 2009. The seven deadly sins of comparative analysis. Journal of Evolutionary Biology 22(7): 1367–1375. A practical guide to recognizing and avoiding some common pitfalls in comparative phylogenetic inference.

Maddison, W. P., P. E. Midford, and S. P. Otto. 2007. Estimating a binary character’s effect on speciation and extinction. Systematic Biology 56(5): 701–710. This paper introduces an important new method for jointly modeling trait evolution and lineage diversification.

Martins, E. P., and T. F. Hansen. 1997. Phylogenies and the comparative method: A general approach to incorporating phylogenetic information into the analysis of interspecific data. American Naturalist 149(4): 646–667. This paper was important in establishing generalized least squares analysis in comparative biology.

Pagel, M. 1999. Inferring the historical patterns of biological evolution. Nature 401: 877–884. A nice review of statistical comparative methods, with several interesting examples of their application.

Price, T. 1997. Correlated evolution and independent contrasts. Philosophical Transactions of the Royal Society B 352: 519–529. An interesting counterpoint to the theory underlying independent contrasts, inspired by the process of adaptive radiation.