Phylogenetics, Especially Comparative Methods

The history of life unfolds within a phylogenetic context. Comparative phylogenetic methods are statistical approaches for analyzing historical patterns along phylogenetic trees. This task view describes R packages that implement a variety of different comparative phylogenetic methods. This is an active research area and much of the information is subject to change. One thing to note is that many important packages are not on CRAN: either they were formerly on CRAN and were later archived (for example, if they failed to incorporate necessary changes as R is updated) or they are developed elsewhere and have not been put on CRAN yet. Such packages may be found on github, R-forge, or authors' websites.

Getting trees into R : Trees in R are usually stored in the S3 phylo class (implemented in ape), though the S4 phylo4 class (implemented in phylobase) is also available. ape can read trees from external files in newick format (sometimes popularly known as phylip format) or NEXUS format. It can also read trees input by hand as a newick string (i.e., "(human,(chimp,bonobo));"). phylobase and its lighter weight sibling rncl can use the Nexus Class Library to read NEXUS, Newick, and other tree formats. treebase can search for and load trees from the online tree repository TreeBASE, rdryad can pull data from the online data repository Dryad. RNeXML can read, write, and process metadata for the NeXML format. PHYLOCH can load trees from BEAST, MrBayes, and other phylogenetics programs (PHYLOCH is only available from the author's website). phyext2 can read and write various tree formats, including simmap formats. rotl can pull in a synthetic tree and individual study trees from the Open Tree of Life project.

Utility functions: These packages include functions for manipulating trees or associated data. ape has functions for randomly resolving polytomies, creating branch lengths, getting information about tree size or other properties, pulling in data from GenBank, and many more. phylobase has functions for traversing a tree (i.e., getting all descendants from a particular node specified by just two of its descendants). geiger can prune trees and data to an overlapping set of taxa. treeplyr can use dplyr-style functions (filter, mutate, reorder, etc.) on objects consisting of trees plus associated data. evobiR can do fuzzy matching of names (to allow some differences). BoSSA can load sequences from GenBank and BLAST sequences. rphast implements an R interface to the PHAST, which can be used for many types of analysis in comparative and evolutionary genomics, such as estimating models of evolution from sequence data, scoring alignments for conservation or acceleration, and predicting elements based on conservation or custom phylogenetic hidden Markov models. SigTree finds branches that are responsive to some treatment, while allowing correction for multiple comparisons. dendextend can manipulate dendrograms, including subdividing trees, adding leaves, and more.

Ancestral state reconstruction : Continuous characters can be reconstructed using maximum likelihood, generalised least squares or independent contrasts in ape. Root ancestral character states under Brownian motion or Ornstein-Uhlenbeck models can be reconstructed in ouch, though ancestral states at the internal nodes are not. Discrete characters can be reconstructed using a variety of Markovian models that parameterize the transition rates among states using ape. markophylo can fit a broad set of discrete character types with models that can incorporate constrained substitution rates, rate partitioning across sites, branch-specific rates, sampling bias, and non-stationary root probabilities. phytools can do stochastic character mapping of traits on trees.

Diversification Analysis: Lineage through time plots can be done in ape; nLTT can estimate the normalized lineage through time statistic, which can be used as a summary statistic in ABC approaches. A simple birth-death model for when you have extant species only (sensu Nee et al. 1994) can be fitted in ape as can survival models and goodness-of-fit tests (as applied to testing of models of diversification). TESS can calculate the likelihood of a tree under a model with time-dependent diversification, including mass extinctions. Net rates of diversification (sensu Magellon and Sanderson) can be calculated in geiger. diversitree implements the BiSSE method (Maddison et al. 1997) and later improvements (FitzJohn et al. 2009). TreePar estimates speciation and extinction rates with models where rates can change as a function of time (i.e., at mass extinction events) or as a function of the number of species. caper can do the macrocaic test to evaluate the effect of a a trait on diversity. apTreeshape also has tests for differential diversification (see description). iteRates can identify and visualize areas on a tree undergoing differential diversification. DDD can fit density dependent models as well as models with occasional escape from density-dependence. BAMMtools is an interface to the BAMM program to allow visualization of rate shifts, comparison of diversification models, and other functions. DDD implements maximum likelihood methods based on the diversity-dependent birth-death process to test whether speciation or extinction are diversity-dependent, as well as identifies key innovations and simulate a density-dependent process. expoTree can calculate the likelihood of a tree under a density dependent model. PBD can calculate the likelihood of a tree under a protracted speciation model. phyloTop has functions for investigating tree shape, with special functions and datasets relating to trees of infectious diseases.

Divergence Times: Non-parametric rate smoothing (NPRS) and penalized likelihood can be implemented in ape.

Phylogenetic Inference: UPGMA, neighbour joining, bio-nj and fast ME methods of phylogenetic reconstruction are all implemented in the package ape. phangorn can estimate trees using distance, parsimony, and likelihood. ips wraps several tree inference and other programs, including MrBayes, Beast, and RAxML, allowing their easy use from within R. phyclust can cluster sequences. phytools can build trees using MRP supertree estimation and least squares. Rphylip wraps PHYLIP, a broad variety of programs for tree inference under parsimony, likelihood, and distance, bootstrapping, character evolution, and more. phylotools can build supermatrices for analyses in other software. pastis can use taxonomic information to make constraints for Bayesian tree searches. RADami can import RADseq data for use with pyRAD. expands can reconstruct phylogenies of tumors and cluster them into populations. outbreaker can infer transmission trees for diseases, as well as other parameters of disease spread; OutbreakTools can infer parameters of disease spread. For more information on importing sequence data, see the Genetics task view; pegas may also be of use.

Time series/Paleontology: Paleontological time series data can be analyzed using a likelihood-based framework for fitting and comparing models (using a model testing approach) of phyletic evolution (based on the random walk or stasis model) using paleoTS. strap can do stratigraphic analysis of phylogenetic trees.

Tree Simulations: Trees can be simulated using constant-rate birth-death with various constraints in TreeSim and a birth-death process in geiger. Random trees can be generated in ape by random splitting of edges (for non-parametric trees) or random clustering of tips (for coalescent trees). paleotree can simulate fossil deposition, sampling, and the tree arising from this as well as trees conditioned on observed fossil taxa. TESS can simulate trees with time-dependent speciation and/or extinction rates, including mass extinctions.

Trait evolution: Independent contrasts for continuous characters can be calculated using ape, picante, or caper (which also implements the brunch and crunch algorithms). Analyses of discrete trait evolution, including models of unequal rates or rates changing at a given instant of time, as well as Pagel's transformations, can be performed in geiger. DiscML implements a flexible array of models for discrete traits, including approaches to deal with unobservable data, a gamma rate distribution, and custom transition matrices. corHMM can look for hidden rates in discrete traits as well as fit correlational models for two or three binary traits (similar to Pagel's old Discrete program) and complex models for multistate traits (similar to Pagel's old Multistate program). Brownian motion models can be fit in geiger, ape, and paleotree. Multiple-rate Brownian motion can be fit in motmot and RBrownie (both currently not on CRAN, but older versions can be downloaded obtained from the archive). Deviations from Brownian motion can be investigated in geiger and OUwie. mvMORPH can fit Brownian motion, early burst, ACDC, OU, and shift models to univariate or multivariate data. Ornstein-Uhlenbeck (OU) models can be fitted in geiger, ape, ouch (with multiple means), and OUwie (with multiple means, rates, and attraction values). surface wraps ouch to infer shifts in the OU optimum; bayou also allows data-driven selection between different OU models. geiger fits only single-optimum models. Other continuous models, including Pagel's transforms and models with trends, can be fit with geiger. ANOVA's and MANOVA's in a phylogenetic context can also be implemented in geiger. Traditional GLS methods (sensu Grafen or Martins) can be implemented in ape, PHYLOGR, or caper. Phylogenetic autoregression (sensu Cheverud et al) and Phylogenetic autocorrelation (Moran's I) can be implemented in ape or--if you wish the significance test of Moran's I to be calculated via a randomization procedure--in adephylo. Correlation between traits using a GLMM can also be investigated using MCMCglmm. phylolm can fit phylogenetic linear regression and phylogenetic logistic regresssion models using a fast algorithm, making it suitable for large trees. phytools can also investigate rates of trait evolution and do stochastic character mapping. metafor can perform meta-analyses accounting for phylogenetic structure. pmc evaluates the model adequacy of several trait models (from geiger and ouch) using Monte Carlo approaches. geomorph can do geometric morphometric analysis in a phylogenetic context. MPSEM can predict features of one species based on information from related species using phylogenetic eigenvector maps. Rphylip wraps PHYLIP which can do independent contrasts, the threshold model, and more. convevol can test for convergent evolution on a phylogeny.

Trait Simulations : Continuous traits can be simulated using brownian motion in ouch, geiger, ape, picante, OUwie, and caper, the Hansen model (a form of the OU) in ouch and OUwie and a speciational model in geiger. Discrete traits can be simulated using a continuous time Markov model in geiger. phangorn can simulate DNA or amino acids. Both discrete and continuous traits can be simulated under models where rates change through time in geiger. phytools can simulate discrete characters using stochastic character mapping. phylolm can simulate continuous or binary traits along a tree.

Tree Manipulation : Branch length scaling using ACDC; Pagel's (1999) lambda, delta and kappa parameters; and the Ornstein-Uhlenbeck alpha parameter (for ultrametric trees only) are available in geiger. phytools also allows branch length scaling, as well as several tree transformations (adding tips, finding subtrees). Rooting, resolving polytomies, dropping of tips, setting of branch lengths including Grafen's method can all be done using ape. Extinct taxa can be pruned using geiger. phylobase offers numerous functions for querying and using trees (S4). Tree rearrangements (NNI and SPR) can be performed with phangorn. paleotree has functions for manipulating trees based on sampling issues that arise with fossil taxa as well as more universal transformations. dendextend can manipulate dendrograms, including subdividing trees, adding leaves, and more.

Community/Microbial Ecology: picante, vegan, SYNCSA, phylotools, PCPS, caper, DAMOCLES, and cati integrate several tools for using phylogenetics with community ecology. HMPTrees and GUniFrac provide tools for comparing microbial communities. betapart allows computing pair-wise dissimilarities (distance matrices) and multiple-site dissimilarities, separating the turnover and nestedness-resultant components of taxonomic (incidence and abundance based), functional and phylogenetic beta diversity.

Phyloclimatic Modeling: phyloclim integrates several new tools in this area.

Phylogeography / Biogeography: phyloland implements a model of space colonization mapped on a phylogeny, it aims at estimating limited dispersal and competitive exclusion in a statistical phylogeographic framework. jaatha can infer demographic parameters for two species with multiple individuals per species. BioGeoBEARS implements a variety of models for discrete biogeography.

Species/Population Delimitation: adhoc can estimate an ad hoc distance threshold for a reference library of DNA barcodes.

Tree Plotting and Visualization: User trees can be plotted using ape, adephylo, phylobase, phytools, ouch, and dendextend; several of these have options for branch or taxon coloring based on some criterion (ancestral state, tree structure, etc.). paleoPhylo and paleotree are specialized for drawing paleobiological phylogenies. Trees can also be examined (zoomed) and viewed as correlograms using ape. Ancestral state reconstructions can be visualized along branches using ape and paleotree. phytools can project a tree into a morphospace. BAMMtools can visualize rate shifts calculated by BAMM on a tree. The popular R visualization package ggplot2 can be extended by ggtree to visualize phylogenies. Trees can also be to interactively explored (as dendrograms) using idendr0. phylocanvas is a widget for "htmlwidgets" that enables embedding of phylogenetic trees using the phylocanvas javascript library.

Tree Comparison: Tree-tree distances can be evaluated, and used in additional analyses, in distory and Rphylip. ape can compute tree-tree distances and also create a plot showing two trees with links between associated tips. kdetrees implements a non-parametric method for identifying potential outlying observations in a collection of phylogenetic trees, which could represent inference problems or processes such as horizontal gene transfer. dendextend can evaluate multiple measures comparing dendrograms.

Taxonomy: taxize can interact with a suite of web APIs for taxonomic tasks, such as verifying species names, getting taxonomic hierarchies, and verifying name spelling. evobiR contains functions for making a tree at higher taxonomic levels, downloading a taxonomy tree from NCBI or ITIS, and various other miscellaneous functions (simulations of character evolution, calculating D-statistics, etc.). Reol can also create taxonomy trees from taxonomies used by EOL. pastis can use taxonomic information to make constraints for Bayesian tree searches.

Gene tree - species tree: HyPhy can count the duplication and loss cost to reconcile a gene tree to a species tree. It can also samply histories of gene trees from within family trees. rmetasim can simulate loci and individuals across landscapes using the metasim simulation engine.

Miscellaneous: treebase offers ways to download trees from TreeBase, an online repository of phylogenies and phylogenetic data.

Notes: At least ten packages start as phy* in this domain, including two pairs of similarly named packages (phytools and phylotools, phylobase and phybase). This can easily lead to confusion, and future package authors are encouraged to consider such overlaps when naming packages. For clarification, phytools provides a wide array of functions, especially for comparative methods, and is maintained by Liam Revell; phylotools has functions for building supermatrices and is maintained by Jinlong Zhang. phylobase implements S4 classes for phylogenetic trees and associated data and is maintained by Francois Michonneau; phybase has tree utility functions and many functions for gene tree - species tree questions and is authored by Liang Liu, but no longer appears on CRAN.


  • Butler MA, King AA 2004 Phylogenetic comparative analysis: A modeling approach for adaptive evolution. American Naturalist 164, 683-695.
  • Cheverud JM, Dow MM, Leutenegger W 1985 The quantitative assessment of phylogenetic constraints in comparative analyses: Sexual dimorphism in body weight among primates. Evolution 39, 1335-1351.
  • FitzJohn RG, Maddison WP, and Otto SP 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Systematic Biology 58: 595-611.
  • Garland T, Harvey PH, Ives AR 1992 Procedures for the analysis of comparative data using phylogenetically independent contrasts. Systematic Biology 41, 18-32.
  • Hansen TF 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution 51: 1341-1351.
  • Maddison WP, Midford PE, and Otto SP 2007. Estimating a binary character's effect on speciation and extinction. Systematic Biology 56: 701–710.
  • Magallon S, Sanderson, M.J. 2001. Absolute Diversification Rates in Angiosperm Clades. Evolution 55(9):1762-1780.
  • Moore, BR, Chan, KMA, Donoghue, MJ (2004) Detecting diversification rate variation in supertrees. In Bininda-Emonds ORP (ed) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Kluwer Academic pgs 487-533.
  • Nee S, May RM, Harvey PH 1994. The reconstructed evolutionary process. Philosophical Transactions of the Royal Society of London Series B Biological Sciences 344: 305-311.
  • Pagel M 1999 Inferring the historical patterns of biological evolution. Nature 401, 877-884
  • Pybus OG, Harvey PH 2000. Testing macro-evolutionary models using incomplete molecular phylogenies. Proceedings of the Royal Society of London Series B Biological Sciences 267, 2267-2272.

View on CRAN

22 days ago

Brian O'Meara