This is a guide for the R program MCMCtreeR.
MCMCtreeR contains functions to set up analyses in the MCMCtree program. The functions here help users choose the best parameters to reflect age information for prior age distributions, visualise time priors, and produce output files ready to be used in MCMCtree. A seperate vignette is available to explain the plotting options for timescaled trees in MCMCtreeR.
MCMCtree is a Bayesian program in the software PAML that estimates divergence times on fixed topologies using molecular data, developed by Ziheng Yang. The program requires various inputs from the user: a phylogeny, molecular sequence alignment, and selected model parameters. This guide does not include details about which time and other priors are most appropriate for the data, etc., so please see the MCMCtree manual for more information.
if (!any(rownames(installed.packages()) == "MCMCtreeR")) install.packages("MCMCtreeR")
library(MCMCtreeR, quietly = TRUE, warn.conflicts = FALSE)
##
## Attaching package: 'sn'
## The following object is masked from 'package:stats':
##
## sd
The examples here use a phylogeny of apes and associated age information. These data are a phylogeny of apes apeTree
, the minimum ages for internal nodes minimumTimes
, maximum ages for internal nodes maximumTimes
, and the tip labels descending from each node monophyleticGroups
. These example data can be substituted for other data.
data(apeData)
attach(apeData)
names(apeData)
## [1] "minimumTimes" "maximumTimes" "monophyleticGroups"
## [4] "apeTree"
minimumTimes
## nodeOne nodeTwo nodeThree nodeFour
## 1.5 0.6 0.8 1.3
maximumTimes
## nodeOne nodeTwo nodeThree nodeFour
## 3.0 1.2 1.2 2.0
monophyleticGroups
## [[1]]
## [1] "human" "chimpanzee" "bonobo" "gorilla" "sumatran"
## [6] "orangutan" "gibbon"
##
## [[2]]
## [1] "human" "chimpanzee" "bonobo" "gorilla"
##
## [[3]]
## [1] "human" "chimpanzee" "bonobo"
##
## [[4]]
## [1] "sumatran" "orangutan"
apeTree
##
## Phylogenetic tree with 7 tips and 6 internal nodes.
##
## Tip labels:
## human, chimpanzee, bonobo, gorilla, orangutan, sumatran, ...
##
## Rooted; no branch lengths.
The order of the data must match in the minimumTimes
, maximumTimes
, and monophyleticGroups
objects. These data do not need to be given in the order they appear in the tree, but the order must match in each object. For example, if the minimum age for the root is the first element in minimumTimes
it must also be the first element in the minimumTimes
and monophyleticGroup
objects.
For this example there are four calibrated nodes. The MCMCtree function tipDes
can be used to recreate the monophyleticGroups
list object. First the ape
package is used to plot the tree and view the node label numbers.
plot(apeTree)
nodelabels()
tipDes(apeTree, 10)
## [1] "gorilla" "human" "chimpanzee" "bonobo"
The calibrated nodes in this example are 8, 10, 11, and 13. The function tipDes
takes the tree and node numbers as input and returns the taxon names that descend from that node; this output can be used directly in the functions below.
monophyleticGroups.user <- tipDes(apeTree, c(8, 10, 11, 13))
monophyleticGroups.user
## $`8`
## [1] "gibbon" "gorilla" "orangutan" "sumatran" "human"
## [6] "chimpanzee" "bonobo"
##
## $`10`
## [1] "gorilla" "human" "chimpanzee" "bonobo"
##
## $`11`
## [1] "human" "chimpanzee" "bonobo"
##
## $`13`
## [1] "orangutan" "sumatran"
This section includes information to estimate and plot prior age distributions for node(s) used in MCMCtree divergence time estimation. The data required to do this are a phylogeny, minimum and maximum ages for the nodes with prior distributions, and taxa that descend from each node (please see above for information on how to make this various objects).
The code can be used to simultaneously estimate the parameter values that reflect the a priori time information for nodes and write files ready for MCMCtree input. MCMCtreeR can produce output files with the same type of distributions used to summarise a priori time information for all nodes, or different distributions can be used to reflect uncertainty on different internal nodes.
The functions here estimates the distribution parameters so that the distribution spans for user-supplied minimum bounds (lower age) and maximum bounds (upper age). By default, minimum ages are treated as ‘hard’ constraints and maximum ages are ‘soft’. The function then ensures that 97.5% of the distribution falls between these minimum and maximum ages. The code can estimate paramaters for the Cauchy, Skew t, Skew-normal, and Gamma distirbutions shown in the MCMCtree manual on page 49, and calibrated node priors can also be placed on trees for uniform (bound), fixed, and upper age.
For each function, if only a single value is provided for each parameter by the user, the function outputs warnings to indicate these values are recycled for each node.
The default arguments in the estimateSkewT
assumes the user wants to estimate the scale of the distribution with a given shape value (the default shape value is 50). The function estimates the parameters with the user-supplied minimum and maximum ages for all nodes, and the monophyletic groups that define the nodes.The output skewT_results$MCMCtree
shows the estimated parameters in the input ready for MCMCtree. Here the parameters for the Skew t distributions are the location (lower node age), scale, shape, and degrees of freedom.
skewT_results <- estimateSkewT(minAge = minimumTimes, maxAge = maximumTimes,
monoGroups = monophyleticGroups, phy = apeTree, plot = FALSE)
## [1] "warning - minProb parameter value recycled"
## [1] "warning - maxProb parameter value recycled"
## [1] "warning - estimateScale argument recycled"
## [1] "warning - estimateShape argument recycled"
## [1] "warning - estimateMode argument recycled"
## [1] "warning - shape parameter value recycled"
## [1] "warning - addMode parameter value recycled"
skewT_results$MCMCtree
##
## 1 7 1
##
## 1 ((((human,(chimpanzee,bonobo))'ST(0.8,0.016,50,1)',gorilla)'ST(0.6,0.024,50,1)',(orangutan,sumatran)'ST(1.3,0.028,50,1)'),gibbon)'ST(1.5,0.059,50,1)';
##
## 1 //end of file
As explained above, if only a single value is provided for each parameter by the user, the function outputs warnings to indicate these values are recycled for each node.
The function plotMCMCtree
plots the estimated age distributions given these parameters.
par(mfrow = c(2, 2), family = "Palatino")
for (i in 1:4) plotMCMCtree(skewT_results$parameters[i, ], method = "skewT",
title = paste0("node ", i), upperTime = max(maximumTimes))