Veronica+Comper

Veronica Comper PhD Candidate Queen Mary, University of London

For any two species, there is not actually a single time back to their common ancestor. In fact we expect to see different times for different loci, because of the variation in times back to a common ancestor of two lineages in the common ancestral population (coalescence process). The differences in timing can be so large that, in some cases, different loci may actually show different phylogenetic branching patterns. On top of these differences in timing, different loci can have different mutation rates. How then are we to combine information from different loci?
 * Research interests:**

The issue of rate variation has been addressed in more recent molecular-dating techniques but since we know little of the underlying pattern of rate variation, it is challenging to evaluate its accuracy and precision. Techniques such as bayesian relaxed clock methods and autocorrelated models attempt to model the pattern of variation between lineages without trying to understand the process, ie. the causes of this variation.

All molecular genetic inferences need to take this effect into account, yet most attempts ignore this huge variation in timing completely. It will affect fundamental questions such as "Does 'speciation' occur at the same time for different parts of the genome?", "Do different loci have diffferent rates of evolution?", "Are there mutation hotspots?", "Does the rate of substitution vary along different evolutionary branches?" and "What was the ancestral population size?".

The model attempts to sample parameters such as the divergence time of the species and the rate of coalecence while taking into account variation in the coalescence process, variation in the rate of mutation between loci and the stochastic nature of the substitution process. This will allow me to estimate the ancestral population size of the species in question and the time of their divergence. The model can then be expanded to include rate of recombination, non-silent sites, functionality and positions in the genome (ie. active sites will have different evolutionary constraints than non-essential parts of a protein sequence).

The method I am using at the moment is a Monte Carlo Markov Chain (MCMC) method with a Metropolis-Hasting algorithm for the acceptance/rejection of the proposed values for the parameters. A prior is given for all the parameters and bivariate likelihood surfaces (marginal likelihood distributions of divergence times against rate of coalescence given rate 1) are used to find the likelihood at each iteration of the chain. At the moment the model is using data from 3 species comparisons with 4375 orthologous sequences each.