Alternatives to the Wright-Fisher Model- The Robustness of Mitochondrial Eve Dating

March 16, 2018 | Author: Kiran John | Category: Neanderthal, Evolutionary Biology, Homo Sapiens, Genetics, Biology


Comments



Description

ARTICLE IN PRESSTheoretical Population Biology ( ) – Contents lists available at ScienceDirect Theoretical Population Biology journal homepage: www.elsevier.com/locate/tpb Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating Krzysztof A. Cyran a , Marek Kimmel b,c,∗ a Institute of Informatics, Silesian University of Technology, Gliwice, Poland b Department of Statistics, Rice University, Houston, TX, United States c Systems Engineering Group, Silesian University of Technology, Gliwice, Poland a r t i c l e i n f o Article history: Received 18 March 2009 Available online xxxx Keywords: Coalescence distribution Branching processes genealogy Wright–Fisher model O’Connell model Computer simulations MtEve dating Neanderthal mtDNA a b s t r a c t Methods of calculating the distributions of the time to coalescence depend on the underlying model of population demography. In particular, the models assuming deterministic evolution of population size may not be applicable to populations evolving stochastically. Therefore the study of coalescence models involving stochastic demography is important for applications. One interesting approach which includes stochasticity is the O’Connell limit theory of genealogy in branching processes. Our paper explores howmany generations are needed for the limiting distributions of O’Connell to become adequate approximations of exact distributions. We perform extensive simulations of slightly supercritical branching processes and compare the results to the O’Connell limits. Coalescent computations under the Wright–Fisher model are compared with limiting O’Connell results and with full genealogy-based predictions. These results are used to estimate the age of the so-called mitochondrial Eve, i.e., the root of the mitochondrial polymorphisms of the modern humans based on the DNA from humans and Neanderthal fossils. ©2010 Elsevier Inc. All rights reserved. 1. Introduction The indices of genetic variation, including allele distribution, heterozygosity or linkage disequilibrium, are affected by popula- tion history. Therefore a lot of effort has been spent by popula- tion geneticists to estimate the long-term demographic history of populations belonging to various species. For this purpose many statistical tests detecting past population expansion have been proposed, including King et al. (2000), Bjorklund (2003), Laan et al. (2005) and Cyran and Myszor (2008). Most of the existing infer- ence techniques are based on the Wright–Fisher model of genetic drift, supplemented by models of mutation, recombination, selec- tion, and demography. This is in agreement with a long-termprac- tice in genetic research. However, there exists another tradition of modeling of demographic processes, which uses the branching process approach. Comparison between these two approaches is likely to determine how the different underlying assumptions in- fluence differing estimates of genetic and demographic parame- ters. This paper attempts to contribute to this task, using the data on modern human ancestry. ∗ Corresponding author at: Department of Statistics, Rice University, Houston, TX, United States. E-mail address: [email protected] (M. Kimmel). With the introduction of new sequencing techniques, DNA strands taken from many qualitatively different loci of Homo sapi- ens and even Homo Neanderthalensis fossils have been analyzed. Examples of these studies include maternally inherited mitochon- drial DNA (mtDNA) (Briggs et al., 2009; Green et al., 2008; Krings et al., 2000, 1999, 1997; Ovchinnikov et al., 2000; Serré et al., 2004), paternally inherited Y chromosomes (Jobling, 2001; Thompson et al., 2000), X chromosomes (Wooding and Rogers, 2000), autoso- mal DNA sequences (Green et al., 2010; Burbano et al., 2010; Green et al., 2006; Noonan et al., 2006; Pennisi, 2007; Yu et al., 2001), or nuclear short tandem repeats (STRs) (Kimmel et al., 1998). Despite these efforts, the problem of human population trajectory is still open, and thus there is a growing interest in the studies of sensitiv- ity of genetic variation indices to departures fromthe assumptions made in different models. In particular, it is interesting how these departures influence the distributions of the time to coalescence, and the paper addresses this problem. For a sample of DNA sequences not undergoing recombination, it is assumed that all sequences are descendants of an ancestral chromosome existing some generations ago. The time to this ancestor is referred to as the time to coalescence of the whole sample. Analogously, we define the time to the coalescence of two chromosomes randomly drawn from the sample, as a time to the most recent common ancestor (MRCA) of these chromosomes. The coalescent method, which is frequently used to infer the 0040-5809/$ – see front matter ©2010 Elsevier Inc. All rights reserved. doi:10.1016/j.tpb.2010.06.001 Please cite this article in press as: Cyran, K.A., Kimmel, M., Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. Theoretical Population Biology (2010), doi:10.1016/j.tpb.2010.06.001 ARTICLE IN PRESS 2 K.A. Cyran, M. Kimmel / Theoretical Population Biology ( ) – time to the MRCA, is a powerful tool based on the reverse- time analysis of the Wright–Fisher model of genetic drift. For large populations, coalescent models are equivalent to diffusion process approximations, which depend only on the mean and on the variance of offspring number distribution. Consequently, coalescent models are robust for large populations, but during the population bottlenecks, the diffusion approximation may fail. How relevant for the model predictions are departures from a panmictic population (in the case of the Wright–Fisher model) and from large population size (the case of the coalescent method)? To answer this question we compare three different models for calculating the distribution of the time to coalescence of a pair of chromosomes. These include • the Wright–Fisher model with discrete generations, • the coalescent-based model with continuous time scaled by the size of population variable in time, and • the less known O’Connell model based on branching processes (see O’Connell, 1995). All three models are applied to stochastic population growth approximated by a slightly supercritical Galton–Watson branching process. To be able to compare these methodologies reliably, we designed a computational framework to estimate the two- chromosome coalescence distribution in any of these models as well as in a model based on a full record of the population history. After simulating several thousand genealogies we estimated parameters with great accuracy. There might be some doubt whether the use of the time to coalescence of two chromosomes is an adequate tool to estimate the age of the MRCA. Therefore, we also considered coalescence in a sample of n chromosomes randomly chosen from a population. However, the computational difficulties of the recursion involved as well as the difficulty of linking the results with the known genetic indices make the approach troublesome. Analysis of the aforementioned approach is beyond the scope of this article and will be discussed in a separate paper. We consider models suitable for the analysis of data having the form of numbers of pairwise differences between DNA sequences in the sample. Although phylogenetic methods, attempting to use all genetic information contained in a sample to build the geneal- ogy (e.g. Griffiths and Tavaré, 1995, Griffiths’ Genetree software, available from http://www.stats.ox.ac.uk/~griff/software.html), tend to give estimates with a smaller variance than those based on pairwise differences, they cannot be directly compared with the O’Connell model playing an important role in our paper. This model was originally proposed by O’Connell (1995) for dating the mitochondrial Eve (mtEve) based on a sample of mtDNA of humans and chimpanzees. The O’Connell’s limit results are based on the assumption that the population is growing as a slightly supercritical branching process with progeny distributions homogeneous in time. Though these are not quite realistic assumptions, especially that of time homogeneity, the model is important as an alternative to the Wright–Fisher model, since it does not assume any particular offspring distribution. Moreover, asymptotically, for a given expected number of offspring, the O’Connell model is independent of the shape of the progeny number distribution, and in particular of its variance as long as this variance is bounded. This property is interesting in the light of classical results in which the short- term inbreeding effective population size is proportional to the variance of the offspring number distribution, and therefore the variance influences the shape of the coalescence distribution. The invariance of the offspring number distribution in the O’Connell model is theoretically valid in a limit. It remains unknown how fast, in the terms of number of generations, the coalescence distributions in a real population converge to O’Connell’s limit. This could be answered only by time-forward simulation of the full branching process genealogy and then by comparing the actual distributions with limiting results. The growing interest in studies concerning genealogies of branching processes is reflected among others by the studies of Klebaner and Sagitov (2002) focused on the geometric distribution of progeny, and by the work of Lambert (2003) focused on subcritical cases. Still, we consider the O’Connell model as a standard because of its independence of the offspring number distribution and our interest in supercritical processes which can model the long-term growth of the human population size. In contrast to the O’Connell model, the Wright–Fisher model is not limited to any specific growth pattern. Yet except for some early classical work, such as that of Nagylaki (1990), relatively little effort has been expended in analyzing its relationship to other models in terms of sensitivity to departures fromthe models assumptions. Addressing this problem, this paper compares coalescence distributions under a range of Wright–Fisher models (including those which arose fromthe time continuous coalescent) to distributions obtained from the O’Connell model. Finally, using computer software designed by us for this purpose, the results of all these models are compared with the actual distributions obtained from simulations of several thousand full genealogies. As a real world application of our results, we report estimations of the time to mitochondrial MRCA of modern humans to show how sensitive these estimates are to the assumptions made by the various compared models. 2. Models The two population genetics models we employ are defined further on in Sections 2.1 and 2.2. Two approaches will be used by us to study how sensitive is the distribution of the time to coalescence to specific model assumptions. The first approach requires storing the entire simulated genealogy of a population. Then, by averaging over genealogies, the experimental distribution of the times to coalescence can be found and then compared to those obtained in the Wright–Fisher and the coalescent models. This approach is very general since it is not limited to any generation-to-generation sampling scheme or assumption about large population size. In particular it can be applied for an arbitrary progeny distribution (possibly changing in time) used to model the evolution of the population as a branching process. However, with the exception of small population sizes, it requires a large amount of memory for storing information about each generation, and therefore it seems practically infeasible for simulations of the required number of generations. In the alternative approach, the population history is simulated and only the population size is recorded as a function of time. Using this information and approaches such as in Bobrowski and Kimmel (2004) we compute the coalescence distributions of the pairs of sequences. The results of such experiments were reported by Cyran and Kimmel (2004) and Cyran (2007). They were also used in Cyran and Kimmel (2005) for conservative estimation of the parameter α (see Section 2.3 for the definition of how this parameter influences the expansion rate of the population) in a problem of hypothetical Neanderthal contribution to the modern human mtDNA gene pool. However, this methodology lacks one important feature which could be taken into consideration only in the first approach. Namely, by regarding only the sizes of the population and not its full genealogy, it is impossible to distinguish between the length of time of the entire simulation started from the ancestral individual, and the length of time to the MRCA. The problembecomes clear if we realize that the founder of the process, the common ancestor of the population evolved, is rarely the most recent common ancestor of the extant individuals. Having no Please cite this article in press as: Cyran, K.A., Kimmel, M., Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. Theoretical Population Biology (2010), doi:10.1016/j.tpb.2010.06.001 ARTICLE IN PRESS K.A. Cyran, M. Kimmel / Theoretical Population Biology ( ) – 3 possibility to distinguish between the two, we assumed in earlier studies (Cyran and Kimmel, 2004, 2005) that the time between the founder and the MRCA is relatively short compared to the time of the whole process. Therefore we treated both as identical, not having information as to what extent this simplification can be justified. But now, with the increase of computational power and mem- ory capacity, we have returned to the problem of implementing the first approach indirectly. We designed software capable of simulating full genealogies of at least 10 2 generations under an arbitrarily chosen distribution of offspring number and with pa- rameters of a branching process identical to those which could reflect the long-time (about 10 4 generations) evolution of mod- ern humans. Subsequently, we checked whether the asymptotic properties of the O’Connell model hold for a small number of gen- erations such as 10 2 . If the asymptotic properties of the slightly supercritical branching process were already valid for simulations comprising as fewas 10 2 generations, then this also will be the case for simulations exceeding 10 2 generations. Therefore, we use the O’Connell model as a theoretical standard, which extrapolates the full genealogy simulation results for arbitrarily many generations. The comparison of the coalescence distributions in different models allows us to observe how sensitive the estimate of TMRCA, the time to the most recent common ancestor, is to specific model assumptions. We consider a sample of n DNAsequences takenfrom a population with the expected duration of a generation (in years) equal to λ. We denote the average pairwise mutation difference in such a sample by d avg and the mutation rate per nucleotide per generation by µ. In the infinitely many sites model the genetic divergence rate between two species, δ, is equal to µ/λ, so we can estimate the mutation rate using the identity ˆ µ = δλ. Then, denoting the expected time to coalescence of two individuals in a population by T 2c , it follows that E(d avg |K 0 = 1) = T MRCA ˆ µE T 2c T MRCA K 0 = 1 , (1) where K 0 is the number of these individuals at generation 0 whose descendants are presently alive. Assuming that T MRCA(y) = λT MRCA is the equivalent of T MRCA expressed in years, the moment-based estimate of T MRCA(y) is ˆ T MRCA(y) = d avg δE T 2c T MRCA K 0 = 1 . (2) In the Wright–Fisher model-based computations, we obtain the expectation E(T 2c /T| K 0 = 1) by performing computer simulations of a branching process starting fromone individual and calculating the required ratio of T 2c and T MRCA . After simulating several thousand processes we planned to obtain the expectation of the ratio. However, only in the model with the record of a full genealogy, the times T 2c and T MRCA were explicitly given and (2) could be applied directly. In other models, only the time T 2c could be computed and the time T MRCA was not available directly. Instead we have at our disposal time T, i.e., duration of the process. Certainly, time T, being the time tothe only individual initiating the branching process, is the time tothe commonancestor of the whole evolved population. However, as mentioned, it is rarely also the time to the most recent common ancestor. Nevertheless, we were able to estimate the ratio of T MRCA and T in simulations with a fully recorded genealogy, and moreover we confirmed that the limiting properties of coalescence distribution in the O’Connell model are valid for as little as 10 2 generations for which we could perform the full genealogy simulations. In this way we could relate in the O’Connell model T MRCA to T and T 2c , and applying the limit theorem we propagated this result to arbitrarily many generations using the equation ˆ T MRCA_y = d avg δE T 2c T T T MRCA K 0 = 1 = E T MRCA T K 0 = 1 d avg E T 2c T K 0 = 1 δ . (3) The parameters δ and d avg included in formula (3) and necessary for estimating ˆ T MRCA(y) can be retrieved from genetic diversity data. Remark. The Wright–Fisher model implies multinomial sampling of chromosomes from one generation to the other, which can be approximated by Poisson sampling. Since the inbreeding effective population size is proportional to the variance of the number of offspring, we also consider the binary fission (BF) distribution and linear fractional (LF) distribution. In this way, we investigate the effects of the departures from the model’s standard, Poisson (P) progeny number distribution, in the direction of under-dispersion (variance less than mean) and over-dispersion (variance greater than mean). 2.1. Wright–Fisher model for a pair of chromosomes Our presentation of the models starts from the Wright–Fisher model applied to the smallest sample exhibiting effects of the genetic drift, i.e. the sample composed of two chromosomes. The model assumes a population of haploid individuals, say mtDNA sequences, which at time t = 0 has the size N t . Since multinomial sampling is assumed, two individuals at generation t + 1 are descendants of the single member of generation t with probability p t = 1/N t . Consequently, with probability q t = 1 − p t they are descendants of two different members. This is reflected in the following distribution of the time to coalescence, T 2c , of two randomly drawn chromosomes (e.g. Bobrowski and Kimmel, 2004) P(T 2c = t) = T−1 ¸ k=T−t q k − T−1 ¸ k=T−t−1 q k = p T−t−1 T−1 ¸ k=T−t q k , (4) where T denotes the number of generations we consider, and for mathematical consistency we let q −1 = 0 and p −1 = 1. The average pairwise mutation difference within a sample, after scaling by the mutation rate µ, corresponds to the expectation of the coalescence distribution (2), and moreover, the discrete nature of generations makes it easy to simulate the demography. Therefore, using Monte Carlo techniques it is possible to estimate the unconditional coalescence distribution by averaging the conditional one using a series of N t realizations required to compute parameters in (4). 2.2. Coalescent model Let us assume a coalescent model with population size N τ variable in time and continuous time τ measured backwards. Suppose also that λ(τ) = N 0 /N τ and that τ 2c is the time to coalescence of a pair of chromosomes observed over N 0 genera- tions. Then, the tail of the distribution of τ 2c is given by P(τ 2c > τ) = exp ¸ − τ 0 λ(u)du ¸ , (5) which is the continuous analog of (4). To ensure existence of a unique common ancestor, λ(t) must satisfy ∞ 0 λ(u)du = ∞. (6) For the stochastic N t , and therefore λ(τ), the right side of the Eq. (5) should be averaged over the process realizations. In the context of our study it is also worth noticing that the continuous Please cite this article in press as: Cyran, K.A., Kimmel, M., Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. Theoretical Population Biology (2010), doi:10.1016/j.tpb.2010.06.001 ARTICLE IN PRESS 4 K.A. Cyran, M. Kimmel / Theoretical Population Biology ( ) – coalescence model correctly approximates the discrete coalescent model as long as 1 − 1/N τ ≈ exp(−1/N τ ), which certainly is not true in the early phase of the branching process, when N t is not large and undergoes stochastic fluctuations. This fact is reflected in the differences between experimental distributions of the time to coalescence in the coalescent model and the O’Connell branching process model. 2.3. O’Connell model Consider a slightly supercritical time-homogeneous branching process with expected number of offspring E(ξ 0 ) = 1 + α/T + o(1/T) and variance Var(ξ 0 ) = σ 2 + O(1/T). For this model, an asymptotic property of the probability P x (Z t > 0) where P x denotes probabilities for the process started by x individuals, satisfies the O’Connell (1995) formula P x (Z t > 0) ∼ 2αx σ 2 ¸ 1 −exp −α t T ¸ , as T →∞. (7) From this it follows (Cyran and Kimmel, 2004) that E(Z T |Z T > 0, Z 0 = x) ∼ σ 2 T 2α [exp (α) −1] , as T →∞ (8) where symbol ∼ denotes asymptotic equivalence. Let us express the time interval [0, T] of variable t as a unit interval [0, 1] of variable r = t/T. Then (O’Connell, 1995), as corrected in Kimmel and Axelrod (2002), for times T long enough, we have the following equation describing the tail of the distribution of D T , the time of death of the last common ancestor of two individuals living at T, giventhat we start the populationhistory fromx individuals having descendants at T lim T→∞ P D T T > r K 0 = x = 2q x r (x −1)! ¸ (q r −1) −x (x −1)! −F (x −1, 1 −q r ) ¸ , (9) where q r = e −rα −e −α 1 −e −α (10) and F: Z + ×(0, 1) →R is defined as F(n, y) = ∂ n ∂y n ¸ ln(1 −y) y 2 ¸ . (11) The O’Connell original distribution is continuous, but to compare it to the discrete empirical distributions described below, we consider the discretized version, specified by the tail of the original distribution computed at points r corresponding to integer values of t = rT. For the sake of terminological simplicity, we will still refer to this discrete distribution as the O’Connell distribution. In the O’Connell model, E T 2c T K 0 = 1 = 1 T E [(T −D T ) |K 0 = 1] , (12) and therefore, the Eq. (3) becomes ˆ T MRCA(y) = E T MRCA T K 0 = 1 × d avg δ 1 −2 1 0 ˆ q r (1−ˆ q r ) 2 ˆ q r −1 −ln ˆ q r dr , (13) where ˆ q r = e −r ˆ α −e −ˆ α 1 −e −ˆ α . (14) Moreover, the expectation of the ratio T MRCA /T in (13) is obtained from the simulations with recorded full genealogies, and ˆ x denotes the estimate of parameter x. Therefore, to calculate the estimated MRCA time ˆ T MRCA(y) from the genetic variation data, we need ˆ α. However, from the simulation results concordant with limiting properties of the O’Connell model we can have the ratio E (T MRCA /T|K 0 = 1). Therefore, we can simultaneously estimate T MRCA(y) and α. From (8), if Z T is substituted as an estimate of its expected value, it follows that Z T = E T T MRCA K 0 = 1 σ 2 ˆ T MRCA(y) 2λˆ α ¸ exp ˆ α −1 ¸ (15) and estimates of T MRCA(y) and α are solutions of the system of Eqs. (13) and (15) for given short-term inbreeding effective population size of females Z T , and genetic data d avg and δ. 2.4. Simulations Simulation mode 1 The first simulation mode implements the full genealogy recording in the branching process model, thus allowing explicit access to any desired feature of the model. In particular, it is possible to trace back the genealogy of a pair of individuals and to find their MRCA, and therefore the actual time of coalescence. By randomly choosing from simulated population a sample of about 100 individuals and determining coalescence of each pair in a sample we obtain a histogram H T2c |tree of the times to coalescence, conditionally on the simulated tree. This histogram is the experimental approximation of the conditional coalescence distribution P (T 2c = t|tree) in the full genealogy model. Having obtained the distribution P (T 2c = t|tree) we also compute its expected value E (T 2c |tree) denoting it as T 2c_avg |tree. Additionally, we trace back the lineages of the whole population to the MRCA, and therefore we have the time T MRCA |tree as well as the ratios (T 2c_avg /T MRCA )|tree and (T MRCA /T)|tree. Finally, by simulating many branching processes and then by averaging over trees generated, we obtain corresponding unconditional characteristics. These characteristics include: the histogram H T2c , the distribution P (T 2c = t) and its expectation E (T 2c ), the distribution P (T 2c_avg = t) with the expectation E (T 2c_avg ), the distribution P (T MRCA = t) with the expected value E (T MRCA ), as well as the histograms and the expectations over genealogies of the ratios T 2c_avg /T MRCA , and T MRCA /T. It is important to notice that E (T 2c_avg /T MRCA ) obtained in the procedure described above, can be used in this model in the Eq. (2) instead of E (T 2c /T MRCA ), what yields a smaller variance estimator. Such substitution is justified from the genetic point of view by linking the expectation E (T 2c_avg ) (scaled by divergence rate δ = µ/λ in (1), with the average pairwise mutation difference in a sample d avg . Note also that the simulations which became extinct were excluded from computations since problems similar to those of dating MRCA of modern humans are always solved conditionally on non-extinction. Simulation mode 2 The second simulation mode stores only the course of population size change described by the branching process. This mode is used for numerical computation of the distribution in the Wright–Fisher (7) or the coalescent (8) models, conditional on N t . In the discrete Wright–Fisher model, the Eq. (7) can be directly applied if the history of N t is known from simulation. However, in the continuous coalescent model, instead of using (7) we apply a Monte Carlo approach by generation of coalescence times from the distribution (8) conditional on N t . This procedure was repeated 10 4 times for one simulated branching process. The conditional histogram is used as the approximation of the conditional distribution P (T 2c = t | N t , CM), in which CM denotes conditioning on the coalescent model. As in the case of Please cite this article in press as: Cyran, K.A., Kimmel, M., Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. Theoretical Population Biology (2010), doi:10.1016/j.tpb.2010.06.001 ARTICLE IN PRESS K.A. Cyran, M. Kimmel / Theoretical Population Biology ( ) – 5 the full genealogy models, the unconditional with respect to N t , but conditional on the model used, distributions P (T 2c = t | CM) and P (T 2c = t | W–F) are obtained by averaging over many realizations of N t ; W–F denotes the Wright–Fisher model. 3. Genetic data To obtain estimates of the time to MRCA we considered the av- erage pairwise mutation differences d avg computed from a com- plete Neanderthal mitochondrial genome sequence determined by high-throughput sequencing (Green et al., 2008). We also com- puted the genetic divergence rate δ based on results reported by Briggs et al. (2009). We also compared the results based on the re- cent data to the results calculated from a sample of 663 mtDNA sequences of modern humans and their homolog sequenced from the Neanderthal fossils (Krings et al., 1999, a single Neanderthal se- quence was sequenced at that time). These latter sequences were takenfromthe hypervariable control regions I (HVRI) andII (HVRII) of mtDNA. As reported by Green et al. (2008), the complete mitochondrial genome of H. sapiens is composed of 16,568 nucleotides (the mitochondrial genome of H. neanderthalensis is 3 nucleotides shorter). The number of average pairwise differences between species is equal to 210.3 ± 4, and within H. sapiens sample 64.8 ± 26.8. This results in the average mutation difference between modern humans and Neanderthals of d avg ∼ = 0.013, and the average mutation difference within modern human population of d avg ∼ = 0.004. In the Krings et al. (1999) sample including the two hypervari- able regions the mutation rate is about 5 times larger. The average number of pairwise differences is equal to 35.3 ±2.3. Therefore the average mutation difference between modern humans and Nean- derthals is equal to d avg ∼ = 0.059. Divergence in contemporary hu- mans results in the average number of pairwise differences equal to 10.9 ± 5.1, and thus the average mutation difference among contemporary humans is equal to d avg ∼ = 0.018. The average mu- tation difference among modern humans calculated originally by O’Connell (1995) was equal to 0.028, but we do not use it because of O’Connell’s small sample of 19 humans. Interestingly, the average mutation difference between H. neanderthalensis and H. sapiens is about 3.25 times greater than the average mutation difference among contemporary humans based on the hypervariable regions and based on the complete mtDNA; the corresponding ratios differ by less than 1%. This suggests among other that the number of mutations in the hypervariable regions is small enough to allow ignoring reverse mutations occurring in both lineages from the time of their MRCA. We assume this time T d to be about 511,000 years ago, based on Briggs et al. (2009) results, which are based on analysis including the information about the dating of the Neanderthal fossils. Combining Briggs et al. (2009) and Green et al. (2008) data and applying the infinitely many sites model, we calculate the rate of divergence for the complete mtDNA as δ = d avg /T d(MN) ∼ = 0.013/511,000 = 2.5 ×10 −8 mutations per nucleotide per year. 4. Results For consistency of comparison, the results will be presented in discrete time measured backwards in generation units. Results of the models that traditionally use differently measured time have been scaled accordingly. Note that despite this, the distributions are depicted for visual convenience as continuous curves. We confirmed that the O’Connell limit model can approximate the models with full genealogy even if the number of generations is as small as 10 2 . The models with full genealogy yield visually indistinguishable distributions P(T 2c = t), P(T 2c(avg) = t), P(T MRCA = t) and P(T 2c(avg) /T MRCA = x) for the same mean Fig. 1. Distributions of T 2c computed in the full genealogy model. a b Fig. 2. Distributions in the full genealogy model: general comparison (a), comparison of distributions of T 2c in the full genealogy model with binary fission offspring number distribution and the limit O’Connell model (b). number of progeny regardless of the offspring number distribution. In particular, in the agreement with the O’Connell model, these distributions are independent of the variance of the progeny count. This fact is illustrated in Fig. 1 for P(T 2c = t). The comparison of the shapes of the coalescence distributions obtained in different stochastic models with the coalescence distribution conditional on the expected size of the branching process P(T 2c = t|E(N t )) for a Poisson offspring number distribution is given in Fig. 2(a). More importantly, the distributions P(T 2c = t) obtained for any offspring distribution are also visually identical to the O’Connell limiting distribution for as few as 10 2 generations with O’Connell parameter α = 10 (see Fig. 2(b) for the BF distribution). Specifi- cally, O’Connell (1995) argued that any value between 10 and 14 is not contradicted by demographic evidence and has little effect on estimates. We choose α = 10. Visual inspection of Fig. 2(b) and a comparison of expectations presented in Table 1 prove that the limiting O’Connell distribution P(T 2c = t|OC) almost perfectly approximates the distributions of T 2c obtainedfromthe full geneal- ogy model for 10 2 generations. Therefore we can map the expec- tation of TMRCA available directly from the full genealogy model on a time scale of the O’Connell process, and can obtain the expec- tation of the ratio T MRCA /T required in (3). Because of the asymp- totic character, the result becomes valid for an arbitrary number of Please cite this article in press as: Cyran, K.A., Kimmel, M., Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. Theoretical Population Biology (2010), doi:10.1016/j.tpb.2010.06.001 ARTICLE IN PRESS 6 K.A. Cyran, M. Kimmel / Theoretical Population Biology ( ) – Table 1 Expectations of the ratio T 2c /T ±SDin the O’Connell and the full genealogy models. Model E (T 2c /T) O’Connell 0.8054 ±0.1591 Full genealogy with BF progeny 0.8097 ±0.1585 Full genealogy with P progeny 0.8008 ±0.1645 Full genealogy with LF progeny 0.8002 ±0.1662 a b Fig. 3. Comparison of the distributions of T 2c computed in the Wright–Fisher, the coalescent and the O’Connell models assuming the binary fission (a), and Poisson (b) progeny number distribution. generations exceeding 10 2 . Therefore, even if it is not possible to directly estimate E(T MRCA /T) in a full genealogy model, we may in- directly combine the results with the limiting O’Connell properties and obtain E(T MRCA /T) for the number of generations of the order 10 4 corresponding to the time elapsed fromthe death of the mtEve. In our work we have also characterized the relationship of the Wright–Fisher discrete model to the continuous coalescent model instochastic populationhistories approximatedby a slightly supercritical branching process. Both models considered deviate from the O’Connell model for offspring distributions other than Poisson (see Fig. 3(a) for the BF offspring number distribution). Since the continuous coalescent model is equivalent to the diffusion process limit, which in turn depends on the variance of progeny number, this result can be explained by the variances of binary fission and linear fractional distributions deviating from the variance of Poisson distribution in opposite directions. There is one more interesting fact which can be observed. Namely, for times t close to T (corresponding to the beginning of branching process) the continuous approximation assumed in coalescent theory loses validity and the distribution differs more and more from the Wright–Fisher distribution. This is finally reflected in the atom of probability at t = T required for probabilities to sum to one (see Fig. 3(b)). However, despite this visually striking feature, the expectations of T 2c |WF and T 2c |CM remain very similar (see Table 2). Table 2 Comparison of the expectations of T 2c /T computed in the Wright–Fisher and the coalescent models for different progeny number distributions. Progeny number distribution E (T 2c /T|WF) E (T 2c /T|CM) BF 0.7497 0.7585 P 0.8005 0.8078 LF 0.8454 0.8550 Table 3 Expectations of different ratios of the coalescence times and their standard deviations computed in the full genealogy model for various distributions of the number of progeny. Parameter Binary fission Poisson Linear fractional E (T 2c /T) 0.8097 ±0.1585 0.8008 ±0.1645 0.8002 ±0.1662 E (T 2c_avg /T) 0.8097 ±0.1057 0.8009 ±0.1124 0.8001 ±0.1150 E (T MRCA /T) 0.9094 ±0.0950 0.9032 ±0.1011 0.9017 ±0.1040 E (T 2c_avg /T MRCA ) 0.9068 ±0.0482 0.9027 ±0.0532 0.9035 ±0.0535 Table 4 Expected values of the time to the MRCA of modern humans computed in the O’Connell model, the branching process full genealogy models, the Wright–Fisher models, and the coalescent models. Model ˆ T MRCA(y) (years ×10 −3 ) O’Connell limit 176 Full genealogy, binary fission 174 Full genealogy, Poisson 174 Full genealogy, linear fractional 174 Wright–Fisher, binary fission 189 Wright–Fisher, Poisson 178 Wright–Fisher, linear fractional 168 Coalescent, binary fission 187 Coalescent, Poisson 175 Coalescent, linear fractional 165 The expected values and the standard deviations of various ratios of coalescence times available directly only in the full genealogy model are collected in Table 3. Using the O’Connell model with the T MRCA moment computed based on the full genealogy model, we can estimate the time to the mtEve. The estimates of this time, assuming δ = 2.5 × 10 −8 and d avg = 0.003, for different population histories, are given in Table 4. Comparison of the numbers in the Table 4 with the expectation based on the hypervariable regions equal to 163 × 10 3 years with the 95% confidence interval [111 × 10 3 , 260 × 10 3 ] as reported by Krings et al. (1999) shows that all stochastic model predictions based on the recent complete mtDNA data fall into the phylogenetically obtained interval, although particular coalescence time distributions vary among the models considered. Table 5 presents the data required to compute the expected value of T MRCA and the 95% confidence interval in the full genealogy model with the use of Eq. (2). As can be seen, we obtain the expectedvalue of T MRCA = 174×10 3 years withthe 95%confidence interval [96×10 3 , 449×10 3 ]. The confidence interval is computed in a conservative way; i.e., to compute the lower bound of T MRCA we used the lower bound of d avg and the upper bound of δ and T 2c /T MRCA , whereas for the upper bound of T MRCA we used the upper bound of d avg and the lower bound of δ, and T 2c /T MRCA , respectively. To compute the lower bound of δ we used the lower bound of the average mutation difference between modern humans and Neanderthals d HN and the date of the MRCA of the two species of 389 × 10 3 years ago, while for the upper bound of δ we used the upper bound of d HN and the date of the MRCA of modern humans and Neanderthals 641 ×10 3 years ago (Briggs et al., 2009). Please cite this article in press as: Cyran, K.A., Kimmel, M., Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. Theoretical Population Biology (2010), doi:10.1016/j.tpb.2010.06.001 ARTICLE IN PRESS K.A. Cyran, M. Kimmel / Theoretical Population Biology ( ) – 7 Table 5 Expected value and the 95% confidence interval of the estimates. Parameter Lower confidence bound Expected value Upper confidence bound d avg 0.003 0.0039 0.0055 d HN 0.0122 0.0127 0.0131 T MRCAHN(y) 389 ×10 3 511 ×10 3 641 ×10 3 δ 2.04 ×10 −8 2.5 ×10 −8 3.14 ×10 −8 T 2c /T MRCA 0.6 0.9 1 T MRCA(y) 96 ×10 3 174 ×10 3 449 ×10 3 5. Discussion Until the last decade, estimation of the divergence rate in pre-modern and modern humans could rely only on hu- man–chimpanzee divergence data. Methods used were based on phylogenetic trees constructed either by maximum likelihood or maximum parsimony and rooted using the chimpanzee as an out- group. However, due to the relatively long time to this divergence, all estimates of this time were very inaccurate, ranging from 4 to 9 million years. Consequently, the estimated divergence rate and time to the MRCA of modern humans could not be accurate, with expected values ranging from200,000 years ago (Wilson and Cann, 1992; Vigilant et al., 1991) to 300,000 years ago (Hasegawa and Horai, 1991). Additionally, many possible patterns of human pop- ulation growth were assumed. The simplified exponential models were often used, but also the logistic growth of human popula- tion proved to be not inconsistent with the mtDNA variation data (Polanski et al., 1998). The majority of these estimates were in agreement with the out-of-Africa scenario and in contradiction to the multiregional theory of origin of modern humans, supported by some paleon- tologists (Thorne and Wolpoff, 1992). These researchers claimed that the time to the MRCA should be placed about a million years ago or even earlier. It should be emphasized that the genetic data did not necessarily contradict the multiregional theory, as shown by O’Connell (1995). He inferred, using the branching process model that the genetic diversity of modern humans was consistent with estimates of the mtEve existing between 700 thousand and 1.5 million years ago. These estimates depended on an inaccurate inference of the human–chimpanzee divergence time and on the methods of the inference. To validate his conclusions, O’Connell (1995) also indicated the weak points of the outgroup methods when the genetic distance between the outgroup and the sample was large. Since the application of different methods to the same genetic data gave results differing by almost one order of magni- tude, the multiregional hypothesis could not be rejected solely be- cause it was in contradiction with the majority of genetic models, while there were still models supporting it. The situation changed after 1997 (Krings et al., 1997), when, for the first time, the mtDNA from Neanderthals dated to live until about 40,000 years ago (Schmitz et al., 2002) was sequenced. However, fewer than 400 base pairs were sequenced; hence, any estimates based on this data were not very reliable. The next successful sequencings of Neanderthal mtDNA in 1999 (Krings et al., 1999, 2000; Ovchinnikov et al., 2000; Krings et al., 2000) confirmed the accuracy of the first experiment and radically changed the estimates of the time to the most recent common female ancestor of modernhumans. Since it seems fromthe genetic data (Krings et al., 1999; Green et al., 2008; Briggs et al., 2009) that Neanderthals did not contribute mtDNA to the lineages of presently living modern humans, the time of the mtEve should be placed after the H. sapiens – H. neanderthalensis divergence. Even if later studies (Serré et al., 2004; Cyran and Kimmel, 2005) indicated that interbreeding between two human forms could not be excluded, and moreover that there is an evidence of a small- scale gene flow (Green et al., 2010), it remains true that the root of mtDNA of all currently living humans should be placed after that of humans and Neanderthals. A slight admixture of at most 25% (Serré et al., 2004) or 15% (Cyran and Kimmel, 2005) disappeared as a result of the genetic drift. Therefore, even if the results of the Neanderthal Genome Project suggest possible interbreeding between the Neanderthals and the archaic Europeans yielding about 3% admixture of the nuclear DNA (Plagnol and Wall, 2006; Pennisi, 2006; Green et al., 2010), treating the Neanderthals as a mtDNA outgroup is justified. In the paper we compared the distributions of the time to coalescence of a pair of chromosomes obtained by conceptually different methods. In particular, we proved that a branching process evolving for as little as 10 2 generations is approximated by the O’Connell model with an accuracy of less than 2%. Moreover, the result holds for three offspring number distributions, and due to the asymptotic character of the O’Connell results, it also remains true for the evolution of branching processes with an arbitrarily large number of generations. Having this result, we were able to obtain the estimate of the expected value of the ratio of the coalescence times of two individuals and that of all individuals in the population for generation 10 4 , even if it is infeasible to apply a full genealogy model in this case. Finally, we applied our approach to estimate the age of the root of the mtDNA polymorphism of modern humans based on the genetic material belonging to contemporary humans and Neanderthal fossils, as reported by Krings et al. (1999), Green et al. (2008) and Briggs et al. (2009). For all stochastic trajectories we analyzed, the resulting time falls into a 95% confidence interval of the estimate based on phylogenetic trees (Krings et al., 1999). Moreover, the result depends mainly (in fact linearly) on the assumed time to the MRCA of Neanderthals and modern humans rather than the method that was applied. Therefore, we conclude that the stochastic models based on branching processes provide similar estimates to those obtained using phylogenetic analysis, each supporting the other. Hence, our results indicate that the estimates of the time to coalescence in the Wright–Fisher and the coalescent models with random population size are quite robust in terms of their insensitivity to the model assumptions. They deviate by less than 8% (see Table 4) from the O’Connell model predictions, and the asymptotic O’Connell prediction differs from the actual value computed in the full genealogy model by only 1.6%. Such small differences are likely to be negligible compared to the large range of confidence intervals obtained not only in pairwise difference-based methods considered in the paper, but also in the phylogenetic studies (Krings et al., 1999; Green et al., 2008; Briggs et al., 2009). The greatest uncertainty of the mathematical expectations is caused by the scaling factors such as the divergence rate between species (the rate of the molecular clock) and not by deviations from the particular assumption of the method used. This validates both the Wright–Fisher and the coalescent models with stochastic population sizes also for reproduction schemes not following assumptions of these models. In particular, this provides support to the results about inferring population trajectory from the genetic diversity data, as reviewed in Wooding and Rogers (2002); results which implicitly relied on the Wright–Fisher model assumption but which remain valid for a much larger spectrum of possible demographies. Please cite this article in press as: Cyran, K.A., Kimmel, M., Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. Theoretical Population Biology (2010), doi:10.1016/j.tpb.2010.06.001 ARTICLE IN PRESS 8 K.A. Cyran, M. Kimmel / Theoretical Population Biology ( ) – Acknowledgments The authors would like to thank the anonymous reviewers for their suggestions and comments. KC was supported by a Grant from the Polish Ministry of Science and Higher Education No. NN519 31 9035 from funds for supporting science in 2008–2010. MK was supported by a CPRIT grant number RP101089 and by a grant fromthe Polish Ministry of Science and Higher Education No. NN519 579938. Assistance and collegiality of Dr. Jan Hewitt of Rice University in the editing of the final version of the manuscript is hereby gratefully acknowledged. References Bjorklund, M., 2003. Test for a population expansion after a drastic reduction in population size using DNA sequence data. Heredity 91, 481–486. Bobrowski, A., Kimmel, M., 2004. Asymptotic behavior of joint distributions of characteristics of a pair of randomly chosen individuals in discrete-time Fisher–Wright models withmutations anddrift. Theor. Popul. Biol. 66, 355–367. Briggs, A.W., Good, J.M., Green, R.E., Krause, J., Maricic, T., Stenzel, U., Lalueza-Fox, C., Rudan, P., Brajkovi, D., Kucan, Z., Gui, I., Schmitz, R., Doronichev, V.B., Golovanova, L.V., de la Rasilla, M., Fortea, J., Rosas, A., Pääbo, S., 2009. Targeted retrieval and analysis of five Neanderthal mtDNA genomes. Science 325, 318–321. Burbano, H.A., Hodges, E., Green, R.E., Briggs, A.W., Krause, J., Meyer, M., Good, J.M., Maricic, T., Johnson, Ph.L.F., Xuan, Z., Rooks, M., Bhattacharjee, A., Brizuela, L., Albert, F.W., de la Rasilla, M., Fortea, J., Rosas, A., Lachmann, M., Hannon, G.J., Pääbo, S., 2010. Targeted investigation of the Neanderthal genome by array- based sequence capture. Science 328, 723–725. Cyran, K.A., 2007. Simulating branching processed in the problem of Mitochondrial Eve dating based on coalescent distributions. Int. J. Math. Comput. Simul. 1, 268–274. Cyran, K.A., Kimmel, M., 2004. Robustness of the dating of the most recent common female ancestor of modern humans. In: Proc. Tenth National Conference on Application of Mathematics in Biology and Medicine, Swi ¸ ety Krzyż, Poland. pp. 19–24. Cyran, K.A., Kimmel, M., 2005. Interactions of Neanderthals and modern humans: what can be inferred from mitochondrial DNA? Math. Biosci. Eng. 2, 487–498. Cyran, K.A., Myszor, D., 2008. New artificial neural network based test for the detection of past population expansion using microsatellite loci. Int. J. Appl. Math. Inform. 2, 1–9. Green, R.E., Krause, J., Briggs, A.W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M.H.-Y., Hansen, N.F., Durand, E.Y., Malaspinas, A.-S., Jensen, J.D., Marques-Bonet, T., Alkan, C., Prüfer, K., Meyer, M., Burbano, H.A., Good, J.M., Schultz, R., Aximu-Petri, A., Butthof, A., Höber, B., Höffner, B., Siegemund, M., Weihmann, A., Nusbaum, Ch., Lander, E.S., Russ, C., Novod, N., Affourtit, J., Egholm, M., Verna, Ch., Rudan, P., Brajkovic, D., Kucan, Ž., Gušic, I., Doronichev, V.B., Golovanova, L.V., Lalueza-Fox, C., de la Rasilla, M., Fortea, J., Rosas, A., Schmitz, R.W., Johnson, Ph.L.F., Eichler, E.E., Falush, D., Birney, E., Mullikin, J.C., Slatkin, M., Nielsen, R., Kelso, J., Lachmann, M., Reich, D., Pääbo, S., 2010. A draft sequence of the Neanderthal genome. Science 328, 710–721. Green, R.E., Krause, J., Ptak, S.E., Briggs, A.W., Ronan, M.T., Simons, J.F., Du, L., Egholm, M., Rothberg, J.M., Paunovic, M., Pääbo, S., 2006. Analysis of one million base pairs of Neanderthal DNA. Nature 444, 330–336. Green, R.E., Malaspinas, A.-S., Krause, J., Briggs, A.W., Johnson, Ph.L.F., Uhler, C., Meyer, M., Good, J.M., Maricic, T., Stenzel, U., Pruefer, K., Siebauer, M., Burbano, H.A., Ronan, M., Rothberg, J.M., Egholm, M., Rudan, P., Brajkovic, D., Kucan, Z., Gusic, I., Wikstrom, M., Laakkonen, L., Kelso, J., Slatkin, M., Pääbo, S., 2008. A complete Neanderthal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134, 416–426. Griffiths, R.C., Tavaré, S., 1995. Unrooted genealogical tree probabilities in the infinitely-many-sites model. Math. Biosci. 127, 77–98. Hasegawa, M., Horai, S., 1991. Time of the deepest root for polymorphismin human mitochondrial DNA. J. Mol. Evol. 32, 37–42. Jobling, M., 2001. In the name of the father: surnames and genetics. Trends Genet. 17, 353–357. Kimmel, M., Axelrod, D.E., 2002. Branching Processes in Biology. Springer-Verlag, New York. Kimmel, M., Chakraborty, R., King, J., Bamshad, M., Watkins, W., Jorde, L., 1998. Signatures of population expansion in microsatellite repeat data. Genetics 148, 1921–1930. King, J.P., Kimmel, M., Chakraborty, R., 2000. Apower analysis of microstallite-based statistics for inferring past population growth. Mol. Biol. Evol. 17, 1859–1868. Klebaner, F.C., Sagitov, S., 2002. The age of a Galton–Watson population with a geometric offspring distribution. J. Appl. Probab. 39, 816–828. Krings, M., Capelli, C., Tschentscher, F., Geisert, H., Meyer, S., von Haeseler, A., Grossschmidt, K., Possnert, G., Paunovic, M., Pääbo, S., 2000. A view of Neanderthal genetic diversity. Nat. Genet. 26, 144–146. Krings, M., Geisert, H., Schmitz, R., Krainitzki, H., Pääbo, S., 1999. DNA sequence of the mitochondrial hypervariable region II fromthe Neanderthal type specimen. Proc. Natl. Acad. Sci. USA 96, 5581–5585. Krings, M., Stone, A., Schmitz, R., Krainitzki, H., Stoneking, M., Pääbo, S., 1997. Neanderthal DNA sequences and the origin of modern humans. Cell 90, 19–30. Laan, M., Wiebe, V., Khusnutdinova, E., Remm, M., Pääbo, S., 2005. X-chromosome as a marker for population history: linkage disequilibriumand haplotype study in Euroasians populations. Eur. J. Hum. Genet. 13, 452–462. Lambert, A., 2003. Coalescence times for the branching process. Adv. Appl. Probab. 35, 1071–1098. Nagylaki, T., 1990. Models and approximations for random genetic drift. Theor. Popul. Biol. 37, 192–212. Noonan, J.P., Coop, G., Kudaravalli, S., Smith, D., Krause, J., Alessi, J., Chen, F., Platt, D., Pääbo, S., Pritchard, J.K., Rubin, E.M., 2006. Sequencing and analysis of Neanderthal genomic DNA. Science 314, 1113–1118. O’Connell, N., 1995. The genealogy of branching processes and the age of our most recent common ancestor. Adv. Appl. Probab. 27, 418–442. Ovchinnikov, I., Götherström, A., Romanova, G., Kharitonov, V., Lidén, K., Goodwin, W., 2000. Molecular analysis of Neanderthal DNA from the northern Caucasus. Nature 404, 490–493. Pennisi, E., 2007. No sex please, we’re Neanderthals. Science 318, 967. Pennisi, E., 2006. The dawn of the stone age genomics. Science 314, 1068–1071. Plagnol, V., Wall, J.D., 2006. Possible ancestral structure in human populations. PLoS Genet. 2, 972–979. Polanski, A., Kimmel, M., Chakraborty, R., 1998. Application of time-dependent coalescence process for inferring the history of population size changes from DNA sequence data. Proc. Natl. Acad. Sci. USA 95, 5456–5461. Schmitz, R., Bonani, G., Smith, F.H., 2002. Newresearch at the Neanderthal type site in the Neander Valley of Germany. J. Hum. Evol. 42, A32. Serré, D., Langaney, A., Chech, M., Teschler-Nicola, M., Paunovic, M., Mennecier, P., Hofreiter, M., Possnert, G., Pääbo, S., 2004. No evidence of Neanderthal mtDNA contribution to early modern humans. PLOS Biol. 2, 313–317. Thompson, R., Pritchard, J., Shen, P., Oefner, P., Feldman, M., 2000. Recent common ancestry of human Y chromosomes: evidence from DNA sequence data. Proc. Natl. Acad. Sci. USA 97, 7360–7365. Thorne, A., Wolpoff, M.H., 1992. The multiregional evolution of humans. Scientific American 266, 76–83. Vigilant, L., Stoneking, M., Harpending, H., Hawkes, K., Wilson, A.C., 1991. African populations and the evolution of human mitochondrial DNA. Science 253, 1503–1507. Wilson, A.C., Cann, R.L., 1992. The recent African genesis of humans. Scientific American 266, 68–73. Wooding, S., Rogers, A., 2000. A pleistocence population X-plosion? Hum. Biol. 72, 693–695. Wooding, S., Rogers, A., 2002. The matrix coalescence and an application to human single-nucleotide polymorphisms. Genetics 161, 1641–1650. Yu, N., Zhao, Z., Fu, Y., Sambuughin, N., Ramsay, M., Jenkins, T., Leskinen, E., Patthy, L., Jorde, L., Kuromori, T., Li, W., 2001. Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. Mol. Biol. Evol. 18, 214–222. Please cite this article in press as: Cyran, K.A., Kimmel, M., Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. Theoretical Population Biology (2010), doi:10.1016/j.tpb.2010.06.001 However. but during the population bottlenecks. Moreover. the Wright–Fisher model is not limited to any specific growth pattern. the common ancestor of the population evolved. tend to give estimates with a smaller variance than those based on pairwise differences. Kimmel / Theoretical Population Biology ( ) – time to the MRCA. using computer software designed by us for this purpose. attempting to use all genetic information contained in a sample to build the genealogy (e. coalescent models are robust for large populations. Griffiths and Tavaré. it requires a large amount of memory for storing information about each generation. 2.1016/j. Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. doi:10. is a powerful tool based on the reversetime analysis of the Wright–Fisher model of genetic drift.tpb.2.. Namely. This model was originally proposed by O’Connell (1995) for dating the mitochondrial Eve (mtEve) based on a sample of mtDNA of humans and chimpanzees. To be able to compare these methodologies reliably. The problem becomes clear if we realize that the founder of the process. For large populations. They were also used in Cyran and Kimmel (2005) for conservative estimation of the parameter α (see Section 2. for a given expected number of offspring. Two approaches will be used by us to study how sensitive is the distribution of the time to coalescence to specific model assumptions. we also considered coalescence in a sample of n chromosomes randomly chosen from a population. In the alternative approach.06. We consider models suitable for the analysis of data having the form of numbers of pairwise differences between DNA sequences in the sample. which depend only on the mean and on the variance of offspring number distribution.html). we report estimations of the time to mitochondrial MRCA of modern humans to show how sensitive these estimates are to the assumptions made by the various compared models. However. the computational difficulties of the recursion involved as well as the difficulty of linking the results with the known genetic indices make the approach troublesome. M. they cannot be directly compared with the O’Connell model playing an important role in our paper. This property is interesting in the light of classical results in which the shortterm inbreeding effective population size is proportional to the variance of the offspring number distribution. All three models are applied to stochastic population growth approximated by a slightly supercritical Galton–Watson branching process. with the exception of small population sizes. we consider the O’Connell model as a standard because of its independence of the offspring number distribution and our interest in supercritical processes which can model the long-term growth of the human population size. There might be some doubt whether the use of the time to coalescence of two chromosomes is an adequate tool to estimate the age of the MRCA. 1995. The results of such experiments were reported by Cyran and Kimmel (2004) and Cyran (2007). the model is important as an alternative to the Wright–Fisher model. Although phylogenetic methods. such as that of Nagylaki (1990). M. coalescent models are equivalent to diffusion process approximations. Yet except for some early classical work. After simulating several thousand genealogies we estimated parameters with great accuracy. and therefore it seems practically infeasible for simulations of the required number of generations. How relevant for the model predictions are departures from a panmictic population (in the case of the Wright–Fisher model) and from large population size (the case of the coalescent method)? To answer this question we compare three different models for calculating the distribution of the time to coalescence of a pair of chromosomes. Addressing this problem. and therefore the variance influences the shape of the coalescence distribution. the coalescence distributions in a real population converge to O’Connell’s limit.. The growing interest in studies concerning genealogies of branching processes is reflected among others by the studies of Klebaner and Sagitov (2002) focused on the geometric distribution of progeny. Models The two population genetics models we employ are defined further on in Sections 2. and the length of time to the MRCA.A. the population history is simulated and only the population size is recorded as a function of time. and by the work of Lambert (2003) focused on subcritical cases. In particular it can be applied for an arbitrary progeny distribution (possibly changing in time) used to model the evolution of the population as a branching process. This approach is very general since it is not limited to any generation-to-generation sampling scheme or assumption about large population size. Consequently. 1995). Theoretical Population Biology (2010). K.uk/~griff/software. Still. is rarely the most recent common ancestor of the extant individuals. by regarding only the sizes of the population and not its full genealogy. Though these are not quite realistic assumptions. this paper compares coalescence distributions under a range of Wright–Fisher models (including those which arose from the time continuous coalescent) to distributions obtained from the O’Connell model. Griffiths’ Genetree software.2010.3 for the definition of how this parameter influences the expansion rate of the population) in a problem of hypothetical Neanderthal contribution to the modern human mtDNA gene pool. Using this information and approaches such as in Bobrowski and Kimmel (2004) we compute the coalescence distributions of the pairs of sequences. Analysis of the aforementioned approach is beyond the scope of this article and will be discussed in a separate paper.ox. Then. and in particular of its variance as long as this variance is bounded.ac. The invariance of the offspring number distribution in the O’Connell model is theoretically valid in a limit. The first approach requires storing the entire simulated genealogy of a population. especially that of time homogeneity. • the coalescent-based model with continuous time scaled by the size of population variable in time. The O’Connell’s limit results are based on the assumption that the population is growing as a slightly supercritical branching process with progeny distributions homogeneous in time. since it does not assume any particular offspring distribution. available from http://www. It remains unknown how fast. asymptotically. Cyran. Kimmel. Finally.ARTICLE IN PRESS 2 K. the diffusion approximation may fail. the experimental distribution of the times to coalescence can be found and then compared to those obtained in the Wright–Fisher and the coalescent models. the O’Connell model is independent of the shape of the progeny number distribution. This could be answered only by time-forward simulation of the full branching process genealogy and then by comparing the actual distributions with limiting results. we designed a computational framework to estimate the twochromosome coalescence distribution in any of these models as well as in a model based on a full record of the population history. As a real world application of our results. These include • the Wright–Fisher model with discrete generations. However. and • the less known O’Connell model based on branching processes (see O’Connell.001 . Therefore.stats.g.1 and 2. in the terms of number of generations. In contrast to the O’Connell model. relatively little effort has been expended in analyzing its relationship to other models in terms of sensitivity to departures from the models assumptions. it is impossible to distinguish between the length of time of the entire simulation started from the ancestral individual. by averaging over genealogies.A. this methodology lacks one important feature which could be taken into consideration only in the first approach. Having no Please cite this article in press as: Cyran. the results of all these models are compared with the actual distributions obtained from simulations of several thousand full genealogies. . and applying the limit theorem where T denotes the number of generations we consider.e. as mentioned. Since multinomial sampling is assumed. the time to the most recent common ancestor. is equal to µ/λ. K. (5) which is the continuous analog of (4). But now. after scaling by the mutation rate µ. which extrapolates the full genealogy simulation results for arbitrarily many generations. λ(t ) must satisfy ∞ λ(u)du = ∞. In other models. the tail of the distribution of τ2c is given by P (τ2c > τ ) = exp − 0 τ λ(u)du . Subsequently. Bobrowski and Kimmel. with the increase of computational power and memory capacity. 2. using Monte Carlo techniques it is possible to estimate the unconditional coalescence distribution by averaging the conditional one using a series of Nt realizations required to compute parameters in (4).001 . (2) K0 = 1 In the Wright–Fisher model-based computations.. and moreover we confirmed that the limiting properties of coalescence distribution in the O’Connell model are valid for as little as 102 generations for which we could perform the full genealogy simulations.A. The model assumes a population of haploid individuals. In this way we could relate in the O’Connell model TMRCA to T and T2c . it follows that E (davg |K0 = 1) = TMRCA µE ˆ T2c TMRCA K0 = 1 . 2004) T −1 T −1 T −1 P (T2c = t ) = k=T −t qk − k=T −t −1 q k = p T −t −1 k=T −t qk . Poisson (P) progeny number distribution. Therefore we treated both as identical. it is rarely also the time to the most recent common ancestor. Certainly. In this way. and therefore λ(τ ). 2. is to specific model assumptions.1. Kimmel. However. Therefore. the discrete nature of generations makes it easy to simulate the demography. which can be approximated by Poisson sampling. The average pairwise mutation difference within a sample. the times T2c and TMRCA were explicitly given and (2) could be applied directly. Coalescent model Let us assume a coalescent model with population size Nτ variable in time and continuous time τ measured backwards. We designed software capable of simulating full genealogies of at least 102 generations under an arbitrarily chosen distribution of offspring number and with parameters of a branching process identical to those which could reflect the long-time (about 104 generations) evolution of modern humans. not having information as to what extent this simplification can be justified. we obtain the expectation E (T2c /T | K0 = 1) by performing computer simulations of a branching process starting from one individual and calculating the required ratio of T2c and TMRCA . (4) where K0 is the number of these individuals at generation 0 whose descendants are presently alive. Suppose also that λ(τ ) = N0 /Nτ and that τ2c is the time to coalescence of a pair of chromosomes observed over N0 generations. i.1016/j. After simulating several thousand processes we planned to obtain the expectation of the ratio.. 2004. The Wright–Fisher model implies multinomial sampling of chromosomes from one generation to the other. duration of the process.g. Cyran. we investigate the effects of the departures from the model’s standard. Kimmel / Theoretical Population Biology ( ) – 3 possibility to distinguish between the two. so we can estimate the mutation rate using the identity µ = δλ. (3) T2c T T TMRCA The parameters δ and davg included in formula (3) and necessary ˆ for estimating TMRCA(y) can be retrieved from genetic diversity data. Wright–Fisher model for a pair of chromosomes Our presentation of the models starts from the Wright–Fisher model applied to the smallest sample exhibiting effects of the genetic drift. and for mathematical consistency we let q−1 = 0 and p−1 = 1. doi:10. which at time t = 0 has the size Nt . Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. time T . Remark. T2c . corresponds to the expectation of the coalescence distribution (2). Instead we have at our disposal time T . two individuals at generation t + 1 are descendants of the single member of generation t with probability pt = 1/Nt . Nevertheless. 0 (6) For the stochastic Nt . the moment-based estimate of TMRCA(y) is ˆ TMRCA(y) = δE davg T2c TMRCA . we also consider the binary fission (BF) distribution and linear fractional (LF) distribution. In the context of our study it is also worth noticing that the continuous Please cite this article in press as: Cyran. Assuming that TMRCA(y) = λTMRCA is the equivalent of TMRCA expressed in years.e. say mtDNA sequences. If the asymptotic properties of the slightly supercritical branching process were already valid for simulations comprising as few as 102 generations.A. To ensure existence of a unique common ancestor. The comparison of the coalescence distributions in different models allows us to observe how sensitive the estimate of TMRCA. (5) should be averaged over the process realizations. ˆ denoting the expected time to coalescence of two individuals in a population by T2c . Consequently. the right side of the Eq. Then. However. we have returned to the problem of implementing the first approach indirectly. we assumed in earlier studies (Cyran and Kimmel. (1) we propagated this result to arbitrarily many generations using the equation davg E TMRCA T T2c T K0 = 1 davg ˆ TMRCA_y = δE = K0 = 1 E K0 = 1 δ . M.06. 2005) that the time between the founder and the MRCA is relatively short compared to the time of the whole process. δ . only in the model with the record of a full genealogy. we were able to estimate the ratio of TMRCA and T in simulations with a fully recorded genealogy. in the direction of under-dispersion (variance less than mean) and over-dispersion (variance greater than mean). is the time to the common ancestor of the whole evolved population. then this also will be the case for simulations exceeding 102 generations. only the time T2c could be computed and the time TMRCA was not available directly. the sample composed of two chromosomes. M. This is reflected in the following distribution of the time to coalescence. In the infinitely many sites model the genetic divergence rate between two species. we checked whether the asymptotic properties of the O’Connell model hold for a small number of generations such as 102 . Then. and moreover.ARTICLE IN PRESS K. Theoretical Population Biology (2010).2.tpb. we use the O’Connell model as a theoretical standard.2010. We consider a sample of n DNA sequences taken from a population with the expected duration of a generation (in years) equal to λ. Since the inbreeding effective population size is proportional to the variance of the number of offspring. with probability qt = 1 − pt they are descendants of two different members. being the time to the only individual initiating the branching process. Therefore. i. of two randomly drawn chromosomes (e. We denote the average pairwise mutation difference in such a sample by davg and the mutation rate per nucleotide per generation by µ. This fact is reflected in the differences between experimental distributions of the time to coalescence in the coalescent model and the O’Connell branching process model. if ZT is substituted as an estimate of its expected value. 2. Additionally. we consider the discretized version. For this model. as well as the histograms and the expectations over genealogies of the ratios T2c_avg /TMRCA . By randomly choosing from simulated population a sample of about 100 individuals and determining coalescence of each pair in a sample we obtain a histogram HT 2c |tree of the times to coalescence. in which CM denotes conditioning on the coalescent model. by simulating many branching processes and then by averaging over trees generated. In particular. we obtain corresponding unconditional characteristics. T ] of variable t as a unit interval [0. when Nt is not large and undergoes stochastic fluctuations. we trace back the lineages of the whole population to the MRCA.tpb.A. it follows that ˆ σ 2 TMRCA(y) exp α − 1 ˆ (15) TMRCA 2λα ˆ and estimates of TMRCA(y) and α are solutions of the system of Eqs. 2α as T → ∞ (8) where symbol ∼ denotes asymptotic equivalence. (2) instead of E (T2c /TMRCA ). specified by the tail of the original distribution computed at points r corresponding to integer values of t = rT . Note also that the simulations which became extinct were excluded from computations since problems similar to those of dating MRCA of modern humans are always solved conditionally on non-extinction. M. Simulations Simulation mode 1 The first simulation mode implements the full genealogy recording in the branching process model. O’Connell model Consider a slightly supercritical time-homogeneous branching process with expected number of offspring E (ξ0 ) = 1 + α/T + o(1/T ) and variance Var(ξ0 ) = σ 2 + O(1/T ). the distribution P (T2c = t ) and its expectation E (T2c ). However. (9) = (x − 1)! where qr = e−r α − e−α 1 − e−α (10) and F : Z+ × (0. and genetic data davg and δ . 1) → R is defined as F (n. in the continuous coalescent model. which certainly is not true in the early phase of the branching process. what yields a smaller variance estimator. For the sake of terminological simplicity. conditional on Nt .4. the expectation of the ratio TMRCA /T in (13) is obtained ˆ from the simulations with recorded full genealogies. The conditional histogram is used as the approximation of the conditional distribution P (T2c = t | Nt . y) = ∂n ∂ yn ln(1 − y) y2 . Such substitution is justified from the genetic point of view by linking the expectation E (T2c_avg ) (scaled by divergence rate δ = µ/λ in (1). 1995). (11) The O’Connell original distribution is continuous. Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating..1016/j. we have the following equation describing the tail of the distribution of DT . Theoretical Population Biology (2010). Having obtained the distribution P (T2c = t |tree) we also compute its expected value E (T2c |tree) denoting it as T2c_avg |tree. from the simulation results concordant with ˆ limiting properties of the O’Connell model we can have the ratio E (TMRCA /T |K0 = 1). Finally. As in the case of σ 2 t 1 − exp −α T . (3) becomes ˆ TMRCA(y) = E × TMRCA T K0 = 1 davg . conditionally on the simulated tree. thus allowing explicit access to any desired feature of the model. Z0 = x) ∼ σ 2T [exp (α) − 1] . Therefore. as corrected in Kimmel and Axelrod (2002). Kimmel / Theoretical Population Biology ( ) – coalescence model correctly approximates the discrete coalescent model as long as 1 − 1/Nτ ≈ exp(−1/Nτ ). From (8). we need α . Simulation mode 2 The second simulation mode stores only the course of population size change described by the branching process. This procedure was repeated 104 times for one simulated branching process. an asymptotic property of the probability P x (Zt > 0) where P x denotes probabilities for the process started by x individuals.001 . (7) can be directly applied if the history of Nt is known from simulation. 2004) that E (ZT |ZT > 0. and therefore the actual time of coalescence. This mode is used for numerical computation of the distribution in the Wright–Fisher (7) or the coalescent (8) models. E T2c T K0 = 1 = 1 T E [(T − DT ) |K0 = 1] . satisfies the O’Connell (1995) formula P x (Zt > 0) ∼ 2α x Moreover. In the discrete Wright–Fisher model. to calculate the ˆ estimated MRCA time TMRCA(y) from the genetic variation data. (7) From this it follows (Cyran and Kimmel.ARTICLE IN PRESS 4 K. ZT = E T K0 = 1 (13) and (15) for given short-term inbreeding effective population size of females ZT . M.3. Then (O’Connell. 1] of variable r = t /T . we can simultaneously estimate TMRCA(y) and α . instead of using (7) we apply a Monte Carlo approach by generation of coalescence times from the distribution (8) conditional on Nt . In the O’Connell model. K. can be used in this model in the Eq. it is possible to trace back the genealogy of a pair of individuals and to find their MRCA. Kimmel. but to compare it to the discrete empirical distributions described below. (13) δ 1−2 where ˆ ˆ e− r α − e− α ˆ 1 − e−α 1 0 q (1−ˆ r )2 ˆ qr ˆ ˆ qr − 1 − ln qr dr ˆ qr = . These characteristics include: the histogram HT 2c . the Eq. and therefore we have the time TMRCA |tree as well as the ratios (T2c_avg /TMRCA )|tree and (TMRCA /T )|tree.2010. (12) and therefore. doi:10. 2. (14) Please cite this article in press as: Cyran.06. However. Therefore. we will still refer to this discrete distribution as the O’Connell distribution. the Eq. for times T long enough. and TMRCA /T . This histogram is the experimental approximation of the conditional coalescence distribution P (T2c = t |tree) in the full genealogy model. Cyran. Let us express the time interval [0. given that we start the population history from x individuals having descendants at T T →∞ lim P DT T 2qx r > r K0 = x (qr − 1)−x (x − 1)! − F (x − 1. CM). and x denotes the estimate of parameter x. It is important to notice that E (T2c_avg /TMRCA ) obtained in the procedure described above.A. the time of death of the last common ancestor of two individuals living at T . the distribution P (T2c_avg = t ) with the expectation E (T2c_avg ). 1 − qr ) . with the average pairwise mutation difference in a sample davg .. the distribution P (TMRCA = t ) with the expected value E (TMRCA ). as T → ∞. 028. sapiens is about 3..tpb. We assume this time Td to be about 511. Because of the asymptotic character.013/511.018. Distributions of T2c computed in the full genealogy model. W–F denotes the Wright–Fisher model. comparison of distributions of T2c in the full genealogy model with binary fission offspring number distribution and the limit O’Connell model (b). In particular. a single Neanderthal sequence was sequenced at that time). (2009) results.3 ± 2. sapiens is composed of 16. 1999. 3. As reported by Green et al. 2(a).001 . M. = In the Krings et al. the results will be presented in discrete time measured backwards in generation units. (2008). The comparison of the shapes of the coalescence distributions obtained in different stochastic models with the coalescence distribution conditional on the expected size of the branching process P (T2c = t |E (Nt )) for a Poisson offspring number distribution is given in Fig. Results For consistency of comparison. (2008) data and applying the infinitely many sites model.000 = = 2. sapiens sample 64.06. in the agreement with the O’Connell model. This suggests among other that the number of mutations in the hypervariable regions is small enough to allow ignoring reverse mutations occurring in both lineages from the time of their MRCA. and thus the average mutation difference among contemporary humans is equal to davg ∼ 0. This fact is illustrated in Fig. these distributions are independent of the variance of the progeny count.8 ± 26. the corresponding ratios differ by less than 1%.2010. the result becomes valid for an arbitrary number of Please cite this article in press as: Cyran.. We also compared the results based on the recent data to the results calculated from a sample of 663 mtDNA sequences of modern humans and their homolog sequenced from the Neanderthal fossils (Krings et al. but conditional on the model used. We also computed the genetic divergence rate δ based on results reported by Briggs et al. neanderthalensis is 3 nucleotides shorter).000 years ago. Therefore the average mutation difference between modern humans and Neanderthals is equal to davg ∼ 0. This results in the average mutation difference between modern humans and Neanderthals of davg ∼ 0. neanderthalensis and H. doi:10. which are based on analysis including the information about the dating of the Neanderthal fossils. but we do not use it because of O’Connell’s small sample of 19 humans.A.25 times greater than the average mutation difference among contemporary humans based on the hypervariable regions and based on the complete mtDNA. 1 for P (T2c = t ). Note that despite this. 2008). we calculate the rate of divergence for the complete mtDNA as δ = davg /Td(MN) ∼ 0. Therefore we can map the expectation of TMRCA available directly from the full genealogy model on a time scale of the O’Connell process. The number of average pairwise differences between species is equal to 210.ARTICLE IN PRESS K. Theoretical Population Biology (2010). We confirmed that the O’Connell limit model can approximate the models with full genealogy even if the number of generations is as small as 102 . distributions P (T2c = t | CM) and P (T2c = t | W–F) are obtained by averaging over many realizations of Nt .1016/j. 2.1. More importantly. Divergence in contemporary hu= mans results in the average number of pairwise differences equal to 10. (2009). Kimmel / Theoretical Population Biology ( ) – 5 the full genealogy models. a b Fig. the distributions are depicted for visual convenience as continuous curves. P (TMRCA = t ) and P (T2c (avg) /TMRCA = x) for the same mean Fig. the complete mitochondrial genome of H. The average mu= tation difference among modern humans calculated originally by O’Connell (1995) was equal to 0. and within H.059. P (T2c (avg) = t ).3. 2(b) for the BF distribution). and can obtain the expectation of the ratio TMRCA /T required in (3).5 × 10−8 mutations per nucleotide per year. These latter sequences were taken from the hypervariable control regions I (HVRI) and II (HVRII) of mtDNA. Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. Interestingly. The average number of pairwise differences is equal to 35.9 ± 5.. Visual inspection of Fig. and = the average mutation difference within modern human population of davg ∼ 0. the average mutation difference between H. Cyran. based on Briggs et al.568 nucleotides (the mitochondrial genome of H. The models with full genealogy yield visually indistinguishable distributions P (T2c = t ). O’Connell (1995) argued that any value between 10 and 14 is not contradicted by demographic evidence and has little effect on estimates. Specifically. K. Results of the models that traditionally use differently measured time have been scaled accordingly. the unconditional with respect to Nt . We choose α = 10. 2(b) and a comparison of expectations presented in Table 1 prove that the limiting O’Connell distribution P (T2c = t |OC) almost perfectly approximates the distributions of T2c obtained from the full genealogy model for 102 generations. (2009) and Green et al. Genetic data To obtain estimates of the time to MRCA we considered the average pairwise mutation differences davg computed from a complete Neanderthal mitochondrial genome sequence determined by high-throughput sequencing (Green et al.8. 4.013. number of progeny regardless of the offspring number distribution..A. M. (1999) sample including the two hypervariable regions the mutation rate is about 5 times larger. Distributions in the full genealogy model: general comparison (a). the distributions P (T2c = t ) obtained for any offspring distribution are also visually identical to the O’Connell limiting distribution for as few as 102 generations with O’Connell parameter α = 10 (see Fig.004.3 ± 4. Combining Briggs et al. 1. Kimmel. 2009).9017 ± 0.ARTICLE IN PRESS 6 K. Therefore. Comparison of the distributions of T2c computed in the Wright–Fisher.. and the coalescent models.9094 ± 0.003. Kimmel / Theoretical Population Biology ( ) – Table 1 Expectations of the ratio T2c /T ± SD in the O’Connell and the full genealogy models. linear fractional Coalescent. M. However.8550 a Table 3 Expectations of different ratios of the coalescence times and their standard deviations computed in the full genealogy model for various distributions of the number of progeny. Namely. which in turn depends on the variance of progeny number. for times t close to T (corresponding to the beginning of branching process) the continuous approximation assumed in coalescent theory loses validity and the distribution differs more and more from the Wright–Fisher distribution. 449 × 103 ].1662 0.9068 ± 0.0950 0. this result can be explained by the variances of binary fission and linear fractional distributions deviating from the variance of Poisson distribution in opposite directions.1016/j. As can be seen.7497 0.9032 ± 0. Please cite this article in press as: Cyran. Since the continuous coalescent model is equivalent to the diffusion process limit. In our work we have also characterized the relationship of the Wright–Fisher discrete model to the continuous coalescent model in stochastic population histories approximated by a slightly supercritical branching process.1591 0. 260 × 103 ] as reported by Krings et al. The expected values and the standard deviations of various ratios of coalescence times available directly only in the full genealogy model are collected in Table 3. (1999) shows that all stochastic model predictions based on the recent complete mtDNA data fall into the phylogenetically obtained interval.8097 ± 0. Poisson Full genealogy. Parameter E E E E Binary fission 0.9035 ± 0.1040 0. the expectations of T2c |WF and T2c |CM remain very similar (see Table 2). K.8078 0. the branching process full genealogy models. Kimmel. Comparison of the numbers in the Table 4 with the expectation based on the hypervariable regions equal to 163 × 103 years with the 95% confidence interval [111 × 103 .1645 0. M.001 .2010.1585 0. (2). binary fission Coalescent. binary fission Wright–Fisher. whereas for the upper bound of TMRCA we used the upper bound of davg and the lower bound of δ . and Poisson (b) progeny number distribution.. Model O’Connell Full genealogy with BF progeny Full genealogy with P progeny Full genealogy with LF progeny E (T2c /T ) 0.8097 ± 0. we obtain the expected value of TMRCA = 174 × 103 years with the 95% confidence interval [96 × 103 . 3(b)).8054 ± 0.0532 Linear fractional 0. we can estimate the time to the mtEve.0482 Poisson 0. linear fractional Wright–Fisher. Poisson Coalescent. i. Table 5 presents the data required to compute the expected value of TMRCA and the 95% confidence interval in the full genealogy model with the use of Eq.tpb. doi:10.1150 0.0535 (T2c /T ) (T2c_avg /T ) (TMRCA /T ) (T2c_avg /TMRCA ) Table 4 Expected values of the time to the MRCA of modern humans computed in the O’Connell model.8005 0. respectively.1645 0. to compute the lower bound of TMRCA we used the lower bound of davg and the upper bound of δ and T2c /TMRCA .5 × 10−8 and davg = 0.7585 0. To compute the lower bound of δ we used the lower bound of the average mutation difference between modern humans and Neanderthals dHN and the date of the MRCA of the two species of 389 × 103 years ago. for different population histories. the coalescent and the O’Connell models assuming the binary fission (a). we may indirectly combine the results with the limiting O’Connell properties and obtain E (TMRCA /T ) for the number of generations of the order 104 corresponding to the time elapsed from the death of the mtEve.9027 ± 0.. 3(a) for the BF offspring number distribution).8008 ± 0.1662 Table 2 Comparison of the expectations of T2c /T computed in the Wright–Fisher and the coalescent models for different progeny number distributions. are given in Table 4.1585 0.1124 0. Progeny number distribution BF P LF E (T2c /T |WF) 0. Using the O’Connell model with the TMRCA moment computed based on the full genealogy model. while for the upper bound of δ we used the upper bound of dHN and the date of the MRCA of modern humans and Neanderthals 641 × 103 years ago (Briggs et al. and T2c /TMRCA . linear fractional ˆ TMRCA(y) (years × 10−3 ) 176 174 174 174 189 178 168 187 175 165 Fig. even if it is not possible to directly estimate E (TMRCA /T ) in a full genealogy model..8097 ± 0.06.8002 ± 0. The confidence interval is computed in a conservative way.8002 ± 0.8008 ± 0. generations exceeding 102 . Poisson Wright–Fisher.8454 E (T2c /T |CM) 0.A. b Model O’Connell limit Full genealogy.8009 ± 0. although particular coalescence time distributions vary among the models considered. This is finally reflected in the atom of probability at t = T required for probabilities to sum to one (see Fig. binary fission Full genealogy. despite this visually striking feature. 3. Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating. There is one more interesting fact which can be observed.1011 0.8001 ± 0. assuming δ = 2. the Wright–Fisher models.A. Theoretical Population Biology (2010).e.1057 0. Cyran. Both models considered deviate from the O’Connell model for offspring distributions other than Poisson (see Fig. The estimates of this time. sapiens – H.. 2008.1016/j. Therefore. Since the application of different methods to the same genetic data gave results differing by almost one order of magnitude.. Such small differences are likely to be negligible compared to the large range of confidence intervals obtained not only in pairwise difference-based methods considered in the paper. This validates both the Wright–Fisher and the coalescent models with stochastic population sizes also for reproduction schemes not following assumptions of these models. we proved that a branching process evolving for as little as 102 generations is approximated by the O’Connell model with an accuracy of less than 2%. 2005) disappeared as a result of the genetic drift. Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating.000 years ago (Hasegawa and Horai.6%. 2006.. any estimates based on this data were not very reliable.. Green et al. but also in the phylogenetic studies (Krings et al. 2009) that Neanderthals did not contribute mtDNA to the lineages of presently living modern humans.0131 641 × 103 3.0055 0. we conclude that the stochastic models based on branching processes provide similar estimates to those obtained using phylogenetic analysis. 2000) confirmed the accuracy of the first experiment and radically changed the estimates of the time to the most recent common female ancestor of modern humans.2010.. Theoretical Population Biology (2010). 1999). estimation of the divergence rate in pre-modern and modern humans could rely only on human–chimpanzee divergence data. (2008) and Briggs et al. due to the relatively long time to this divergence. 2006. the estimated divergence rate and time to the MRCA of modern humans could not be accurate. the mtDNA from Neanderthals dated to live until about 40.. 2008. 2004.9 174 × 103 Upper confidence bound 0. Briggs et al. 2000.04 × 10−8 0. Parameter davg dHN TMRCA HN(y) T2c /TMRCA TMRCA(y) Lower confidence bound 0. The next successful sequencings of Neanderthal mtDNA in 1999 (Krings et al. with expected values ranging from 200. as reviewed in Wooding and Rogers (2002). 2009). It should be emphasized that the genetic data did not necessarily contradict the multiregional theory.A. 1991). using the branching process model that the genetic diversity of modern humans was consistent with estimates of the mtEve existing between 700 thousand and 1. In particular. 2002) was sequenced. In particular.. it remains true that the root of mtDNA of all currently living humans should be placed after that of humans and Neanderthals. hence. Green et al. even if it is infeasible to apply a full genealogy model in this case.5 × 10−8 0.06. Methods used were based on phylogenetic trees constructed either by maximum likelihood or maximum parsimony and rooted using the chimpanzee as an outgroup. 1998).. and the asymptotic O’Connell prediction differs from the actual value computed in the full genealogy model by only 1. Pennisi. while there were still models supporting it. Since it seems from the genetic data (Krings et al. 1992. the multiregional hypothesis could not be rejected solely because it was in contradiction with the majority of genetic models. For all stochastic trajectories we analyzed..0127 511 × 103 2.. These researchers claimed that the time to the MRCA should be placed about a million years ago or even earlier. 2000. K. Discussion Until the last decade. The majority of these estimates were in agreement with the out-of-Africa scenario and in contradiction to the multiregional theory of origin of modern humans. we were able to obtain the estimate of the expected value of the ratio of the coalescence times of two individuals and that of all individuals in the population for generation 104 .tpb. it also remains true for the evolution of branching processes with an arbitrarily large number of generations. treating the Neanderthals as a mtDNA outgroup is justified. as reported by Krings et al. (2009).0122 389 × 103 2. as shown by O’Connell (1995). Please cite this article in press as: Cyran.. our results indicate that the estimates of the time to coalescence in the Wright–Fisher and the coalescent models with random population size are quite robust in terms of their insensitivity to the model assumptions. Having this result. this provides support to the results about inferring population trajectory from the genetic diversity data. Finally. and due to the asymptotic character of the O’Connell results. Moreover.. 2004) or 15% (Cyran and Kimmel. for the first time. The greatest uncertainty of the mathematical expectations is caused by the scaling factors such as the divergence rate between species (the rate of the molecular clock) and not by deviations from the particular assumption of the method used. Even if later studies (Serré et al..5 million years ago. Hence. Cyran. Green et al. 2005) indicated that interbreeding between two human forms could not be excluded. 2010).0039 0. Kimmel. supported by some paleontologists (Thorne and Wolpoff. Additionally. even if the results of the Neanderthal Genome Project suggest possible interbreeding between the Neanderthals and the archaic Europeans yielding about 3% admixture of the nuclear DNA (Plagnol and Wall. 1999. Moreover. These estimates depended on an inaccurate inference of the human–chimpanzee divergence time and on the methods of the inference. the result depends mainly (in fact linearly) on the assumed time to the MRCA of Neanderthals and modern humans rather than the method that was applied.001 . each supporting the other. we applied our approach to estimate the age of the root of the mtDNA polymorphism of modern humans based on the genetic material belonging to contemporary humans and Neanderthal fossils. They deviate by less than 8% (see Table 4) from the O’Connell model predictions. Green et al. (1999). the result holds for three offspring number distributions. M.. Consequently. A slight admixture of at most 25% (Serré et al. the resulting time falls into a 95% confidence interval of the estimate based on phylogenetic trees (Krings et al. 1992).. 1999. The situation changed after 1997 (Krings et al... the time of the mtEve should be placed after the H..ARTICLE IN PRESS K. 1991) to 300.. However. O’Connell (1995) also indicated the weak points of the outgroup methods when the genetic distance between the outgroup and the sample was large. doi:10. Krings et al. when. Cyran and Kimmel. but also the logistic growth of human population proved to be not inconsistent with the mtDNA variation data (Polanski et al. all estimates of this time were very inaccurate.003 0. He inferred.000 years ago (Wilson and Cann.14 × 10−8 1 449 × 103 ) – 7 δ 5. results which implicitly relied on the Wright–Fisher model assumption but which remain valid for a much larger spectrum of possible demographies. ranging from 4 to 9 million years. 2010). The simplified exponential models were often used. In the paper we compared the distributions of the time to coalescence of a pair of chromosomes obtained by conceptually different methods. Ovchinnikov et al. many possible patterns of human population growth were assumed. Therefore. M. To validate his conclusions. and moreover that there is an evidence of a smallscale gene flow (Green et al. However. 1997). neanderthalensis divergence. Vigilant et al. 1999.A. fewer than 400 base pairs were sequenced. Briggs et al.. Kimmel / Theoretical Population Biology ( Table 5 Expected value and the 95% confidence interval of the estimates.000 years ago (Schmitz et al.6 96 × 103 Expected value 0. K. 353–357. Targeted investigation of the Neanderthal genome by arraybased sequence capture. K.. Analysis of one million base pairs of Neanderthal DNA. Fu. Lalueza-Fox. The recent African genesis of humans.. J. V... Nagylaki. A. Acad. Wilson. 2007. Laakkonen. Krause. 2003.. Ž. Cyran. Horai. Targeted retrieval and analysis of five Neanderthal mtDNA genomes... J..S. Pääbo. Schmitz.. V. Science 328... Fritz. Inform. E.. 452–462.. Jenkins.... Kircher. 2006.. 5456–5461.. J. Weihmann.. M. A. D. Evol. Vigilant... Du... Branching Processes in Biology. King. M. Falush. 313–317. Oefner. 2000. L. A. M. M.. Popul.M. Johnson. R. Schmitz.... Possnert. P.. S.. D.. G. 2010...H. M. I.. 144–146.. Pääbo. S. 1999. Green.. Kimmel.E. Rothberg. 2006. A.. Theor.. Models and approximations for random genetic drift. U. 2003. 2002. de la Rasilla. M... Durand.... Int. J. J.001 . R. Lambert. A draft sequence of the Neanderthal genome. Possible ancestral structure in human populations. S. Ph. Kimmel. F. L. M. Kimmel.T. E.. Genet. V.. Evol.V. 77–98. Poland... E. D. Sci. J..tpb.. Krainitzki. 318–321.. Laan.. Evol. V.. Serré. Klebaner. A complete Neanderthal mitochondrial genome sequence determined by high-throughput sequencing. 416–426.. S. The age of a Galton–Watson population with a geometric offspring distribution.... Krings.. Maricic. Gušic.W. Nusbaum... Chakraborty. Malaspinas. Hannon. Probab. Biol.. 1859–1868. T. 1997.. R.... P. 192–212. M. Myszor. Kucan. Asymptotic behavior of joint distributions of characteristics of a pair of randomly chosen individuals in discrete-time Fisher–Wright models with mutations and drift.. Rudan.E. M. Eng. M. Patthy. A... Malaspinas. 490–493. Jobling. Pääbo.. J. 72. 5581–5585. Maricic. L. A... Wooding. Ramsay. J. A.. Science 328. Ph. P. Acad. A. 2000. Golovanova. Acad. Simul. Please cite this article in press as: Cyran. Appl. 127. J. J.A.1016/j. Genet. Cell 90.. Ovchinnikov. A. 1992.. von Haeseler. 32. Briggs. M. Y. 37. Affourtit. A. Chech. L. Lidén. Wooding.. Platt... M. Rubin. Slatkin. Watkins.. M. 2008. K. Golovanova.. Bobrowski. M. Theor. Biosci... J. S. H. M. 1992.. Sambuughin. S. Mennecier.A... Science 314. 18. Application of time-dependent coalescence process for inferring the history of population size changes from DNA sequence data. A. C..W. C.A. M.. 2007. 710–721. S. Paunovic..... Popul. Mullikin. Good... I. E. Langaney. Pääbo. K. Hofreiter. No evidence of Neanderthal mtDNA contribution to early modern humans.A. N N519 579938.. L.. Time of the deepest root for polymorphism in human mitochondrial DNA. Assistance and collegiality of Dr. A. Biosci.. J. A. 2.. Marques-Bonet. Jensen... J. A. Simons. J.. In: Proc. O’Connell. Meyer. Thorne. Fortea. Proc. M. 19–30. 1995. Yu. Shen. D....F. 2000.W. Egholm. Pääbo.... D. Theoretical Population Biology (2010)... Höber.. Nat. M.. Natl.. Neanderthal DNA sequences and the origin of modern humans. Biol. Rooks. Smith. C. Cell 134. Scientific American 266.ARTICLE IN PRESS 8 K. MK was supported by a CPRIT grant number RP101089 and by a grant from the Polish Ministry of Science and Higher Education No.. Sci. Scientific American 266. Nielsen. X -chromosome as a marker for population history: linkage disequilibrium and haplotype study in Euroasians populations. Johnson. R. Wiebe. R. Gusic. Sequencing and analysis of Neanderthal genomic DNA. L. 2008. Nature 404. Harpending. Kelso. Rosas. Robustness of the dating of the most recent common female ancestor of modern humans. Swiety Krzyż.C.D. G. C. 13.. J.W. 39.. Coop. 2009. P.. Bhattacharjee. Novod. Uhler. Kudaravalli. M. Cann. Griffiths. J. Stenzel... Test for a population expansion after a drastic reduction in population size using DNA sequence data. R. Briggs. 17... Natl. Appl..M.. 1.E.A. Götherström. Höffner. M.2010.. Paunovic.... A.. J.. S. D... Z. Hum.E.B.... Pääbo. New research at the Neanderthal type site in the Neander Valley of Germany.. ¸ pp. Stoneking. 2002.. R.. Rogers. Z..Y.F. Kimmel / Theoretical Population Biology ( ) – Acknowledgments The authors would like to thank the anonymous reviewers for their suggestions and comments..... Eur. M. 2001..E. Probab. J... R. Brizuela. we’re Neanderthals. Int. King. J..J. S. Recent common ancestry of human Y chromosomes: evidence from DNA sequence data. Coalescence times for the branching process. Sagitov.. R. J. Ph.. F. M. A..L. Gui. Heredity 91.. H.. Schmitz.. Briggs.L. Genetics 161. M. M.. Feldman. M. A. M..F. 723–725. Rosas. H. Hum.M. Burbano. N. 68–73.. R. Kuromori. Mol.. Remm. A view of Neanderthal genetic diversity. S.. Appl. J. 487–498.F. The dawn of the stone age genomics. M. Tavaré. Pritchard. N. Green. Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. 35.A..C.. Zhai. Adv. Krings. Aximu-Petri.. Interactions of Neanderthals and modern humans: what can be inferred from mitochondrial DNA? Math. R. PLOS Biol. M. Capelli. Bamshad. Prüfer. Chakraborty. Albert. S. Molecular analysis of Neanderthal DNA from the northern Caucasus. A. Tenth National Conference on Application of Mathematics in Biology and Medicine. Pruefer. D. 2. M. A. K. Genetics 148. M...M. Kimmel. Evol. 972–979. Kucan. 268–274. J. Natl. 2... M.E.. Teschler-Nicola..M. H. Jorde. Thompson. G. M.. J. Signatures of population expansion in microsatellite repeat data. NN519 31 9035 from funds for supporting science in 2008–2010. Eichler. K. I. Proc.W. F. 2006. Rothberg.. Rudan. Wall. J. de la Rasilla. A pleistocence population X -plosion? Hum. R.. USA 97. Bonani. R. S.. T.. Alternatives to the Wright–Fisher model: The robustness of mitochondrial Eve dating.A. M. Schultz. M.. Plagnol. USA 96.. A32. Stenzel. 1921–1930. D. N. A power analysis of microstallite-based statistics for inferring past population growth. 2010. 2002. S. A. Kimmel. S. W. Hansen... Nature 444. L.. Math. B. Egholm.. Khusnutdinova. H. Hawkes. Pääbo. Fortea. 37–42. P. de la Rasilla.. Z. Pennisi.D. Briggs.. In the name of the father: surnames and genetics. 2006.. Li. A. M.A. Green. Krause. Z. 816–828... Egholm. Brajkovi. Butthof. Siegemund. References Bjorklund. Jan Hewitt of Rice University in the editing of the final version of the manuscript is hereby gratefully acknowledged.E. S. Polanski. A. S... 66. P. J. Kimmel.M... Possnert. A. Smith. J.. Fortea. Krainitzki. 1503–1507. Proc. B. Good. Romanova..H.. Johnson. V. Pääbo.V... 2005. M. J. 2004.C. 1068–1071. J. K. 1990. E. New artificial neural network based test for the detection of past population expansion using microsatellite loci. H. N. Li. Wilson.. Axelrod. H. A. Good. K. J. African populations and the evolution of human mitochondrial DNA... T. Alkan. Pennisi. G. Doronichev. 1071–1098.. The multiregional evolution of humans..L. Lachmann. Noonan.. Pääbo. Biol. Green..C.F... M. Pääbo. G. Paunovic. Science 253. Trends Genet.. 2001.. 1991..-Y. 7360–7365..A. Meyer. 1–9.. E. 2. F. M. M. F. The genealogy of branching processes and the age of our most recent common ancestor. Grossschmidt. 76–83. T. K. KC was supported by a Grant from the Polish Ministry of Science and Higher Education No. 967. Krings. 27. Birney. Geisert. J. J. J. Wikstrom. T. E...P. Burbano. 17. DNA sequence of the mitochondrial hypervariable region II from the Neanderthal type specimen. doi:10.. 2005.W. 2004. 2004. Cyran.. Brajkovic... N. S.. J. Kharitonov. M. Xuan. M. PLoS Genet. L. K.-S. Briggs. Mol. Mol. W. 1113–1118. C. Doronichev.. Goodwin. D. R. A. Cyran.. Kimmel. Cyran. Reich.. Science 314. 355–367. M. Sci... 1998.. S. No sex please.. 2000.. Pääbo.. H.K. C. M. Burbano. M. Russ. The matrix coalescence and an application to human single-nucleotide polymorphisms..P.. Science 318. 481–486.. Tschentscher. Green... Stone. Lalueza-Fox. Probab. Unrooted genealogical tree probabilities in the infinitely-many-sites model.. D. Ch. Krause.. 693–695.W.. Good.. Schmitz.. Ronan.. R.M. S. Zhao.. M. Ronan. Cyran.. T. Kucan. Geisert. M. Brajkovic. Kimmel. 2000.C. Wolpoff.. Krause.. Leskinen.. Lachmann. Slatkin. Meyer.. 19–24. Meyer. M. Springer-Verlag. Rosas. 1995. Patterson.. I. Pritchard. U. Simulating branching processed in the problem of Mitochondrial Eve dating based on coalescent distributions. W. 1991. Lander. 1641–1650.. New York.. 330–336. Maricic. L. J. Hodges.. USA 95.. Kelso. G. Schmitz. Rogers. Chakraborty. T. Maricic. Adv..... R.. Rudan. E. Krause. Krause.. Biol.L..-S. 42. Alessi..H. Biol..06. E. Stoneking. R... Stenzel. T. Jorde. Comput.B. Science 325. Math. Chen. Math. H. N. E. A. Ch. 418–442. W.. Appl. 26... Verna.E. 1998. Siebauer. R.. U. 214–222. M.. 2002. Ptak. M. Hasegawa. M.
Copyright © 2024 DOKUMEN.SITE Inc.