MMBR Figure table search 04
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Rouzine, I. M.
Right arrow Articles by Coffin, J. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rouzine, I. M.
Right arrow Articles by Coffin, J. M.

 Previous Article

Microbiology and Molecular Biology Reviews, March 2001, p. 151-185, Vol. 65, No. 1
1092-2172/01/$04.00+0   DOI: 10.1128/MMBR.65.1.151-185.2001
Copyright © 2001, American Society for Microbiology. All rights reserved.

Transition between Stochastic Evolution and Deterministic Evolution in the Presence of Selection: General Theory and Application to Virology

I. M. Rouzine,1,* A. Rodrigo,2 and J. M. Coffin1

Department of Molecular Biology and Microbiology, Tufts University, Boston, Massachusetts 02111,1 and School of Biological Sciences, University of Auckland, Auckland, New Zealand2

SUMMARY
INTRODUCTION
QUALITATIVE DISCUSSION AND COMPUTER SIMULATIONS
    Description of the Model and the Evolution Equation
        Virus population model.
        Stochastic equation of evolution.
        Boundary conditions: properties of almost monomorphic populations.
    Experiments on Evolution and Observable Parameters
    Steady State
        Neutral case: s µ.
        Case with selection: µ  s 1.
    Deterministic Dynamics and Its Boundaries
        Deterministic dynamics.
        Boundaries of deterministic approximation.
    Stochastic Dynamics: the Drift Regime
        Decay of the polymorphic state and gene fixation.
        Transition from a monomorphic to a steady state.
        Divergence of populations which have been separated and the time correlation function.
    Stochastic Dynamics: the Selection-Drift Regime
        Accumulation.
        Divergence of separated populations and the time correlation function.
        Reversion (fixation of an advantageous variant).
    Sampling Effects
    Experimental Applications
        Virological studies in vitro.
        HIV populations in vivo.
        General applications.
    Many Loci and Other Aspects
    Conclusions
MATHEMATICAL RESULTS AND DERIVATIONS
    Description of the Model and the Evolution Equation
        Main results.
        Virus population model.
        Stochastic equation of evolution.
        (i) Discrete Markovian equation.
        (ii) Diffusion equation limit.
        Boundary conditions: properties of an almost monomorphic population.
    Experiments on Evolution and Observable Parameters
    Steady State
        General case.
        Neutral case: s µ.
        Case with selection: µ  s 1.
    Deterministic Dynamics and Its Boundaries
        Main results and discussion.
        Deterministic dynamics.
        Boundaries of deterministic approximation.
    Stochastic Dynamics: the Drift Regime
        Main results and discussion.
        Decay of the polymorphic state and gene fixation.
        (i) Decay of strong polymorphism.
        (ii) Gene fixation.
        Transition from a monomorphic to a steady state.
        Divergence of separated populations and the time correlation function.
    Stochastic Dynamics: the Selection-Drift Regime
        Main results and discussion.
        Accumulation.
        Divergence of separated populations and the time correlation function.
        Reversion (fixation of advantageous variant).
    Sampling Effects
        Main results.
        Derivations.
ACKNOWLEDGMENTS
REFERENCES


SUMMARY
Top
Next
References

We present here a self-contained analytic review of the role of stochastic factors acting on a virus population. We develop a simple one-locus, two-allele model of a haploid population of constant size including the factors of random drift, purifying selection, and random mutation. We consider different virological experiments: accumulation and reversion of deleterious mutations, competition between mutant and wild-type viruses, gene fixation, mutation frequencies at the steady state, divergence of two populations split from one population, and genetic turnover within a single population. In the first part of the review, we present all principal results in qualitative terms and illustrate them with examples obtained by computer simulation. In the second part, we derive the results formally from a diffusion equation of the Wright-Fisher type and boundary conditions, all derived from the first principles for the virus population model. We show that the leading factors and observable behavior of evolution differ significantly in three broad intervals of population size, N. The "neutral limit" is reached when N is smaller than the inverse selection coefficient. When N is larger than the inverse mutation rate per base, selection dominates and evolution is "almost" deterministic. If the selection coefficient is much larger than the mutation rate, there exists a broad interval of population sizes, in which weakly diverse populations are almost neutral while highly diverse populations are controlled by selection pressure. We discuss in detail the application of our results to human immunodeficiency virus population in vivo, sampling effects, and limitations of the model.


INTRODUCTION
Top
Previous
Next
References

The process of evolution is a consequence of the interplay of mutation, selection, and chance on a population of organisms, leading to an observable change in its genetic makeup. Since the time of Darwin, the influence of these factors on the evolution of organisms ranging from bacteria to humans has been intensively studied, both experimentally and theoretically, leading to a very large body of literature. Only recently, however, has attention been turned toward special problems in the evolution of viruses. Virus evolution is of particular interest and importance for three reasons. First, we desire to gain an understanding (usually in the absence of a fossil record) of how modern viruses have arisen from their earlier forms, both in recent times and in parallel with the evolution of their hosts. Second, the evolution of a virus during the course of infection of a single host, or along a short transmission chain, is of great importance in creating new populations with properties altered in important ways, such as evasion of the immune response, resistance to antiviral therapy, or altered virulence. Third, because of their high replication rates, simple genomes, large population sizes, and high mutation rates, viruses make good models for studying and testing evolutionary theory.

Particular attention has focussed on understanding the evolutionary forces that act on human immunodeficiency virus (HIV) during the course of infection of a single human host. HIV displays a remarkable extent of genetic variation concurrent with a high speed of evolution: in the most variable region of the genome (env), individual genomes within a population from an infected person can vary by as much as 3 to 5% (2, 43, 78); substitutions in env accumulate at a rate of approximately 1% per year (71), 50 million times faster than in the small subunit of rRNA (61). This variation has important consequences. It allows the virus to evolve to infect different cell types (9, 20, 30) and to rapidly become resistant to otherwise highly effective antiviral drugs (10, 47, 50); it may play a role in evading the immune system (4, 56, 73, 79). Furthermore, its high mutation rate (estimated to average about 3 × 10-5 per nucleotide site per replication cycle [49]), large population size (variously estimated from about 107 to 108 productively infected cells), and continuous steady state, in which the large majority of virions and productively infected cells turns over every day (25, 77), create a situation which, at least in principle, is amenable to (and requires) mathematical modeling.

To date, a number of modeling approaches have been applied to understand the evolution of HIV in vivo. These approaches use either population genetic (mutation frequency distribution) or phylogenetic inference using virus sequences obtained from HIV-infected individuals. In general, they are based on one of two different theoretical frameworks to the evolution problem. Deterministic approaches, including quasispecies theory (15, 26), assume that the population size is very large, such that the frequency of a given mutation at any given time is completely predictable if one knows the initial frequency, the mutation rate, and the selection coefficient (i.e., the differential growth rate conferred by the different alleles). At first glance, such approaches would seem justified by the large number of infected cells at each generation (21); however, a number of factors, such as variation in the replication potential and generation times among infected cells, may lead to an effective population size much smaller than the actual number of infected cells. Stochastic models, as applied to HIV (to this point), proceed from the opposite assumption: that the effective population size is so small (or that selective forces are so weak) that random drift dominates over selection. The hypothesis of selectively neutral mutations has a long, successful history in describing the evolution of organisms where populations are small (and not uniformly distributed) and mutation rates are very low (36). Their applicability to virus populations remains to be established. Many of the assumptions that underlie neutral theory are not appropriate for virus populations, and a number of characteristics of HIV genetic variation in vivo, such as the uneven ratio of synonymous to nonsynonymous changes in different regions of the genome (5, 44, 48), argue against simple application of neutral theory. However, inclusion of selection effects in evolutionary analysis (for example, the coalescent method) presents a mathematical challenge that has not yet been fully solved in a practical fashion, although progress toward this goal has been made recently (42, 55).

As an example of the difference between deterministic and stochastic models, consider the question of the frequency in a population of a mutation that is slightly deleterious to virus replication. In a deterministic system, it can be easily calculated that the frequency of such a mutation in the population will come to equilibrium at a point equal to the mutation rate divided by the selection coefficient (24). In a stochastic system, the population will usually be completely uniform in one variant or the other (76), switching rarely but rapidly from one form to the other. This theoretical experiment is of great practical importance in that it describes the appearance of a mutation that can confer resistance to an antiviral drug even before treatment.

To solve this problem and many others, it is clear that a more general theoretical framework is needed: one that takes into account both selection and drift under a set of assumptions more appropriate to viruses than is found in theoretical works published to date. Our aim in this work was to develop, from first principles, a general theory that includes the effects of both selection and drift on a population. We use a set of assumptions appropriate to virus populations, focusing on the interplay between deterministic and stochastic behavior in the context of virologically realistic experiments. We apply these to the simplest possible model: mutation at a single site with only two alleles, replicating in a steady-state system (that is, a constant number of infected cells) under the influence of constant selective pressure in a single isolated population. Because we are dealing with a single locus, we do not consider recombination explicitly; because we are dealing with haploid populations, we do not have to consider allelic dominance. It should be noted that although we do not consider recombination explicitly, the presence of strong recombination must be, in fact, implied for the one-locus approximation to be quantitatively correct. Also, nonconserved loci must be spaced sufficiently far apart in the genome, depending on the recombination rate. Even in the absence of recombination, the one-locus approximation is a useful starting point for understanding interactions between selection and stochastic factors at a qualitative level. We present a complete model that considers the full range of possible values for population size, mutation rate, and selection effects. Despite its simplicity, the model is surprisingly rich in its descriptive power. At the extremes, the results of this model correspond to the standard results of deterministic or neutral theory; however, we have found that there is a large range of values for the key parameters in which the system behaves in an intermediate fashion: under some conditions its evolution is dominated by stochastic factors, whereas at other times it behaves in a nearly deterministic fashion. We refer to this range of parameter values as the "selection-drift" regime and describe its properties in detail.

This work is divided into two major parts. In the first, we present all the principal results in qualitative terms, using language appropriate for a reader trained in biology and with a moderate level of mathematical sophistication. This part is accompanied by a number of illustrative examples obtained by computer simulation. Although keyed to the mathematical formalism of the second part, it is designed to be read independently and to provide the reader with an understanding of the principal results and their biological significance, particularly in the context of virus populations. The second part is a formal mathematical derivation of the principal results of the model. These results are listed at the beginning of each section and derived in the following subsections. Although some of the derivation presented is not novel, in that it parallels classic work of a number of population biologists (18, 19, 23, 24, 31, 37, 81, 82), its formal application specifically to virus systems is, to the best of our knowledge, a new approach, and we present it in full for this reason, as well as to provide a thorough and self-contained review. Although some of our mathematical methods differ from the classic methods, the final results are identical.

The presentation in both parts of this work proceeds in parallel. We first develop the basic evolution equation, which describes, at least in a statistical sense, the change in frequency of a mutant allele as a function of time and the key parameters: mutation rate, selection coefficient, and population size. We then present the predicted results, for all three regimes, of a set of virological experiments: accumulation and reversion of deleterious mutations, competition between mutant and wild-type viruses, gene fixation, mutation frequencies at the steady state, divergence of two populations split from one population, and genetic turnover within a single population. Next, we discuss sampling statistics and the application of this theory to some specific real-world experimental issues of virus and organismal evolution. Finally, we discuss the application and extension of this theoretical framework to other problems, including multilocus evolution and phylogenetic analysis.


QUALITATIVE DISCUSSION AND COMPUTER SIMULATIONS
Top
Previous
Next
References

Description of the Model and the Evolution Equation

In this section, we introduce the population model and explain how to approach the problem of evolution when random factors enter the picture. First we describe a one-locus, two-allele population model based on the virus replication cycle and discuss briefly the main factors of evolution included in the model. This is followed by a discussion of the biological meaning of the evolution equation. Finally, the boundary conditions for the evolution equation describing the properties of a weakly polymorphic population are described.

Virus population model. First, we choose a basic model of virus evolution. For the purposes of simplicity, we consider the evolution of one nucleotide position at a time, and we assume that each nucleotide has a choice between only two alleles. (Such a model applies directly to multiple loci if the evolving loci are sufficiently distant and the recombination rate is sufficiently high. Evolution at closely situated loci or in the absence of efficient recombination is not independent [see "Many loci and other aspects" below].) Conventionally, we denote the better-fit allele as wild type and the less-fit allele as mutant. A deleterious mutation event (from wild type to mutant) will be referred to as forward mutation, and an advantageous mutation event will be referred to as reverse mutation. Each separate nucleotide will be characterized by two parameters, both of which are assumed to be much less than unity: the mutation cost (or selection coefficient), s, which is the relative difference in fitness between the two alleles, and the mutation rate per base per replication cycle, µ. We assume that mutations at different nucleotides have a weak additive effect on virus fitness. In doing so, we neglect epistasis (coselection) arising due to biological interaction between nucleotides at both the nucleotide and protein levels. We also ignore linkage disequilibrium between loci due to random drift, so that different nucleotides evolve independently (see the Introduction). The mutation rate is set, in our work, to be the same in the forward and reverse directions. For example, for HIV in infected cells the mutation rate per base is in the range of 5 × 10-6 to 5 × 10-5, depending on the type of substitution (49, 68). The selection coefficient will vary over a wide range according to the specific base and to the specific conditions of replication, but it is assumed to be constant over the period of observation; in other words, there is no selection for diversity.

The basic model of virus replication is illustrated in Fig. 1. Consider the dynamics of a cell population infected by two genetic variants of a virus: a fraction (f) of cells is infected by the mutant virus, and the remaining cells (1 - f) are infected by the wild type. The number of mutant-infected cells may change with time, i.e., with each new generation of cells. The total cell count is assumed to be constant. During a generation step, each cell produces a fixed (large) number of virions and then dies and is replaced by an uninfected cell. The number of virions produced and capable of infecting new cells differs, by a factor of 1 - s, between cells infected with different variants, creating selection for the better-fit (more prolific) variant. Since the total number of infected cells is fixed and the number of virions produced per cell is large, only a small fraction of the virions infect the next generation of cells. On infecting a cell, each virion has a small chance of mutating into the opposite genetic variant, given by the mutation rate introduced above. All the virions produced by a cell afterwards represent the same genetic variant. Thus, intracellular interference between variants does not occur. (This lack of intracellular competition is a reasonable assumption for retroviruses or when the proportion of infected cells in a tissue is much lower than 100%. It may vary in other virus models, when the multiplicity of infection is high.)


View larger version (17K):
[in this window]
[in a new window]
 
FIG. 1.   (a) Drift of genetic composition due to random sampling of infecting virions. Circles denote infected cells, and small diamonds show free virus particles. Black and white denote virus genetic variants. (b) Full virus population model including random drift, selection, and mutation. Two consecutive generations of infected cells are shown. Lines radiating from a cell denote virions, some of which, as shown by arrows, infect new cells. Mutant cells yield fewer progeny per cell. A small fraction of infecting virions, m1 and m2, mutate to the other variant.

Some details of the model, such as fixed burst sizes and the point of the replication cycle at which mutation occurs, are of no consequence when long timescales are considered. Overlap in time between generations of infected cells was neglected but causes a factor of 2 increase in the rate of random drift (52). By contrast, such assumptions as two variants per base and the absence of both coselection and selection for diversity are essential. The model includes a minimal set of three factors of genetic evolution: random drift due to sampling of genomes, mutation, and selection. Let us characterize briefly the effect of each of these factors on the composition of the population as it changes with time.

The model assumes that the virions infecting each new generation of cells are chosen randomly from the virions produced by the mutant and wild-type subpopulations. As a result of this random sampling of genomes, the mutant frequency experiences random drift in time (18, 80), as shown in Fig. 1a. In the absence of mutation and selection, any population composed originally of a mixture of alleles eventually becomes uniform in either genotype (i.e., the allele is fixed), with the probabilities depending on the initial composition.

Selection enters our model through the difference in the number of infectious progeny produced by cells infected with different genetic variants. Selection alone drives the system into a state consisting entirely of the better-fit variant.

Mutations, in contrast to random drift and selection, favor inhomogeneity. If the other two factors are absent, mutations push the system toward the equilibrium composition at which the total numbers of forward and reverse mutations per generation are in balance. For equal forward and reverse mutation rates assumed here, equilibrium occurs at 50% of each allele.

If all three factors are at work and there are no external perturbations, the population will eventually reach a dynamic steady state in which mutation, on average, is in balance with selection and/or random drift. In the steady state, the statistical properties of the population no longer vary with time; i.e., even though the genetic composition may fluctuate strongly with time, all the mean values, standard deviations, etc., remain constant. The whole model with the three factors of evolution is illustrated in Fig. 1b.

Stochastic equation of evolution. Different meanings can be assigned to the word "evolution." For the task at hand, evolution of the population is characterized by the dependence of the frequency of cells infected with mutant virus on time. In deterministic dynamics, which applies only in very large populations of infected cells, if one knows the initial mutant frequency and has the appropriate equations, one can, in principle, predict the mutant frequency at later times with arbitrary precision. (In practise, the equations are never known exactly, since there are many different factors in play, but this is a separate issue [68].) By contrast, in the presence of random factors, the time dependence of the mutant fraction cannot be predicted even in principle. Even if one knows its precise initial value, the error with which one can predict its value later grows with time. If random factors are strong, the error in the mutant frequency and its value become eventually comparable. Evolution of the mutant frequency, in other words, is a random process.

Randomness of mutations does not mean, however, that the evolution of a population is totally arbitrary. On the contrary, useful predictions can be made about its statistical properties even if its specific state cannot be predicted. Instead of time dependence of the mutant frequency, one has to consider the time-dependent probability density [rho (f)], defined as the chance that a given population has a mutant frequency near a particular value. The probability density, which can be introduced if both subpopulations (mutant and wild type) are large, is closely related to a histogram derived by plotting the number of times the mutant frequency of a population is observed to lie within a certain range of values. When both the number of similar experiments and the number of histogram bars are very large, the histogram becomes, in the limit, a smooth function, which is the probability density. (The histogram and the probability density differ by a constant factor: the total area under the probability-density curve [integral] is, by definition, the total probability of having any value of the mutant frequency and is, of course, equal to 1.) The density function contains information about the most relevant statistical parameters (average values and standard deviations) which can be compared with experiment (see "Experiments on evolution and observable parameters" below). In particular, the characteristic width of the probability density peak indicates the error within which the mutant frequency can be predicted.

The stochastic evolution equation (equations 1 and 2) (Fig. 2a) expresses the rate of change in the probability density with time in terms of its form at the present moment. Using such an equation and knowing the initial probability density, one can predict its form, in principle, at any time in the future, similarly to how one would predict the mutant frequency itself for a deterministic process. The difference between the two cases is that the time-dependent variable is now a function rather than a number. We derive the evolution equation directly for the population model introduced in the previous subsection, in the beginning of mathematical part of our work (see "Mathematical results and derivations" below). The rest of the mathematical part is devoted to solving the equation for different important cases. Here we only show how the equation looks when the probability density is localized in a small region near some value of the mutant frequency and comment on its meaning from a more qualitative perspective.


View larger version (25K):
[in this window]
[in a new window]
 
FIG. 2.   Illustration of the stochastic evolution equation. (a) The equation shown is derived from equations 1 and 2 in the particular case where the probability density, rho (f,t), is a narrow peak. Its right-hand side is a composite of three terms describing the effects of random drift, selection, and mutation on the change in the probability density with time. (b to d) The lower panels show the local changes in rho (f,t), corresponding to each part of the right-hand side of equation in panel a, and the upper panels show the resulting effects: spread (b) and shift (c and d) of the peak. Solid and dashed lines show the peak at two adjacent moments in time.

The right hand-side of the equation shown in Fig. 2a is a sum of three terms, which together describe how the shape of the probability density function, rho , changes over a short time interval, dt. The first term describes random drift, the second describes selection, and the third describes mutation. To clarify the roles of the three terms in describing evolution, we consider each of them separately, by setting the other two terms equal to 0 (Fig. 2b to d). As a convenient example, we examine a probability density localized in a small region near some value of the mutant frequency (fmax). In this example, the second term, by itself, means that the probability density increases with time on the left side of the peak and decreases on the right side of the peak. As a result, the probability density peak, whose shape stays constant, shifts to lower mutant frequencies, as it should in the presence of selection (Fig. 2c). The third term in the equation, by itself, causes a shift of the peak as well, but the direction of the shift is toward 50% composition, which is the expected effect of mutation when the forward and reverse mutation rates are, as assumed, equal (Fig. 2d). The effect of the first term in the equation is of a different kind. Due to this term, the probability density decreases in the interval between the inflection points A and B (Fig. 2b) and increases everywhere outside of the interval. As a result, the probability density spreads outward. This is random drift: the error within which one can predict the value of mutant frequency increases with time. A more general form of the stochastic equation when the probability density, rho (f), is spread over a broad interval of f, is given in equations 1 and 2.

In the equation in Fig. 2a, a physicist will recognize a particular case of the Fokker-Planck equation and a mathematician will recognize a case of the forward Kolmogorov equation (41). It was introduced into the field of population genetics by Wright (81) and then intensively used to study evolution in the presence of different factors (31-33, 37). As it turns out, the equation is much more general than the virus model we used for its derivation in the mathematical section of this review. It describes a broad range of population models, from a bacterial culture to a randomly mating population without allelic dominance (35). Originally, the approach of the Fokker-Planck equation was introduced into population genetics from a phenomenological perspective, based on analogy to gas kinetics (18). Later, the validity of this approach was confirmed for different population models (52, 75). Examples of essential factors which are not included in the equation but which may or may not be important, depending on the experimental system, are epistasis (biological interaction) and linkage between multiple loci, time variation of the selection coefficient and the population size, and allelic dominance in a diploid population (33).

A formal analogy for the system described by the evolution equation is a gas consisting of particles mixed with air and confined between two parallel walls (Fig. 3a). A value of the mutant frequency is analogous to a location between the walls, and the probability density is now the local gas density. The first term (Fig. 2a) describes the diffusion of the gas particles in the air, and the second and third terms combined describe the effect of directed force (an electric field, for example) acting on the gas particles in the presence of friction of the gas against the air. Another useful analogy is gel electrophoresis. The electrical force acting on polymer molecules and the friction against the gel matrix together create directed motion, which segregates the molecules into bands. Molecular diffusion leads to increasing bandwidths. Although the physics of the gel or gas system has nothing to do with viruses or evolution, the formal mathematical analogy between the two systems, as we shall see below, turns out to be very useful.


View larger version (26K):
[in this window]
[in a new window]
 
FIG. 3.   Stochastic evolution equation (equations 1 and 2) and its boundary conditions viewed through the formal analogy between the probability density and the local gas density. The walls at f = 0,1 correspond to the two monomorphic states. (a) Gas particles subject to diffusion and a directed force when far from the walls. (b) Boundary conditions at large population sizes: gas particles bounce off the walls; the total flux at a wall is 0 (equation 2). (c) Small population sizes: gas particles can condense on or evaporate from a wall; the total flux at a wall does not need to be 0 (equations 5 and 6).

Boundary conditions: properties of almost monomorphic populations. In the real world, the mutant frequency cannot be less than 0 or greater than 1, yet the master equation has no such restriction. Thus, the stochastic equation in Fig. 2a (and equations 1 and 2) is incomplete without describing what happens near ends of the allowed interval for the mutant frequencies, 0 and 1. The analysis shown in Fig. 2 is for the case where there is a large number of minority allele copies (that is, f is not near 0 or 1) and treats the mutant frequency (f) as a continuous variable. In many important cases, one also needs to describe the evolution of a population with only a few copies of the minority variant. The boundary conditions where f is near 0 and 1 have to be derived independently from the virus population model described in Subsection A. The derivation given in the mathematical section of this review shows that the conditions differ depending on the interval of population size, as follows.

The boundary conditions can be conveniently expressed in terms of the probability density flux (q), which is exactly analogous to the flux of gas particles through unit area per unit time (Fig. 3). In very large virus populations (Fig. 3b), the boundary conditions state that the flux must vanish at the "walls" corresponding to two monomorphic states, i.e., 100% mutant or 100% wild type (equation 3). In small populations (Fig. 3c), the flux is not zero (equations 5 and 6). This is because the probability of finding the virus population in a completely monomorphic state is finite and can increase or decrease in time. In the gas analogy, in the first case (Fig. 3b) gas molecules bounce off the hot walls and in the second case (Fig. 3c) the walls are cold and gas forms a condensate which can decrease or increase with time. Figuratively speaking, the probability density, just like the gas condensing in or evaporating from the liquid on a wall, can "condense" in or "evaporate" from a monomorphic state.

The real, biological interpretation of the different sets of boundary conditions is as follows. In very large virus populations (which, as we shall see, roughly correspond to almost deterministic evolution), a purely monomorphic state is unlikely: mutations destroy it very quickly. In a small population, mutations are rare and the monomorphic state can occur with a finite probability. This argument also shows that mutations affect virus evolution in a different way depending on the number of infected cells. In a large population, mutations may be important even in a very polymorphic state (e.g., if selection is small). In small populations, the role of mutations is to create a copy of the new allele in an otherwise monomorphic population; once a copy is created, mutations can be neglected until the population becomes monomorphic again. Typically, as we discuss below in the section on steady state, a new allele is lost due to random drift and repeated introduction of mutations will be needed to restore diversity.

Experiments on Evolution and Observable Parameters

In this section, we describe a few gedanken experiments on genetic evolution important for virological applications and introduce quantitative parameters suitable for experimental comparison.

To make use of the evolution equation with boundary conditions (see "Description of the model and the evolution equation" above), one needs to know the state of the system or its statistics at the initial moment of time. The initial condition depends on a particular experimental or natural setup. Virological experiments, relevant for both in vivo and in vitro situations, are as follows.

(i) Accumulation of deleterious mutants (initial condition: a pure wild-type population, i.e., f = 0).

(ii) Reversion of a deleterious mutation (initial condition: a pure mutant population, i.e., f = 1).

(iii) Growth competition (initial composition: a 50%-50% population [f = 0.5] or any other strongly polymorphic mixture).

(iv) Gene fixation (this experiment, which has received a lot of attention in population biology [19, 24, 34, 38, 80] and which is very useful for understanding other stochastic experiments, is defined only in small populations in which the total mutation rate per population, µN, is much less than 1; suppose that a single advantageous allele is introduced into an otherwise monomorphic population [f = 1/N]---the allele will have one of two fates: either it will be lost due to random drift [Fig. 1a] or it will spread to the entire population, i.e., become "fixed"; the questions are: what is the fixation probability, and, if the allele is fixed and does not become extinct, how much time will it take, counting from the moment it appeared? One can also ask a more general question: what is the probability of having a new allele to grow into a subpopulation of a given size before it becomes extinct?).

(v) Steady state. Whatever the initial condition, after a sufficient time, the system passes to the stochastic steady state, in which the probability density no longer depends on time; we consider this relatively simple case separately.

(vi) Genetic divergence. One splits a steady-state population into two isolated parts. Initially, both populations have a random but identical genetic composition, from which they independently diverge. As time goes on, their respective random compositions correlate less and less. The question is, what is the characteristic time at which the loss of correlation occurs?

(vii) Genetic turnover? This experiment studies the average timescale associated with random fluctuations of the mutant frequency in the steady state.

The probability density (rho ) of the mutant frequency predicted by the stochastic equation is the main observable parameter. Unfortunately, to measure it directly, one would have to generate a histogram of mutant frequencies for a very large ensemble of populations. More amenable for experimental testing are the average (expectation) values (equation 36) and the standard deviations or variances (equation 37) of different stochastic parameters, which require a smaller number of populations to measure. Below we introduce some useful parameters whose statistics can be measured in the different experiments we outlined above. At the same time, their predicted statistics can be expressed via the probability density, as shown in the mathematical section of this review. In what follows, we assume that each parameter, for each given population, is measured with a high precision from a sufficiently large sample of sequences. The sampling effects will be discussed separately below.

The first parameter is the mutant frequency itself (f), which is self-explanatory. Its value can be compared directly with the experimental value, provided that the wild-type (best-fit) nucleotide is known.

The second is the intrapopulation genetic distance (T), defined as the proportion of sequence pairs (randomly sampled from the virus population) which differ at the base of interest. Although there are other ways to measure intrapopulation variability, we will use this definition, known in population biology as Nei's nucleotide diversity. It is equivalent to the standard definition of the genetic distance in virology as the average number of pairwise differences among randomly selected genomes, except that it applies to a single base rather than to a long genomic segment. By definition, T is calculated as 2f(1 - f), and varies between 0 (at f = 0 or 1) and 0.5 (at f = 0.5). The genetic distance is usually a more convenient measure of population diversity than the mutant frequency itself since it does not require knowledge of the wild type sequence.

The third is the interpopulation genetic distance (T12), which is defined in the same way as the intrapopulation genetic distance, except that the two sequences of each pair are sampled from two different populations (equation 40). The interpopulation distance is 0 when the two virus populations consist uniformly of the same genetic variant and 1 (100%) when the two virus populations are composed entirely of opposite genetic variants. The interpopulation distance, as one can show, cannot be smaller than the average of the two intrapopulation distances. Therefore, it is sometimes more convenient to consider instead the relative genetic distance between two populations (D), defined as the difference between the interpopulation distance and the average of the two intrapopulation distances [T12 - (T1 + T2)/2]. This parameter (equation 41) varies between 0 (two populations have an identical genetic composition) and 1 (one population is pure mutant, another is pure wild type). There are alternative definitions of the relative distance (54). We find this definition more clear intuitively; also, its statistical moments (average, variance) are relatively easy to calculate.

All the previous parameters can be measured at one time point, both for dynamic experiments (the first three experiments in the beginning) and in the steady state. Since all of them are, in general, stochastic, an average and standard deviation has to be calculated for each. The next parameter is more complex: it requires measurement at two different times. We define it on average and for a steady state population only.

The fourth parameter, the time correlation function of mutant frequency [K(t)], describes how quickly the system "forgets" the preceding random fluctuation of the mutant frequency (equation 45). The time correlation function usually has a maximum when the time difference is 0 and vanishes at large time differences. The characteristic time at which it decays by 50% (or, say, by a factor of e = 2.78... ) from its maximum gives the timescale of random fluctuations. The form of this decay (e.g., exponential or negative power) may be a good fingerprint of a virus population model or, within a given model, of a particular population size.

In the mathematical section of this review, we calculate these parameters for different gedanken experiments and different intervals of population size. In this section of the review, we discuss these results qualitatively and illustrate them, when possible, with Monte Carlo simulations.

Steady State

In this section, we discuss properties of the steady-state, stochastic population in different intervals of the population size.

Neutral case: s µ. Selection is of little significance when the selection coefficient is much less than the mutation rate. This case is probably of little practical significance for RNA viruses, with their tightly organized genomes. However, the transition between stochastic and deterministic behavior is easier to analyze when the selection factor can be neglected. Hence we start our discussion here.

The main fact of stochastic theory is that fluctuations of mutant frequency between statistically identical populations are large if populations are small (stochastic behavior) and small if populations are large (nearly deterministic behavior). In the language of the probability density (equation
52), the density is spread over a broad interval of f in small populations and is a narrow peak at very large population sizes. Transition between the two limits is controlled mostly by a single parameter µN, the product of the population size and the mutation rate. The composite parameter µN, which features extensively in population genetics (usually as Theta  = 2Nµ), gives the total mutation rate for the entire population. For most RNA viruses, µN equals 1 when the number of infected cells is on the order of 105 (i.e., less than the number in a small culture dish).

As the mutation rate per population increases, the probability density gradually changes its shape, as illustrated in Fig. 4 (80). This results from competition between random drift, which drives the system to one of uniform states, and mutations, which diversify the system. At values of µN much smaller than 1 (an interval we accordingly call the drift regime in Table 1), random drift wins and the usual population is only weakly polymorphic. The probability density is, accordingly, U shaped, with a minimum at 50% composition. At the smallest values of µN (the condition is given in equation 5), the system is most likely to be in either of the purely monomorphic states, without a single opposite allele present (see "Description of the model and the evolution equations" above, where the the boundary conditions are described). The total probability of any polymorphic state will be much less than 1 and on the order of µN. This estimate gives the frequency of segregating sites in a genome segment.


View larger version (17K):
[in this window]
[in a new window]
 
FIG. 4.   The steady-state probability density in the neutral case. The curves show rho ss(f) when s  µ at different population numbers, N. Numbers on the curves show the corresponding values of µN.

                              
View this table:
[in this window]
[in a new window]
 
TABLE 1.   Classification of regimes of genetic evolution

Let us move toward larger populations. As we increase the parameter µN, the U shape of the probability density flattens out (Fig. 4). The minimum at 50% composition becomes a maximum when µN is equal to 1/2. The probability density shrinks and becomes narrow as the population increases and µN becomes much larger than 1. This means that the mutant frequency is very close to the deterministic value of 1/2, owing to the balance between forward and reverse mutations. In Table 1, this limit of population sizes is denoted the mutation regime.

Case with selection: µ  s 1. The situation when the selection coefficient is less than 1 but still much larger than the mutation rate is more relevant for RNA viruses and more interesting theoretically. As in the neutral limit, the larger the population size the smaller the fluctuations.

The selection factor can be neglected only if a population is very small, much smaller than the inverse selection coefficient (Ns 1), a case that has the same properties as the above-described drift regime. At larger population sizes, selection is crucial and causes the probability density (equations
48 or 49 to 51) to be asymmetric in favor of a predominantly wild-type population.

In the limit of very large populations, when µN is much larger than 1 (termed the selection regime in Table 1), the probability density is narrow and localized near its deterministic value (equation 57). This value is given by the ratio of the mutation to the selection rate (µ/s), which we assumed to be small. At this value, mutations and selection against emerging mutants reach balance.

A result not sufficiently emphasized in the population biology literature is the existence of a wide interval in population size between the inverse mutation rate and the selection coefficient, which we term the selection-drift regime, in which all three factors of evolution are critical. Specifically, mutations produce diversity, selection restricts mutants to a low level, and random drift causes strong fluctuations between populations. The structure of the probability density in this regime is shown schematically in Fig. 5. It consists of three components. The large peak (delta function) situated at exactly zero mutant frequency means that a population is, most probably, purely wild type. The weak continuous exponential tail which decays at mutant frequencies on the order of 1/Ns 1 (80) means that the chance of a population being polymorphic is low and that if a population happens to be polymorphic, the proportion of mutants is small and quite random. A small peak at f = 1 becomes important only close to the lower border of the interval, when N is on the order of 1/s. The probability of finding any mutants (which is given by the total area under this curve) is low and proportional to µN (equations 49 to 51).


View larger version (8K):
[in this window]
[in a new window]
 
FIG. 5.   Schematic plot of the steady-state probability density in the selection-drift regime. The curve shows rho ss(f) for the case when 1/s N 1/µ. Note the very narrow peaks at rho  = 0 and 1, together with the tail extending from rho  = 0.

The selection-drift regime has rather interesting, even controversial properties. On the one hand, the shape of the probability density suggests a very stochastic behavior. On the other hand, the average mutant frequency and the average genetic distance happen to coincide, over most of the regime, with their deterministic values, as if the population were much larger. Figure 6 shows the average values and the relative standard deviations for both parameters at all the population sizes. As expected, in the selection drift regime the relative standard deviations for both the mutant frequency and the genetic distance are much larger than unity (Fig. 6b). At the same time, the average values (in equation 59) are the same as in the selection regime (Fig. 6a). Notably, the fluctuations of the parameters are much stronger than could be expected from the Poisson statistics. This is a result of clonal amplification: if a single mutant appears in otherwise wild-type population, it grows into a clone. In the sections on stochastic dynamics (see below), we will further clarify the structure of the steady state by presenting a Monte Carlo simulation of a stochastic dynamic evolution in a single population. Examples of the results of such simulations for each regime are shown in Fig. 6c.


View larger version (20K):
[in this window]
[in a new window]
 
FIG. 6.   Dependence of the observable parameters at steady state on the population number. N varies over the three main intervals. (a) Average mutation frequency, <A><AC>f</AC><AC>&cjs1171;</AC></A> and genetic distance, <A><AC>T</AC><AC>&cjs1171;</AC></A>. (b) Relative standard deviations of the same two parameters. (c) Fragments of representative Monte Carlo simulations in the respective intervals of N (see Fig. 10 to 12 for details).

Deterministic Dynamics and Its Boundaries

As we have shown above (see "Experiments on evolution and observable parameters"), the steady-state mutant frequency approaches its deterministic value when µN is much larger than 1. The purpose of this section, small but with a large mathematical counterpart, is to gain insight into the transition between stochasticity and determinism in the more complex case, in which parameters of the system depend on time.

Deterministic dynamics. Deterministic and stochastic theories operate with different dynamic variables. The former considers the time dependence of the frequency of mutants, and the latter uses a more complex object, the time-dependent probability density of the mutant frequency. It is important to ensure that the two approaches converge to the same result in the limit of infinite population, when they are expected to describe deterministic evolution, albeit in a different way. For this purpose, in the mathematical section of this review we solve the dynamic stochastic equation (equation 1) for the case of large populations. The resulting probability density, as expected, is a very narrow peak located at the time-dependent mutant frequency (Fig. 7b), which satisfies the deterministic equation of evolution (equations 60 and 61).


View larger version (10K):
[in this window]
[in a new window]
 
FIG. 7.   Probability density of the mutant frequency in the deterministic limit. rho (f) is represented by the mathematical expression for µN 1 (a) and a schematic plot (b).

The first term in the right-hand side of the deterministic equation (Fig. 7a) (equation 61) describes selection for the wild type, causing depletion of mutants. When one of two subpopulations (f or 1 - f) is very small, the first term becomes small, since if there is no diversity, there is no selection. The second term, describing mutations, does not vanish in a uniform population. Instead, the term vanishes at 50% composition when the effects of forward and reverse mutations cancel each other. Mutations drive the system toward 50% composition. The same evolution equation can be obtained directly from the deterministic first principles (equations 63 and 64).

The deterministic equation in Fig. 7a allows one to predict the genetic composition as a function of time for any initial condition set in an experiment (equation 62). Corresponding plots for the three cases matching the conditions of the accumulation, growth competition, and reversion experiments described above (see "Experiments on evolution and observable parameters") are shown in Fig. 8. In all cases, after a characteristic time proportional to the inverse selection coefficient (1/s), the population approaches a steady state in which the mutant frequency saturates at a small value, the mutation rate over the selection coefficient (µ/s) (see "Steady state" above). Reversion is somewhat delayed compared to that in the two other experiments since the system first has to diversify slowly due to mutations and then still has to cross the entire interval of the mutant frequencies. Note that in both the accumulation and reversion experiments, the initial slope of the time dependence of the mutant frequency is shallow and is determined by the mutation rate (Fig. 8). Selection becomes important and causes the plots to curve after a growing subpopulation becomes sufficiently large.


View larger version (15K):
[in this window]
[in a new window]
 
FIG. 8.   Schematic dependence of the mutant frequency, f, on time in the deterministic limit. The three curves correspond to three different initial values of f(0): accumulation of mutations [f(0) = 0], growth competition [f(0) = 1/2], and reversion of a mutation [f(0) = 1]. The value of the ratio µ/s used in the figure is unrealistically high for viruses and is used for clarity of plot only. Dashed lines show initial slopes.

Boundaries of deterministic approximation. Random drift, always present even in very large populations, causes the frequency of mutants to fluctuate around its deterministic value. As the population size decreases, the magnitude of fluctuations becomes comparable to the average frequency of the minority allele (either mutant or wild type), and the deterministic description breaks down. The corresponding condition on the population size varies significantly depending on the initial conditions of the experiment (equation 65). When the population starts from a monomorphic state (reversion or accumulation), the deterministic criterion is met when µN is much larger than unity. A population that is strongly diverse to start with, as in the growth competition experiment, is already deterministic at a much smaller population size in the selection-drift regime. (The criterion for diversity is that the mutant frequency must be higher than its characteristic "tail" at steady state [Fig. 5] ). The reason for this difference is that a small polymorphism is influenced by rare and random mutation events while a strongly polymorphic population is controlled by selection alone.

Stochastic Dynamics: the Drift Regime

At the smallest population sizes, smaller than the inverse selection coefficient, as we found out when considering the steady state, selection can be neglected altogether. In this section, we consider the nonequilibrium dynamics in this regime. The problems of interest are those listed above (see "Experiments on evolution and observable parameters"): the decay of a strongly polymorphic state, gene fixation, transition from a monomorphic to the steady state, divergence of populations which have been separated, and the rate of genetic turnover in the steady state.

Decay of the polymorphic state and gene fixation. We start our discussion from the population that is initially polymorphic, somewhere in the middle between 0 and 100%. As already discussed (see "Description of the model and the evolution equation"), mutations are not important in a polymorphic population, since they occur in the population with a frequency, µN, much less than 1 per generation. Therefore, random drift remains the only factor causing variation of the mutant frequency in time. As time passes, the mutant frequency drifts until the population accidentally ends up in either monomorphic state (cf. Fig. 1a). A representative random process is illustrated by computer simulation in Fig. 9b. The average time (the number of generations) it takes for a population to become monomorphic (i.e., for either variant to be fixed) is on the order of the population size (equations 81 and 82) (32, 80). The fixation time is quite random: its representative fluctuations are on the order of its average value. The same process can be understood in another way, from the time evolution of probability density. Figure 9a shows how the probability density, initially a narrow peak located, e.g., at 50% composition, gradually spreads out to the entire interval and then decays.


View larger version (21K):
[in this window]
[in a new window]
 
FIG. 9.   Decay of polymorphism in the drift regime. A growth competition experiment for the initial condition f0 = 0.5 and Ns 1 is shown. (a) Change in the probability density in time (equations 81 and 82). (b) The two stochastic dependences f(t) were obtained by random runs of a Monte Carlo simulation program written for the virus population model described in the text. Parameters are shown in the figure.

The fact that, in a time not exceeding a few multiples of the population size, the population becomes uniform has general phylogenetic consequences. Let us divide arbitrarily a population into two groups of equal size and mark each group, say, by a different color. Then we divide each group (color) into two subgroups and mark them by two different shades. Then we divide each shade into two hues, and so on. If we continue the process of subdivision long enough, all individuals in the population will eventually have different tags. Consider now a group consisting of two subgroups. According to the above result, in a time not exceeding a few multiples of the group size, one of the two subgroups vanishes. Likewise, the surviving subgroup contains two smaller subgroups, one of which also becomes extinct in a time not exceeding a few multiples of the subgroup size, and so on. Therefore, in a time on the order of the total population size, the entire population will have the same tag, i.e., will comprise descendants from a single virus or organism. In other words, any two organisms in a population in the drift regime have a common ancestor at a past number of generations on the order of the population size. Phylogenetic methods of analyzing branching processes confirm this result, which is the basis of the coalescent method of estimating population size (39, 40, 65).

Related to the decay of polymorphism described above is gene fixation. Suppose that a single new allele is introduced into a monomorphic population at an initial moment. Eventually, after a number of replication steps, the allele will either disappear due to random drift (which is the most likely outcome) or spread to the entire population, i.e., become fixed. The questions are as follows. (i) What is the probability that the allele will get fixed? (ii) Given that the allele is lucky enough to become fixed, what is the average fixation time? As we show in the mathematical section of this review (equation 84 with f = 1), the fixation probability is the inverse of the population size (1/N) (34) and the fixation time is on the same order as the polymorphism decay time, i.e., on the order of the population size.

One can also ask more general questions. What is the probability that a single mutant genome will ever grow into a subpopulation with a given size? What is the average time spent on this growth? The results are analogous to that for full fixation, except that the subpopulation size substitutes for the total population size (equations 84). As we show in the beginning of the sections on stochastic dynamics in the mathematical section of this review, this result allows us to interpret, at a semiquantitative level, all the important results on stochastic dynamics.

Transition from a monomorphic to a steady state. We also consider here the accumulation of mutations starting from a purely monomorphic state, e.g., wild type (which one of the two does not matter, since selection is negligible). Eventually, mutants will be generated, one of them will become fixed (as described), and the system will switch to pure mutant. Then wild-type alleles will be generated, etc., and, in the long run, the population will be, statistically speaking, in dynamic steady state in which it switches back and forth between two monomorphic states. The system will gradually "forget" its initial state, so that the probabilities of the two monomorphic states will be equal and will be close to 1/2.

In the probability density language, this process can be described as shown in Fig.
10a. The initial peak of the probability density is very narrow and is localized at the zero mutant frequency. As time goes on, a tail of the probability density spreads into the interval between 0 and 100% mutants (equations 85 and 86) and a new peak at 100% mutants appears, reflecting a chance of early fixation of a mutant genome. The first peak decays and the second peak grows, until they become equal in the steady state (Fig. 4) (equation 87). In the gas system analogy (see "Experiments on evolution and observable parameters" above), all water is initially condensed on the left wall and then evaporates. The vapors diffuse into the container and condense again on the right wall (analogous to what happens in a freezer over time). The system reaches equilibrium when the amount of condensate on both walls is the same and there remains some gas in between.


View larger version (30K):
[in this window]
[in a new window]
 
FIG. 10.   Time dependence of the mutant frequency in the drift regime on a long timescale. (a) Change in the probability density in time (equations 85 to 87). Sharp peaks at f = 0 and 1 correspond to the monomorphic states; their probabilities are shown by the relative peak heights (arbitrary units). (b) One Monte Carlo run is shown for Ns 1 and the initial condition f0 = 0. Parameters are shown in the figure.

In addition to the language of probability density, it is useful to visualize transition to the steady state directly, as a typical random process. If the probability density is analogous to the density of gas, the random dependence of the mutant frequency on time corresponds to the random trajectory of a separate gas particle. A representative Monte Carlo simulation of the equilibration process, together with the relevant timescales, is shown in Fig. 10b. The steady-state process looks like a telegraph signal between the two uniform states. The peaks in the mutant and wild-type frequencies correspond to alleles which were generated by mutations and started new subcolonies but failed to become fixed.

Two, widely different timescales appear in both the representative random process and the evolution of the probability density. The typical waiting time for a switch from pure wild type to pure mutant or back is within an order of magnitude of the inverse mutation rate 1/µ. This corresponds to the time in which the probability density becomes symmetric between the wild type and mutant (Fig. 4) (equations 86 and 87). The actual time spent on a successful switch is much shorter, within an order of magnitude of the population size N. This corresponds to the time in which the tail of probability density is formed between 0 and 100% (equation 85). The two timescales can be derived either rigorously, from the evolution equation (equations 4 to 6), or approximately, from the gene fixation problem (equation 84). Both approaches are used in the mathematical section of this review. They agree with each other and with the simulation in Fig. 10b.

The total probability of a polymorphic state (the frequency of segregating sites in genome) is, at any time, much less than 1 and on the order, roughly, of µN. This agrees with the result we obtained directly for the steady state (see above). Interestingly, this value is reached on a timescale of approximately N generations, i.e., much sooner than the two probabilities of monomorphic states equilibrate.

Divergence of populations which have been separated and the time correlation function. The longer timescale, 1/µ, also appears in the time correlation function of mutant frequency, which characterizes the timescale of random fluctuation in the steady state and the divergence of populations which have been separated (see "Experiments on evolution and observable parameters" above). The value of the relative genetic distance, D, gradually changes from 0 to a constant value corresponding to statistically independent populations (equation 90). (Note that some other measures of interpopulation genetic distance used in population biology do not have an upper limit [54].) As it turns out, the time of this transition, the half time of the correlation function decay (equation 91), and the time in which the probability density becomes symmetric (above) are on the same order, the inverse mutation rate. Indeed, all three times are determined by the waiting time for a successful gene fixation.

Stochastic Dynamics: the Selection-Drift Regime

Here we consider nonequilibrium experiments in the most interesting interval of population sizes (Table 1). The relative role of selection and stochasticity in population dynamics, as derived from the evolution equation in the mathematical section of this review, depends on the initial genetic composition. The dynamics of growth competition is almost deterministic (see "Deterministic dynamics and its boundaries" above), so that this experiment need not be discussed again. In the accumulation experiment, the overall dynamics is stochastic, except for the average values of the mutant frequency and the intrapopulation distance, which are, remarkably, the same as in the corresponding deterministic conditions.

Accumulation. As in the drift regime (see above), accumulation can be described as a spread of the peak of the probability density initially located at 0 (uniform wild type) into the interval between 0 and 1. However, unlike in the drift regime, the resulting steady state is not symmetric of a large peak (Fig. 5) (equation 48 or 49 to 51). The process of accumulation is reduced to generation of a small tail describing rarely occurring weakly polymorphic states (Fig. 5). As a result, the initial peak at 0 does not decay greatly and the steady state is reached in the same time as in deterministic selection (see "Deterministic dynamics and its boundaries" above) given by the inverse selection coefficient (1/s), i.e., faster than all timescales in the drift regime (equations 103 and 104).

The simulated stochastic dependence for this experiment is shown in Fig. 11. The process starts from the generation of a single allele, which tries to grow into a clone. The growth initially occurs under the condition that random drift is more important than selection. The maximum frequency that this clone can reach is determined by the characteristic mutant frequency at equilibrium, ~1/(Ns) which corresponds to the clone size, 1/s copies (Fig. 5). Above this value, selection becomes the leading force and drift becomes a correction. Further growth of the deleterious clone cannot occur, and it soon becomes extinct. This appears as sparse peaks, the highest of which reach to the length of the "tail" of the probability density, 1/(Ns) (Fig. 5) (equation 48 or 50). The half-life of a mutant clone (width of a large peak) is the inverse selection coefficient. Note that the typical time interval between peaks, 1/(µNs), is longer than 1/s. The former time is the waiting time for a new allele that will be lucky to reach the size 1/s. The latter time is the time that the lucky clone actually spends growing and contracting before it becomes extinct again. The ratio of the two times, µN, gives the probability of finding the population in a polymorphic state (the area under the tail in Fig. 5). As in the drift regime, all these estimates can be obtained from both the evolution equation (equation 101) and the more intuitive gene fixation approach (equation 84). For comparison, simulation of an accumulation experiment in the "selection" regime (µN = 20) is shown in Fig. 12.


View larger version (26K):
[in this window]
[in a new window]
 
FIG. 11.   Simulated accumulation of mutants in the selection-drift regime. One random Monte Carlo run is shown for 1/s N 1/µ and the initial condition: f0 = 0. The double-pointed arrows and dashed line show predicted scales in time and in the mutant frequency. The solid smooth line shows the deterministic dependence for comparison. Parameters are shown in the figure.


View larger version (26K):
[in this window]
[in a new window]
 
FIG. 12.   Simulated accumulation of mutants in the selection regime. Dashed lines show the average and the standard deviation at steady state for µN 1 calculated using equations 58. Parameters are shown in the figure.

Divergence of separated populations and the time correlation function. The characteristic times of divergence of separated populations (Eq. 105) and the decay time of the correlation function (Eq. 106) are on the order of the inverse selection coefficient, 1/s. Both experiments show for how long, on average, the system "remembers" its previous random fluctuation. The answer: for the half-life of a typical mutant clone, before it becomes extinct. This is because separate clones appear, due to mutation, at independent random times.

Reversion (fixation of an advantageous variant). A reversion experiment, in which the initial population is uniformly mutant, behaves rather differently. Although the same scales for time and the minority allele frequency appear in this case, they have different meaning. As in accumulation, random drift and selection dominate in smaller and larger wild-type colonies, respectively. However, in this case, selection accelerates rather than hinders the growth of a new clone. The probability that a single wild-type allele will manage to grow to a size equal to the inverse selection coefficient, 1/s, is low, s. However, above this critical size, the rest of its growth will be carried out by selection in a deterministic manner, i.e., with a probability close to 1 and over the deterministic timescale, 1/s (see "Deterministic dynamics and its boundaries" above). Hence, the bottleneck of reversion is in reaching the critical size despite random drift; after that, a clone is likely to be fixed in the population. Stochastic dynamics below the critical size is the same as in the accumulation regime (selection is not important). The average waiting time for reversion to start is determined by the fixation probability, s, and by the frequency at which single alleles are generated in a population at each generation, µN, which gives the time ~1/(µNs), i.e., the same scale as the waiting time for a high peak in accumulation regime (Fig. 11) (equation 107) (51). A few examples of reversion curves are shown in Fig. 13. Evolution of the probability density is shown in Fig. 14, including evolution of the density of polymorphic states (Fig. 14a) (equation 108) and of the two probabilities of monomorphic states (Fig. 14b) (equation 107).


View larger version (32K):
[in this window]
[in a new window]
 
FIG. 13.   Simulated reversion (fixation of advantageous variant) in the selection-drift regime, 1/s N 1/µ. The reversion curve in the deterministic limit, N = infinity , is shown by the dashed curves for comparison. Parameters are shown in the figures. (a) Beginning of the reversion curves. Two random Monte Carlo runs are shown for each of two population sizes. (b) Full reversion curve at a smaller population size. Three random runs are shown. Solid lines show the average and the standard deviation of the mutant frequency calculated using equation 107.


View larger version (12K):
[in this window]
[in a new window]
  </