**DOI:**10.1128/MMBR.65.1.151-185.2001

## SUMMARY

We present here a self-contained analytic review of the role of stochastic factors acting on a virus population. We develop a simple one-locus, two-allele model of a haploid population of constant size including the factors of random drift, purifying selection, and random mutation. We consider different virological experiments: accumulation and reversion of deleterious mutations, competition between mutant and wild-type viruses, gene fixation, mutation frequencies at the steady state, divergence of two populations split from one population, and genetic turnover within a single population. In the first part of the review, we present all principal results in qualitative terms and illustrate them with examples obtained by computer simulation. In the second part, we derive the results formally from a diffusion equation of the Wright-Fisher type and boundary conditions, all derived from the first principles for the virus population model. We show that the leading factors and observable behavior of evolution differ significantly in three broad intervals of population size, N. The “neutral limit” is reached when N is smaller than the inverse selection coefficient. When N is larger than the inverse mutation rate per base, selection dominates and evolution is “almost” deterministic. If the selection coefficient is much larger than the mutation rate, there exists a broad interval of population sizes, in which weakly diverse populations are almost neutral while highly diverse populations are controlled by selection pressure. We discuss in detail the application of our results to human immunodeficiency virus population in vivo, sampling effects, and limitations of the model.

The process of evolution is a consequence of the interplay of mutation, selection, and chance on a population of organisms, leading to an observable change in its genetic makeup. Since the time of Darwin, the influence of these factors on the evolution of organisms ranging from bacteria to humans has been intensively studied, both experimentally and theoretically, leading to a very large body of literature. Only recently, however, has attention been turned toward special problems in the evolution of viruses. Virus evolution is of particular interest and importance for three reasons. First, we desire to gain an understanding (usually in the absence of a fossil record) of how modern viruses have arisen from their earlier forms, both in recent times and in parallel with the evolution of their hosts. Second, the evolution of a virus during the course of infection of a single host, or along a short transmission chain, is of great importance in creating new populations with properties altered in important ways, such as evasion of the immune response, resistance to antiviral therapy, or altered virulence. Third, because of their high replication rates, simple genomes, large population sizes, and high mutation rates, viruses make good models for studying and testing evolutionary theory.

Particular attention has focussed on understanding the evolutionary forces that act on human immunodeficiency virus (HIV) during the course of infection of a single human host. HIV displays a remarkable extent of genetic variation concurrent with a high speed of evolution: in the most variable region of the genome (*env*), individual genomes within a population from an infected person can vary by as much as 3 to 5% (2, 43, 78); substitutions in *env*accumulate at a rate of approximately 1% per year (71), 50 million times faster than in the small subunit of rRNA (61). This variation has important consequences. It allows the virus to evolve to infect different cell types (9, 20,30) and to rapidly become resistant to otherwise highly effective antiviral drugs (10, 47, 50); it may play a role in evading the immune system (4, 56, 73, 79). Furthermore, its high mutation rate (estimated to average about 3 × 10^{−5} per nucleotide site per replication cycle [49]), large population size (variously estimated from about 10^{7} to 10^{8} productively infected cells), and continuous steady state, in which the large majority of virions and productively infected cells turns over every day (25, 77), create a situation which, at least in principle, is amenable to (and requires) mathematical modeling.

To date, a number of modeling approaches have been applied to understand the evolution of HIV in vivo. These approaches use either population genetic (mutation frequency distribution) or phylogenetic inference using virus sequences obtained from HIV-infected individuals. In general, they are based on one of two different theoretical frameworks to the evolution problem. Deterministic approaches, including quasispecies theory (15, 26), assume that the population size is very large, such that the frequency of a given mutation at any given time is completely predictable if one knows the initial frequency, the mutation rate, and the selection coefficient (i.e., the differential growth rate conferred by the different alleles). At first glance, such approaches would seem justified by the large number of infected cells at each generation (21); however, a number of factors, such as variation in the replication potential and generation times among infected cells, may lead to an effective population size much smaller than the actual number of infected cells. Stochastic models, as applied to HIV (to this point), proceed from the opposite assumption: that the effective population size is so small (or that selective forces are so weak) that random drift dominates over selection. The hypothesis of selectively neutral mutations has a long, successful history in describing the evolution of organisms where populations are small (and not uniformly distributed) and mutation rates are very low (36). Their applicability to virus populations remains to be established. Many of the assumptions that underlie neutral theory are not appropriate for virus populations, and a number of characteristics of HIV genetic variation in vivo, such as the uneven ratio of synonymous to nonsynonymous changes in different regions of the genome (5, 44, 48), argue against simple application of neutral theory. However, inclusion of selection effects in evolutionary analysis (for example, the coalescent method) presents a mathematical challenge that has not yet been fully solved in a practical fashion, although progress toward this goal has been made recently (42, 55).

As an example of the difference between deterministic and stochastic models, consider the question of the frequency in a population of a mutation that is slightly deleterious to virus replication. In a deterministic system, it can be easily calculated that the frequency of such a mutation in the population will come to equilibrium at a point equal to the mutation rate divided by the selection coefficient (24). In a stochastic system, the population will usually be completely uniform in one variant or the other (76), switching rarely but rapidly from one form to the other. This theoretical experiment is of great practical importance in that it describes the appearance of a mutation that can confer resistance to an antiviral drug even before treatment.

To solve this problem and many others, it is clear that a more general theoretical framework is needed: one that takes into account both selection and drift under a set of assumptions more appropriate to viruses than is found in theoretical works published to date. Our aim in this work was to develop, from first principles, a general theory that includes the effects of both selection and drift on a population. We use a set of assumptions appropriate to virus populations, focusing on the interplay between deterministic and stochastic behavior in the context of virologically realistic experiments. We apply these to the simplest possible model: mutation at a single site with only two alleles, replicating in a steady-state system (that is, a constant number of infected cells) under the influence of constant selective pressure in a single isolated population. Because we are dealing with a single locus, we do not consider recombination explicitly; because we are dealing with haploid populations, we do not have to consider allelic dominance. It should be noted that although we do not consider recombination explicitly, the presence of strong recombination must be, in fact, implied for the one-locus approximation to be quantitatively correct. Also, nonconserved loci must be spaced sufficiently far apart in the genome, depending on the recombination rate. Even in the absence of recombination, the one-locus approximation is a useful starting point for understanding interactions between selection and stochastic factors at a qualitative level. We present a complete model that considers the full range of possible values for population size, mutation rate, and selection effects. Despite its simplicity, the model is surprisingly rich in its descriptive power. At the extremes, the results of this model correspond to the standard results of deterministic or neutral theory; however, we have found that there is a large range of values for the key parameters in which the system behaves in an intermediate fashion: under some conditions its evolution is dominated by stochastic factors, whereas at other times it behaves in a nearly deterministic fashion. We refer to this range of parameter values as the “selection-drift” regime and describe its properties in detail.

This work is divided into two major parts. In the first, we present all the principal results in qualitative terms, using language appropriate for a reader trained in biology and with a moderate level of mathematical sophistication. This part is accompanied by a number of illustrative examples obtained by computer simulation. Although keyed to the mathematical formalism of the second part, it is designed to be read independently and to provide the reader with an understanding of the principal results and their biological significance, particularly in the context of virus populations. The second part is a formal mathematical derivation of the principal results of the model. These results are listed at the beginning of each section and derived in the following subsections. Although some of the derivation presented is not novel, in that it parallels classic work of a number of population biologists (18, 19, 23, 24, 31, 37, 81, 82), its formal application specifically to virus systems is, to the best of our knowledge, a new approach, and we present it in full for this reason, as well as to provide a thorough and self-contained review. Although some of our mathematical methods differ from the classic methods, the final results are identical.

The presentation in both parts of this work proceeds in parallel. We first develop the basic evolution equation, which describes, at least in a statistical sense, the change in frequency of a mutant allele as a function of time and the key parameters: mutation rate, selection coefficient, and population size. We then present the predicted results, for all three regimes, of a set of virological experiments: accumulation and reversion of deleterious mutations, competition between mutant and wild-type viruses, gene fixation, mutation frequencies at the steady state, divergence of two populations split from one population, and genetic turnover within a single population. Next, we discuss sampling statistics and the application of this theory to some specific real-world experimental issues of virus and organismal evolution. Finally, we discuss the application and extension of this theoretical framework to other problems, including multilocus evolution and phylogenetic analysis.

## QUALITATIVE DISCUSSION AND COMPUTER SIMULATIONS

Description of the Model and the Evolution EquationIn this section, we introduce the population model and explain how to approach the problem of evolution when random factors enter the picture. First we describe a one-locus, two-allele population model based on the virus replication cycle and discuss briefly the main factors of evolution included in the model. This is followed by a discussion of the biological meaning of the evolution equation. Finally, the boundary conditions for the evolution equation describing the properties of a weakly polymorphic population are described.

Virus population model.First, we choose a basic model of virus evolution. For the purposes of simplicity, we consider the evolution of one nucleotide position at a time, and we assume that each nucleotide has a choice between only two alleles. (Such a model applies directly to multiple loci if the evolving loci are sufficiently distant and the recombination rate is sufficiently high. Evolution at closely situated loci or in the absence of efficient recombination is not independent [see “Many loci and other aspects” below].) Conventionally, we denote the better-fit allele as wild type and the less-fit allele as mutant. A deleterious mutation event (from wild type to mutant) will be referred to as forward mutation, and an advantageous mutation event will be referred to as reverse mutation. Each separate nucleotide will be characterized by two parameters, both of which are assumed to be much less than unity: the mutation cost (or selection coefficient), *s*, which is the relative difference in fitness between the two alleles, and the mutation rate per base per replication cycle, μ. We assume that mutations at different nucleotides have a weak additive effect on virus fitness. In doing so, we neglect epistasis (coselection) arising due to biological interaction between nucleotides at both the nucleotide and protein levels. We also ignore linkage disequilibrium between loci due to random drift, so that different nucleotides evolve independently (see the Introduction). The mutation rate is set, in our work, to be the same in the forward and reverse directions. For example, for HIV in infected cells the mutation rate per base is in the range of 5 × 10^{−6} to 5 × 10^{−5}, depending on the type of substitution (49, 68). The selection coefficient will vary over a wide range according to the specific base and to the specific conditions of replication, but it is assumed to be constant over the period of observation; in other words, there is no selection for diversity.

The basic model of virus replication is illustrated in Fig.1. Consider the dynamics of a cell population infected by two genetic variants of a virus: a fraction (*f*) of cells is infected by the mutant virus, and the remaining cells (1 − f) are infected by the wild type. The number of mutant-infected cells may change with time, i.e., with each new generation of cells. The total cell count is assumed to be constant. During a generation step, each cell produces a fixed (large) number of virions and then dies and is replaced by an uninfected cell. The number of virions produced and capable of infecting new cells differs, by a factor of 1 − s , between cells infected with different variants, creating selection for the better-fit (more prolific) variant. Since the total number of infected cells is fixed and the number of virions produced per cell is large, only a small fraction of the virions infect the next generation of cells. On infecting a cell, each virion has a small chance of mutating into the opposite genetic variant, given by the mutation rate introduced above. All the virions produced by a cell afterwards represent the same genetic variant. Thus, intracellular interference between variants does not occur. (This lack of intracellular competition is a reasonable assumption for retroviruses or when the proportion of infected cells in a tissue is much lower than 100%. It may vary in other virus models, when the multiplicity of infection is high.)

Some details of the model, such as fixed burst sizes and the point of the replication cycle at which mutation occurs, are of no consequence when long timescales are considered. Overlap in time between generations of infected cells was neglected but causes a factor of 2 increase in the rate of random drift (52). By contrast, such assumptions as two variants per base and the absence of both coselection and selection for diversity are essential. The model includes a minimal set of three factors of genetic evolution: random drift due to sampling of genomes, mutation, and selection. Let us characterize briefly the effect of each of these factors on the composition of the population as it changes with time.

The model assumes that the virions infecting each new generation of cells are chosen randomly from the virions produced by the mutant and wild-type subpopulations. As a result of this random sampling of genomes, the mutant frequency experiences random drift in time (18, 80), as shown in Fig. 1a. In the absence of mutation and selection, any population composed originally of a mixture of alleles eventually becomes uniform in either genotype (i.e., the allele is fixed), with the probabilities depending on the initial composition.

Selection enters our model through the difference in the number of infectious progeny produced by cells infected with different genetic variants. Selection alone drives the system into a state consisting entirely of the better-fit variant.

Mutations, in contrast to random drift and selection, favor inhomogeneity. If the other two factors are absent, mutations push the system toward the equilibrium composition at which the total numbers of forward and reverse mutations per generation are in balance. For equal forward and reverse mutation rates assumed here, equilibrium occurs at 50% of each allele.

If all three factors are at work and there are no external perturbations, the population will eventually reach a dynamic steady state in which mutation, on average, is in balance with selection and/or random drift. In the steady state, the statistical properties of the population no longer vary with time; i.e., even though the genetic composition may fluctuate strongly with time, all the mean values, standard deviations, etc., remain constant. The whole model with the three factors of evolution is illustrated in Fig. 1b.

Stochastic equation of evolution.Different meanings can be assigned to the word “evolution.” For the task at hand, evolution of the population is characterized by the dependence of the frequency of cells infected with mutant virus on time. In deterministic dynamics, which applies only in very large populations of infected cells, if one knows the initial mutant frequency and has the appropriate equations, one can, in principle, predict the mutant frequency at later times with arbitrary precision. (In practise, the equations are never known exactly, since there are many different factors in play, but this is a separate issue [68].) By contrast, in the presence of random factors, the time dependence of the mutant fraction cannot be predicted even in principle. Even if one knows its precise initial value, the error with which one can predict its value later grows with time. If random factors are strong, the error in the mutant frequency and its value become eventually comparable. Evolution of the mutant frequency, in other words, is a random process.

Randomness of mutations does not mean, however, that the evolution of a population is totally arbitrary. On the contrary, useful predictions can be made about its statistical properties even if its specific state cannot be predicted. Instead of time dependence of the mutant frequency, one has to consider the time-dependent probability density [ρ(*f*)], defined as the chance that a given population has a mutant frequency near a particular value. The probability density, which can be introduced if both subpopulations (mutant and wild type) are large, is closely related to a histogram derived by plotting the number of times the mutant frequency of a population is observed to lie within a certain range of values. When both the number of similar experiments and the number of histogram bars are very large, the histogram becomes, in the limit, a smooth function, which is the probability density. (The histogram and the probability density differ by a constant factor: the total area under the probability-density curve [integral] is, by definition, the total probability of having any value of the mutant frequency and is, of course, equal to 1.) The density function contains information about the most relevant statistical parameters (average values and standard deviations) which can be compared with experiment (see “Experiments on evolution and observable parameters” below). In particular, the characteristic width of the probability density peak indicates the error within which the mutant frequency can be predicted.

The stochastic evolution equation (equations 1 and 2) (Fig.2a) expresses the rate of change in the probability density with time in terms of its form at the present moment. Using such an equation and knowing the initial probability density, one can predict its form, in principle, at any time in the future, similarly to how one would predict the mutant frequency itself for a deterministic process. The difference between the two cases is that the time-dependent variable is now a function rather than a number. We derive the evolution equation directly for the population model introduced in the previous subsection, in the beginning of mathematical part of our work (see “Mathematical results and derivations” below). The rest of the mathematical part is devoted to solving the equation for different important cases. Here we only show how the equation looks when the probability density is localized in a small region near some value of the mutant frequency and comment on its meaning from a more qualitative perspective.

The right hand-side of the equation shown in Fig. 2a is a sum of three terms, which together describe how the shape of the probability density function, ρ, changes over a short time interval,*dt*. The first term describes random drift, the second describes selection, and the third describes mutation. To clarify the roles of the three terms in describing evolution, we consider each of them separately, by setting the other two terms equal to 0 (Fig. 2b to d). As a convenient example, we examine a probability density localized in a small region near some value of the mutant frequency (*f*
_{max}). In this example, the second term, by itself, means that the probability density increases with time on the left side of the peak and decreases on the right side of the peak. As a result, the probability density peak, whose shape stays constant, shifts to lower mutant frequencies, as it should in the presence of selection (Fig. 2c). The third term in the equation, by itself, causes a shift of the peak as well, but the direction of the shift is toward 50% composition, which is the expected effect of mutation when the forward and reverse mutation rates are, as assumed, equal (Fig. 2d). The effect of the first term in the equation is of a different kind. Due to this term, the probability density decreases in the interval between the inflection points A and B (Fig. 2b) and increases everywhere outside of the interval. As a result, the probability density spreads outward. This is random drift: the error within which one can predict the value of mutant frequency increases with time. A more general form of the stochastic equation when the probability density, ρ(*f*), is spread over a broad interval of*f*, is given in equations 1 and 2.

In the equation in Fig. 2a, a physicist will recognize a particular case of the Fokker-Planck equation and a mathematician will recognize a case of the forward Kolmogorov equation (41). It was introduced into the field of population genetics by Wright (81) and then intensively used to study evolution in the presence of different factors (31-33, 37). As it turns out, the equation is much more general than the virus model we used for its derivation in the mathematical section of this review. It describes a broad range of population models, from a bacterial culture to a randomly mating population without allelic dominance (35). Originally, the approach of the Fokker-Planck equation was introduced into population genetics from a phenomenological perspective, based on analogy to gas kinetics (18). Later, the validity of this approach was confirmed for different population models (52, 75). Examples of essential factors which are not included in the equation but which may or may not be important, depending on the experimental system, are epistasis (biological interaction) and linkage between multiple loci, time variation of the selection coefficient and the population size, and allelic dominance in a diploid population (33).

A formal analogy for the system described by the evolution equation is a gas consisting of particles mixed with air and confined between two parallel walls (Fig. 3a). A value of the mutant frequency is analogous to a location between the walls, and the probability density is now the local gas density. The first term (Fig.2a) describes the diffusion of the gas particles in the air, and the second and third terms combined describe the effect of directed force (an electric field, for example) acting on the gas particles in the presence of friction of the gas against the air. Another useful analogy is gel electrophoresis. The electrical force acting on polymer molecules and the friction against the gel matrix together create directed motion, which segregates the molecules into bands. Molecular diffusion leads to increasing bandwidths. Although the physics of the gel or gas system has nothing to do with viruses or evolution, the formal mathematical analogy between the two systems, as we shall see below, turns out to be very useful.

Boundary conditions: properties of almost monomorphic populations.In the real world, the mutant frequency cannot be less than 0 or greater than 1, yet the master equation has no such restriction. Thus, the stochastic equation in Fig. 2a (and equations 1and 2) is incomplete without describing what happens near ends of the allowed interval for the mutant frequencies, 0 and 1. The analysis shown in Fig. 2 is for the case where there is a large number of minority allele copies (that is, *f* is not near 0 or 1) and treats the mutant frequency (*f*) as a continuous variable. In many important cases, one also needs to describe the evolution of a population with only a few copies of the minority variant. The boundary conditions where *f* is near 0 and 1 have to be derived independently from the virus population model described in Subsection A. The derivation given in the mathematical section of this review shows that the conditions differ depending on the interval of population size, as follows.

The boundary conditions can be conveniently expressed in terms of the probability density flux (*q*), which is exactly analogous to the flux of gas particles through unit area per unit time (Fig. 3). In very large virus populations (Fig. 3b), the boundary conditions state that the flux must vanish at the “walls” corresponding to two monomorphic states, i.e., 100% mutant or 100% wild type (equation 3). In small populations (Fig. 3c), the flux is not zero (equations 5 and6). This is because the probability of finding the virus population in a completely monomorphic state is finite and can increase or decrease in time. In the gas analogy, in the first case (Fig. 3b) gas molecules bounce off the hot walls and in the second case (Fig. 3c) the walls are cold and gas forms a condensate which can decrease or increase with time. Figuratively speaking, the probability density, just like the gas condensing in or evaporating from the liquid on a wall, can “condense” in or “evaporate” from a monomorphic state.

The real, biological interpretation of the different sets of boundary conditions is as follows. In very large virus populations (which, as we shall see, roughly correspond to almost deterministic evolution), a purely monomorphic state is unlikely: mutations destroy it very quickly. In a small population, mutations are rare and the monomorphic state can occur with a finite probability. This argument also shows that mutations affect virus evolution in a different way depending on the number of infected cells. In a large population, mutations may be important even in a very polymorphic state (e.g., if selection is small). In small populations, the role of mutations is to create a copy of the new allele in an otherwise monomorphic population; once a copy is created, mutations can be neglected until the population becomes monomorphic again. Typically, as we discuss below in the section on steady state, a new allele is lost due to random drift and repeated introduction of mutations will be needed to restore diversity.

Experiments on Evolution and Observable ParametersIn this section, we describe a few gedanken experiments on genetic evolution important for virological applications and introduce quantitative parameters suitable for experimental comparison.

To make use of the evolution equation with boundary conditions (see “Description of the model and the evolution equation” above), one needs to know the state of the system or its statistics at the initial moment of time. The initial condition depends on a particular experimental or natural setup. Virological experiments, relevant for both in vivo and in vitro situations, are as follows.

(i) Accumulation of deleterious mutants (initial condition: a pure wild-type population, i.e., f = 0 ).

(ii) Reversion of a deleterious mutation (initial condition: a pure mutant population, i.e., f = 1 ).

(iii) Growth competition (initial composition: a 50%-50% population [f = 0.5] or any other strongly polymorphic mixture).

(iv) Gene fixation (this experiment, which has received a lot of attention in population biology [19, 24, 34, 38, 80] and which is very useful for understanding other stochastic experiments, is defined only in small populations in which the total mutation rate per population, μ*N*, is much less than 1; suppose that a single advantageous allele is introduced into an otherwise monomorphic population [ f = 1/N ]—the allele will have one of two fates: either it will be lost due to random drift [Fig. 1a] or it will spread to the entire population, i.e., become “fixed”; the questions are: what is the fixation probability, and, if the allele is fixed and does not become extinct, how much time will it take, counting from the moment it appeared? One can also ask a more general question: what is the probability of having a new allele to grow into a subpopulation of a given size before it becomes extinct?).

(v) Steady state. Whatever the initial condition, after a sufficient time, the system passes to the stochastic steady state, in which the probability density no longer depends on time; we consider this relatively simple case separately.

(vi) Genetic divergence. One splits a steady-state population into two isolated parts. Initially, both populations have a random but identical genetic composition, from which they independently diverge. As time goes on, their respective random compositions correlate less and less. The question is, what is the characteristic time at which the loss of correlation occurs?

(vii) Genetic turnover? This experiment studies the average timescale associated with random fluctuations of the mutant frequency in the steady state.

The probability density (ρ) of the mutant frequency predicted by the stochastic equation is the main observable parameter. Unfortunately, to measure it directly, one would have to generate a histogram of mutant frequencies for a very large ensemble of populations. More amenable for experimental testing are the average (expectation) values (equation 36) and the standard deviations or variances (equation 37) of different stochastic parameters, which require a smaller number of populations to measure. Below we introduce some useful parameters whose statistics can be measured in the different experiments we outlined above. At the same time, their predicted statistics can be expressed via the probability density, as shown in the mathematical section of this review. In what follows, we assume that each parameter, for each given population, is measured with a high precision from a sufficiently large sample of sequences. The sampling effects will be discussed separately below.

The first parameter is the mutant frequency itself (*f*), which is self-explanatory. Its value can be compared directly with the experimental value, provided that the wild-type (best-fit) nucleotide is known.

The second is the intrapopulation genetic distance (*T*), defined as the proportion of sequence pairs (randomly sampled from the virus population) which differ at the base of interest. Although there are other ways to measure intrapopulation variability, we will use this definition, known in population biology as Nei's nucleotide diversity. It is equivalent to the standard definition of the genetic distance in virology as the average number of pairwise differences among randomly selected genomes, except that it applies to a single base rather than to a long genomic segment. By definition, *T* is calculated as 2*f*(*1* − *f*), and varies between 0 (at f = 0 or 1) and 0.5 (at f = 0.5 ). The genetic distance is usually a more convenient measure of population diversity than the mutant frequency itself since it does not require knowledge of the wild type sequence.

The third is the interpopulation genetic distance (*T*
_{12}), which is defined in the same way as the intrapopulation genetic distance, except that the two sequences of each pair are sampled from two different populations (equation 40). The interpopulation distance is 0 when the two virus populations consist uniformly of the same genetic variant and 1 (100%) when the two virus populations are composed entirely of opposite genetic variants. The interpopulation distance, as one can show, cannot be smaller than the average of the two intrapopulation distances. Therefore, it is sometimes more convenient to consider instead the relative genetic distance between two populations (*D*), defined as the difference between the interpopulation distance and the average of the two intrapopulation distances [*T*
_{12} − (*T*
_{1} + *T*
_{2})/2]. This parameter (equation 41) varies between 0 (two populations have an identical genetic composition) and 1 (one population is pure mutant, another is pure wild type). There are alternative definitions of the relative distance (54). We find this definition more clear intuitively; also, its statistical moments (average, variance) are relatively easy to calculate.

All the previous parameters can be measured at one time point, both for dynamic experiments (the first three experiments in the beginning) and in the steady state. Since all of them are, in general, stochastic, an average and standard deviation has to be calculated for each. The next parameter is more complex: it requires measurement at two different times. We define it on average and for a steady state population only.

The fourth parameter, the time correlation function of mutant frequency [*K*(*t*)], describes how quickly the system “forgets” the preceding random fluctuation of the mutant frequency (equation 45). The time correlation function usually has a maximum when the time difference is 0 and vanishes at large time differences. The characteristic time at which it decays by 50% (or, say, by a factor of e = 2.78… ) from its maximum gives the timescale of random fluctuations. The form of this decay (e.g., exponential or negative power) may be a good fingerprint of a virus population model or, within a given model, of a particular population size.

In the mathematical section of this review, we calculate these parameters for different gedanken experiments and different intervals of population size. In this section of the review, we discuss these results qualitatively and illustrate them, when possible, with Monte Carlo simulations.

Steady StateIn this section, we discuss properties of the steady-state, stochastic population in different intervals of the population size.

Neutral case: *s* ≪ μ.Selection is of little significance when the selection coefficient is much less than the mutation rate. This case is probably of little practical significance for RNA viruses, with their tightly organized genomes. However, the transition between stochastic and deterministic behavior is easier to analyze when the selection factor can be neglected. Hence we start our discussion here.

The main fact of stochastic theory is that fluctuations of mutant frequency between statistically identical populations are large if populations are small (stochastic behavior) and small if populations are large (nearly deterministic behavior). In the language of the probability density (equation 52), the density is spread over a broad interval of *f* in small populations and is a narrow peak at very large population sizes. Transition between the two limits is controlled mostly by a single parameter μ*N*, the product of the population size and the mutation rate. The composite parameter μ*N*, which features extensively in population genetics (usually as Θ = 2*N*μ), gives the total mutation rate for the entire population. For most RNA viruses, μ*N* equals 1 when the number of infected cells is on the order of 10^{5}(i.e., less than the number in a small culture dish).

As the mutation rate per population increases, the probability density gradually changes its shape, as illustrated in Fig.4 (80). This results from competition between random drift, which drives the system to one of uniform states, and mutations, which diversify the system. At values of μ*N* much smaller than 1 (an interval we accordingly call the drift regime in Table 1), random drift wins and the usual population is only weakly polymorphic. The probability density is, accordingly, U shaped, with a minimum at 50% composition. At the smallest values of μ*N* (the condition is given in equation 5), the system is most likely to be in either of the purely monomorphic states, without a single opposite allele present (see “Description of the model and the evolution equations” above, where the the boundary conditions are described). The total probability of any polymorphic state will be much less than 1 and on the order of μ*N*. This estimate gives the frequency of segregating sites in a genome segment.

Let us move toward larger populations. As we increase the parameter μ*N*, the U shape of the probability density flattens out (Fig. 4). The minimum at 50% composition becomes a maximum when μ*N* is equal to 1/2. The probability density shrinks and becomes narrow as the population increases and μ*N* becomes much larger than 1. This means that the mutant frequency is very close to the deterministic value of 1/2, owing to the balance between forward and reverse mutations. In Table 1, this limit of population sizes is denoted the mutation regime.

Case with selection: μ ≪ *s*≪ 1.The situation when the selection coefficient is less than 1 but still much larger than the mutation rate is more relevant for RNA viruses and more interesting theoretically. As in the neutral limit, the larger the population size the smaller the fluctuations.

The selection factor can be neglected only if a population is very small, much smaller than the inverse selection coefficient (*Ns* ≪ 1), a case that has the same properties as the above-described drift regime. At larger population sizes, selection is crucial and causes the probability density (equations 48 or 49 to 51) to be asymmetric in favor of a predominantly wild-type population.

In the limit of very large populations, when μ*N* is much larger than 1 (termed the selection regime in Table 1), the probability density is narrow and localized near its deterministic value (equation57). This value is given by the ratio of the mutation to the selection rate (μ/*s*), which we assumed to be small. At this value, mutations and selection against emerging mutants reach balance.

A result not sufficiently emphasized in the population biology literature is the existence of a wide interval in population size between the inverse mutation rate and the selection coefficient, which we term the selection-drift regime, in which all three factors of evolution are critical. Specifically, mutations produce diversity, selection restricts mutants to a low level, and random drift causes strong fluctuations between populations. The structure of the probability density in this regime is shown schematically in Fig.5. It consists of three components. The large peak (delta function) situated at exactly zero mutant frequency means that a population is, most probably, purely wild type. The weak continuous exponential tail which decays at mutant frequencies on the order of 1/*Ns* ≪ 1 (80) means that the chance of a population being polymorphic is low and that if a population happens to be polymorphic, the proportion of mutants is small and quite random. A small peak at f = 1 becomes important only close to the lower border of the interval, when *N* is on the order of 1/*s*. The probability of finding any mutants (which is given by the total area under this curve) is low and proportional to μ*N* (equations 49 to 51).

The selection-drift regime has rather interesting, even controversial properties. On the one hand, the shape of the probability density suggests a very stochastic behavior. On the other hand, the average mutant frequency and the average genetic distance happen to coincide, over most of the regime, with their deterministic values, as if the population were much larger. Figure 6shows the average values and the relative standard deviations for both parameters at all the population sizes. As expected, in the selection drift regime the relative standard deviations for both the mutant frequency and the genetic distance are much larger than unity (Fig.6b). At the same time, the average values (in equation 59) are the same as in the selection regime (Fig. 6a). Notably, the fluctuations of the parameters are much stronger than could be expected from the Poisson statistics. This is a result of clonal amplification: if a single mutant appears in otherwise wild-type population, it grows into a clone. In the sections on stochastic dynamics (see below), we will further clarify the structure of the steady state by presenting a Monte Carlo simulation of a stochastic dynamic evolution in a single population. Examples of the results of such simulations for each regime are shown in Fig. 6c.

Deterministic Dynamics and Its BoundariesAs we have shown above (see “Experiments on evolution and observable parameters”), the steady-state mutant frequency approaches its deterministic value when μ*N* is much larger than 1. The purpose of this section, small but with a large mathematical counterpart, is to gain insight into the transition between stochasticity and determinism in the more complex case, in which parameters of the system depend on time.

Deterministic dynamics.Deterministic and stochastic theories operate with different dynamic variables. The former considers the time dependence of the frequency of mutants, and the latter uses a more complex object, the time-dependent probability density of the mutant frequency. It is important to ensure that the two approaches converge to the same result in the limit of infinite population, when they are expected to describe deterministic evolution, albeit in a different way. For this purpose, in the mathematical section of this review we solve the dynamic stochastic equation (equation 1) for the case of large populations. The resulting probability density, as expected, is a very narrow peak located at the time-dependent mutant frequency (Fig. 7b), which satisfies the deterministic equation of evolution (equations 60 and 61).

The first term in the right-hand side of the deterministic equation (Fig. 7a) (equation 61) describes selection for the wild type, causing depletion of mutants. When one of two subpopulations (*f* or 1 − f ) is very small, the first term becomes small, since if there is no diversity, there is no selection. The second term, describing mutations, does not vanish in a uniform population. Instead, the term vanishes at 50% composition when the effects of forward and reverse mutations cancel each other. Mutations drive the system toward 50% composition. The same evolution equation can be obtained directly from the deterministic first principles (equations 63 and 64).

The deterministic equation in Fig. 7a allows one to predict the genetic composition as a function of time for any initial condition set in an experiment (equation 62). Corresponding plots for the three cases matching the conditions of the accumulation, growth competition, and reversion experiments described above (see “Experiments on evolution and observable parameters”) are shown in Fig.8. In all cases, after a characteristic time proportional to the inverse selection coefficient (1/*s*), the population approaches a steady state in which the mutant frequency saturates at a small value, the mutation rate over the selection coefficient (μ/*s*) (see “Steady state” above). Reversion is somewhat delayed compared to that in the two other experiments since the system first has to diversify slowly due to mutations and then still has to cross the entire interval of the mutant frequencies. Note that in both the accumulation and reversion experiments, the initial slope of the time dependence of the mutant frequency is shallow and is determined by the mutation rate (Fig. 8). Selection becomes important and causes the plots to curve after a growing subpopulation becomes sufficiently large.

Boundaries of deterministic approximation.Random drift, always present even in very large populations, causes the frequency of mutants to fluctuate around its deterministic value. As the population size decreases, the magnitude of fluctuations becomes comparable to the average frequency of the minority allele (either mutant or wild type), and the deterministic description breaks down. The corresponding condition on the population size varies significantly depending on the initial conditions of the experiment (equation 65). When the population starts from a monomorphic state (reversion or accumulation), the deterministic criterion is met when μ*N*is much larger than unity. A population that is strongly diverse to start with, as in the growth competition experiment, is already deterministic at a much smaller population size in the selection-drift regime. (The criterion for diversity is that the mutant frequency must be higher than its characteristic “tail” at steady state [Fig. 5] ). The reason for this difference is that a small polymorphism is influenced by rare and random mutation events while a strongly polymorphic population is controlled by selection alone.

Stochastic Dynamics: the Drift RegimeAt the smallest population sizes, smaller than the inverse selection coefficient, as we found out when considering the steady state, selection can be neglected altogether. In this section, we consider the nonequilibrium dynamics in this regime. The problems of interest are those listed above (see “Experiments on evolution and observable parameters”): the decay of a strongly polymorphic state, gene fixation, transition from a monomorphic to the steady state, divergence of populations which have been separated, and the rate of genetic turnover in the steady state.

Decay of the polymorphic state and gene fixation.We start our discussion from the population that is initially polymorphic, somewhere in the middle between 0 and 100%. As already discussed (see “Description of the model and the evolution equation”), mutations are not important in a polymorphic population, since they occur in the population with a frequency, μ*N*, much less than 1 per generation. Therefore, random drift remains the only factor causing variation of the mutant frequency in time. As time passes, the mutant frequency drifts until the population accidentally ends up in either monomorphic state (cf. Fig. 1a). A representative random process is illustrated by computer simulation in Fig. 9b. The average time (the number of generations) it takes for a population to become monomorphic (i.e., for either variant to be fixed) is on the order of the population size (equations 81 and 82) (32, 80). The fixation time is quite random: its representative fluctuations are on the order of its average value. The same process can be understood in another way, from the time evolution of probability density. Figure 9a shows how the probability density, initially a narrow peak located, e.g., at 50% composition, gradually spreads out to the entire interval and then decays.

The fact that, in a time not exceeding a few multiples of the population size, the population becomes uniform has general phylogenetic consequences. Let us divide arbitrarily a population into two groups of equal size and mark each group, say, by a different color. Then we divide each group (color) into two subgroups and mark them by two different shades. Then we divide each shade into two hues, and so on. If we continue the process of subdivision long enough, all individuals in the population will eventually have different tags. Consider now a group consisting of two subgroups. According to the above result, in a time not exceeding a few multiples of the group size, one of the two subgroups vanishes. Likewise, the surviving subgroup contains two smaller subgroups, one of which also becomes extinct in a time not exceeding a few multiples of the subgroup size, and so on. Therefore, in a time on the order of the total population size, the entire population will have the same tag, i.e., will comprise descendants from a single virus or organism. In other words, any two organisms in a population in the drift regime have a common ancestor at a past number of generations on the order of the population size. Phylogenetic methods of analyzing branching processes confirm this result, which is the basis of the coalescent method of estimating population size (39, 40, 65).

Related to the decay of polymorphism described above is gene fixation. Suppose that a single new allele is introduced into a monomorphic population at an initial moment. Eventually, after a number of replication steps, the allele will either disappear due to random drift (which is the most likely outcome) or spread to the entire population, i.e., become fixed. The questions are as follows. (i) What is the probability that the allele will get fixed? (ii) Given that the allele is lucky enough to become fixed, what is the average fixation time? As we show in the mathematical section of this review (equation 84 with f = 1 ), the fixation probability is the inverse of the population size (1/*N*) (34) and the fixation time is on the same order as the polymorphism decay time, i.e., on the order of the population size.

One can also ask more general questions. What is the probability that a single mutant genome will ever grow into a subpopulation with a given size? What is the average time spent on this growth? The results are analogous to that for full fixation, except that the subpopulation size substitutes for the total population size (equations 84). As we show in the beginning of the sections on stochastic dynamics in the mathematical section of this review, this result allows us to interpret, at a semiquantitative level, all the important results on stochastic dynamics.

Transition from a monomorphic to a steady state.We also consider here the accumulation of mutations starting from a purely monomorphic state, e.g., wild type (which one of the two does not matter, since selection is negligible). Eventually, mutants will be generated, one of them will become fixed (as described), and the system will switch to pure mutant. Then wild-type alleles will be generated, etc., and, in the long run, the population will be, statistically speaking, in dynamic steady state in which it switches back and forth between two monomorphic states. The system will gradually “forget” its initial state, so that the probabilities of the two monomorphic states will be equal and will be close to 1/2.

In the probability density language, this process can be described as shown in Fig. 10a. The initial peak of the probability density is very narrow and is localized at the zero mutant frequency. As time goes on, a tail of the probability density spreads into the interval between 0 and 100% mutants (equations 85 and86) and a new peak at 100% mutants appears, reflecting a chance of early fixation of a mutant genome. The first peak decays and the second peak grows, until they become equal in the steady state (Fig. 4) (equation 87). In the gas system analogy (see “Experiments on evolution and observable parameters” above), all water is initially condensed on the left wall and then evaporates. The vapors diffuse into the container and condense again on the right wall (analogous to what happens in a freezer over time). The system reaches equilibrium when the amount of condensate on both walls is the same and there remains some gas in between.

In addition to the language of probability density, it is useful to visualize transition to the steady state directly, as a typical random process. If the probability density is analogous to the density of gas, the random dependence of the mutant frequency on time corresponds to the random trajectory of a separate gas particle. A representative Monte Carlo simulation of the equilibration process, together with the relevant timescales, is shown in Fig. 10b. The steady-state process looks like a telegraph signal between the two uniform states. The peaks in the mutant and wild-type frequencies correspond to alleles which were generated by mutations and started new subcolonies but failed to become fixed.

Two, widely different timescales appear in both the representative random process and the evolution of the probability density. The typical waiting time for a switch from pure wild type to pure mutant or back is within an order of magnitude of the inverse mutation rate 1/μ. This corresponds to the time in which the probability density becomes symmetric between the wild type and mutant (Fig. 4) (equations86 and 87). The actual time spent on a successful switch is much shorter, within an order of magnitude of the population size*N*. This corresponds to the time in which the tail of probability density is formed between 0 and 100% (equation 85). The two timescales can be derived either rigorously, from the evolution equation (equations 4 to 6), or approximately, from the gene fixation problem (equation 84). Both approaches are used in the mathematical section of this review. They agree with each other and with the simulation in Fig. 10b.

The total probability of a polymorphic state (the frequency of segregating sites in genome) is, at any time, much less than 1 and on the order, roughly, of μ*N*. This agrees with the result we obtained directly for the steady state (see above). Interestingly, this value is reached on a timescale of approximately *N*generations, i.e., much sooner than the two probabilities of monomorphic states equilibrate.

Divergence of populations which have been separated and the time correlation function.The longer timescale, 1/μ, also appears in the time correlation function of mutant frequency, which characterizes the timescale of random fluctuation in the steady state and the divergence of populations which have been separated (see “Experiments on evolution and observable parameters” above). The value of the relative genetic distance, *D*, gradually changes from 0 to a constant value corresponding to statistically independent populations (equation 90). (Note that some other measures of interpopulation genetic distance used in population biology do not have an upper limit [54].) As it turns out, the time of this transition, the half time of the correlation function decay (equation91), and the time in which the probability density becomes symmetric (above) are on the same order, the inverse mutation rate. Indeed, all three times are determined by the waiting time for a successful gene fixation.

Stochastic Dynamics: the Selection-Drift RegimeHere we consider nonequilibrium experiments in the most interesting interval of population sizes (Table 1). The relative role of selection and stochasticity in population dynamics, as derived from the evolution equation in the mathematical section of this review, depends on the initial genetic composition. The dynamics of growth competition is almost deterministic (see “Deterministic dynamics and its boundaries” above), so that this experiment need not be discussed again. In the accumulation experiment, the overall dynamics is stochastic, except for the average values of the mutant frequency and the intrapopulation distance, which are, remarkably, the same as in the corresponding deterministic conditions.

Accumulation.As in the drift regime (see above), accumulation can be described as a spread of the peak of the probability density initially located at 0 (uniform wild type) into the interval between 0 and 1. However, unlike in the drift regime, the resulting steady state is not symmetric of a large peak (Fig. 5) (equation 48 or 49 to 51). The process of accumulation is reduced to generation of a small tail describing rarely occurring weakly polymorphic states (Fig. 5). As a result, the initial peak at 0 does not decay greatly and the steady state is reached in the same time as in deterministic selection (see “Deterministic dynamics and its boundaries” above) given by the inverse selection coefficient (1/*s*), i.e., faster than all timescales in the drift regime (equations 103 and 104).

The simulated stochastic dependence for this experiment is shown in Fig. 11. The process starts from the generation of a single allele, which tries to grow into a clone. The growth initially occurs under the condition that random drift is more important than selection. The maximum frequency that this clone can reach is determined by the characteristic mutant frequency at equilibrium, ∼1/(*Ns*) which corresponds to the clone size, 1/*s* copies (Fig. 5). Above this value, selection becomes the leading force and drift becomes a correction. Further growth of the deleterious clone cannot occur, and it soon becomes extinct. This appears as sparse peaks, the highest of which reach to the length of the “tail” of the probability density, 1/(*Ns*) (Fig. 5) (equation 48 or 50). The half-life of a mutant clone (width of a large peak) is the inverse selection coefficient. Note that the typical time interval between peaks, 1/(μ*Ns*), is longer than 1/*s*. The former time is the waiting time for a new allele that will be lucky to reach the size 1/*s*. The latter time is the time that the lucky clone actually spends growing and contracting before it becomes extinct again. The ratio of the two times, μ*N*, gives the probability of finding the population in a polymorphic state (the area under the tail in Fig. 5). As in the drift regime, all these estimates can be obtained from both the evolution equation (equation 101) and the more intuitive gene fixation approach (equation 84). For comparison, simulation of an accumulation experiment in the “selection” regime (μ N = 20 ) is shown in Fig. 12.

Divergence of separated populations and the time correlation function.The characteristic times of divergence of separated populations (Eq. 105) and the decay time of the correlation function (Eq. 106) are on the order of the inverse selection coefficient, 1/*s*. Both experiments show for how long, on average, the system “remembers” its previous random fluctuation. The answer: for the half-life of a typical mutant clone, before it becomes extinct. This is because separate clones appear, due to mutation, at independent random times.

Reversion (fixation of an advantageous variant).A reversion experiment, in which the initial population is uniformly mutant, behaves rather differently. Although the same scales for time and the minority allele frequency appear in this case, they have different meaning. As in accumulation, random drift and selection dominate in smaller and larger wild-type colonies, respectively. However, in this case, selection accelerates rather than hinders the growth of a new clone. The probability that a single wild-type allele will manage to grow to a size equal to the inverse selection coefficient, 1/*s*, is low, *s*. However, above this critical size, the rest of its growth will be carried out by selection in a deterministic manner, i.e., with a probability close to 1 and over the deterministic timescale, 1/*s* (see “Deterministic dynamics and its boundaries” above). Hence, the bottleneck of reversion is in reaching the critical size despite random drift; after that, a clone is likely to be fixed in the population. Stochastic dynamics below the critical size is the same as in the accumulation regime (selection is not important). The average waiting time for reversion to start is determined by the fixation probability,*s*, and by the frequency at which single alleles are generated in a population at each generation, μ*N*, which gives the time ∼1/(μ*Ns*), i.e., the same scale as the waiting time for a high peak in accumulation regime (Fig. 11) (equation107) (51). A few examples of reversion curves are shown in Fig. 13. Evolution of the probability density is shown in Fig. 14, including evolution of the density of polymorphic states (Fig. 14a) (equation108) and of the two probabilities of monomorphic states (Fig. 14b) (equation 107).

Sampling EffectsIn the previous sections, we analyzed random fluctuations of the mutant frequency within an ensemble of populations of infected cells of the same finite size. We have assumed that the value of mutant frequency, genetic distance, etc., for each population was measured accurately by counting the numbers of mutant and wild-type alleles in a sufficiently large sample of genomes. The genome samples used in real experiments are, of course, not infinitely large. Hence, the experimental estimate of any quantity is approximate and sample dependent. The sampling effects may distort the experimental results if the samples are too small. In this section, we calculate how large a sample of genomes we need to achieve a given accuracy of measurements. We focus on the intrapopulation genetic distance (*T*) defined above (see “Experiments on evolution and observable parameters”) for a separate nucleotide. To obtain an experimental estimate of the distance, one isolates a fixed number of sequences from the population, determines the number of nucleotide differences for each pair of sequences, and averages the result over all pairs. (This procedure is specific for our choice of the genetic distance.) The accuracy of the estimate is characterized by the relative error (ɛ), defined as the standard deviation divided by the average. The result is shown in Fig.15a (equation 116). This formula is quite general and can be applied to any regime or particular experiment on genetic evolution. For instance, for the maximum possible intrapopulation distance, T = 0.5 (in absolute units), which corresponds to the half-and-half variant composition at the base of interest, 25% accuracy is approximately reached at a sample size of 6 genomes and 10% accuracy is predicted for a sample of 14 genomes. As the polymorphism decreases, the sample size required increases quickly. To reach, e.g., a 20% accuracy of measurement at the genetic distance T = 0.095 (0.95 and 0.05 composition at the base), one needs to sample ∼500 genomes (of which 25 ± 5 genomes will be mutant). Hence, to study rare genetic variants, it is undesirable to simply count sequences: one needs to employ alternative methods of quantitation like selective PCR. Of course, as is done often, the genetic distance can be averaged over a large number of bases; this saves the sample size. Such a simple solution will not work, however, if one does not know whether the bases used for averaging evolve under similar conditions or if one is interested in a specific base.

Experimental design requires making an educated prediction of the appropriate sample size and measurement methods, and one therefore needs to anticipate the intrapopulation genetic distance, at least to within an order of magnitude. At the same time, the actual value of the distance fluctuates between populations and is not known before the measurement is made. Therefore, one has to use some sort of theoretically predicted typical distance. Making such a prediction is not trivial. The expectation value is not a good choice, since, deep into the stochastic regime, a population is most probably found in a monomorphic state at any given allele. The sample size has to be optimized with respect to polymorphic states. These states have a low probability: if a population is completely uniform at a site, any size sample will be monomorphic as well. We propose to use the representative average distance (*T*
_{rep}), which differs from the standard average distance in that it is averaged over polymorphic states only. (Experimentally, this can be accomplished by examining many sites and focusing attention only on the few that are polymorphic.) Quantitative differences between the two averages can be rather large. The expressions for the representative average distance in the steady-state population, for all three intervals of the population size, are shown in Fig. 15b (equation 118). One can see that the smallest distance and therefore the largest samples required correspond to the deterministic limit and the smallest samples correspond to the drift regime. This advantage of stochastic regimes is, however, canceled by a large number of different populations (or, at least, similar sites) needed for a representative assessment of polymorphism (roughly 1/μ*N* populations or sites). In the steady state, assaying many populations or sites can be traded for sampling the same population or site at many time points spaced farther than the genetic turnover time (see the discussions of the time correlation function in the previous two sections).

Experimental ApplicationsThe theoretical considerations presented here have useful and important implications for understanding the evolution not only of viruses but also of organisms generally. Their practical application, however, requires the use of appropriate experiments, designed both to test the validity of the theory and to then apply it to specific situations. In the following subsections, we present three examples of such applications. Other important experimental issues which are outside of the scope of our basic analytic review (phylogenetic studies, multilocus effects, etc.) will be briefly discussed in the next section.

Virological studies in vitro.Viruses replicating in cell cultures offer a convenient experimental model for studying evolutionary processes. Compared to more traditional genetic models (fruit flies and bacteria), the advantages are a relatively easy control and sampling of genotypes and of external conditions, short generation times, and, especially, high mutation rates (for RNA viruses). Application of the results presented in this paper and testing of their validity to these systems are rather straightforward. We list recommendations for two kinds of experiments.

Experiments on growth competition, aimed at comparing the fitness of two chosen genetic variants, are common in the virological literature (54-56, 58-60). They are typically carried out by mixing a majority of mutant virions with wild-type virus and monitoring the change in proportion as a function of repeated passage in permissive cells. The selective advantage(s) can then be estimated from the slope of the curve relating the mutant frequency, *f*, to the number of generations (Fig. 8). Note that the slope of the curve where it crosses 50% is independent of whether the experiment is carried out in the selection or the selection-drift regime. New spontaneous mutations may arise, changing the virus fitness and distorting results. This problem can be avoided if the population of infected cells is chosen in the selection-drift interval (Table 1), 1/*s* ≪ *N* ≪ 1/μ. Then, on the one hand, competition dynamics is almost deterministic, until one of the two subpopulations becomes very small. On the other hand, the time in which a mutation (advantageous for the virus but unwanted by the experimentalist) will appear, 1/(μ*N*s), is much longer than the measurement time, 1/*s*, required to resolve the two growth rates (assuming that all selection coefficients are on the same order and that the advantageous mutant allele is not present in the initial population, i.e., a single ex vivo clone).

In the opposite experiment, one starts from a monomorphic population and monitors how fast a new advantageous mutation appears and outgrows the old genetic variant. To shorten the waiting time (Fig. 13), the population size must be large, at least in the selection-drift regime. After a new colony exceeds the critical size imposed by the stochastic bottleneck (see “Stochastic dynamics: the drift regime” above), the dynamics is almost deterministic.

Based on our results, the evolutionary experiment is not a suitable way to measure the spontaneous mutation rate. For example, attempted measurements of changes in the mutation rate in bacteria due to changes in external growth conditions (adaptive mutation) (61) are difficult to interpret. First, the selection coefficient is affected by the change in external conditions as well, and this effect is likely to be more important than the change in the mutation rate. As one can show (second equation 62), in a deterministically large population, even a substantial change in the mutation rate causes only a slight shift in the reversion curve (Fig. 8). Only the selection coefficient can be reliably assessed in such an experiment. Second, such experiments depend on the details of the evolutionary model. Third, if the population is small (Fig. 13), the time dependencies of the mutant frequency will fluctuate between different cultures and the changes in the mutation rate cannot be detected due to statistical error.

HIV populations in vivo.Data on the evolutionary behavior and genetic diversity of HIV, if understood in sufficient detail, could reveal vital information about major biological factors acting on the virus population in vivo. Of particular interest are the relative roles of stochastic factors and selective forces, the role of purifying selection versus selection for diversity, and possible variation of wild-type sequence between individuals and tissues.

One application of this model is to use HIV genetic variation as a tool to probe the underlying size and structure of the infected cell population. There has been considerable controversy in the literature about the effective population size of HIV in a representative untreated patient. The concept of the effective size was introduced by Wright as a means of referring the intensity of genetic drift in a real population to that in an ideal Wright-Fisher population (80); i.e., the effective size of a real population is the size an ideal population would have if it also had the same rate of genetic drift as that observed in the real population. Another, perhaps more intuitive, way of thinking about the effective size is to consider the inbreeding effective population size, defined as the inverse of the probability that two randomly selected individuals have a common ancestor in the previous generation. (This is conceptually close to the crude “virological” definition of the effective population size as the number of productively effective cells that produce most of the virions that infect the next generation of productively infected cells.) If this probability of identity by descent is low, it stands to reason that the population size must be large, and vice versa. One begins to see how the (inbreeding) effective population size influences the genetic diversity. If the effective size is small, the probability of identity by descent is high, and there is consequently low genetic diversity, since individuals tend to be closely related. Both definitions apply in the presence of weak selection as well, e.g., for the model system shown in Fig. 1. There are several other measures of effective size, e.g., the variance effective population size and the eigenvalue effective population size, but these rarely give different values. Of course, the usefulness of all these definitions depends on the hypothesis that the actual population is not too far, in the sense of its evolutionary properties, from an ideal Wright-Fisher model with suitable parameters.

Assuming that selection is not important and the neutral model applies, a coalescence-based approach (see “Many loci and other aspects” below) has been used to estimate an effective size as small as 100 to 1,000 cells (45), much smaller than the total number of productively infected cells per patient, 10^{7} to 10^{9} (22). At least one other study reported similar values (66). However, other lines of evidence, including differences among rates of accumulation in different genes (74) and very high (44, 84) or very low (4, 5) ratios of synonymous to nonsynonymous mutations in some genes, imply that HIV populations are subject to significant selective influences. Therefore, population genetic methods that assume a lack of selection may yield erroneous results.

Two of us recently developed and applied a robust method to estimate the effective HIV population size in vivo based on the genetic variation at close pairs of highly diverse sites (67). As follows from the simulation examples above (Fig. 8, 9, and 13), a site cannot preserve a high diversity (f ≃ 0.5) indefinitely. Early in infection, the HIV population is almost uniform genetically or has a limited number of sequences, due to the bottleneck that occurs at transmission and to early competition between clones (12, 27, 46, 86). Therefore, highly diverse sites are sites that are caught in the act of “reversion” from mutant to wild type (i.e., of advantageous substitution). The basis for this test was to select two such sites, A(a), and B(b), where the lowercase and capital letters denote mutant and wild type, respectively, and then classify all sequences in the population into four groups (haplotypes): ab, Ab, aB, and AB. During reversion, the population starts from an almost uniform haplotype ab and arrives at an almost uniform haplotype AB. The two other haplotypes are transient. The idea of the test is that, deep in a stochastic regime and given a limited sample size, one of the four haplotype groups will be empty at any time, because the time at which reversion ensues is random. Suppose that the population is deep in the selection-drift regime. Two sites revert typically at different random times, even if their selection coefficients are equal (Fig. 13). Nearly simultaneous reversion can happen accidentally. In all cases, as can be shown, the number of well-represented subclones (i.e., the number found in a sample of the usual size [10 to 30 clones]) is typically two, rarely three, and much more rarely four, at any time point.

Using sequence databases for HIV *pro* and *env*genes from drug-naive individuals (27, 44), we found that this effect is absent for close pairs of bases. We checked that this effect is not sensitive to variation of the initial genetic composition and some other factors assumed in the model and estimated the effect of recombination (derived from kinetic data) on the test to be numerically small as well. We therefore were able to conclude that a steady-state HIV population in an untreated individual, with respect to evolution of separate bases, is either in or at the border of the deterministic interval of population sizes.

Some authors considered a possibility that an HIV population may consist of weakly connected small populations. Shedding viruses from these subpopulations into the peripheral blood could explain the presence of all four haplotypes in the above test, in apparent contrast to our conclusion. Indeed, HIV-infected cells are located within lymphoid tissue in visually distinct islands (64). However, different islands exchange virions and infected cells and may or may not be weakly connected genetically. The strong overlap of the island patterns obtained for different virus strains (64) proves that the island structure is due to nonuniform distribution of infectible cells rather than to random seeding by the virus. Next, estimates based on studies of the clearance rate of free virus from peripheral blood (85), on HIV RNA quantitation in the lymphoid tissue (22), and on the decay rate of infectious virus titer under highly active antiretroviral therapy (62) suggest that a considerable portion of virus particles produced in the tissue drain into the blood (within a few hours or less, from where they are removed within a few minutes). This implies that a good portion of virus particles infect cells far from the cells that produced them, suggesting strong virus transfer between the islands within the same tissue. On the other hand, viruses isolated from some locations (semen and the central nervous system) show phylogeny distinct from that of the main virus reservoir in the body. Genetic sampling from different islands could clear the issue (28).

Given a relatively weak role of stochastic effects detected, we decided to test which factors shape evolution in the HIV protease (*pro*) gene (68). Using the same database of*pro* sequences, we observed that variation was restricted to rare bases: an average base was variable in about 16% of patients. The intrapatient distance per individual variable site, 27%, was similar for synonymous and nonsynonymous sites, although synonymous variable sites were twice as abundant, implying that purifying selection is the dominant kind of selection. We explained these facts within the one-locus model of evolution by assuming deterministic evolution within individuals and random sampling during the transmission between individuals. We considered different variants of the model with transmission of one and several genomes and with coinfection from independent sources. The model explained the variable sites as slightly deleterious mutants that are slowly being replaced with the better-fit variant during individual infection. In the case of a single-source transmission, genetic bottlenecks at the moment of transmission effectively suppress selection, allowing mutants to accumulate along the transmission chain to the high levels observed. However, we found that even very rare coinfections from independent sources are able to counteract the bottleneck effect and keep mutants at low levels. If such coinfections occur, the plausible explanation of the high level of mutants in an inoculum is variation of the best-fit sequence between individuals due to variation in the specific immune response, combined with coselection. In this model, variation in *pro* is due to a cascade of mutations compensating for early antigenic escape mutations. Note that our analysis was restricted to the single-locus approximation (see below).

General applications.The progress of evolution, in general, may be limited by the time it takes for a new advantageous (in our notation, wild-type) allele to appear and become fixed in the population. Figure 16 shows schematically the time required to reach a 50% composition as a function of the population size. An interesting conclusion about the relative role of selection and randomness follows from this diagram. The reversion half time depends on the population size. The shortest time, given mostly by the inverse selection coefficient, is reached in the deterministic limit, *μN* >> 1. This implies population sizes larger than 10^{8} to 10^{9} genomes for DNA viruses and bacteria and 10^{11} to 10^{12}for higher organisms. Such a population size exceeds the total size of any species higher on the evolutionary scale than insects. On the other hand, a mutation in a small population with a size smaller than the inverse selection coefficient (drift regime, in our terms) would be fixed only after a number of generations given by the inverse mutation rate, 1/μ, corresponding to timescales of planetary development. Put another way, nucleotides with a very small selective advantage compared to the inverse population size evolve very slowly. The characteristic values of selective advantage for the most important mutations (in an evolutionary sense) are unknown. Still, the above considerations suggest the possibility that the evolution of higher organisms may be driven, mostly, by nucleotides with *s* larger than the inverse population size (provided that they are sufficiently frequent in the genome), i.e., within the selection-drift interval (Table 1). Then the rate of evolution does not have to be unreasonably low and the size of a population does not have to be unreasonably large. If this is the case, one can conclude that the two factors, random drift and selection, are equally important for the rate of evolution on very large timescales (millions of years) and that neither is a small correction. The reversion (fixation) half time within the interval 1/*s* < N < 1/μ is ∼1/(μ*Ns*).

Many Loci and Other AspectsThe distinction between deterministic and stochastic evolutionary genetic processes is critical—under a deterministic regime, the fate of a novel mutation or allele can be known with certainty; under a stochastic regime, we are able to characterize only the statistical properties of allele frequencies and fixation times. The neutral and the deterministic cases represent the ends of a spectrum, and precisely where a population sits depends most strongly, on its effective size. However, much of the theory relating effective population size and the stochastic-deterministic continuum had been worked out for simple models involving two (as in the present review) or, at most, a handful of alleles (37). These developments have allowed population geneticists to characterize the qualitative behavior of genes in populations undergoing a variety of dynamic processes, e.g., population size change, subdivision, and selection. Nonetheless, the assumptions and restrictions of these simple models typically preclude their use as descriptors of real populations, except in some particular cases (below).

In 1982, Kingman (39, 40) introduced a new way of studying the stochastic behavior of genes in a population. His framework—the coalescent—characterizes the genealogies of genes (or gene fragments), specifically the statistical distribution of times to common ancestors. It assumes the neutral model of evolution so that phylogenetic branching and accumulation of mutations are independent processes. Just as we estimated above (see “Decay of the polymorphic state and gene fixation”), the average time to the most recent common ancestor of a sample of genes is, within a numerical factor which depends on the sample size, the effective population size. The coalescent incorporates the same information as allele-based methods but in a form more relevant for today's evolutionary geneticist. This is largely because the raw data of molecular evolutionary studies in the last 15 years have been the molecular sequences, and there is an extensive and well-developed literature on molecular phylogenetic reconstruction (reference 65 and references therein). Studies on the coalescent have also shown that there is an increase in the power of parameter estimation as more independent loci are analyzed. Selection—for a long time the thorn in the side of the coalescent theorist—can, in principle, be accommodated within a coalescent-like framework. The recently developed inclusion of selection in a genealogical framework (42, 55) requires the construction of an ancestral selection graph, akin to a coalescent genealogy, which has the unusual property of coalescing and splitting as one moves back in time. The reader has to keep in mind, though, that a mathematical theory on a network (the case with selection) is technically much harder to handle (with or without computer simulation) than a theory made for the tree topology as the neutral coalescent. The tree theory, but not the network theory, can be reduced to one-dimensional chain of equations. (Similar issues arise in physics, in theoretical studies of hopping transport of electrons [63, 72].) Still, this recent achievement should stimulate the development of novel methods that work with selection as well.

In some cases, recombination or point mutations break linkage disequilibrium and make loci almost independent, so that the one-locus approximation applies directly. One such case is when strong recombination is present in the system and the variable loci are spread far apart (for HIV, the respective length is predicted to be around 100 nucleotides or longer, if superinfection protection is efficient [67]). If the recombination rate is low, Muller's ratchet (6, 13, 14, 17, 53) may operate. With Muller's ratchet, random drift can lead to the elimination of the fittest genomes from a population. Once a better-fit haplotype is accidentally lost from a recombination-free population, it cannot reappear, thus clicking the ratchet another notch. Successive “clicks” cause the population to become successively less fit, on average. Back mutations can, in principle, restore the disappearing better-fit haplotypes. For a long segment of genome of a replication-competent virus, they are expected to be much less likely than the forward mutations, since the frequency of deleterious substitutions is expected to be low. As follows from Monte Carlo simulation, which we hope to discuss elsewhere, back mutations can prevent Muller's ratchet only for sites with large selection coefficients, s = 0.1 to 0.2, and provided that the gene segment is short.

Another important consideration is that selective pressures on some parts of the genome must also have an effect elsewhere. Such “background” selection may explain the very small effective size of HIV obtained from the *env* gene diversity under the neutral approximation. Charlesworth and colleagues (7, 8, 57) have shown that if an unsequenced region of a genome is under selection and is linked to the region under study, depressed estimates of effective size are obtained. Potentially, then, selective pressures on*gag* or *pol* can influence diversity in*env*. However, linkage disequilibrium obviously has an effect beyond simply lowering the effective size, and it speaks to the issue of fitness at the unit of the individual virion. If there is linkage disequilibrium, does it really make sense to look for the effects of immune-driven selection only in *env* or *gag*, as many studies have done (3, 70, 83)? To what extent is the fitness advantage of mutations in *env*, for instance, balanced by the loss of fitness due to mutations in *pol*? Such “interaction” between loci in a small population may be caused by linkage alone and applies even to mutations that additively affect the fitness of a genome. To make things more complex, nonadditive compensatory mutations (epistasis) exist, due to actual biological interaction between loci, at both the nucleotide and protein levels. Moreover, in vesicular stomatitis virus systems, epistasis may be the factor counteracting the loss of fitness due to Muller's ratchet (16). Compensatory mutations, which become advantageous only after initial resistance mutations occur, have also observed in HIV-infected patients treated with protease inhibitors (11). The development of molecular techniques that allow full-length HIV genomes to be sequenced (69) means that it is only a matter of time before genetic data become available to study the evolutionary processes of linkage disequilibrium, background selection, and compensation in infected individuals.

ConclusionsWe have analyzed in depth a broad range of problems in evolutionary dynamics in the framework of a simple one-locus, two-allele population model, which includes three basic factors: random drift, point mutation, and selection. We found that (as long as the mutation rate is lower than the selection coefficient) the dynamic properties differ drastically in three wide intervals of the population size that we call the drift, selection-drift, and selection regimes. Transition between stochastic and deterministic behavior of genetic evolution occurs in the intermediate selection-drift regime, which is expected to be very wide in the population size, especially for DNA systems. In this regime, deterministic laws govern genetically highly polymorphic populations, and almost uniform populations evolve stochastically.

Estimates of typical population sizes and of the time in which new advantageous alleles appear and become fixed in the population suggest that higher organisms may evolve while in the selection-drift regime. If this is the case, the speed of evolution depends on three parameters: mutation rate, selective advantage, and population size. Hence, selection pressure and random drift, whose relative importance for evolution is often disputed in the literature, are equally important, although they act differently: selection promotes evolution, and random drift slows it down.

The theory provides recommendations for the size of the population in different bacteriological and virological experiments in vitro aimed at either comparing the fitness of different mutants or measuring the mutation rate. For HIV populations in vivo, theory based on the purifying selection alone predicts either a weak diversity or a very low genetic turnover rate. Experimental searches for rapidly varying bases can provide biological evidence for selection for diversity due to different environments, a changing immune response, changes in host cell populations with time, and other important aspects of HIV infection.

Naturally, with any research program that requires theory to be integrated with data, there is an inevitable tension between experimental biologists, who deal daily with the complexity of real biological systems, and theoretical biologists, who “simplify, simplify, simplify” in the name of tractability. In this work, our analysis has been limited to the simplest possible case: evolution of a single locus with only two alleles. Many important aspects of evolution, including the effects of multiple loci, recombination, coselection, and migration, were not considered. Nevertheless, in-depth consideration of this simple system has yielded a surprisingly rich set of results, which should be very useful for the design of experiments in evolution and for the interpretation of patterns of genetic variation in natural infection. In the future, however, we see greater reliance on the fusion of analytic and computational methods as a means of simulating the complexity of real populations. By tying these computer-intensive methods to well-characterized mathematical and statistical methods, one has the advantage of using standard inferential procedures without sacrificing too much in the way of realism. However, the old adage that one has to walk before one can run applies to population genetics as it does elsewhere, and understanding simple evolutionary models is perhaps the surest route to coming to grips with the complexity of virus evolution.

## MATHEMATICAL RESULTS AND DERIVATIONS

Description of the Model and the Evolution EquationIn this section, we will derive the diffusion-type differential equation and complement it with the boundary conditions. First, we derive the discrete Markovian equation for the virus population model; second, we reduce it to the continuous diffusion equation; and third, we determine the boundary conditions for the diffusion equation, in different intervals of the population size. In other sections, we will solve the appropriate set of equations and boundary conditions for each interval of *N* and different initial conditions. Table2 contains a list of the principal notation used in this section.

Main results.We show that the stochastic evolution of the virus population is described by one of two different sets of differential equations and boundary conditions, depending on the interval of the population size, *N*. A large population, μ*N* ≫ 1/ln *N*, usually has many copies of both the wild-type and mutant genomes. The corresponding evolution equation and the boundary conditions have the form (32)*f* is the mutant frequency, ρ is the probability density, *t* is the generation number (time), and*s* is the selection coefficient. Equations 1 to 3 are valid under the conditions *s* ≪ 1, μ ≪ 1, and μ*N*≫ 1/ln *N*. In this case, *t* and *f* can be treated (approximately) as continuous variables. Effects of the three terms in the right-hand side of the evolution equation given by equations 1 and 2 are illustrated in Fig. 2.

A useful analogy between the probability density and the density of a gas between two walls is discussed in the qualitative section of this review (Fig. 3). In this analogy, equation 1 expresses the fact that gas particles do not appear or disappear but only travel from one location to another. The quantity *q(f,t)* in equation 2 is the “probability flux,” analogous to the gas flux density defined as the net number of particles crossing a plane at *f* from left to right per unit area per unit time. Thus, the evolution equation written in the form of equation 1 expresses the fact that the probability density is a locally conserved quantity, just like a gas density. The boundary conditions in equation 3 state that the probability flux vanishes at the boundaries of the allowed interval in*f*, similar to gas particles being prohibited from crossing the confining walls (Fig. 3b).

In small populations, where N ≪ 1/[μ ln(1/μ)] (see below), the population can be found, with a finite probability, in a purely monomorphic state, of f = 0 or 1 (similar to condensation of gas at cold walls). In this interval, we break up the total probability density into a sum of the continuous probability density and of two singular terms, as given by*p*
_{0} and*p*
_{1} are the probabilities of having pure wild-type and pure mutant, respectively; δ(*f*) denotes the Dirac delta function; and *g*(*f,t*), where*f*(1 − *f*) ≫ 1/*N*, is the continuous part of the probability density. The boundary conditions are*f = *1*/N* or (N − 1)/N , respectively, can occur due to a mutation only. This pair of equations has to be derived from the first principles, i.e., the discrete virus population model (below). The differential equation for the continuous part of the probability density, *g(f,t)*, has a form*N*(equations 1 and 2), in that the term with μ in equation 2 is absent. The mutation rate enters the problem through the boundary conditions (equation 6) only.

One can easily obtain the upper bound for *N*, within which the above boundary conditions apply, from the boundary conditions themselves. The probability of a polymorphic state is given by ∫_{0}
^{1} *g(f)df*. As follows from equation 6, near the boundaries *g(f)* diverges and is given by *g(f)* ∼ 2μ*N p _{0}/f* and

*g(f)*∼ 2μ

*N p*

_{1}/ (1 − f) . The integral of

*g(f)*is mostly contributed from f ≃ 1 or 0 and is truncated at f(1 − f) ∼ 1/

*N*. The resulting probability of a polymorphic state is comparable to the probability of a monomorphic state, p

_{0}+ p

_{1}, at μ

*N*log

*N*≃ 1, as we stated above.

Note that the validity of these equations, as we discussed in the qualitative section of this review, is not restricted to the virus population model. The same diffusion equation (equation 1 and 2) applies to many other haploid one-locus, two-allele populations, which include the same three factors: random sampling of genomes, symmetric mutations, and purifying selection. Should some other factors come into consideration, such as allelic dominance in a diploid population or time fluctuations of selection coefficient or of other parameters, a more general equation of similar form can be written (see “stochastic equation of evolution” below) (33). In principle, the approach can be generalized for many-allele or multiple loci using partial derivatives in haplotype frequencies (37).

Now we proceed with derivations of all these formulas.

Virus population model.The model (as discussed in the qualitative section of this review) considers an asexual population of*N* cells infected with two genetic variants of a virus:*n* cells are infected with “mutant” virus, and*N* − *n* cells are infected with “wild-type” virus. The total population size *N* is fixed, while *n* changes in time. During a generation step, each mutant-infected cell produces *b*
_{1} mutant virions and then dies, and each wild-type-infected cell produces*b*
_{2} wild-type virions and dies (Fig. 1b). The respective numbers of virions per cell, *b*
_{1} and*b*
_{2}, are assumed to be large,*b*
_{1} ≫ 1, *b*
_{2} ≫ 1, and differ slightly for the two alleles*s*, *s* ≪ 1, is, by definition, the selection coefficient (mutation cost), reflecting the difference in fitness. From all the virions produced per generation, *N*virions are sampled randomly to infect new generation of cells. Each virion, on infecting a cell, can mutate into the opposite genetic variant with a probability μ, μ ≪ 1. The virus population model described is a particular case of the Wright-Fisher population with discrete time.

(i) Discrete Markovian equation.Let *p(n,t)* be the probability of*n* mutant cells at time *t*, where *t* is an integer that numbers generations and *n* can change from 0 through *N*. If consecutive generations do not overlap,*p(n,t)* is a Markovian process described by a discrete evolution equation*P(n‖n′)* is the conditional probability of having *n* mutants, given that their number at the previous step was *n*′. In this section, we derive *P(n‖n′)*for the virus population model introduced in the qualitative section of this review.

First, we obtain the conditional probability *P(n‖n′)* in equation 9, neglecting mutation events and denoting the result*P*
_{0}
*(n‖n′)*. Suppose that the number of mutants in some generation is *n*′. The total numbers of virions produced by all mutant- and all wild-type-infected cells are*b*
_{1}, *b*
_{2} >> 1 (above). If*n* is the number of new mutant-infected cells, then the numbers of mutant and wild-type virions which infect must be*n* and *N* − *n* virions, respectively. The probability of *n* new mutant cells,*P*
_{0}(n‖n′), is proportional to the number of possible ways in which one can choose *n* mutant virions from *B*
_{1} possible mutant virions and*N* − *n* wild-type virions from*B*
_{2} possible wild-type virions*A* is determined by the condition Σ_{n}
*P*
_{0}(*n‖n′*) = 1.

We now take mutations into consideration. Suppose, at the moment of infection of new cells by *n* mutant and *N* −*n* wild-type virions, *m*
_{1} forward and*m*
_{2} reverse mutations occur (Fig. 1b). The resulting number of mutant-infected cells, *n*′′, will be*n*′′ = *n* + *m*
_{1} −*m*
_{2}. The probability of *m*
_{2}reverse mutations among *n* infecting virions, if *n*is large, is given by Poisson statistics with the average μ*n*
*n* is not large, equation 12 still is valid for *m*
_{2} = 0 and 1, which are the only important values in this case, since we assume μ ≪ 1 everywhere in the present work.) Analogously, the probability of*m*
_{1} forward mutations is π(*m*
_{1}‖*N* − *n*). As a result, for the conditional probability*P*(*n*′′‖*n*′), we obtain*P*
_{0}(*n*‖*n*′) is given by equation 11 and the Kroneker symbol δ_{i,j} is 1 if i = j and 0 otherwise.

(ii) Diffusion equation limit.The discrete evolution equation given by equations 9, 11, and 13 may be suitable for computer simulation, but is very inconvenient for analytic treatment. In addition, it contains many model-dependent details which are not important at long timescales and in large populations and could be best disregarded. When both subpopulations are large (*n* ≫ 1 and*N* − *n* ≫ 1), equation 9 can be transformed to a differential form. In this case, the conditional probability,*P*(*n*‖*n*′), changes slowly in*n* and *n*′, ‖*P*(*n*‖*n*′) −*P*(*n* + 1‖*n*′)‖ *≪ P*(*n*‖*n*′), ‖*P*(*n*‖*n*′ + 1) −*P*(*n*‖*n*′)‖ ≪*P*(*n*‖*n*′), and can be approximated by a continuous function of *n* and *n*′. Substituting the asymptotic formula, *n*! ≃ (2π*n*)^{1/2}(*n/e*)^{n}, into equation 11, we find that*P*
_{0}(*n‖n′*) has a narrow maximum at*n′ ≃ n*. Rewriting*P*
_{0}(*n*‖*n′*) = exp [ln (*P*
_{0}(*n*‖*n′*))] and expanding the argument of the exponential in *n′* up to the second-order terms in *n′* − *n*, we obtain*n* − *n′*‖ ≪ min (*n′*, *N* − *n′*), which confirms that *P*
_{0}(*n*‖*n′*) can be considered a smooth function of its variables.

Note that the style of the last paragraph is the approximate derivation “with a large parameter,” standard for theoretical physics. In this case, large parameters are *n, N, 1/s*, and 1/μ. One makes a guess that the function *p*(*n‖n′*) is smooth and verifies its consistency later, after the result was obtained. To avoid a vicious circle, one checks alternative assumptions as well. A more rigorous way, usual in population biology, would be to evaluate probability *p*(*n, n′*) in the limit N → ∞, μ → 0, s → 0 while μ*N* and *sN* are constant. In our experience, the method we employ almost always leads to a correct result and is easier to use, especially when there exists an independent verification of the result.

We now obtain a simplified expression for the full conditional probability in equation 13 using the fact that mutations are rare (μ ≪ 1), so that the probable values of *m*
_{1} and*m*
_{2} are much smaller than those of*N* − *n* and *n*. We substitute*n"* − *m*
_{1} +*m*
_{2} for *n* in the arguments of both π functions and of the function *P*
_{0} in the right-hand side of equation 13 and expand these functions in*m*
_{1} − *m*
_{2}, up to the first-order terms. The resulting double sum in*m*
_{1} and *m*
_{2} can be evaluated exactly with the use of equation 12, which yields, within the same accuracy,*p(n, t)*, with one exception discussed below separately, is a smooth function of *n*, it will be more convenient, from now on, to consider the probability density *p(f,t)* of the mutant frequency *f* ≡*n/N*, normalized by the usual condition ∫*df ρ(f,t*) = 1.The two definitions are related, as given by ρ(*f,t*) = *Np(Nf,t)*. In terms of ρ(*f,t*), the evolution equation, given by equations 9, 14, and 15, can be written as*M*(*f*) and *V*(*f*), introduced in equations 18 and 19, have meanings of the expectation value and of the variance of change in*f* per unit time, respectively. One can present them in the general form

As shown below, the integral equation, (equations 16 and 17) can be transformed to the Kolmogorov forward equation

We will derive equation 22 from equations 16 and 17 in a general form, without specifying the model-dependent parameters *M* and*V* from equations 1 and 2. We will assume that*V*(*f*) ≪ 1 and that the higher momenta(̅f̅ ̅−̅ ̅f̅′̅)̅3̅
̅,(̅f̅ ̅−̅ ̅f̅′̅)̅4̅
̅, … of the conditional probablity Π_{ε}(*f*‖*f′*) are proportional to powers of ε higher than 1. These assumptions are valid for our model, as can be checked using equations 17 to 19 (see “Virus population model” above). The derivation that follows is well known.

Consider an arbitrary function, *A(f)*, localized in the interval 0 < *f* < 1 far from its ends, so that*A(f)* and its derivative vanish at *f* = 0 and*f* = 1. We also introduce the expectation value*A(f)*and integrating in *f*, we have_{ε}(*f*‖*f′*) in terms of*f* − *f′* is small, we are allowed to expand*A(f)* in the integrand in equation 24 in a series of*f* − *f′*. Evaluating the resulting integral over *f* and discarding terms of higher than first order in ε, we get*A(f′)* can be neglected due to the above assumption. Evaluating the integral over*f′* in equation 25 by parts and using the definition of the time derivative, we get*A(f′)* = δ(*f′* − *f*). Here the width of the “delta function” is assumed to be much larger than*V(f)* but much smaller than the characteristic values of*f* at which the density function ρ(*f*) changes noticeably.

Boundary conditions: properties of an almost monomorphic population.In a large population, *N* ≫ 1/μ, the boundary conditions have a form 3, which states that the probability density flux *q(f,t)*, defined as such by equation 1, must vanish at the boundaries, *f* = 0 and *f* = 1. This follows from the continuity conditions at the boundaries and from the understanding that monomorphic states, in which *f*is exactly 0 or exactly 1, are very unlikely to occur when the population is large due to a high mutation rate per population (*N*μ ≫ 1). As we show now, the flux does not vanish at the boundaries in smaller (but still very large) systems.

Suppose, first, that boundary conditions in equation 3 do apply. As follows from equations 2 and 3, the function ρ(*f,t*) diverges near the boundaries at μ*N* < 1/2. Indeed, solving the equation *q(f,t)* ≡ 0 near *f* = 0 and near *f* = 1, one obtains*C*
_{0} and*C*
_{1} are constants. Integrating the first of equations 27 from *f* = 0 to *f* ∼ 1/2 and the second from *f* ∼ 1/2 to *f* = 1, one finds that the region of *f*, 1 − *f*, such that ln (1/*f*) or ln [1/(1 − *f*)] ∼ 1/μ*N*, contributes most to the normalization integral. If the population is not too small, *N*μ ln *N* ≫ 1, these values of *f* correspond to many copies of a minority allele, *f*, 1 − *f* ≫ 1/*N*. Therefore, for the probability of monomorphic states we have*p*
_{0}, *p*
_{1} ≪ 1. The boundary conditions given by equation 3 apply. If, however, the population is small, *N*μ ln *N* ≪ 1, the most probable values of *f* are in the region *f*, 1 − *f* ≪ 1/*N*, corresponding to much less than one minority copy per entire population. The above result indicates that the population can be found, with a finite and even high (close to 1) probability, in a state in which *f* is exactly 0 or 1. To account for this fact, we separate singular terms in ρ(*f*), as given by equation 4, and obtain new boundary conditions. Since we now have two more time-dependent variables, *P*
_{0}and *p*
_{1}, unlike in the case with a large*N*, we will need four rather than two conditions at the boundaries. The first pair of equations (equations 5) describe the continuity condition at the boundaries. We now derive the second pair.

We return to the discrete probability notation, *p(n)*. It suffices to consider only one of the boundary regions in *n*, say, *n* ≪ *N*; the conditions for the other region, *N* − *n* ≪ 1, are analogous. Therefore, the probability has two components: a large value*p*(0,*t*) ≡ *p*
_{0}(*t*) and a relatively small part *p(n,t), n*≠ 0, which changes slowly with *n* at *n* ≫ 1. [Strictly speaking, *p(n,t)* is diverging as 1/*n*as *n* → 0 (equation 27). Divergence of the integral ∫_{n} *p(n,t)dn,* however, is only logarithmic at best, which is sufficiently slow for what follows.]

We start by simplifying equations 11 and 13 for*P*
_{0}(*n*‖*n′*) and*P*(*n*‖*n′*). Using the condition*n* ≪ *N*, equation 11 gives*n* ≪ *N*) allows one to neglect the reverse mutations, keeping only terms with*m*
_{2} = 0 in equation 13. Next, the condition of small μ*N* means that even a single forward mutation per generation per entire population is a rare event. Hence, one can discard in equation 13 all terms with *m*
_{1} ≥ 1, except the term with *m*
_{1} = 1 for particular values *n"* = 1, *n′* = 0. The latter term has to be kept since the transition between states with *n*′ = 0 and *n*" = 1 cannot occur by means other than a mutation. As a result, the expression for*P*(*n*‖*n′*) simplifies to*x,t*)*x* < 1, and φ(*x,t*) is a sum over the continuous part of*p(n,t)* for *n* ≠ 0 only. The evolution equation for ϕ(*x,t*), which can be obtained from equations9 and 28 to 30, has a form*n* in*p(n,t)* for *n* ≠ 0 are large (*n*≫ 1). Hence, as one can see from equation 30, the characteristic scale of *x* for function ϕ(*x,t*) is small (*x* ≪ 1). We expand the right-hand side of equation 31 to the second-order terms in small *x*, which yields*p*
_{0} + φ and made use of the strong inequalities *s* ≪ 1, μ*N* ≪ 1, and φ ≪*p*
_{0}. We notice that at *x* ≪ 1, the function φ(*x,t*) represents the Laplace transform_{x}, one can rewrite equation 32 in the form*q(n,t)* is the probability flux*q* in equation 7 at *f* ≡ *n/N* ≪ 1. It is important that the probability function, *p(n,t)*, together with its derivatives at *n* ≠ 0, is a nonsingular function of *n*, meaning that it does not contain a delta function or its derivatives. The singular part of the probability function is already separated in the term *p*
_{0}. As is well known, a Laplace transform ℒ_{x} of a nonsingular function cannot be constant and cannot increase with*x* in the limit of large *x*. Therefore, both bracketed terms in equation 34 must be zero, and we arrive at the boundary conditions at *f* → 0 (equations 5 and 6). Since the left-hand side of equation 34 turns out to be zero, the argument of the Laplace operator in braces is zero as well. This yields the differential evolution equation (equation 7) at *f* ≪ 1. The boundary conditions at another boundary, *f* → 1 (*n* → *N*), are obtained in the same manner.

Experiments on Evolution and Observable ParametersIn this section, we define rigorously the observable parameters introduced in the qualitative section of this review.

Let parameter *A(f)* be a deterministic function of the random mutant frequency *f*. The expectation value of *A,*, and the variance,

*V*, are defined by

_{A}*V*is equal to the square of the standard deviation of parameter

_{A}*A*.

The intrapopulation genetic distance*f* differ at a given nucleotide; it varies between 0 and 0.5 (Nei's nucleotide diversity measure). The expectation values of parameters *f*and *T* are given by equations 36 and 37, with *A(f)*= *f* and *A(f)* = 2*f*(1 −*f*), respectively. The variance *V _{f}
* and the average

*are related:*T ¯

*T*

_{12}, which is defined as the probability that a pair of genomes sampled from two different populations with mutant frequencies

*f*

_{1}and

*f*

_{2}differ at a given nucleotide, and the relative distance between populations,

*D*, as follows:

*= ( ̅f̅ ̅*D ¯

_{1}−

f ¯

_{2})

^{2}+

*V*

_{f1}+

*V*

_{f2}. Other definitions for the genetic distance between populations have been proposed in the literature (see reference 54 and references therein).

Consider now the genetic divergence experiment. Suppose that two populations have been isolated, at *t* = 0, from the same parental population, which was at steady state. New populations are then allowed to grow quickly to the original size, so that their composition does not change from the (random) composition of the parental population (*f* = *f*
_{0}). Our aim is to monitor how the average relative distance between population,
̅D̅
̅, evolves after the moment of split. The expectation value,
̅D̅
̅
*, *is given by_{ss}(*f*
_{0}) denotes the steady-state probability density and ρ(*f,t*‖*f*
_{0}) is the probability density, which satisfies the initial condition*V _{f}
*(

*t*‖

*f*

_{0}) is the variance of

*f*, defined by equations 36 and 37, with

*A(f)*=

*f*and under the initial condition of equation 43. The variance

*V*(

_{f}*t*‖

*f*

_{0}) varies from 0 at

*t*= 0 to its equilibrium value

*V*

*t*= ∞. Note that equation 44 reduces the relative distance between two population to the properties of one population.

Consider now a single steady-state population. Random variation of*f* in time can be characterized by the time correlation function*t* = 0 in equation 45 is arbitrary since the system is at steady state. The function *K(t)* varies from 1 at *t* = 0 to 0 at*t* = ∞. The characteristic timescale of the random fluctuations, *t*
_{trn} (the genetic turnover time) is defined by *K(t*
_{trn}) = 1/e. The correlation function *K(t)* can also be expressed in terms of the expectation value*
*(

*t*‖

*f*

_{0}), as given by

*(*f ¯

*t*‖

*f*

_{0}) is defined by equation 36 with

*A(f)*=

*f*and under the initial condition given by equation 43.

Steady StateIn this section, we derive the general expression for the probability density in the steady state and discuss in detail the crossover between stochastic and deterministic behavior in two cases: in the neutral case (*s* ≪ μ) and in the opposite limit (*s* ≫ μ).

General case.From equations 1 and 3, which apply at large population sizes, *N* μ >> 1/ln *N*, and the condition of steady state, ∂*ρ*/∂*t* = 0, we obtain*q(f)* given by equation 2. Separating the variables *f* and *p* in the obtained differential equation and integrating both sides, we get (80)

As discussed above in the section on boundary conditions, at small*N* such that *N*μ ≪ 1/ln *N*, the probability density has singular components, equation 4, and obeys equations 5 to 7. From the steady-state conditions, ∂*g*/∂*t* = 0 and d*p*
_{0}/dt = *dp*
_{1} = 0, we obtain, again equation 47, with *q(f)* given by equation 2with μ = 0, which yields*p*
_{pol} ≃ 1 is the total probability of having a polymorphic population.

At small μ*N*, both forms of probability density (equations48 and 49), have singularities at f = 0 and f = 1 , although the singularities are of different kinds. The two forms happen to be, to some extent, inter-changeable in the entire interval *N* ≪ 1/μ. For example, equations48 and 49 can be shown to have, within relative error of ∼μ*N*, the same lower momenta of the density function: the expectation value, variance, etc. The form of equation 49, although less compact, is generally more convenient to use at *N* ≪ 1/μ. The form of equation 48, on the other hand, applies at μ*N* > 1 as well and is suitable for studies of transition to the deterministic limit.

Neutral case: *s* ≪ μ.We start from a simple case, *s* ≪ μ, when mutations can be considered neutral. We will use the form of equation 48 for the probability density, since we want to describe both small and large populations. Setting s = 0 in equation 48 and normalizing the resulting expression, we get (80)*x*) is the Euler gamma function, Γ(*x*) = ∫_{0}
^{∞} *dt t ^{x−1}e^{−t}
*, and we used the identity (1)

*N*is shown in Fig. 4.

The expectation values and variances of mutant frequency, *f*, and intrapatient distance, *T*, which can be obtained from equation 52 and the definitions in equations 36 to 38 are as follows:*f* in equations36 and 37, we used the identities in equation 53 and Γ (x + 1) = *x*Γ(*x*) (1). For small populations (μ*N* ≪ 1), equations 54 yield well-known results of the neutral theory:*V*
^{1/2}/
̅f̅
̅, is on the order of 1 at μ*N* ≲ 1 and small in the deterministic limit, μ*N* ≫ 1 (Fig. 6b).

Case with selection: μ ≪ *s*≪ 1.As in the neutral case considered above, the probability density ρ(*f*) shrinks as *N*increases. However, selection, accounted for by the factor exp(-2*Nsf*) in the right-hand side of equations 48 and 50, causes asymmetry of ρ(*f*). Another important difference is the appearance of an additional asymptotic interval in *N*(selection-drift regime in Table 1).

For the smallest populations, *N* ≪ 1/s (drift regime), we neglect *s* in equation 48 and arrive at the results obtained above for the neutral case. In the opposite limit, *N* ≫ 1/μ (the selection regime in Table 1), the probability density (equation 48) has a sharp maximum near *f* = μ/*s*. Expanding ln *ρ*
_{ss} near its maximum, one obtains a Gaussian curve (29)*s*, is the deterministic steady-state value of the mutant frequency (see below). Expectation values and variances of *f* and *T*(equations 36 to 38) are given by*s*≪ *N* ≪ 1/μ), the probability density is not narrow, since equations 58 yield *V _{f}
* ≫
̅f̅
̅. At the same time, one is not allowed to neglect selection by putting s = 0 in equation 48 or equations49 and 50. In this interval, we will analyze

*ρ*

_{ss}(

*f*) using the form given by equations 49 and 50, which can be shown to give asymptotically correct lower momenta even at 1/ln (1/μ) ≪

*N*μ ≪ 1. The function ρ

_{ss}(

*f*) has three components, as shown in Fig. 5: a large peak at f = 0 , a tiny peak at f = 1 , and a continuous exponential tail at f ∼ 1 /

*Ns*, which describes the density of polymorphic states. The total probability of polymorhism (equation 51) is given by a small value,

*p*

_{pol}≃ 2μ

*N*ln(1/

*s*). To obtain momenta for

*f*and

*T*, we substitute equation 49 into equations 36 and 37and note that integrals of

*g*

_{ss}(

*f*) are mostly contributed by small f ∼ 1 /

*Ns*. As a result, we obtain

*N*∼ 1/

*s*, the above four values match, to an order of magnitude, the corresponding neutral values in equations 55 and 56. At

*N*∼ (1/

*s*) ln(

*s*/μ), they asymptotically match equations 58 derived above in the deterministic limit,

*N*≫ 1/μ. Remarkably, in most of the selection-drift interval, (1/

*s*)ln(

*s*/μ) ≪

*N*≪ 1/μ, the average and variance of

*f*and

*T*, happen to coincide with their respective deterministic formulas, even though the relative standard deviations,

*V*‖ ̅f̅ ̅ and

_{f}*V*‖ ̅T̅ ̅are large, as they should be, given the shape of ρ(

_{T}*f*) (Fig. 6b).

Deterministic Dynamics and Its BoundariesAs we have shown above, the steady state is asymptotically deterministic in the limit *N* ≫ 1/μ. The purpose of this section is to consider a more general, time-dependent case. We derive the deterministic evolution equation in two independent ways, directly from deterministic first-principles, and from the stochastic equation in the limit *N* → ∞; we solve it for an arbitrary initial condition and thus obtain the boundaries of the deterministic approximation.

Main results and discussion.In the deterministic limit, *N* → ∞, the time-dependent probability density is given by*f _{d}
*(

*t*) (Fig.7). This agrees with the meaning of

*M(f*), defined by equation 18, as the average change in

*f*per generation (equation 20). Since the random factor, in this limit, is absent, the actual change in

*f*naturally coincides with the average change. The first term in the right-hand side of equation 61 describes selection for or against the minority allele and vanishes in a uniform population,

*f*= 0 or 1. The second term in equation 61, describing mutations, does not vanish at

_{d}*f*= 0 or 1 since mutations occur even in a uniform population. Instead, the term vanishes at

_{d}*f*= 1/2, when the effects of forward and reverse mutations (with equal rates) cancel each other.

_{d}The solution of equation 61 in two asymptotic cases, the neutral case and the opposite case with selection, has the form*f*
_{ss} = μ/*s* (23, 24).

In the following subsection, we derive the deterministic equation 61from the stochastic equations 1 and 2. Equation 61 can also be obtained directly from deterministic first principles. The initial set of equations appropriate for the virus population model (see “Description of the model and the evolution equation”) has the form*n*
_{1} and *n*
_{2} are the numbers of mutant- and wild-type-infected cells, respectively, κ is the replication constant for the wild type, and ω^{−1}is the average life span of an infected cell. To match the virus population model where generations of infected cells change at discrete moments in steps of Δ*t* = 1, we choose ω = 1. Using the condition *n*
_{1} +*n*
_{2} = *N* = *const*and the notation *f* =*n*
_{1}/*N*, equations 63 and 64 are replaced by a single equation, equation 61.

Note that the condition *N* = *const* requires that κ depends on *f*, as given by κ − 1 = κ(*sf* − μ), which explains why the resulting equation 61 is nonlinear. The reason behind this is that κ must depend on some hidden “fast” variable which adjusts quickly to relatively slow changes in *f* to keep *N* constant. An example of such a variable is the number of available target cells, which may be stable due to a balance between replenishment and killing by virus. If *f(t*), for example, increases slowly, the average replication rate will decrease, causing, due to decreased killing, an increase in the number of target cells, which will compensate for the initial decrease in replication rate and keep*N* constant. There are examples of biological systems in which such a feedback mechanism is absent and the condition*N* = *const* does not apply; we do not consider them in this work.

At large but finite *N*, the probability density has a finite width, *w*(*t*), due to random drift, which is calculated in the subsection below on boundaries of deterministic approximation. The deterministic approximation remains adequate as long as the ratio*w*(*t*)/*f _{d}
*(

*t*)[1 −

*f*(

_{d}*t*)] remains much less than 1. In the neutral limit, μ ≫

*s*, the deterministic boundary in

*N*is the same as in the steady state,

*N*≫ 1/μ. In the presence of selection, μ ≪

*s*, the deterministic boundary depends, in general, on the initial condition set in the experiment. We will list results for the first three experiments in the section on experiments on evolution and observable parameters (see above), assuming that the initial value of

*f*is known exactly, so that

*w*(0) = 0:

*f*≡

*f*(

_{d}*t*). We observe that in the first two experiments, when the initial population is monomorphic, the deterministic criterion is the same as in the steady state,

*N*≫ 1/μ. For the growth competition exeperiment, the criterion on

*N*is much softer,

*N*≫ 1/

*s*, as long as

*f*remains larger than its characteristic value in the steady state,

*f*≫ 1/

*Ns*(cf. Fig. 5).

Deterministic dynamics.In the limit of large population numbers, *N* → ∞, the term on the right-hand side of equation 2 corresponding to random drift is relatively small. Consequently, the density function must be narrow in *f*. We start by rewriting the master equation 1 and 2 in the equivalent form*M*′(*f*) ≡ *dM/df* and*M*(*f*) is defined by equation 18. Both terms on the right-hand side of equation 66 are relatively small: the first term is proportional to 1/*N*, and the second term is much smaller than the second term on the left-hand side since ρ(*f*) changes much more rapidly in *f* than*M*(*f*) does. In the zero approximation in 1/*N*, one can substitute 0 for the right-hand side. The resulting equation has a delta function for its partial solution, as given by equations 60 and 61. [The general solution for ρ(*f, t*) is any linear combination of different solutions of the form of equation 60, as determined by the initial condition ρ(*f,t*) ≡ ρ_{0}(*f*). If the initial mutant frequency is set by experiment and/or known exactly, ρ(*f,t*) is a single delta function at all times.] In the following section, we solve equation 66 in the next approximation in 1/*N*.

A solution, *f*(*t*), of the deterministic equation61 can be obtained in a general form. We rewrite equation 61 as*f _{ss}
* and

*f*

_{∗}, respectively (here and in the rest of this section, we omit the subscript in

*f*). The values of the parameters

_{d}*f*and

_{ss}*f*

_{∗}are restricted to the intervals 0 <

*f*< 1/2 and

_{ss}*f*

_{∗}> 1.

*f*represents the mutant frequency in the steady state;

_{ss}*f*

_{∗}has no particular meaning. As one can check,

*f*

_{ss}matches its asymptotic values of 1/2 and μ/

*s*in the respective limits μ ≪

*s*and μ ≪

*s*(see “Steady state” above). Separating the variables in equation 67 and integrating, one obtains

*f*

_{0}≡

*f*(0). From equation 69,

*f*(∞) =

*f*. Note that the function

_{ss}*f*(

*t*) is monotonous, so that it never crosses

*f*. Asymptotics of equation 69 in the two cases (the neutral case and the opposite case with selection) are listed in equations 63 and 64. The characteristic time it takes to reach steady state is given, in the two respective limits in equation62, by 1/μ and 1/

_{ss}*s*. Plots of

*f*(

*t*) for μ ≪

*s*for three particular initial conditions,

*f*

_{0}= 0, 1/2, and 1, which correspond to accumulation, reversion, and growth competition experiments (see “Experiments on evolution and observable parameters” in the qualitative section of this reviews), are shown schematically in Fig.8. The approximate expression for

*f*(

*t*) in the accumulation experiment is especially simple,

Boundaries of deterministic approximation.To establish conditions under which the deterministic description applies, one has to find the finite width of the probability density at finite*N*. For this, we use the perturbation method to solve equation 66 in the next approximation in 1/*N*. We will seek a solution in the automodel form*F*(*u*) is some normalized function, ∫*du F(u)* = 1, which does not depend on time explicitly and whose width in *u* is on the order of 1. We assume the width of the probability density *w* to be much less than *f _{d}
*. Later, we will obtain the interval of

*N*in which this assumption actually holds. Since we are interested in the region of

*f*such that ‖

*f*−

*f*‖ ∼

_{d}*w*, we can replace

*M*(

*f*) on the left-hand side of equation 66 by its linear expansion in

*f − f*. On the right-hand side, which is already small in 1/

_{d}*N*, we retain only the largest terms replacing

*f*(1 −

*f*) →

*f*(1 −

_{d}*f*),

_{d}*M(f)*→

*M(f*. Substituting equation71 into equation 66, we obtain

_{d})*f*and

_{d}*w*depend on time,

*t*, and primes denote the derivatives in the corresponding arguments (shown in parentheses). We observe that the left-hand side of this expression and the bracketed terms on the right-hand side depend only on

*t*while the factors multiplying the brackets are functions only of

*u*. Since

*u*and

*t*are independent variables, equation 72 can be satisfied only if the ratio of bracketed terms is a constant (which we denote λ) and the left-hand side is identically zero. As a result, we arrive at three separate differential equations: equation 61 for

*f*(

_{d}*t*) and the equation for

*F(u)*and

*w(t)*:

*u*→ λ

^{1/2}

*u*and

*w*→ λ

^{−1/2}

*w*, which does not change the probability density (equation 71). The normalized solution of equation 73 is a Gaussian:

*f*in equation 71, which does not change the resulting probability density.

_{d}To find *w(t)*, equation 74 can be reduced to two equations with separating variables, substituting *w(t)* =*y(t)φ(t)*, where *y* meets the equation*t*) and equation76, we obtain a general solution of the form:*f* ≡*f _{d}
*(

*t*). The integrals in

*t*in this expression can be rewritten as integrals in

*f*by using equation 61. As a result, the width

*w*can be expressed in terms of the deterministic value

*f(t)*and of the initial width,

*w*(0), as given by

*f*(

*t*) never crosses its steady-state level (see the previous subsection).

We now calculate the ratio *w/*[*f*(1 −*f*)] for a few cases of interest. The deterministic criterion requires that this ratio be less than 1. Let us start from the steady state. Since the integral in equation 78 diverges at*f* = *f*
_{ss}, one has to consider the upper limit of the integral, *f*, to be close to*f*
_{ss}, then expand *M*(*f*) ≃ *M′*(*f*
_{ss})(*f* −*f*
_{ss}), and then evaluate the limit *f*→ *f*
_{ss}. This yields*f*
_{ss} is, of course, different in the two cases. The deterministic criterion is met when *N*≫ 1/μ (see “steady state” above). At μ ≫ *s*, as one can show from equation 78, the same condition on *N*applies even far from steady state. At *s* ≫ μ and far from steady state, the condition on *N* depends on both*f*
_{0} and *f(t)* (equation 65).

Stochastic Dynamics: the Drift RegimeThe problems of interest are the decay of a polymorphic state and gene fixation, transition from a monomorphic state to the steady state, divergence of separated populations, and the rate of genetic turnover in the steady state.

Main results and discussion.As in the steady state, selection can be neglected if *N* ≪ min(1/*s*, 1/μ) (see “Steady state” above). In most of this interval, mutations enter only in the boundary conditions and are negligible in the polymorphic state. Dynamic experiments in this regime exhibit two main timescales: the shorter scale, at which mutation can be neglected,*t* ∼ *N*, and a much longer scale, associated with mutations, *t* ∼ 1/μ.

Consider a polymorphic population with an initial value of *f*close neither to 0 nor to 1 and focus on the shorter timescale. As discussed in the qualitative section of this review, *f(t)*drifts randomly until the population, at some point, hits a monomorphic state (Fig. 9b). In terms of the probability density,*g*(*f,t*) (equation 4) spreads from the point*f* = *f*
_{0} onto the whole interval of *f* and then decays, as a whole, in time (Fig. 9a) as given by the following equations (32, 78):*f* − *f*
_{0} ∼ (t*N*), the same as in a gas diffusion process. At*t* ≫ *N*, the probability is being “absorbed” by two monomorphic states, as gas is “absorbed” by two very cold walls. If, however, the initial polymorphism is very small (*f*
_{0} ≪ 1), the manner of spread of*g*(*f*) differs from the classical diffusion law:*A* ∼ 1.

Choosing *f*
_{0} = 1/*N* and using equation 83, one can estimate the probability that a single mutant introduced into a population will ever grow to frequency *f*before becoming extinct and the average time of such successful growth, as given by*G*(1/*N*) ∼ 1, since a single allele was present to start with. The gene fixation probability is *G*(1) ∼ 1/*N*, with the corresponding time *t _{G}(f)* ∼

*N*(34).

Transition from a monomorphic (e.g., *f* = 0) to steady state occurs in two stages, as shown in Fig. 10a. At the first stage,*t* ∼ *N*, a thin tail of the density*g*(*f,t*) spreads into the interval 0 <*f* < 1 while the probability *p*
_{0}remains close to 1. (This means that some rare populations acquire an admixture of mutants.) At the second stage, *t* ∼ 1/μ, the probability *p*
_{0}(*t*) drops slowly, and *p*
_{1}(*t*) slowly increases, both approaching 1/2:*f*, given by*
*(

*t*) (equation 88) is the same as in the deterministic limit (equation 62). It also noteworthy that the average intrapatient distance ̅T̅ ̅, and its variance

*V*as found from equation 86, do not depend on

_{T}*t*at

*t*≫

*N*. They approach their steady-state values (equation 56) much earlier than the two monomorphic states become equally probable.

The two timescales in accumulation and steady state can also be obtained from the results for the gene fixation experiment (equation84). A mutant genome appears in the population with the small probability μ*N* per generation. It gets fixed with the probability *G*(1) ∼ 1/*N*. Hence, an average time between the switch from pure wild type to pure mutant and back is on the order of *N*/(μ*N*) = 1/μ. The time taken by a separate switch is much shorter and is the same as the fixation time, *t _{G}
*(1) ∼

*N*(equation 84) (compare the simulation in Fig. 10a).

The longer timescale, 1/μ, also appears in the divergence of separated populations and the genetic turnover experiments (see “Experiments on evolution and observable parameters” in the qualitative section of this review. After two populations are isolated, at *t* = 0, from the same population, the relative genetic distance between them (equation 44) has a form

Decay of the polymorphic state and gene fixation.At the smallest *N* ≪ 1/μ and 1/*s*, we use the formalism in equations 4 to 7. Neglecting selection, equation 7 becomes*p*
_{0}(0), *p*
_{1}(0), and*g*(*f*, 0).

In this subsection, we consider an initial polymorphic population with a known mutant frequency*f*
_{0} ≠ 0 or 1. As shown in the next subsection, mutations (which enter the problem via equation 6) become important at much longer timescales than those involved in random drift. Hence, we can set μ = 0 in equation 6 implying that *g*(*f,t*) does not diverge at the boundaries [or diverges more slowly than 1/*f* and 1/(1 −*f*)]. The general solution of equation 92,*g*(*f,t*), can be written as a sum over its eigenfunctions *h*
_{i}(*f*) (32):_{i} corresponding to nondivergent solutions of equation 95 and the eigenfunctions*h*
_{i}(*f*) are given by (1)*C*
_{i}
^{(3/2)}(*x*) are Gegenbauer polynomials (1). The set of functions {*h*
_{i}} is orthogonal and normalized, as given by ∫*df f*(1 −*f*)*h _{i}
*(

*f*)

*h*(

_{j}*f*) = δ

_{ij}. Below, we evaluate

*g*(

*f*, 0) in asymptotic limits in time, for two cases: strong and weak initial polymorphism.

(i) Decay of strong polymorphism.Suppose that the initial population is strongly polymorphic,*f*
_{0} ∼ 1 −*f*
_{0} ∼ 1 in equation 93. For an initial period, the density *g*(*f,t*) is localized in a small region of *f* near *f* =*f*
_{0}. The factor *f*(1 −*f*) in equation 92 can be then approximated by a constant. It is easier to solve the resulting simplified equation directly rather than using the general solution in equation 94. Since the density is localized far from the boundaries, it is expected to have an automodel form. Substuting *g*(*f,t*) =*A*(*t*)F[B(*t*)(*f* −*f*
_{0})] into the approximate equation 92, one arrives at equation 81. The solution applies while *t* ≪*N*, when the density peak remains narrow.

In the opposite limit, *t* ≫ *N*, the sum in equation 94 can be approximated, with an exponentially small error, by its lowest term with *i* = 0. Finding λ_{0}and *h*
_{0}(*f*) from equations 97 and*a*
_{0} from equations 93 and 96, we arrive at equation 82 (32, 78).

(ii) Gene fixation.We consider now a weak initial polymorphism, *f*
_{0} ≪ 1 in equation 93. For example, *f*
_{0} = 1/*N* corresponds to a single new genome introduced into a monomorphic population. We estimate the probability, *G*(*f*), of having the new subpopulation grow to frequency *f* before it becomes extinct and the average time of such growth,*t _{G}
*(

*f*), if this event happens. This problem has received much attention in the literature (19, 24,34, 38, 78). A treatment based on the backward Kolmogorov equation (34) treats the final mutant frequency,

*f*, as a constant and the initial mutant frequency,

*f*(0), as a variable. We present here a semiquantitative derivation based on the forward Kolmogorov equation. In any approach, since the range of values

*f*∼ 1/

*N*is involved, one can estimate

*G*only up to a numerical constant depending on the finer details of a population model (a fact not emphasized in reference 34).

At large *t* ≫ *N*, the probability density,*g*(*f,t*), still decays as given by equation 82. However, most of the decay occurs much earlier. At *t* such that 1 ≪ *t* ≪ *N*, the density*g*(*f,t*) is localized at *f* ≪ 1. Unlike in the case of strong initial polymorphism, only the factor 1 −*f* in equation 92 can be replaced by a constant, 1: the factor *f* has to be preserved as a variable. The new automodel solution for equation 92 has the form of equation 83. Using equation 83, for the total probability of polymorphism we obtain.*t* ∼ 1 will most probably become extinct after a few generations. The normalization coefficient *A* is estimated from the condition *p*
_{pol}(t) = 1 at*t* ∼ 1, which yields *A* ∼ 1. This is an estimate since the continuous approach ceases to apply at*t* ∼ 1 and *f* ∼ 1/*N*. The probability *G*(*f,t*) that the frequency of new allele will exceed *f* at time *t* is given by*G*(*f,t*) has its maximum at *t* = 2*Nf*. The height and position of this maximum yield the desired estimates for the probability of growth*G*(*f*) and the average time of growth*t _{G}
*(

*f*) (equations 84).

Transition from a monomorphic to a steady state.We consider now the accumulation and reversion experiment (see “Steady state” above). Since the two alleles are symmetric when*s* = 0, it suffices to consider only one of these experiments. Let the population consist initially of the wild type only. This corresponds to initial conditions

At short times *t, g(f, t)* is localized near the left boundary, *f* ≪ 1, and, as verified below, we have*p*
_{0} ≃ 1, *p*
_{1}≃ 0. Using these facts, equations 6 and 92 can be simplified:*t* ∼*N*, the probability density spreads over the whole interval of *f*. The total change in *p*
_{0} can be estimated by integrating the first of equations 5 over the interval between *t* ∼ 1 and *t* ∼ *N*, which yields 1 − *p*
_{0} ∼ μ*N* ln *N* ≪ 1. This confirms our initial guess that *p*
_{0} remains close to 1 in this interval of time. The average mutant frequency, *f*, and polymorphism,*T* (equations 36 and 38), are given by*
*(

*t*) ≃ μ

*t*,

*(*T ¯

*t*) ≃ 2μ

*t*.

At long times, *t* ≫ *N*, the probability*p*
_{0} is slowly decaying and*p*
_{1} is slowly increasing. Since the characteristic diffusion times of *g*(*f,t*) are as short as *t* ∼ *N* (above), the density*g*(*f,t*) is at local equilibrium at any moment of time, statically following the slow changes in*p*
_{0}(*t*) and*p*
_{1}(t). Hence, we can put ∂*g*/∂*t* = 0 in equation 92 which yields*f*(1 − *f*) *g*(*f,t*) =*C*
_{1}(*t*) +*C*
_{2}(*t*)*f*. Finding parameters*C*
_{1}(*t*) and*C*
_{2}(*t*) from equations 5 and 6, we arrive at equations 86 and 87 for *g*(*f*) and*p*
_{0,1}.

Divergence of separated populations and the time correlation function.Suppose that a steady-state system is split in two populations at *t* = 0, which grow quickly to the initial size. The average relative distance between the two populations, *
*, (equation 44) is expressed via the conditional variance

*V*(

_{f}*t*‖

*f*

_{0}) for arbitrary value of

*f*

_{0}. In the drift regime, the task of finding

*is greatly simplified because the system is almost always either purely mutant or purely wild type, so that*D ¯

_{ss}(

*f*

_{0}). (Note that this approximation cannot be used to calculate the average polymorphism

*: it would yield*T ¯

*= 0. For*T ¯

*and*f ¯

*V*, however, the accuracy is sufficient.) Hence, we need to know the variance

_{f}*V*(

_{f}*t*‖

*f*

_{0}) at only two initial values,

*f*

_{0}= 0 and

*f*

_{0}= 1. The value of

*V*(

_{f}*t*‖0), was already obtained, equation 89, and from symmetry between the two alleles, we have

*V*(

_{f}*t*‖1) ≡

*V*(

_{f}*t*‖0). Using equations 44 and102, we arrive at equation 90.

The time correlation function, *K*(*t*), characterizing the speed of genetic turnover in a single steady-state population can be expressed in terms of the conditional expectation value *
*(

*t*‖

*f*

_{0}), as given by equation 46. Evaluation of

*K(t)*is similar to evaluation of the relative distance. The value of

*f*

_{0}which mostly contributes to the right-hand side of equation 46 is

*f*

_{0}= 1. We have

*(*f ¯

*t*‖1) = 1 −

*(*f ¯

*t*‖0) from symmetry, where

*(*f ¯

*t*‖0) is already known from equation 88. Substituting

*V*

Main results and discussion.In this section, we consider the interval of *N*, i.e., 1/s ≪ *N* ≪ 1/μ. The accumulation experiment consists of sprouting a weak probability density tail into the interval 0 < *f*< 1 from the main peak, δ(*f*) (Fig. 5). The relevant scales for *f* and *t* are easy to estimate from the results on gene fixation (equation 84). Consider a typical stochastic process *f(t)*, like one shown in Fig. 11. A single genome appears and grows, drifting randomly, to a frequency *f*∼ 1/*Ns* (Fig. 5). Further growth is efficiently prohibited by selection. The timescale of growth is*t _{G}
*(1/

*Ns*) = 1/

*s*(equation 84). Furthermore, mutant alleles are generated in the population with frequency μ

*N*. The probability of successful growth to

*f*∼ 1/

*Ns*is

*G*(

*f*

_{s}) ∼ s ≪ 1 (equation 84). The probability of having a polymorphic state is ∼μ

*N*(see “Steady state” above). Hence, the average time interval between such events (i.e., between high peaks in Fig. 11) is 1/μ

*Ns*. The exact expressions for the average frequency,

*, and its variance,*f ¯

*V*, are

_{f}*t*→ ∞, the two parameters cross over to the steady-state values obtained in equation 59. The average intrapatient distance and its variance are given by

*≃ 2 ̅f̅ ̅ and*T ¯

*V*≃ 4

_{T}*V*.

_{f}Note that the expectation value of the frequency*
*(

*t*‖0) in equation 103 coincides with the deterministic value in equation 70, although fluctuations of

*f*are very large. The same curious result can be demonstrated for any population model in which the function

*M*(

*f*) in Fokker-Planck equation 22 is linear in

*f*. In the virus population model, as in many other models, the linearity condition is met asymptotically in a nearly momomorphous state,

*f*≪ 1 or 1 −

*f*≪ 1, which is the case in the accumulation experiment. One can imagine systems, such as diploid populations with a very strong allelic dominance (34,38), in which

*M(f)*is not linear even at

*f*→ 0 and

*(*f ¯

*t*) does not equal the deterministic value at small

*N*.

The relative genetic distance between two populations split from one at*t* = 0 and the time correlation function (see below) are given by*s*, is much shorter than the timescale in the adjacent drift regime, 1/μ (see “Stochastic dynamics: the drift regime” above). Crossover between these two values occurs smoothly in the interval 1/*s* ≪*N* ≪ (1/*s*) ln (*s*/μ) (see “Steady state” above), as controlled by the second peak of the probability density at *f* = 1 (Fig. 5), which is exponentially small at *N* ≫ (1/*s*) ln (*s*/μ).

The reversion experiment (see below) exhibits a transition from the uniform mutant, *f*
_{0} = 1, to an almost wild-type population, *f* ∼ 1/*Ns* ≪ 1. The evolution of the probability density ρ(*f,t*) occurs in two stages and involves two different timescales, *t* ∼ 1/*s* and *t* ∼ 1/μ*Ns* (i.e., the same two scales as in the accumulation experiment). The first, short time is that in which a rare population becomes polymorphic. In terms of the probability density, this corresponds to a thin tail of ρ(*f,t*) spreading into the interval 0 <*f* < 1. The second, longer time is that on which a typical population switches to the wild type, i.e., the probability of the purely mutant state, *p*
_{1}(*t*), decays. At the second stage, different components of the probability density evolve, as shown in Fig. 12 and given by*N* (Fig. 14). The expectation values and variances of parameters *f,T* are given by*p*
_{1}(*t*) is given by equation107. Note that equations 109 have a small relative error. As a result, the three parameters do not vanish at *t* → ∞, as would follow from equations 109, but cross over to small steady state values given by equations 59 (Fig. 5). Note also that although the initial state is a pure mutant, we have a finite genetic distance,*
* (0), in equations 109. This is because the average polymorphism,

*(*T ¯

*t*), is already saturated at

*t*∼ 1/

*s*.

The most important result is that the average waiting-for-reversion time, 1/μ*Ns* (51), is longer than in the deterministic regime (Fig. 13). Dependence of the reversion time on*N* in all three intervals of *N* is shown schematically in Fig. 16.

Accumulation.For the sake of simplicity, we consider an interval in *N*, somewhat narrower in the log sense than the selection-drift interval: (1/*s*) ln (*s*/μ) ≪ *N* ≪ 1/[μ ln (1/*s*]. Then we can (i) use the more convenient formalism in equations 4 to 7 and (ii) neglect the second density peak at *f* = 1 (Fig. 5), not considering crossover to the drift regime in the interval 1/*s* ≪*N* ≪ (1/*s*) ln(*s*/μ) (see “Steady state” above).

We seek the probability density in the form*g(f*,0) ≡ 0 and *p*
_{0}(*t*) = 1. The density*g(f,t*) remains localized at small *f* and crosses over with time to the steady-state density,*g*(*f*,∞) = (2μ*N/f*)*e*
^{−2Nsf} (equation50). This process is described by the dynamic equation and the boundary condition*f* ≪ 1 and *p*
_{0} ≃ 1.

At short times, *t* ≪ 1/*s*, we have *f*≪ 1/*(Ns)*, and the selection term in equation 111 is negligible. Hence, we can use *g(f,t)* for the drift regime (equation 85). In principle, one could also solve equation 111 at any*t*, using an expansion over eigenfunctions which are related to the Laguerre polynomials (1). The lower momenta of ρ, however, can be more easily obtained without finding ρ(*f*) explicitly. Multiplying both sides of equation 111 first by*f* and second by *f*
^{2} and integrating both sides over *f*, we get*f*(0) = *V _{f}
*(0) = 0, we arrive at equations 103 and 104 for the expectation value,
̅f̅
̅(

*t*), and variance,

*V*(0) ≃ ̅f̅ ̅2̅ ̅.

_{f}Divergence of separated populations and the time correlation function.The time dependence of the average relative distance, *
*(

*t*), between two populations isolated at

*t*= 0 from a single population is given by general equation 44, in which

*V*(

_{f}*t*‖

*f*

_{0}) is defined by equations 36 and 37 with

*A*(

*f*) ≡

*f*and the initial condition ρ(

*f*,0) = δ(

*f*−

*f*

_{0}). As is clear from the structure of the density function (equation 110), the main contribution to the integral in

*f*

_{0}(equation 44) comes from

*f*

_{0}= 0. Hence, using equation 104, we obtain equation 105.

The time correlation function, *K*(*t*), as follows from equation 46, is contributed only from the polymorphic initial states, *f*
_{0} ≠ 0 or 1. Substituting equations 50 and 103 and the variance, *V*

Reversion (fixation of advantageous variant).We start from equations 4 to 7 and the initial conditions*g*(*f*, 0) ≡ 0,*p*
_{0}(0) = 0, and*p*
_{1}(0) = 1. We consider only the second, most interesting stage of evolution. On this timescale,*t* ∼ 1/(μ*Ns*), the probability*p*
_{1}(*t*) decays from 1 to almost 0 and*p*
_{0}(*t*) increases from 0 to almost 1. As we did with the drift regime (see above), we use the fact that the equilibration of a polymorphic state, *f* ∼ 1, does not involve the mutation rate and is relatively fast. Therefore,*g*(*f,t*) follows quasistatically the change in *p*
_{0}(*t*) and*p*
_{1}(*t*). Setting ∂*g*/∂*t* = 0 in equation 7 and solving the resulting equation, we get*q*(*t*) and*A*(*t*) change slowly in time. Substituting equation115 into the boundary conditions given by equations 5 and 6 and solving the resulting system of equations with respect to*q*(*t*), *A*(*t*),*p*
_{0}(*t*), and*p*
_{1}(*t*), we arrive at equations 107 and108 and curves shown in Fig. 14.

Main results.Suppose that to obtain an experimental estimate for the intrapatient genetic distance in a population,*T**, we isolate κ sequences from the population, determine the number of nucleotide differences for each pair of sequences, and average the result over all such pairs. The difference between*T** and the actual value, *T*, is characterized by the standard relative error of such a measurement, ɛ*T*varies randomly between populations. It is useful to know in equation116 the representative value of *T* in a polymorphic population:*p*
_{pol} = ∫*df g*(*f*) is the total probability of polymorphism.*T*
_{rep} differs from the standard average*
* by being averaged over polymorphic states only. For instance, in the steady state, using equations 50 and 57, we get

Derivations.Consider a sample of *k* genomes randomly selected from a population with the mutant frequency*f*. An experimental estimate for *f*, which we denote *f**, is the proportion of genomes in the sample that are mutant*i* numbers the genomes in the sample and the integer number *v _{i}
* assumes one of two values:

*v*= 1 if the

_{i}*i*th genome is mutant and

*v*= 0 if it is wild type. The probability

_{i}*P*(

*v*) of a particular allele is given by

*v*> =

_{i}*f*. (We will use angle brackets to denote the average over samples, reserving the overline for the average over populations.) The expectation value, <

*f*>*, as follows from equations 119 and120, is equal to

*f*, which confirms that

*f**is a correct estimate of

*f*. The sampling variance of the estimate,

*U*, is given by

_{f}*v*and

_{i}*v*at

_{j}*i*≠

*j*are independent random numbers, any term in equation 121 with

*i*≠

*j*reduces to a product of two averages and vanishes. Any term in equation 121 with

*i*=

*j*, as one can obtain from equation 120, is equal to

*f*(1 −

*f*), which yields

*T*= 2

*f*(1 −

*f*), could be calculated, in principle, from the estimate of the mutant frequency,

*f**. However, it is usually more convenient to measure

*T*directly as a fraction of polymorphic pairs among all possible pairs of genomes from the sample. Analogously to

*f**, the estimate

*T**can be written in terms of numbers

*v*

_{i}*k*(

*k*− 1)/2 is the total of all possible pairs. Each polymorphic pair, (

*v*, v

_{i}_{j}) = (0,1) or (1,0), contributes 1 to the sum. The expectation value of

*T**is <

*T**= 2

*f*(1 −

*f*) =

*T*. The sampling variance,

*U*, can be obtained from equation 123 by the same method we used to obtain equation 122:

_{T}*T*cannot be larger than 1/2, we have

*U*> 0 at any

_{T}*k*. The criterion of a sufficiently large sample is that the relative error of measurement, ɛ =

*U*/

_{T}*T*, is small. From equation124, we arrive at equation 116 for ɛ.

## ACKNOWLEDGMENTS

We thank Joe Felsenstein for 117 useful comments.

This work was partially supported by NIH grant 1K25AI01811 and grant R35CA44385.

- Copyright © 2001 American Society for Microbiology