Microbiology and Molecular Biology Reviews, March 2001, p. 151-185, Vol. 65, No. 1
Department of Molecular Biology and
Microbiology, Tufts University, Boston, Massachusetts
02111,1 and School of Biological
Sciences, University of Auckland, Auckland, New
Zealand2
1092-2172/01/$04.00+0 DOI: 10.1128/MMBR.65.1.151-185.2001
Copyright © 2001, American Society for Microbiology. All rights reserved.
Transition between Stochastic Evolution and Deterministic
Evolution in the Presence of Selection: General Theory and
Application to Virology
SUMMARY
INTRODUCTION
QUALITATIVE DISCUSSION AND COMPUTER SIMULATIONS
Description of the Model and the Evolution Equation
Virus population model.
Stochastic equation of evolution.
Boundary conditions: properties of almost monomorphic
populations.
Experiments on Evolution and Observable Parameters
Steady State
Neutral case: s
µ.
Case with selection: µ
s
1.
Deterministic Dynamics and Its Boundaries
Deterministic dynamics.
Boundaries of deterministic approximation.
Stochastic Dynamics: the Drift Regime
Decay of the polymorphic state and gene
fixation.
Transition from a monomorphic to a steady state.
Divergence of populations which have been separated and the
time correlation function.
Stochastic Dynamics: the Selection-Drift Regime
Accumulation.
Divergence of separated populations and the time
correlation function.
Reversion (fixation of an advantageous variant).
Sampling Effects
Experimental Applications
Virological studies in vitro.
HIV populations in vivo.
General applications.
Many Loci and Other Aspects
Conclusions
MATHEMATICAL RESULTS AND DERIVATIONS
Description of the Model and the Evolution Equation
Main results.
Virus population model.
Stochastic equation of evolution.
(i) Discrete
Markovian equation.
(ii) Diffusion equation limit.
Boundary conditions: properties of an almost monomorphic
population.
Experiments on Evolution and Observable Parameters
Steady State
General case.
Neutral case: s
µ.
Case with selection: µ
s
1.
Deterministic Dynamics and Its Boundaries
Main results and discussion.
Deterministic dynamics.
Boundaries of deterministic approximation.
Stochastic Dynamics: the Drift Regime
Main results and discussion.
Decay of the polymorphic state and gene fixation.
(i) Decay of strong polymorphism.
(ii) Gene fixation.
Transition from a monomorphic to a steady state.
Divergence of separated populations and the time
correlation function.
Stochastic Dynamics: the Selection-Drift Regime
Main results and discussion.
Accumulation.
Divergence of separated populations and the time
correlation function.
Reversion (fixation of advantageous variant).
Sampling Effects
Main results.
Derivations.
ACKNOWLEDGMENTS
REFERENCES
SUMMARY
|
|
|---|
We present here a self-contained analytic review of the role of stochastic factors acting on a virus population. We develop a simple one-locus, two-allele model of a haploid population of constant size including the factors of random drift, purifying selection, and random mutation. We consider different virological experiments: accumulation and reversion of deleterious mutations, competition between mutant and wild-type viruses, gene fixation, mutation frequencies at the steady state, divergence of two populations split from one population, and genetic turnover within a single population. In the first part of the review, we present all principal results in qualitative terms and illustrate them with examples obtained by computer simulation. In the second part, we derive the results formally from a diffusion equation of the Wright-Fisher type and boundary conditions, all derived from the first principles for the virus population model. We show that the leading factors and observable behavior of evolution differ significantly in three broad intervals of population size, N. The "neutral limit" is reached when N is smaller than the inverse selection coefficient. When N is larger than the inverse mutation rate per base, selection dominates and evolution is "almost" deterministic. If the selection coefficient is much larger than the mutation rate, there exists a broad interval of population sizes, in which weakly diverse populations are almost neutral while highly diverse populations are controlled by selection pressure. We discuss in detail the application of our results to human immunodeficiency virus population in vivo, sampling effects, and limitations of the model.
INTRODUCTION
|
|
|---|
The process of evolution is a consequence of the interplay of mutation, selection, and chance on a population of organisms, leading to an observable change in its genetic makeup. Since the time of Darwin, the influence of these factors on the evolution of organisms ranging from bacteria to humans has been intensively studied, both experimentally and theoretically, leading to a very large body of literature. Only recently, however, has attention been turned toward special problems in the evolution of viruses. Virus evolution is of particular interest and importance for three reasons. First, we desire to gain an understanding (usually in the absence of a fossil record) of how modern viruses have arisen from their earlier forms, both in recent times and in parallel with the evolution of their hosts. Second, the evolution of a virus during the course of infection of a single host, or along a short transmission chain, is of great importance in creating new populations with properties altered in important ways, such as evasion of the immune response, resistance to antiviral therapy, or altered virulence. Third, because of their high replication rates, simple genomes, large population sizes, and high mutation rates, viruses make good models for studying and testing evolutionary theory.
Particular attention has focussed on understanding the evolutionary
forces that act on human immunodeficiency virus (HIV) during the course
of infection of a single human host. HIV displays a remarkable extent
of genetic variation concurrent with a high speed of evolution: in the
most variable region of the genome (env), individual genomes
within a population from an infected person can vary by as much as 3 to
5% (2, 43, 78); substitutions in env
accumulate at a rate of approximately 1% per year (71), 50 million times faster than in the small subunit of rRNA
(61). This variation has important consequences. It allows
the virus to evolve to infect different cell types (9, 20,
30) and to rapidly become resistant to otherwise highly
effective antiviral drugs (10, 47, 50); it may play a role
in evading the immune system (4, 56, 73, 79). Furthermore,
its high mutation rate (estimated to average about 3 × 10
5 per nucleotide site per replication cycle
[49]), large population size (variously estimated from
about 107 to 108 productively infected cells),
and continuous steady state, in which the large majority of virions and
productively infected cells turns over every day (25, 77),
create a situation which, at least in principle, is amenable to (and
requires) mathematical modeling.
To date, a number of modeling approaches have been applied to understand the evolution of HIV in vivo. These approaches use either population genetic (mutation frequency distribution) or phylogenetic inference using virus sequences obtained from HIV-infected individuals. In general, they are based on one of two different theoretical frameworks to the evolution problem. Deterministic approaches, including quasispecies theory (15, 26), assume that the population size is very large, such that the frequency of a given mutation at any given time is completely predictable if one knows the initial frequency, the mutation rate, and the selection coefficient (i.e., the differential growth rate conferred by the different alleles). At first glance, such approaches would seem justified by the large number of infected cells at each generation (21); however, a number of factors, such as variation in the replication potential and generation times among infected cells, may lead to an effective population size much smaller than the actual number of infected cells. Stochastic models, as applied to HIV (to this point), proceed from the opposite assumption: that the effective population size is so small (or that selective forces are so weak) that random drift dominates over selection. The hypothesis of selectively neutral mutations has a long, successful history in describing the evolution of organisms where populations are small (and not uniformly distributed) and mutation rates are very low (36). Their applicability to virus populations remains to be established. Many of the assumptions that underlie neutral theory are not appropriate for virus populations, and a number of characteristics of HIV genetic variation in vivo, such as the uneven ratio of synonymous to nonsynonymous changes in different regions of the genome (5, 44, 48), argue against simple application of neutral theory. However, inclusion of selection effects in evolutionary analysis (for example, the coalescent method) presents a mathematical challenge that has not yet been fully solved in a practical fashion, although progress toward this goal has been made recently (42, 55).
As an example of the difference between deterministic and stochastic models, consider the question of the frequency in a population of a mutation that is slightly deleterious to virus replication. In a deterministic system, it can be easily calculated that the frequency of such a mutation in the population will come to equilibrium at a point equal to the mutation rate divided by the selection coefficient (24). In a stochastic system, the population will usually be completely uniform in one variant or the other (76), switching rarely but rapidly from one form to the other. This theoretical experiment is of great practical importance in that it describes the appearance of a mutation that can confer resistance to an antiviral drug even before treatment.
To solve this problem and many others, it is clear that a more general theoretical framework is needed: one that takes into account both selection and drift under a set of assumptions more appropriate to viruses than is found in theoretical works published to date. Our aim in this work was to develop, from first principles, a general theory that includes the effects of both selection and drift on a population. We use a set of assumptions appropriate to virus populations, focusing on the interplay between deterministic and stochastic behavior in the context of virologically realistic experiments. We apply these to the simplest possible model: mutation at a single site with only two alleles, replicating in a steady-state system (that is, a constant number of infected cells) under the influence of constant selective pressure in a single isolated population. Because we are dealing with a single locus, we do not consider recombination explicitly; because we are dealing with haploid populations, we do not have to consider allelic dominance. It should be noted that although we do not consider recombination explicitly, the presence of strong recombination must be, in fact, implied for the one-locus approximation to be quantitatively correct. Also, nonconserved loci must be spaced sufficiently far apart in the genome, depending on the recombination rate. Even in the absence of recombination, the one-locus approximation is a useful starting point for understanding interactions between selection and stochastic factors at a qualitative level. We present a complete model that considers the full range of possible values for population size, mutation rate, and selection effects. Despite its simplicity, the model is surprisingly rich in its descriptive power. At the extremes, the results of this model correspond to the standard results of deterministic or neutral theory; however, we have found that there is a large range of values for the key parameters in which the system behaves in an intermediate fashion: under some conditions its evolution is dominated by stochastic factors, whereas at other times it behaves in a nearly deterministic fashion. We refer to this range of parameter values as the "selection-drift" regime and describe its properties in detail.
This work is divided into two major parts. In the first, we present all the principal results in qualitative terms, using language appropriate for a reader trained in biology and with a moderate level of mathematical sophistication. This part is accompanied by a number of illustrative examples obtained by computer simulation. Although keyed to the mathematical formalism of the second part, it is designed to be read independently and to provide the reader with an understanding of the principal results and their biological significance, particularly in the context of virus populations. The second part is a formal mathematical derivation of the principal results of the model. These results are listed at the beginning of each section and derived in the following subsections. Although some of the derivation presented is not novel, in that it parallels classic work of a number of population biologists (18, 19, 23, 24, 31, 37, 81, 82), its formal application specifically to virus systems is, to the best of our knowledge, a new approach, and we present it in full for this reason, as well as to provide a thorough and self-contained review. Although some of our mathematical methods differ from the classic methods, the final results are identical.
The presentation in both parts of this work proceeds in parallel. We first develop the basic evolution equation, which describes, at least in a statistical sense, the change in frequency of a mutant allele as a function of time and the key parameters: mutation rate, selection coefficient, and population size. We then present the predicted results, for all three regimes, of a set of virological experiments: accumulation and reversion of deleterious mutations, competition between mutant and wild-type viruses, gene fixation, mutation frequencies at the steady state, divergence of two populations split from one population, and genetic turnover within a single population. Next, we discuss sampling statistics and the application of this theory to some specific real-world experimental issues of virus and organismal evolution. Finally, we discuss the application and extension of this theoretical framework to other problems, including multilocus evolution and phylogenetic analysis.
QUALITATIVE DISCUSSION AND COMPUTER SIMULATIONS
|
|
|---|
Description of the Model and the Evolution Equation
In this section, we introduce the population model and explain how to approach the problem of evolution when random factors enter the picture. First we describe a one-locus, two-allele population model based on the virus replication cycle and discuss briefly the main factors of evolution included in the model. This is followed by a discussion of the biological meaning of the evolution equation. Finally, the boundary conditions for the evolution equation describing the properties of a weakly polymorphic population are described.
Virus population model.
First, we choose a basic model
of virus evolution. For the purposes of simplicity, we consider the
evolution of one nucleotide position at a time, and we assume that each
nucleotide has a choice between only two alleles. (Such a model applies
directly to multiple loci if the evolving loci are sufficiently distant
and the recombination rate is sufficiently high. Evolution at closely
situated loci or in the absence of efficient recombination is not
independent [see "Many loci and other aspects" below].)
Conventionally, we denote the better-fit allele as wild type and the
less-fit allele as mutant. A deleterious mutation event (from wild type
to mutant) will be referred to as forward mutation, and an advantageous
mutation event will be referred to as reverse mutation. Each separate
nucleotide will be characterized by two parameters, both of which are
assumed to be much less than unity: the mutation cost (or selection
coefficient), s, which is the relative difference in fitness
between the two alleles, and the mutation rate per base per replication
cycle, µ. We assume that mutations at different nucleotides have a
weak additive effect on virus fitness. In doing so, we neglect
epistasis (coselection) arising due to biological interaction between
nucleotides at both the nucleotide and protein levels. We also ignore
linkage disequilibrium between loci due to random drift, so that
different nucleotides evolve independently (see the
Introduction). The mutation rate is set, in our work, to be the same in
the forward and reverse directions. For example, for HIV in infected
cells the mutation rate per base is in the range of 5 × 10
6 to 5 × 10
5, depending on the type
of substitution (49, 68). The selection coefficient will
vary over a wide range according to the specific base and to the
specific conditions of replication, but it is assumed to be constant
over the period of observation; in other words, there is no selection
for diversity.
f) are infected by the wild type.
The number of mutant-infected cells may change with time, i.e., with
each new generation of cells. The total cell count is assumed to be
constant. During a generation step, each cell produces a fixed (large)
number of virions and then dies and is replaced by an uninfected cell.
The number of virions produced and capable of infecting new cells
differs, by a factor of 1
s, between cells infected
with different variants, creating selection for the better-fit (more
prolific) variant. Since the total number of infected cells is fixed
and the number of virions produced per cell is large, only a small
fraction of the virions infect the next generation of cells. On
infecting a cell, each virion has a small chance of mutating into the
opposite genetic variant, given by the mutation rate introduced above.
All the virions produced by a cell afterwards represent the same
genetic variant. Thus, intracellular interference between variants does not occur. (This lack of intracellular competition is a reasonable assumption for retroviruses or when the proportion of infected cells in
a tissue is much lower than 100%. It may vary in other virus models,
when the multiplicity of infection is high.)
|
Stochastic equation of evolution. Different meanings can be assigned to the word "evolution." For the task at hand, evolution of the population is characterized by the dependence of the frequency of cells infected with mutant virus on time. In deterministic dynamics, which applies only in very large populations of infected cells, if one knows the initial mutant frequency and has the appropriate equations, one can, in principle, predict the mutant frequency at later times with arbitrary precision. (In practise, the equations are never known exactly, since there are many different factors in play, but this is a separate issue [68].) By contrast, in the presence of random factors, the time dependence of the mutant fraction cannot be predicted even in principle. Even if one knows its precise initial value, the error with which one can predict its value later grows with time. If random factors are strong, the error in the mutant frequency and its value become eventually comparable. Evolution of the mutant frequency, in other words, is a random process.
Randomness of mutations does not mean, however, that the evolution of a population is totally arbitrary. On the contrary, useful predictions can be made about its statistical properties even if its specific state cannot be predicted. Instead of time dependence of the mutant frequency, one has to consider the time-dependent probability density [
(f)], defined as the chance that a given population has a mutant frequency near a particular value. The probability density, which can be introduced if both subpopulations (mutant and wild type) are large, is closely related to a histogram derived by plotting the number of times the mutant frequency of a
population is observed to lie within a certain range of values. When
both the number of similar experiments and the number of histogram bars
are very large, the histogram becomes, in the limit, a smooth function,
which is the probability density. (The histogram and the probability
density differ by a constant factor: the total area under the
probability-density curve [integral] is, by definition, the total
probability of having any value of the mutant frequency and is, of
course, equal to 1.) The density function contains information about
the most relevant statistical parameters (average values and standard
deviations) which can be compared with experiment (see "Experiments
on evolution and observable parameters" below). In particular, the
characteristic width of the probability density peak indicates the
error within which the mutant frequency can be predicted.
The stochastic evolution equation (equations 1 and 2) (Fig.
2a) expresses the rate of change in the
probability density with time in terms of its form at the present
moment. Using such an equation and knowing the initial probability
density, one can predict its form, in principle, at any time in the
future, similarly to how one would predict the mutant frequency itself
for a deterministic process. The difference between the two cases is
that the time-dependent variable is now a function rather than a
number. We derive the evolution equation directly for the population
model introduced in the previous subsection, in the beginning of
mathematical part of our work (see "Mathematical results and
derivations" below). The rest of the mathematical part is devoted to
solving the equation for different important cases. Here we only show
how the equation looks when the probability density is localized in a
small region near some value of the mutant frequency and comment on its
meaning from a more qualitative perspective.
|
, changes over a short time interval, dt. The first term describes random drift, the second
describes selection, and the third describes mutation. To clarify the
roles of the three terms in describing evolution, we consider each of them separately, by setting the other two terms equal to 0 (Fig. 2b to
d). As a convenient example, we examine a probability density localized
in a small region near some value of the mutant frequency (fmax). In this example, the second term, by
itself, means that the probability density increases with time on the
left side of the peak and decreases on the right side of the peak. As a
result, the probability density peak, whose shape stays constant,
shifts to lower mutant frequencies, as it should in the presence of
selection (Fig. 2c). The third term in the equation, by itself, causes
a shift of the peak as well, but the direction of the shift is toward 50% composition, which is the expected effect of mutation when the
forward and reverse mutation rates are, as assumed, equal (Fig. 2d).
The effect of the first term in the equation is of a different kind.
Due to this term, the probability density decreases in the interval
between the inflection points A and B (Fig. 2b) and increases
everywhere outside of the interval. As a result, the probability
density spreads outward. This is random drift: the error within which
one can predict the value of mutant frequency increases with time. A
more general form of the stochastic equation when the probability
density,
(f), is spread over a broad interval of
f, is given in equations 1 and 2.
In the equation in Fig. 2a, a physicist will recognize a particular
case of the Fokker-Planck equation and a mathematician will recognize a
case of the forward Kolmogorov equation (41). It was
introduced into the field of population genetics by Wright (81) and then intensively used to study evolution in the
presence of different factors (31-33, 37). As it turns
out, the equation is much more general than the virus model we used for
its derivation in the mathematical section of this review. It describes
a broad range of population models, from a bacterial culture to a
randomly mating population without allelic dominance
(35). Originally, the approach of the Fokker-Planck
equation was introduced into population genetics from a
phenomenological perspective, based on analogy to gas kinetics
(18). Later, the validity of this approach was confirmed
for different population models (52, 75). Examples of
essential factors which are not included in the equation but which may
or may not be important, depending on the experimental system, are
epistasis (biological interaction) and linkage between multiple loci,
time variation of the selection coefficient and the population size,
and allelic dominance in a diploid population (33).
A formal analogy for the system described by the evolution equation is
a gas consisting of particles mixed with air and confined between two
parallel walls (Fig. 3a). A value of the
mutant frequency is analogous to a location between the walls, and the
probability density is now the local gas density. The first term (Fig.
2a) describes the diffusion of the gas particles in the air, and the second and third terms combined describe the effect of directed force
(an electric field, for example) acting on the gas particles in the
presence of friction of the gas against the air. Another useful analogy
is gel electrophoresis. The electrical force acting on polymer
molecules and the friction against the gel matrix together create
directed motion, which segregates the molecules into bands. Molecular
diffusion leads to increasing bandwidths. Although the physics of the
gel or gas system has nothing to do with viruses or evolution, the
formal mathematical analogy between the two systems, as we shall see
below, turns out to be very useful.
|
Boundary conditions: properties of almost monomorphic populations. In the real world, the mutant frequency cannot be less than 0 or greater than 1, yet the master equation has no such restriction. Thus, the stochastic equation in Fig. 2a (and equations 1 and 2) is incomplete without describing what happens near ends of the allowed interval for the mutant frequencies, 0 and 1. The analysis shown in Fig. 2 is for the case where there is a large number of minority allele copies (that is, f is not near 0 or 1) and treats the mutant frequency (f) as a continuous variable. In many important cases, one also needs to describe the evolution of a population with only a few copies of the minority variant. The boundary conditions where f is near 0 and 1 have to be derived independently from the virus population model described in Subsection A. The derivation given in the mathematical section of this review shows that the conditions differ depending on the interval of population size, as follows.
The boundary conditions can be conveniently expressed in terms of the probability density flux (q), which is exactly analogous to the flux of gas particles through unit area per unit time (Fig. 3). In very large virus populations (Fig. 3b), the boundary conditions state that the flux must vanish at the "walls" corresponding to two monomorphic states, i.e., 100% mutant or 100% wild type (equation 3). In small populations (Fig. 3c), the flux is not zero (equations 5 and 6). This is because the probability of finding the virus population in a completely monomorphic state is finite and can increase or decrease in time. In the gas analogy, in the first case (Fig. 3b) gas molecules bounce off the hot walls and in the second case (Fig. 3c) the walls are cold and gas forms a condensate which can decrease or increase with time. Figuratively speaking, the probability density, just like the gas condensing in or evaporating from the liquid on a wall, can "condense" in or "evaporate" from a monomorphic state. The real, biological interpretation of the different sets of boundary conditions is as follows. In very large virus populations (which, as we shall see, roughly correspond to almost deterministic evolution), a purely monomorphic state is unlikely: mutations destroy it very quickly. In a small population, mutations are rare and the monomorphic state can occur with a finite probability. This argument also shows that mutations affect virus evolution in a different way depending on the number of infected cells. In a large population, mutations may be important even in a very polymorphic state (e.g., if selection is small). In small populations, the role of mutations is to create a copy of the new allele in an otherwise monomorphic population; once a copy is created, mutations can be neglected until the population becomes monomorphic again. Typically, as we discuss below in the section on steady state, a new allele is lost due to random drift and repeated introduction of mutations will be needed to restore diversity.Experiments on Evolution and Observable Parameters
In this section, we describe a few gedanken experiments on genetic evolution important for virological applications and introduce quantitative parameters suitable for experimental comparison.
To make use of the evolution equation with boundary conditions (see "Description of the model and the evolution equation" above), one needs to know the state of the system or its statistics at the initial moment of time. The initial condition depends on a particular experimental or natural setup. Virological experiments, relevant for both in vivo and in vitro situations, are as follows.
(i) Accumulation of deleterious mutants (initial condition: a pure wild-type population, i.e., f = 0).
(ii) Reversion of a deleterious mutation (initial condition: a pure mutant population, i.e., f = 1).
(iii) Growth competition (initial composition: a 50%-50% population [f = 0.5] or any other strongly polymorphic mixture).
(iv) Gene fixation (this experiment, which has received a lot of
attention in population biology [19, 24, 34, 38, 80] and
which is very useful for understanding other stochastic experiments, is
defined only in small populations in which the total mutation rate per
population, µN, is much less than 1; suppose that a single
advantageous allele is introduced into an otherwise monomorphic
population [f = 1/N]
the allele will have one of two fates: either it will be lost due to random drift [Fig. 1a] or it
will spread to the entire population, i.e., become "fixed"; the
questions are: what is the fixation probability, and, if the allele is
fixed and does not become extinct, how much time will it take, counting
from the moment it appeared? One can also ask a more general question:
what is the probability of having a new allele to grow into a
subpopulation of a given size before it becomes extinct?).
(v) Steady state. Whatever the initial condition, after a sufficient time, the system passes to the stochastic steady state, in which the probability density no longer depends on time; we consider this relatively simple case separately.
(vi) Genetic divergence. One splits a steady-state population into two isolated parts. Initially, both populations have a random but identical genetic composition, from which they independently diverge. As time goes on, their respective random compositions correlate less and less. The question is, what is the characteristic time at which the loss of correlation occurs?
(vii) Genetic turnover? This experiment studies the average timescale associated with random fluctuations of the mutant frequency in the steady state.
The probability density (
) of the mutant frequency predicted by the
stochastic equation is the main observable parameter. Unfortunately, to
measure it directly, one would have to generate a histogram of mutant
frequencies for a very large ensemble of populations. More amenable for
experimental testing are the average (expectation) values (equation 36)
and the standard deviations or variances (equation 37) of different
stochastic parameters, which require a smaller number of populations to
measure. Below we introduce some useful parameters whose statistics can
be measured in the different experiments we outlined above. At
the same time, their predicted statistics can be expressed via the
probability density, as shown in the mathematical section of this
review. In what follows, we assume that each parameter, for each given population, is measured with a high precision from a sufficiently large
sample of sequences. The sampling effects will be discussed separately below.
The first parameter is the mutant frequency itself (f), which is self-explanatory. Its value can be compared directly with the experimental value, provided that the wild-type (best-fit) nucleotide is known.
The second is the intrapopulation genetic distance (T),
defined as the proportion of sequence pairs (randomly sampled from the
virus population) which differ at the base of interest. Although there
are other ways to measure intrapopulation variability, we will use this
definition, known in population biology as Nei's nucleotide diversity.
It is equivalent to the standard definition of the genetic distance in
virology as the average number of pairwise differences among randomly
selected genomes, except that it applies to a single base rather than
to a long genomic segment. By definition, T is calculated as
2f(1
f), and varies between 0 (at
f = 0 or 1) and 0.5 (at f = 0.5). The
genetic distance is usually a more convenient measure of population
diversity than the mutant frequency itself since it does not require
knowledge of the wild type sequence.
The third is the interpopulation genetic distance
(T12), which is defined in the same way as the
intrapopulation genetic distance, except that the two sequences of each
pair are sampled from two different populations (equation 40). The
interpopulation distance is 0 when the two virus populations consist
uniformly of the same genetic variant and 1 (100%) when the two
virus populations are composed entirely of opposite genetic variants.
The interpopulation distance, as one can show, cannot be smaller than
the average of the two intrapopulation distances. Therefore, it is
sometimes more convenient to consider instead the relative genetic
distance between two populations (D), defined as the
difference between the interpopulation distance and the average of the
two intrapopulation distances [T12
(T1 + T2)/2]. This
parameter (equation 41) varies between 0 (two populations have an
identical genetic composition) and 1 (one population is pure mutant,
another is pure wild type). There are alternative definitions of
the relative distance (54). We find this definition more
clear intuitively; also, its statistical moments (average, variance)
are relatively easy to calculate.
All the previous parameters can be measured at one time point, both for dynamic experiments (the first three experiments in the beginning) and in the steady state. Since all of them are, in general, stochastic, an average and standard deviation has to be calculated for each. The next parameter is more complex: it requires measurement at two different times. We define it on average and for a steady state population only.
The fourth parameter, the time correlation function of mutant frequency [K(t)], describes how quickly the system "forgets" the preceding random fluctuation of the mutant frequency (equation 45). The time correlation function usually has a maximum when the time difference is 0 and vanishes at large time differences. The characteristic time at which it decays by 50% (or, say, by a factor of e = 2.78... ) from its maximum gives the timescale of random fluctuations. The form of this decay (e.g., exponential or negative power) may be a good fingerprint of a virus population model or, within a given model, of a particular population size.
In the mathematical section of this review, we calculate these parameters for different gedanken experiments and different intervals of population size. In this section of the review, we discuss these results qualitatively and illustrate them, when possible, with Monte Carlo simulations.
Steady State
In this section, we discuss properties of the steady-state, stochastic population in different intervals of the population size.
Neutral case: s
µ.
Selection is of little significance when the selection coefficient is
much less than the mutation rate. This case is probably of little
practical significance for RNA viruses, with their tightly organized
genomes. However, the transition between stochastic and deterministic
behavior is easier to analyze when the selection factor can be
neglected. Hence we start our discussion here.
= 2Nµ), gives the total mutation rate for the entire population. For most RNA viruses, µN equals 1 when the number of infected cells is on the order of 105
(i.e., less than the number in a small culture dish).
As the mutation rate per population increases, the probability density
gradually changes its shape, as illustrated in Fig. 4 (80). This results from
competition between random drift, which drives the system to one of
uniform states, and mutations, which diversify the system. At values of
µN much smaller than 1 (an interval we accordingly call
the drift regime in Table 1), random
drift wins and the usual population is only weakly polymorphic. The
probability density is, accordingly, U shaped, with a minimum at 50%
composition. At the smallest values of µN (the condition is given in equation 5), the system is most likely to be in either of
the purely monomorphic states, without a single opposite allele present
(see "Description of the model and the evolution equations" above,
where the the boundary conditions are described). The total probability
of any polymorphic state will be much less than 1 and on the order of
µN. This estimate gives the frequency of segregating sites
in a genome segment.
|
|
Case with selection: µ
s
1.
The situation when the selection coefficient is
less than 1 but still much larger than the mutation rate is more
relevant for RNA viruses and more interesting theoretically. As in the neutral limit, the larger the population size the smaller the fluctuations.
1), a case that has the same properties as the
above-described drift regime. At larger population sizes, selection is
crucial and causes the probability density (equations 48 or 49 to 51)
to be asymmetric in favor of a predominantly wild-type population.
In the limit of very large populations, when µN is much
larger than 1 (termed the selection regime in Table 1), the probability density is narrow and localized near its deterministic value (equation 57). This value is given by the ratio of the mutation to the selection rate (µ/s), which we assumed to be small. At this value,
mutations and selection against emerging mutants reach balance.
A result not sufficiently emphasized in the population biology
literature is the existence of a wide interval in population size
between the inverse mutation rate and the selection coefficient, which
we term the selection-drift regime, in which all three factors of
evolution are critical. Specifically, mutations produce diversity, selection restricts mutants to a low level, and random drift causes strong fluctuations between populations. The structure of the probability density in this regime is shown schematically in Fig. 5. It consists of three components. The
large peak (delta function) situated at exactly zero mutant frequency
means that a population is, most probably, purely wild type. The weak
continuous exponential tail which decays at mutant frequencies on the
order of 1/Ns
1 (80) means that the chance
of a population being polymorphic is low and that if a population
happens to be polymorphic, the proportion of mutants is small and quite
random. A small peak at f = 1 becomes important only
close to the lower border of the interval, when N is on the
order of 1/s. The probability of finding any mutants (which
is given by the total area under this curve) is low and proportional to
µN (equations 49 to 51).
|
|
Deterministic Dynamics and Its Boundaries
As we have shown above (see "Experiments on evolution and observable parameters"), the steady-state mutant frequency approaches its deterministic value when µN is much larger than 1. The purpose of this section, small but with a large mathematical counterpart, is to gain insight into the transition between stochasticity and determinism in the more complex case, in which parameters of the system depend on time.
Deterministic dynamics.
Deterministic and stochastic
theories operate with different dynamic variables. The former considers
the time dependence of the frequency of mutants, and the latter uses a
more complex object, the time-dependent probability density of the
mutant frequency. It is important to ensure that the two approaches
converge to the same result in the limit of infinite population, when
they are expected to describe deterministic evolution, albeit in a different way. For this purpose, in the mathematical section of this
review we solve the dynamic stochastic equation (equation 1) for the
case of large populations. The resulting probability density, as
expected, is a very narrow peak located at the time-dependent mutant
frequency (Fig. 7b), which satisfies the
deterministic equation of evolution (equations 60 and 61).
|
f) is very small, the first term becomes small, since if there is no diversity, there is no selection. The second term, describing mutations, does not vanish in a uniform population. Instead,
the term vanishes at 50% composition when the effects of forward and
reverse mutations cancel each other. Mutations drive the system toward
50% composition. The same evolution equation can be obtained directly
from the deterministic first principles (equations 63 and 64).
The deterministic equation in Fig. 7a allows one to predict the genetic
composition as a function of time for any initial condition set in an
experiment (equation 62). Corresponding plots for the three cases
matching the conditions of the accumulation, growth competition, and
reversion experiments described above (see "Experiments on evolution
and observable parameters") are shown in Fig.
8. In all cases, after a characteristic
time proportional to the inverse selection coefficient
(1/s), the population approaches a steady state in which the
mutant frequency saturates at a small value, the mutation rate over the
selection coefficient (µ/s) (see "Steady state"
above). Reversion is somewhat delayed compared to that in the two other
experiments since the system first has to diversify slowly due to
mutations and then still has to cross the entire interval of the mutant
frequencies. Note that in both the accumulation and reversion
experiments, the initial slope of the time dependence of the mutant
frequency is shallow and is determined by the mutation rate (Fig. 8).
Selection becomes important and causes the plots to curve after a
growing subpopulation becomes sufficiently large.
|
Boundaries of deterministic approximation. Random drift, always present even in very large populations, causes the frequency of mutants to fluctuate around its deterministic value. As the population size decreases, the magnitude of fluctuations becomes comparable to the average frequency of the minority allele (either mutant or wild type), and the deterministic description breaks down. The corresponding condition on the population size varies significantly depending on the initial conditions of the experiment (equation 65). When the population starts from a monomorphic state (reversion or accumulation), the deterministic criterion is met when µN is much larger than unity. A population that is strongly diverse to start with, as in the growth competition experiment, is already deterministic at a much smaller population size in the selection-drift regime. (The criterion for diversity is that the mutant frequency must be higher than its characteristic "tail" at steady state [Fig. 5] ). The reason for this difference is that a small polymorphism is influenced by rare and random mutation events while a strongly polymorphic population is controlled by selection alone.
Stochastic Dynamics: the Drift Regime
At the smallest population sizes, smaller than the inverse selection coefficient, as we found out when considering the steady state, selection can be neglected altogether. In this section, we consider the nonequilibrium dynamics in this regime. The problems of interest are those listed above (see "Experiments on evolution and observable parameters"): the decay of a strongly polymorphic state, gene fixation, transition from a monomorphic to the steady state, divergence of populations which have been separated, and the rate of genetic turnover in the steady state.
Decay of the polymorphic state and gene
fixation.
We start our discussion from the population that is
initially polymorphic, somewhere in the middle between 0 and 100%.
As already discussed (see "Description of the model and the
evolution equation"), mutations are not important in a
polymorphic population, since they occur in the population with a
frequency, µN, much less than 1 per generation. Therefore,
random drift remains the only factor causing variation of the mutant
frequency in time. As time passes, the mutant frequency drifts until
the population accidentally ends up in either monomorphic state (cf.
Fig. 1a). A representative random process is illustrated by computer
simulation in Fig. 9b. The average time
(the number of generations) it takes for a population to become
monomorphic (i.e., for either variant to be fixed) is on the order of
the population size (equations 81 and 82) (32, 80). The
fixation time is quite random: its representative fluctuations are on
the order of its average value. The same process can be understood in
another way, from the time evolution of probability density. Figure 9a
shows how the probability density, initially a narrow peak located,
e.g., at 50% composition, gradually spreads out to the entire interval
and then decays.
|
Transition from a monomorphic to a steady state. We also consider here the accumulation of mutations starting from a purely monomorphic state, e.g., wild type (which one of the two does not matter, since selection is negligible). Eventually, mutants will be generated, one of them will become fixed (as described), and the system will switch to pure mutant. Then wild-type alleles will be generated, etc., and, in the long run, the population will be, statistically speaking, in dynamic steady state in which it switches back and forth between two monomorphic states. The system will gradually "forget" its initial state, so that the probabilities of the two monomorphic states will be equal and will be close to 1/2.
In the probability density language, this process can be described as shown in Fig. 10a. The initial peak of the probability density is very narrow and is localized at the zero mutant frequency. As time goes on, a tail of the probability density spreads into the interval between 0 and 100% mutants (equations 85 and 86) and a new peak at 100% mutants appears, reflecting a chance of early fixation of a mutant genome. The first peak decays and the second peak grows, until they become equal in the steady state (Fig. 4) (equation 87). In the gas system analogy (see "Experiments on evolution and observable parameters" above), all water is initially condensed on the left wall and then evaporates. The vapors diffuse into the container and condense again on the right wall (analogous to what happens in a freezer over time). The system reaches equilibrium when the amount of condensate on both walls is the same and there remains some gas in between.
|
Divergence of populations which have been separated and the time correlation function. The longer timescale, 1/µ, also appears in the time correlation function of mutant frequency, which characterizes the timescale of random fluctuation in the steady state and the divergence of populations which have been separated (see "Experiments on evolution and observable parameters" above). The value of the relative genetic distance, D, gradually changes from 0 to a constant value corresponding to statistically independent populations (equation 90). (Note that some other measures of interpopulation genetic distance used in population biology do not have an upper limit [54].) As it turns out, the time of this transition, the half time of the correlation function decay (equation 91), and the time in which the probability density becomes symmetric (above) are on the same order, the inverse mutation rate. Indeed, all three times are determined by the waiting time for a successful gene fixation.
Stochastic Dynamics: the Selection-Drift Regime
Here we consider nonequilibrium experiments in the most interesting interval of population sizes (Table 1). The relative role of selection and stochasticity in population dynamics, as derived from the evolution equation in the mathematical section of this review, depends on the initial genetic composition. The dynamics of growth competition is almost deterministic (see "Deterministic dynamics and its boundaries" above), so that this experiment need not be discussed again. In the accumulation experiment, the overall dynamics is stochastic, except for the average values of the mutant frequency and the intrapopulation distance, which are, remarkably, the same as in the corresponding deterministic conditions.
Accumulation. As in the drift regime (see above), accumulation can be described as a spread of the peak of the probability density initially located at 0 (uniform wild type) into the interval between 0 and 1. However, unlike in the drift regime, the resulting steady state is not symmetric of a large peak (Fig. 5) (equation 48 or 49 to 51). The process of accumulation is reduced to generation of a small tail describing rarely occurring weakly polymorphic states (Fig. 5). As a result, the initial peak at 0 does not decay greatly and the steady state is reached in the same time as in deterministic selection (see "Deterministic dynamics and its boundaries" above) given by the inverse selection coefficient (1/s), i.e., faster than all timescales in the drift regime (equations 103 and 104).
The simulated stochastic dependence for this experiment is shown in Fig. 11. The process starts from the generation of a single allele, which tries to grow into a clone. The growth initially occurs under the condition that random drift is more important than selection. The maximum frequency that this clone can reach is determined by the characteristic mutant frequency at equilibrium, ~1/(Ns) which corresponds to the clone size, 1/s copies (Fig. 5). Above this value, selection becomes the leading force and drift becomes a correction. Further growth of the deleterious clone cannot occur, and it soon becomes extinct. This appears as sparse peaks, the highest of which reach to the length of the "tail" of the probability density, 1/(Ns) (Fig. 5) (equation 48 or 50). The half-life of a mutant clone (width of a large peak) is the inverse selection coefficient. Note that the typical time interval between peaks, 1/(µNs), is longer than 1/s. The former time is the waiting time for a new allele that will be lucky to reach the size 1/s. The latter time is the time that the lucky clone actually spends growing and contracting before it becomes extinct again. The ratio of the two times, µN, gives the probability of finding the population in a polymorphic state (the area under the tail in Fig. 5). As in the drift regime, all these estimates can be obtained from both the evolution equation (equation 101) and the more intuitive gene fixation approach (equation 84). For comparison, simulation of an accumulation experiment in the "selection" regime (µN = 20) is shown in Fig. 12.
|
|
Divergence of separated populations and the time correlation function. The characteristic times of divergence of separated populations (Eq. 105) and the decay time of the correlation function (Eq. 106) are on the order of the inverse selection coefficient, 1/s. Both experiments show for how long, on average, the system "remembers" its previous random fluctuation. The answer: for the half-life of a typical mutant clone, before it becomes extinct. This is because separate clones appear, due to mutation, at independent random times.
Reversion (fixation of an advantageous variant).
A
reversion experiment, in which the initial population is uniformly
mutant, behaves rather differently. Although the same scales for time
and the minority allele frequency appear in this case, they have
different meaning. As in accumulation, random drift and selection
dominate in smaller and larger wild-type colonies, respectively.
However, in this case, selection accelerates rather than hinders the
growth of a new clone. The probability that a single wild-type allele
will manage to grow to a size equal to the inverse selection
coefficient, 1/s, is low, s. However, above this
critical size, the rest of its growth will be carried out by selection
in a deterministic manner, i.e., with a probability close to 1 and over
the deterministic timescale, 1/s (see "Deterministic dynamics and its boundaries" above). Hence, the bottleneck of reversion is in reaching the critical size despite random drift; after
that, a clone is likely to be fixed in the population. Stochastic dynamics below the critical size is the same as in the accumulation regime (selection is not important). The average waiting time for
reversion to start is determined by the fixation probability, s, and by the frequency at which single alleles are
generated in a population at each generation, µN, which
gives the time ~1/(µNs), i.e., the same scale as the
waiting time for a high peak in accumulation regime (Fig. 11) (equation
107) (51). A few examples of reversion curves are shown in
Fig. 13. Evolution of the probability
density is shown in Fig. 14, including
evolution of the density of polymorphic states (Fig. 14a) (equation
108) and of the two probabilities of monomorphic states (Fig. 14b)
(equation 107).
|
|