MMBR Figure table search 04
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Morris, C. E.
Right arrow Articles by Troussellier, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Morris, C. E.
Right arrow Articles by Troussellier, M.
Microbiology and Molecular Biology Reviews, December 2002, p. 592-616, Vol. 66, No. 4
1092-2172/02/$04.00+0     DOI: 10.1128/MMBR.66.4.592-616.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.

Microbial Biodiversity: Approaches to Experimental Design and Hypothesis Testing in Primary Scientific Literature from 1975 to 1999

Cindy E. Morris,1* Marc Bardin,1 Odile Berge,2 Pascale Frey-Klett,3 Nathalie Fromin,4 Hélène Girardin,5 Marie-Hélène Guinebretière,5 Philippe Lebaron,6 Jean M. Thiéry,7 and Marc Troussellier8

Station de Pathologie Végétale,1 Station de Technologie de Produits Végétaux, INRA, Avignon,5 CEA Cadarache, DSV DEVM LEMIR, UMR 163 CNRS-CEA, Univ-mediterranee, St Paul-Lez-Durance,2 UMR INRA-UHP Interactions Arbres-Microorganismes, Centre INRA de Nancy, Nancy,3 University of Paris VI, UMR 7621 CNRS, Laboratoire ARAGO, Banyuls-sur-mere,6 ModLibre.org, Eguilles,7 Laboratoire Ecosystèmes Lagunaires, UMR 5119 CNRS-Université Montpellier II, Montpellier, France,8 Laboratoire de Microbiologie, University of NeuchÂtel, NeuchÂtel, Switzerland4

SUMMARY
INTRODUCTION
METHOD FOR ANALYZING THE SCIENTIFIC LITERATURE
    Establishing the Database
    Sampling and Descriptive Analysis of Publications
    Critical Analysis
DYNAMICS OF RESEARCH INTEREST IN MICROBIAL BIODIVERSITY SINCE 1975
    Dynamics of Publication Rates
    Major Research Themes and Objectives
        Composition and structure of microbial populations and communities.
        Impact of environmental factors on microbial biodiversity.
        Microbial biodiversity as an epidemiological tool.
        Markers to facilitate diagnosis and identification.
        Methods for studying microbial biodiversity.
        Phylogenetic and taxonomic studies.
        Discovery of new taxa.
    Methods Used for Characterizing Biodiversity
APPROACHES TO EXPERIMENTAL DESIGN AND HYPOTHESIS TESTING
    Measuring the Impact of the Environment, Space, and Time
    Snapshotting the Composition and Structure of Microbial Populations and Communities
    Discerning Markers for Diagnosis and Identification
    Defining Relatedness among Microorganisms
    Evaluating Methods for Studying Microbial Biodiversity
    Discovering New Species and Other Taxa
CONCLUSIONS AND FUTURE DIRECTIONS
ACKNOWLEDGMENTS
REFERENCES

   SUMMARY
 Top
 Next
 References
 
Research interest in microbial biodiversity over the past 25 years has increased markedly as microbiologists have become interested in the significance of biodiversity for ecological processes and as the industrial, medical, and agricultural applications of this diversity have evolved. One major challenge for studies of microbial habitats is how to account for the diversity of extremely large and heterogeneous populations with samples that represent only a very small fraction of these populations. This review presents an analysis of the way in which the field of microbial biodiversity has exploited sampling, experimental design, and the process of hypothesis testing to meet this challenge. This review is based on a systematic analysis of 753 publications randomly sampled from the primary scientific literature from 1975 to 1999 concerning the microbial biodiversity of eight habitats related to water, soil, plants, and food. These publications illustrate a dominant and growing interest in questions concerning the effect of specific environmental factors on microbial biodiversity, the spatial and temporal heterogeneity of this biodiversity, and quantitative measures of population structure for most of the habitats covered here. Nevertheless, our analysis reveals that descriptions of sampling strategies or other information concerning the representativeness of the sample are often missing from publications, that there is very limited use of statistical tests of hypotheses, and that only a very few publications report the results of multiple independent tests of hypotheses. Examples are cited of different approaches and constraints to experimental design and hypothesis testing in studies of microbial biodiversity. To prompt a more rigorous approach to unambiguous evaluation of the impact of microbial biodiversity on ecological processes, we present guidelines for reporting information about experimental design, sampling strategies, and analyses of results in publications concerning microbial biodiversity.


   INTRODUCTION
 Top
 Previous
 Next
 References
 
In the mid-1900s, the provocative publications of R. H. MacArthur and G. E. Hutchinson (115, 165, 166) spurred the field of ecology into intense research and debate about the significance of biodiversity. These and other workers claimed that biodiversity is a measure of important ecological processes such as resource partitioning, competition, succession, and community productivity and is also an indicator of community stability. The bulk of this new wave of biodiversity research concerned plant and animal communities. In the 1960s, following in the footsteps of plant and animal ecologists, microbiologists began investigating the impact of biodiversity on the function and structure of microbial communities (97, 257). These questions served to intensify interest in biodiversity, a concept that has been a longstanding foundation of microbiology. Genetic variation among individuals within a population has long been recognized as the starting block for adaptation and evolution among microorganisms as well as among other organisms. Likewise, the consequences of phenotypic variability for the accuracy of disease diagnosis and for establishing taxonomic relationships among microorganisms have been well studied. Following the new research wave of the 1960s, interest in microbial biodiversity has been further bolstered by (i) creation of the Diversitas international research program in 1991 to promote scientific investigations into the origins and conservation of biodiversity and the impact of biodiversity on ecological functions, (ii) the Biodiversity Treaty that issued from the United Nations Conference on Environment and Development in 1992 in Rio de Janeiro, Brazil, and (iii) subsequent initiatives launched by science foundations, scientific societies, and research institutions in a wide range of countries. Microbial biodiversity has also received particular attention in areas where industrial applications are evident—such as for marine, medical, and food biotechnology—and where microbial activity has important implications for Earth's climate and for the bioremediation of polluted sites (46). Nevertheless, in spite of the research devoted to microbial biodiversity and to biodiversity in general, the consequences of biodiversity on the ecological processes cited above are still the object of debate and analysis (263, 264, 290).

If one takes a superficial glance at the study of biodiversity, it seems to be largely a descriptive endeavor: trap, identify, and count. However, as suggested above, the motivation behind these studies arises from specific hypotheses about the nature of biodiversity and its impact on ecological processes. In some studies, the hypotheses are explicit. For example, Hariston et al. (97) sought to test the hypothesis that increasing biodiversity is positively correlated with increasing community stability. Likewise, Kaneko and Atlas (130) focused on the hypothesis that biodiversity is correlated with the density of populations of bacteria in ice, sediment, and water from the Beaufort Sea. In other studies, the hypotheses are implicit, such as for a census of species or groups in a community or for studies of techniques for characterizing biodiversity. The hypotheses in these cases concern the accuracy of the census, the efficiency and biases of the techniques, and, above all, the notion that the observations made are not artifacts.

Tests of such hypotheses are based on demonstrating that one's observations are not due to random error or to factors other than those evoked in the hypothesis. In general, this involves three precautions: (i) experimental design and sampling procedures to reduce the influence of unwanted factors on the resulting observations, (ii) statistical tests to eliminate biases in the judgment of the observer, and (iii) multiple independent tests of a hypothesis to reduce the possibility that random error is the cause of the results observed. The process of hypothesis testing presents some important challenges for the field of microbial biodiversity. The foremost of these is the problem of constituting a sample. How can samples account for the diversity of extremely large and heterogeneous microbial populations when they represent only a very small fraction of these populations or when the methods used have very low resolution? Ecologists studying the diversity of macroscopic organisms have produced a large body of literature devoted to sampling strategies (45, 102, 167, 168, 210). This literature testifies to the important impact that intrinsic properties of a population, such as spatial aggregation, rates of immigration, birth, and death, and the relative frequency of rare species, have on the extent to which the sample represents the population. Unfortunately, microbial ecologists rarely have a priori knowledge about these properties of the populations they study. Furthermore, the work reviewed by Swift (257) concerning the biodiversity of fungi during successional sequences in communities of decomposers provides an early illustration of the wide range of sources of variability that can be encountered on small scales in microbial biodiversity studies. Hence, the initial steps of designing studies of microbial diversity could require considerable reflection and preliminary investigations. A second obvious challenge that one could expect for microbial biodiversity studies resides in multiple, independent tests of hypotheses. This is a challenge because the complexity of identifying microorganisms, and bacteria in particular, can lead to considerable investment in time and labor. As a consequence, the number of strains or samples that can be analyzed is sometimes below the minimal number needed for a single unambiguous test of the hypothesis, and additional tests may be prohibitive.

The field of microbial biodiversity has grown markedly since the Diversitas initiative in 1991 and has resulted in a large body of scientific literature. Through this literature, we have witnessed the development of techniques for characterizing diversity, in particular at the molecular level for both culturable and nonculturable microorganisms (223, 262). Furthermore, this literature has also contributed to a general consensus that the microbial world is much more diverse than we can appreciate at present. However, the abundance of literature published in the field of microbial diversity does not seem proportional to our understanding of the significance of biodiversity for ecological processes in the microbial world or of the ways in which we can manipulate or manage this diversity. The authors of this review wondered about a potential cause of this context. What types of questions—or hypotheses—are being addressed in the field of microbial biodiversity, and how have these hypotheses been tested? Our objective is to elucidate the cause of this context in practical terms: how has the field of microbial biodiversity employed sampling, experimental design, and, ultimately, the process of testing hypotheses over the past 25 years?

To review the approaches used for sampling, experimental design, and hypothesis testing in studies of microbial biodiversity we have, as a first step, traced the growth of research interest in this field for the past 25 years for a wide array of habitats including aquatic systems, soil and rhizosphere systems, mycorrhizae, the phyllosphere, food products, and food-processing factories. These habitats represent the fields of competence of the authors. We established a database of over 2,000 relevant publications and systematically characterized a randomly sampled subset as described in the first part of this review. The second part of the review is dedicated to illustrating the trends in the themes of research concerning the biodiversity of these habitats. This part of the analysis allowed us to identify eight principal themes that have been the main focus of microbial biodiversity studies over the past 25 years. Each theme was then used as a point of reference for analyzing how experimental design and hypothesis testing were treated in each publication. The third part of the review presents our critical analysis of how experimental design and hypothesis testing have been exploited in microbial biodiversity studies for each of the themes identified. We conclude with a series of guidelines for information that should be specified in microbial diversity publications with regard to experimental design and sampling strategies. These guidelines are intended to enhance the contribution of biodiversity studies to elucidating the significance of microbial biodiversity in ecological processes.


   METHOD FOR ANALYZING THE SCIENTIFIC LITERATURE
 Top
 Previous
 Next
 References
 

Establishing the Database

This review is based on studies published in the primary literature concerning eight types of microbial habitats: (i) aquatic systems (continental and marine), (ii) soil, (iii) the rhizosphere, (iv) mycorrhizae, (v) fungus-plant pathosystems, (vi) bacterium-plant systems, (vii) food, and (viii) food-processing factories. For soil and rhizosphere systems, we considered only symbiotic or nonpathogenic bacteria. Plant pathogens were included in the fungus-plant pathosystems and bacterium-plant systems. The latter system also included nonpathogenic bacteria associated with aerial plant parts, i.e., epiphytic bacteria in the phyllosphere. Literature searches were conducted, as illustrated in Fig. 1, using at least one of five bibliographic databases for the years 1975 to 1999: CAB, MedLine, Scifinder, PASCAL, and Food Science and Technology Abstracts. Each database was searched for the scientific fields for which it was relevant: CAB was interrogated for fungus-plant pathosystems, bacterium-plant systems, mycorrhizae, and food systems; MedLine was searched for food and aquatic systems; Scifinder was searched for soil and rhizosphere systems; PASCAL was searched for soil systems; and Food Science and Technology Abstracts was searched for food-processing factory systems. Trial searches were conducted to establish a set of consensus terms and phrases to be used for all habitats. The consensus terms and phrases were diversity, population structure, community structure, variability, dominance, and numerical taxonomy. References in the CAB, MedLine, Scifinder, and PASCAL databases with at least one of these terms in the title, abstract, or key words were retained for further consideration. However, most of these terms and phrases were not appropriate for the Food Science and Technology Abstracts database. This database was searched based on the words diversity, contamination, incidence, prevalence, occurrence, presence, characteristics, and characterization. References were then further selected based on the presence of key words relevant to each habitat. For each habitat, we also established lists of terms and phrases used to exclude the majority of irrelevant references. To exclude the irrelevant references remaining after this step, the database was then screened manually based on titles and contents of abstracts. The references were compiled in EndNote (version 3.0; Niles Software, Inc., Berkeley, Calif.), and duplicates were eliminated. Our analysis of trends is based on the 2,200 references compiled. The database of references may be obtained by contacting the corresponding author.



View larger version (31K):
[in this window]
[in a new window]
 
FIG. 1. Procedure for establishing and analyzing a database of publications from the primary scientific literature from 1975 to 1999 concerning microbial biodiversity.

 
Sampling and Descriptive Analysis of Publications

From the database of 2,200 references, 753 publications were randomly selected for analysis. This was accomplished by selecting about 100 of the total publications for each habitat according to random-number tables. Sampling was stratified over the years of publication so that each year was represented in the sample in the same proportion that it was represented in the database. Hence, between 12 and 56% of the publications were analyzed for all habitats except foods and food-processing factories. For these latter fields, all 40 and 39 references, respectively, in the database were analyzed.

For each habitat, the randomly selected publications were characterized according to 17 criteria describing (i) the origin (culture collections, direct environmental samples, etc.) of the microorganisms or microbial nucleic acids studied and the overall approach to collecting samples, (ii) the precise parameters of the reported sampling strategy (use of random sampling protocols; size, weight, and volume of samples taken; and the number of individuals characterized), (iii) the microbiological and molecular biological techniques used to characterize diversity (Table 1), (iv) the calculation of diversity indices and the use of statistics to test the hypotheses evoked concerning microbial diversity, and (v) the use of replicated tests of the principal hypotheses evoked. Furthermore, the principal themes of each study were identified. For the publications analyzed, eight different themes were identified (Table 2). Most publications had only one principal theme, but a few papers had two (27%) or three (3%) principal themes. A descriptive quantitative analysis of the database was conducted to determine the relationships among the different parameters characterizing the publications (cooccurrence of the characteristics described, changes in occurrence with time, etc.). This analysis was based on procedures written in Scilab (http://www-rocq.inria.fr/scilab), a free general-purpose scientific software.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Techniques employed for characterizing microbial biodiversity in studies published from 1975 to 1999

 

View this table:
[in this window]
[in a new window]
 
TABLE 2. Synopsis of the objectives of publications in the primary literature on microbial biodiversity from 1975 to 1999 concerning eight different microbial habitats associated with water, soil, plants, foods, and food-processing factories

 
Critical Analysis

For each of the eight themes identified in our analysis, we constructed a stepwise procedure to evaluate the approaches taken in addressing the principal objectives of the publications. This procedure took into account the overall experimental approach that we would expect for studies under each of the themes. For example, for some of the themes we expected to find specific information in the publication concerning the sampling strategy and statistical tests used as well as descriptions of multiple independent experiments relative to the central hypotheses. The questions we addressed concerning the experimental approaches used for each theme are presented in Fig. 2. Further details concerning these questions are presented below.



View larger version (40K):
[in this window]
[in a new window]
 
FIG. 2. Procedure for analyzing the role of experimental design and hypothesis testing for eight different themes of research concerning microbial biodiversity. The themes are described in detail in Table 2.

 

   DYNAMICS OF RESEARCH INTEREST IN MICROBIAL BIODIVERSITY SINCE 1975
 Top
 Previous
 Next
 References
 
Dynamics of Publication Rates

For the fields surveyed, the number of publications concerning microbial biodiversity in the primary scientific literature showed a marked increase in the early 1990s (Fig. 3A). This increase was particularly striking for fields concerning plant-pathogenic fungi and aquatic, soil, and rhizosphere systems (including mycorrhizae) (Fig. 3B). For systems implicating food industries, interest in microbial biodiversity did not develop until the 1990s and was prompted by food safety issues and the establishment of hazard analysis protocols (120). The overall increase in the number of publications relevant to microbial biodiversity does not simply reflect the increase in the total number of studies published by the 525 different journals covered in this census. For the 10 journals publishing most frequently in the field of microbial biodiversity (Table 3), the percentage of studies devoted to microbial biodiversity among all the studies published also showed a marked increase in the early 1990s. For example, the publication rate of microbial biodiversity studies increased from only 0.5% of the 589 articles published in 1988 by Applied and Environmental Microbiology to 6% of the 876 studies published by this journal in 1999. Overall, Applied and Environmental Microbiology published the greatest number of studies concerning microbial biodiversity in general (Table 3), as well as for aquatic, soil, and rhizosphere habitats, and was the second most frequent publisher of studies concerning bacterium-plant systems. However, studies concerning the biodiversity of fungus-plant pathosystems were published primarily in Phytopathology and in Mycological Research, those concerning mycorrhizae were published primarily in New Phytologist and Canadian Journal of Botany, and those concerning foods and food industries were published primarily in Journal of Food Protection, Journal of Applied Bacteriology, and International Journal of Food Microbiology.



View larger version (21K):
[in this window]
[in a new window]
 
FIG. 3. Rate of publication in the primary literature from 1975 to 1999 of studies concerning the biodiversity of eight microbial habitats. (A) Total number of publications per year; (B) number of publications per year concerning fungus-plant pathosystems ({lozenge}), the rhizosphere (including mycorrhizae) ({blacktriangleup}), microbial habitats in soil ({triangleup}), aquatic systems ({blacklozenge}), bacterium-plant system (---), and the microbiology of food and food-processing factories ({blacksquare}).

 

View this table:
[in this window]
[in a new window]
 
TABLE 3. Number of publications concerning the biodiversity of microorganisms (bacteria and fungi) in eight different microbial habitats associated with water, soil, plants, foods, and food-processing factories published from 1975 to 1999 by the 10 journals publishing these types of studies most frequently

 
Major Research Themes and Objectives

The classification of publications according to themes (Table 2) was an essential step in the elaboration of this review since it permitted us to clearly develop the analysis of approaches used for experimental design and hypothesis testing. We present these themes not only to provide an overview of the types of questions preoccupying the field of microbial biodiversity but also to give insight into how we have classified the studies analyzed here. Each theme has been accorded an acronym to facilitate reference to the theme (Table 2).

Composition and structure of microbial populations and communities. The description of the composition and structure of microbial populations and communities is an important starting point in studies of microbial biodiversity and sets the stage for fundamental studies concerning how these populations and communities function. Hence, it is not surprising that the basic description of community composition or structure (theme Describe) coupled with the impact of space and time on these parameters (theme Dynamics) was the central theme of over half of the publications analyzed here and was relevant to all habitats surveyed (Table 4). Although descriptive studies (theme Describe) have been dominant in the microbial biodiversity literature considered as a whole, publication of such studies has declined over time whereas reports of spatial and temporal dynamics of microbial biodiversity (theme Dynamics) have become more abundant (Fig. 4).


View this table:
[in this window]
[in a new window]
 
TABLE 4. Distribution of publications among the principal themes of studies of microbial biodiversity in the primary literature from 1975 to 1999 concerning eight different microbial habitats associated with water, soil, plants, foods, and food-processing factories

 


View larger version (31K):
[in this window]
[in a new window]
 
FIG. 4. Changes over time in the frequency of publications addressing each of eight different research themes in microbial biodiversity from 1975 to 1999. The themes Effect ({blacklozenge}), Dynamics ({lozenge}), Source ({blacktriangleup}), Describe ({triangleup}), Relatedness ({blacksquare}), Markers ({square}), Methods (+), and New (x) are described in Table 2. Each value represents the percentage of publications addressing a given theme among the total publications for that period. The periods depicted were chosen to represent approximately equal numbers of publications per period.

 
Microbial populations and communities have been described in terms of a wide range of phenotypic traits, many of which are related to the practical interest in the habitat studied, as listed in Table 2. Numerous studies have also exploited molecular characterization of nucleic acids in order to describe microorganisms; examples of these are cited throughout this review. In aquatic systems in particular, an important set of studies related to the theme Describe has concerned exploratory investigations using culture-independent methods (clone libraries) to determine the species composition of bacterial communities (Table 2).

The impact of time and space on population composition or structure (theme Dynamics) has been investigated to understand the origin and spread of populations of beneficial and of deleterious microorganisms or the probable relationship among organisms from different geographical locations. For systems concerning plant or human pathogens or bacteria involved in food spoilage, these questions have been linked to an interest in disease epidemiology, in the efficiency and durability of practices for control of crop and food losses, and in food safety issues (Table 2). For aquatic systems, studies under this theme have concerned vertical and horizontal spatial variability at a range of scales and short-term as well as seasonal dynamics (Table 2).

Impact of environmental factors on microbial biodiversity. The effect of environmental factors on microbial diversity (theme Effect) has been a major theme of study for soil and rhizosphere microbial systems and for mycorrhizae (Table 4). Interest in this theme has increased markedly over the past 25 years (Fig. 4). Studies under this theme are apparently motivated by the possibility of modifying the factor of interest via soil or forest management practices, for example. For rhizosphere and soil communities, there has been considerable interest in the relationship between microbial biodiversity and plant health, plant productivity, and the efficiency of ecological processes such as nutrient cycling and phytoremediation (Table 2). For mycorrhizal fungi, interest in their biodiversity arises from the observations that these fungi play a major role in the floristic diversity and structure of plant communities (273).

Studies under this theme relative to soil, rhizosphere, and mycorrhizal systems have addressed the impact of two principal factors on microbial diversity: the properties of the soil and the nature of the plant species. The soil properties studied have been overwhelmingly those related to agricultural and other land management practices and to pollution (Table 2). In studies of aquatic habitats, only a few papers concern the effect of specific factors on the structure of bacterial communities. They are related to the effect of anthropogenic disturbances (mainly pollutants) and to the response of natural communities to natural environmental changes (temperature, salinity, etc.) (Table 2).

Microbial biodiversity as an epidemiological tool. For all habitats considered together, relatively few publications have clearly linked the concepts of epidemiology and microbial biodiversity (Table 4; Fig. 4). However, studies of microbial biodiversity addressing epidemiological questions (theme Source) have been of interest for habitats concerning human pathogens and occasionally for those containing plant pathogens. Studies under this theme have compared phenotypic and genotypic profiles of strains from different origins as a means of determining possible sources of contamination of foods or sources of outbreaks of plant disease (Table 2).

Markers to facilitate diagnosis and identification. Characterization of microbial biodiversity as a means of discerning markers useful for diagnosis or microbial identification (theme Markers) has been a major theme of studies concerning plant-associated bacteria (Table 4), where the development of such markers is a matter of particular interest for tracing bacteria in ecological studies and for avoiding the need for time-consuming tests of pathogenicity and host range (Table 2). Markers have also been sought for other types of microorganisms as a means of rapidly identifying phenotypes whose laboratory characterization is laborious, such as the efficiency of mycorrhization, or as a means of facilitating the identification of organisms that are complicated to identify, such as certain mycorrhizal fungi (Table 2).

Methods for studying microbial biodiversity. Interest in methods for studying microbial biodiversity (theme Methods) has grown steadily since 1975 (Fig. 4). However, among the 94 publications analyzed that evaluated methods applicable to the characterization of microbial biodiversity, 90 concerned the evaluation of laboratory methods for characterizing microorganisms and over half of these focused on evaluating methods for the analysis of total DNA or targeted DNA sequences in a sample. Only four studies focused on the effect of experimental designs or sampling procedures on measures of biodiversity (Table 2). Nevertheless, methodological information was presented in other studies that we analyzed, but to a much more limited extent than in the four publications just mentioned, and it was not the principal objective of the publications.

Phylogenetic and taxonomic studies. Biodiversity studies motivated by phylogenetic or taxonomic comparisons of a given group of microorganisms (theme Relatedness) have focused primarily on bacteria and reflect the historical tribulations surrounding the definition of bacterial species and the relatedness among individuals of different genotypes. In our database, the frequency of publications under this theme decreased with time (Fig. 4). Since interest in microbial phylogeny probably is not waning, this trend likely reflects the fact that the key words used for our bibliographic searches are employed less and less frequently in publications addressing microbial phylogeny and taxonomy. Nearly half of the biodiversity studies concerning aquatic systems were relevant to the theme Relatedness (Table 4). An important motivation for studies of this habitat is that less than 1% of aquatic picoplankton—the organisms that constitute the fundamental basis for the functioning of these systems—can be cultured. Hence, there is a prevailing interest in describing organisms that were inaccessible prior to the advent of culture-independent techniques. Culture-independent direct retrieval of 16S rDNA was used extensively in the early 1990s to characterize unknown and uncultured species in aquatic systems (Table 2). Most of these studies provide sequences of the isolated 16S rDNA and localize these "phylotypes" in the tree of life. For bacteria that are pathogenic to plants or animals, studies under the theme Relatedness have been motivated by very practical concerns. The taxonomic variability of causal agents of plant and animal diseases has been investigated because it is intimately linked to questions concerning epidemiology (disease origin) and can be useful in subsequent identification of markers for detection (Table 2).

Discovery of new taxa. The search for new species or taxa (theme New) was a relatively rare theme (Table 4), which has increased slightly in importance over the past few years (Fig. 4). For aquatic systems, studies under this theme generally were derived from phylogenetic or taxonomic studies (theme Relatedness) and focused on the occurrence of new species in a few samples with the aim of finding new DNA sequences (Table 2). For soil and rhizosphere systems, studies under this theme have sought to demonstrate the high degree of bacterial diversity and the presence of nonculturable new species by using direct soil extraction followed by molecular phylogenetic analysis (Table 2). However, the search for new species or taxa in soil and rhizosphere systems has also addressed culturable bacteria (Table 2).

Methods Used for Characterizing Biodiversity

To describe how the different techniques presented in Table 1 were used in biodiversity studies over the last 25 years, we calculated the frequency at which these techniques were reported and their coincidence with the different themes (Table 5). For all publications considered as a whole, and for many of the habitats or themes considered independently, DNA-based characterization techniques—and in particular those based on targeted DNA sequences such as 16S rDNA, specific repeated sequences, or genes for virulence of pathogens—had the most dominant role in biodiversity studies relative to all other types of techniques (Table 5). Furthermore, DNA-based characterization techniques had the greatest growth rate over the past 25 years compared to all other types of techniques for characterizing microbial diversity. The percentage of published articles that reported the use of DNA-based characterization techniques was 9% in the period from 1975 to 1988, 27% from 1989 to 1993, 35% from 1994 to 1995, and 40 to 50% thereafter. Nevertheless, in spite of the important role of DNA-based characterization techniques, tests based on the characterization of phenotypes played a predominant role for four of the habitats. Characterization of microorganisms based on their metabolism, morphology, or ability to grow on selective media were the predominant approaches in studies of soil, mycorrhizae, food, and food-processing factories. The role of phenotypic tests for these systems is related to the frequent use of the BIOLOG system for characterizing soil microorganisms, the ease of characterizing certain mycorrhizal fungi by their morphological features, and the importance of standardized culture methods for food-borne pathogens.


View this table:
[in this window]
[in a new window]
 
TABLE 5. Percentage of articles published from 1975 to 1999 reporting the results of studies of microbial biodiversity based on analysis of total or targeted DNA sequences or on characterization of phenotypic properties

 
Although DNA-based characterization techniques had an overall dominant role in studies of microbial biodiversity, their importance was more restricted when studies were classed by theme. In fact, DNA-based techniques had the most dominant role in studies of the phylogenetic and taxonomic relationships among organisms (theme Relatedness) and in the search for new species or taxa (theme New) (Table 5). As indicated above, the evaluation of methods (theme Methods) has also focused principally on characterization of microbial DNA. Conversely, studies concerning the effect of an environmental factor and the description of structure and composition relied more heavily on characterization of microbial phenotypes than on characterization of genotypes. This trend was consistent for all systems except aquatic systems.

These trends led us to wonder if the predominance of phenotypic tests in studies of microbial biodiversity is related to the role of known, specific microbial functions in a habitat. Studies of plant and animal pathogens and of symbiotic microorganisms provided a large enough database to permit us to analyze trends concerning the use of direct tests of pathogenicity and of symbiosis. We based this analysis on 117 publications that concerned the characterization of processes related to pathogenesis or symbiosis: 45 publications for fungus-plant pathosystems, 37 for bacterium-plant systems, 14 for rhizosphere systems, 13 for mycorrhizae, and the remainder for soils, foods, and food-processing factories. The objectives of these studies were, for example, to determine the similarity of populations of a given plant pathogen from different origins or the overall diversity within a specific group of a plant pathogen or food spoilage organism (11, 16, 32, 51, 62, 121, 142, 152, 189, 221). We did not consider studies focused solely on molecular phylogenetics or on the diversity of specific alleles within pathogen or symbiont populations. For the database considered as a whole, pathogenicity or symbiotic host range testing was frequent for studies in all the habitats represented in the 117 publications. However, over time there has been a general tendency for fewer and fewer studies to carry out this type of characterization, as illustrated in Fig. 5. This tendency could possibly be explained by the increasing availability of markers for properties related to pathogenesis (72, 196, 277) or to symbiosis (2, 148, 159, 185). However, we did not find many discussions concerning how markers were good estimates of the pathogenicity or the symbiotic potential of the organisms studied when direct tests of these properties were not used in the studies described in the 117 publications analyzed. This tendency suggests that direct characterization of the pathogenicity or symbiotic potential of microorganisms in biodiversity studies is being progressively abandoned and is not necessarily being replaced by unambiguous markers of these microbial functions. Part of the impetus for reducing the use of pathogenicity tests may reside in the intensification of quarantine and biosecurity constraints in the handling of plant pathogens such as Ralstonia solanacearum and certain species of Cercospora or Fusarium, for example, or for genetically engineered microorganisms. The manipulation of human and animal pathogens in general and the handling of indicator animals used in evaluating pathogenicity are also subject to increasing constraints that may contribute to this trend. For pathogens and symbionts alike, constraints of time and labor may also orient research programs away from approaches requiring these types of phenotypic characterization.



View larger version (23K):
[in this window]
[in a new window]
 
FIG. 5. Changes over time in the frequency of publications reporting the use of tests of pathogenicity or symbiosis in studies of microbial biodiversity from 1975 to 1999 for systems where these microbial properties are particularly pertinent: fungus-plant pathosystems ({blacksquare}), bacterium-plant systems ({square}), mycorrhizae ({blacktriangleup}), the rhizosphere (x), and all habitats considered together ({blacklozenge}).

 

   APPROACHES TO EXPERIMENTAL DESIGN AND HYPOTHESIS TESTING
 Top
 Previous
 Next
 References
 
The objectives of three of the themes described above involve the testing of hypotheses about the effect of the environment, space, or time on microbial biodiversity (themes Effect, Dynamics, and Source). Examples of hypotheses under these themes, stated in a general way, include the following: there is no significant variation in the structure of the population of a given plant pathogen over time or between geographic regions (32, 39), and plant species have a significant effect on the biodiversity of microorganisms in the rhizosphere (19, 95, 171, 205, 252). The test of such hypotheses necessitates well-developed experimental design leading to statistical tests. Because of the analogies that can be made among these three themes, we have grouped them for the analysis presented below. The objectives of the remaining themes focus—to various degrees—on description of the composition or structure of microbial populations or communities. The principal questions we addressed in the analysis of experimental approaches were unique for each of these themes (Fig. 2), and hence, we present each of them independently.

Measuring the Impact of the Environment, Space, and Time

The objectives of more than half of the microbial biodiversity studies published over the past 25 years were to measure the effect of a specific environmental factor on biodiversity, to measure the changes in biodiversity over time and space, or to compare the similarity of a population of pathogens at a given site with that of the population at a suspected source of contamination (themes Effect, Dynamics, and Source) (Table 4). These objectives are analogous to a wide range of seemingly straightforward objectives such as determining the effect of fertilizers on crop yield or the changes in lichen density with increasing distance from a refinery. Hence, we extrapolated the main principles guiding experimental design and hypothesis testing from these latter systems to the analysis of microbial biodiversity in the themes Effect, Dynamics, and Source. As a first step in characterizing the experimental designs reported in studies under these three themes, we noted whether a sampling strategy was clearly described (Fig. 2). Specifically, we determined if either (i) the space and time within which samples were collected were defined or (ii) information was presented concerning how the samples accounted for the variability inherent to the sampling procedure. For these studies, we then noted if statistical tests were used to evaluate the hypotheses evoked concerning biodiversity and if the study reported multiple independent tests of these hypotheses (Fig. 2).

For the themes Effect, Dynamics, and Source considered as a whole, one-third of the 471 publications analyzed did not report a sampling strategy (Table 6). These publications lacked information about the basis for choosing the samples or strains characterized and/or information about the size, weight, volume, or frequency of samples taken. For these studies, there was insufficient information to determine how an independent but comparable sample could be taken. Hence, we could not ascertain the extent to which the characterized organisms represented the population studied or to what extent the results could be compared to those of studies of similar microorganisms. For 6% of these 471 publications, the study was based solely on the use of strains from culture collections. In nearly all studies that employed strains from culture collections, the only information reported concerned their taxonomic identification and the date, location, and environment from which they were isolated. Nevertheless, a few publications indicated how strains from culture collections had been isolated, and we considered this to be part of the description of a sampling strategy. The frequency of studies reporting sampling strategies varied among the habitats. Publications concerning soil and rhizosphere systems, mycorrhizae, fungus-plant pathosystems, and food most frequently reported sampling strategies (Table 6). For all habitats considered together, there was no discernible effect of time on the frequency at which sampling strategies were reported in publications.


View this table:
[in this window]
[in a new window]
 
TABLE 6. Number of publications reporting sampling strategies, use of statistical analyses to test hypotheses concerning microbial biodiversity, and replicated tests of hypotheses for the themes Effect, Dynamics, and Source

 
Among the 314 publications in the themes Effect, Dynamics, and Source that reported sampling strategies, about 50% also reported values of diversity indices and subsequently employed statistics to test hypotheses concerning biodiversity (Table 6). There were marked differences among the three themes in terms of the frequency with which diversity indices and subsequent statistical tests were employed: 60% of the publications reporting sampling strategies in the theme Effect also employed indices and statistical tests, while 47% of those in the theme Dynamics and 27% of those in the theme Source did so. For the publications in which statistical tests of hypotheses were not employed, many reported experimental designs and data equivalent to those reported in studies employing statistics. For most of these publications, it was not clear why statistical tests were not exploited. A few indicated that they did not have a sufficient number of replications (83). Another (213) clearly stated that statistical tests were considered inappropriate in light of the high variability of mycorrhizal types observed in the large number of samples analyzed. About 8% of the 146 publications that lacked descriptions of sampling strategies under these three themes nevertheless reported statistical evaluations of the hypotheses evoked.

A wide variety of approaches were used for calculating diversity indices and for performing statistical tests of hypotheses. To summarize these approaches, we classified measures of diversity as either primary indices or secondary or composite indices. Primary indices were direct measures of population parameters not requiring any particular calculation or confounding of species (or taxon) richness with relative abundance and identity. In general, these were direct measures of the abundance or relative frequency of phenotypes or genotypes observed in a sample. All other types of indices, such as the Shannon-Wiener diversity index (Hs) (239), Simpson's diversity index ({lambda}) (244), and Nei's index for genetic diversity (IN) (195), were considered to be composite indices. In Table 7 we summarize how primary and composite indices have been coupled with the use of parametric and nonparametric statistical tests.


View this table:
[in this window]
[in a new window]
 
TABLE 7. Measures of diversity and statistical tests used to evaluate the effect of environmental factors, space, or time (themes Effect and Dynamics) on microbial biodiversity or to compare microbial populations with the objective of identifying sources of contamination or inoculum (theme Source)

 
The work of Chen et al. (40) clearly illustrates how the nature of primary and composite diversity indices constrains the choice of parametric and nonparametric statistics for hypothesis testing in microbial biodiversity studies. These workers sought to determine if there were changes in the genetic diversity of populations of the plant pathogenic fungus Mycospharella graminicola during disease epidemics on wheat. Genetic diversity was measured in terms of the frequency of alleles at different restriction fragment length polymorphism (RFLP) loci in isolates of the fungus collected at different times in each of three seasons at one experimental site. For example, to test the difference in genetic diversity between early-season and late-season populations, a contingency {chi}2 test was employed to compare allelic frequencies at each of eight different RFLP loci determined for each of 444 strains collected in 1990 (about one-third of which were collected early in the season, and the rest were collected late in the season). Likewise, the difference in genetic diversity of populations from three different years was evaluated by the same statistical test by comparing allelic frequencies for each of 10 RFLP loci for the late-season strains collected in 1990 and 58 late-season strains collected in 1991 and 1992. No significant differences were revealed at any of the loci. The authors then calculated various indices of genetic diversity including Nei's index for the different loci at each date. Nei's index is analogous to the Shannon-Wiener index and summarizes the number and relative abundance of different genotypes in a sample without accounting for the identity of the genotypes encountered. Furthermore, the value for such an index calculated from data for a single sample is not associated with a measure of variance. Population variance must be measured from multiple samples from the same population. Alternatively, some authors have exploited methods proposed for estimating the theoretical variance of composite diversity indices via simulation or calculations (25, 276, 277). Hence, for diversity based on Nei's index, Chen et al. (40) used parametric tests (t tests) to evaluate differences in genetic diversity for situations in which they had measures of variance. By using t tests, they compared genetic diversity at different dates or different times in a single year by considering the indices for each of the loci to be replicate measures within a given time. However, the indices for the multiple loci of a single set of strains are not necessarily bona fide independent replicate measures.

Other studies can also serve to illustrate how the types of replication and measures of variability inherent to experimental design influence the coupling of diversity indices to parametric and nonparametric tests, as illustrated in Table 7. Hallmann et al. (93) conducted parametric tests based on composite diversity indices by establishing replicated measures, sensu stricto, of these indices. For chitin-amended and nonamended soils, these authors calculated the number of different bacterial genera and Hill's diversity number (N1) for samples of 35 bacterial strains from each of four replicate plots per treatment. For each index, least-significant-difference tests were used to compare diversity based on the four values measured per treatment. Helm et al. (105) used one-way analysis of variance and the nonparametric Kruskal-Wallis test to evaluate differences in fungal diversity in terms of the infection percentages for different ectomycorrhizal types along a chronosequence of different plant species. Xia et al. (288) employed a nonparametric test to evaluate heterogeneity among primary indices of genetic diversity. To determine variation in diversity of the plant pathogen Magnaporthe grisea among sites within individual fields of rice, these authors determined the DNA fingerprints of 7 to 21 strains collected at each of five sites within two commercial rice fields. Within-field spatial heterogeneity of diversity was determined by a {chi}2 test based on the frequency of RFLP fingerprint groups in the five locations per field. Although Nei's index of genetic similarity was calculated in this study, it was not employed in statistical tests. The use of nonparametric tests for primary indices was also illustrated by Carraminana et al. (34). These authors used a {chi}2 test to evaluate the difference in frequency of serotypes among strains of Salmonella collected in poultry slaughterhouses before and after evisceration of carcasses.

Another aspect of experimental design that we noted for the themes Effect, Dynamics, and Source was the reporting of multiple independent tests of hypotheses (Fig 2). Of the 471 publications analyzed in these three themes, about 5% reported multiple independent tests of hypotheses (Table 6). For the 168 that described sampling strategies and employed statistics for testing hypotheses, about 20% reported multiple independent tests of these hypotheses. Nearly all studies under the themes Effect, Dynamics, and Source employed replication to ensure the reliability of the microbial traits characterized in the laboratory. However, most of these studies were based solely on samples from a single location and/or a single date. Hence, multiple independent tests of the principal hypotheses were not possible for these studies.

Examples of studies involving multiple tests of hypotheses include the work of Mahaffee and Kloepper (169), who evaluated the effect of a genetically modified strain of a plant growth-promoting Pseudomonas fluorescens on microbial communities of the cucumber rhizosphere and endorhiza. The microbial diversity of the rhizosphere and endorhiza of inoculated and noninoculated experimental plots did not differ significantly between treatments but was different between 0 and 70 days after planting. Similar results were obtained in each of two field seasons. Jonsson et al. (127) evaluated the effect of wildfires on the ectomycorrhizal fungal communities of Scots pine. To conduct replicated tests of the hypothesis that wildfires have significant effects on mycorrhizal population structure, these authors compared the mycorrhizal populations in burned and adjacent unburned late-successional stands of Scots pine at four different sites in northern Sweden. Asher (11) tested the hypothesis that the aggressiveness of strains of the plant pathogen Gaeumannomyces graminis from fields continuously cultivated with cereal crops is lower than that of strains from fields in which cereals are grown only occasionally. Measures of the aggressiveness of the populations were made from two sites, one representing a short-term (2 or 3 preceding cereal crops) and the other a long-term (12 to 15 preceding cereal crops) cereal sequence. Four tests were done, each with isolates from the two different types of sites. Statistical comparisons among populations from the two types of sites led to the rejection of the hypothesis. Zhan et al. (293) sought to determine the relative contribution of immigration to the genetic structure of Mycosphaerella graminicola populations during the course of an epidemic. They used neutral DNA markers to compare the population structure in wheat plots inoculated with known isolates to that in naturally infected control plots. Field plots were arranged in a randomized complete block design with four replications, and comparisons of allelic frequencies were based on a contingency {chi}2 test. Significant differences were observed in allelic frequencies in populations from control and inoculated plots, suggesting that immigration was low. These differences were evaluated for numerous different alleles and pairs of alleles for both mid-season and late-season populations but for only one field experiment. Heuer and Smalla (109) compared the diversity of bacteria on leaves of common potato cultivars to that on leaves of genetically modified potato plants expressing the bacteriophage T4 lysozyme. Comparisons were made in a greenhouse experiment as well as in a field trial. Other examples of multiple tests of hypotheses under the themes Effect, Dynamics, and Source are reported by Bever et al. (19), Burdon and Jarosz (32), Chen et al. (39), Garland (75), Handley et al. (95), Hartmann et al. (99), Maloney et al. (171), Marilley et al. (173), Paffetti et al. (205), Safir et al. (227), Sanders (231), and Strain et al. (252).

One of the central purposes of experimental design is to ensure that the measures reported in a publication are repeatable and not due simply to random error. This notion is clearly evoked in the Instructions to Authors of the major journals publishing studies of microbial biodiversity. However, our analysis of the trends in experimental design for publications in the themes Effect, Dynamics, and Source suggests that few publications (about 5%) report repeated measures of the phenomena that are central to the principal objectives of the studies. In other words, publications reporting multiple independent tests of hypotheses were rare. Perhaps verifications of the hypotheses addressed in these publications are, or will be, the subject of subsequent publications. Alternatively, the investment in time and labor for multiple tests of hypotheses may have been prohibitive for some studies. For certain studies, such as those addressing changes in population structure or composition over decades (196) or across a wide range of geographic regions (189), multiple independent tests are virtually impossible or very impractical. However, for other types of studies, we could not identify what led to the publication of results of only a single experiment, a single sampling campaign, or a single set of strains.

The lack of repeated tests of hypotheses and the infrequent use of statistical tests led to some ambiguity in the conclusions presented in many of the publications in the themes Effect, Dynamics, and Source. It is important to note that the conclusions stated in most of the publications analyzed here were cautiously elaborated with regard to interpretation of the results. Hence, we encountered relatively few publications that presented unambiguous conclusions about the effect of specific factors, time, or space on microbial biodiversity. The least ambiguous conclusions were presented in publications reporting sampling strategies and the use of statistical tests. The Discussion sections of many of the publications in these three themes focused on the utility of the techniques employed for microbial characterization rather than on the major ecological theme of the study.

Above, we speculated about why multiple independent tests of hypotheses are not commonly reported in the microbial biodiversity literature. We also wonder why statistical tests are infrequently employed in publications relevant to the themes Effect, Dynamics, and Source. We encountered numerous publications with clearly defined sampling strategies and experimental design for which statistics were not reported but for which they would have been appropriate to test the hypotheses evoked. The reporting of sampling strategies coupled to the use of statistical tests of hypotheses was clearly more typical of studies of certain habitats than of others. In particular, publications concerning soil, the rhizosphere, and fungi in general most frequently reported these aspects of experimental design and hypothesis testing. This tendency has led us to wonder if the scientific literature concerning the microbial biodiversity of a given habitat has a broad impact, in particular on studies of apparently unrelated habitats.

What are the possible barriers to a broad impact of the microbial biodiversity literature among different habitats? One barrier may result from the techniques typically used for isolating and characterizing the microorganisms of a given habitat. These techniques may sharply orient experimental design and may make it difficult to extrapolate experimental designs from one system to another. In particular, for habitats where microorganisms are readily culturable and identifiable, it may be relatively easy to design experiments leading to data sets compatible with the calculation of indices, measures of variability, and statistics. It is interesting that none of the studies analyzed for the themes Effect, Dynamics, and Source concerning aquatic systems employed statistics; most of these studies were based on the characterization of the DNA representing nonculturable organisms. Characterization of microorganisms based on DNA extraction from environmental samples followed by PCR amplification, cloning, and sequencing, as was employed for most of the studies in the themes Effect, Dynamics, and Source for aquatic systems, is very demanding of time and labor. Furthermore, strategies to randomly sample the generated clones so as to represent the relative proportions of organisms in the environment are not obvious. However, this type of characterization has been of great interest because it can lead to identification at the species and strain levels. Recently developed approaches to discriminating isolated nucleotide sequences, such as single-strand conformation polymorphism, terminal RFLP, denaturing gradient gel electrophoresis, and related techniques, may lead to better compatibility of characterization of DNA extracted from the environment with experimental designs leading to statistical tests, as illustrated by the work of van Hannen et al. (274).

Alternatively, other approaches to characterizing noncultured microorganisms have proven to be compatible with statistical analyses. For example, communities of microorganisms have been characterized by direct analysis of the fatty acids in phospholipids or in lipopolysaccharides extracted from environmental samples. The ease of this technique and the obvious relationship between the fatty acid composition in the sample to that in the environment have led to statistical analyses of the resulting fatty acid profiles (37, 69, 117, 211, 242, 292). Nevertheless, this technique is less discriminating and therefore less interesting because the resulting profiles correspond to broad taxonomic groups such as actinomycetes, gram-positive or gram-negative bacteria, or anaerobic bacteria. Nonculturable mycorrhizal fungi, as well as fungi that are obligate plant parasites, have also been the object of studies exploiting statistical analyses (33, 227). Quantitative measures of the biodiversity of these fungi in particular might be readily accessible because they can be identified by morphological characteristics even though they are not culturable. Nevertheless, statistical tests are, in general, infrequently employed even for publications concerning habitats where microorganisms are readily culturable and identifiable. Hence, there is a need for considerable reflection about the barriers to statistical analyses and multiple independent tests of hypotheses in publications concerning microbial biodiversity.

Snapshotting the Composition and Structure of Microbial Populations and Communities

Studies in the theme Describe focused on either describing the composition of microbial populations or evaluating the structure of these populations independent of the effects of time and space. In our analysis, we made a clear distinction between the notions of composition and structure. We defined structural studies as those reporting quantitative measures of the relative abundance of the different groups of organisms characterized and compositional studies as those that either enumerated or identified the different groups detectable in a population or community. Although numerous publications employed "structure" in the title or key words, they were not considered to be studies of structure unless quantitative measures of relative abundances were presented. In the absence of such quantitative measures, publications were considered to represent compositional studies for this analysis. For studies of population structure, our analysis consisted of determining if a sampling strategy was described in the publication and if there was an estimation of the variance associated with the structure of the described population. For studies of population composition in the theme Describe, we noted if the authors attempted to justify or evaluate how well the sample represented the population being studied (Fig. 2).

Studies of microbial populations and communities under the theme Describe have focused primarily on quantification of population structure (Table 8). These studies have quantified population structure by measuring parameters such as (i) the relative abundance of ribotypes or of strains resistant to various antibiotics in rhizosphere bacterial populations to which strains of Pseudomonas fluorescens active as biocontrol agents were introduced (193), (ii) the relative abundance of DNA fingerprint groups of Aspergillus parasiticus from a corn field (177), and (iii) the frequency of alleles within the ribosomal DNA intergenic spacer of Hebeloma cylindrosporium from populations of this haploid ectomycorrhizal basidiomycete (90). Hence, the representativeness of a sample relative to the overall population is a central issue for these studies. Nevertheless, only two-thirds of the 118 studies of population structure that we analyzed reported sampling strategies (Table 8).


View this table:
[in this window]
[in a new window]
 
TABLE 8. Characteristics of publications in the theme Describe concerning the evaluation of the composition or structure of a microbial population or community independent of time and location

 
For studies describing the composition of microbial populations, only 16% reported information concerning how the sample was taken or in what way it represented the population (Table 8). Among the publications presenting information concerning the representativeness of the sample, a few included rarefaction curves. Rarefaction curves represent the rate at which new groups (species, genotypes, etc.) are detected as the sample size increases. The asymptote of the curve is considered to be an estimate of the number of different groups in the total population (230, 265). The use of rarefaction curves is standard fare in studies of the biodiversity of macroscopic organisms, and their potential use in microbial diversity studies has been recently well illustrated (114). Among the publications analyzed here, that of Goodman and Trofymow (83) used rarefaction curves to estimate the rate at which new mycorrhizal types were found and also to estimate total richness. Ingleby et al. (119) also used similar curves to compare mycorrhizal diversity (i.e., the cumulative number of different mycorrhizal types) under different plant canopy types. Other examples include the studies by Hantula et al. (96), Ravenschlag et al. (215), and Sakano and Kerkhof (228) for aquatic systems, Clawson and Benson (42), Dunbar et al. (56), and Handley et al. (95) for rhizosphere systems, and Torsvik et al. (266) for soil systems.

The notion of representativeness has a twofold implication for experimental design. First, it is clear that experimental design must take into account the extent to which laboratory tests represent microbial phenotypes and genotypes. This has been of overwhelming interest to microbiologists, as illustrated by the abundance of publications that evaluate assays for phenotypic and genotypic characterizations. Furthermore, the principal types of replication reported in studies of microbial biodiversity pertain to the confirmation of results of such assays. The second implication of representativeness is the extent to which the sample of organisms or their DNA represents a population or a community. For both of these aspects of representativeness, one is compelled to ask if we are studying artifacts and in what way our estimate of the properties of the population or the community is biased. However, it should be kept in mind that the representativeness of a sample vis-à-vis a population is the foundation on which the representativeness of characterization assays rests.

What does "representativeness of a sample" mean? Clearly, for practical and technical reasons, it cannot mean that every taxon in the community is represented in the sample. The growing interest in identifying new taxa and the development of techniques to account for nonculturable microorganisms—major objectives of the Diversitas program and of recent research programs (223)—may be helpful in broadening the scope of biodiversity studies, but they are not intended to address representativeness per se. "Representativeness of a sample" simply means that we can define what the sample represents. The aim of a sampling strategy is to render this definition compatible with the objective of the study. For studies where, for example, the objective is to quantify population structure, the sampling strategy employed should lead to samples in which the proportion of each type or group detected is equivalent to that in the population, within the limits of detection corresponding to the sample size.

Of course, this is a proposition for an ideal world with no sampling biases. Sampling errors and biases are inevitable, and they limit the extent to which approaches such as rarefaction can be used to investigate the representativeness of a sample. Kinkel et al. (139) clearly illustrated how sampling of microbial systems misrepresents the underlying species frequency distributions in the population. Their analysis was based on random sampling of simulated microbial populations having different species abundance distributions (lognormal and negative binomial distributions with different degrees of skewedness) but each having 100 species. Random sampling revealed that depending on sample size, communities with distinctly different species abundance distributions could not be differentiated. Likewise, extrapolations made from identical populations could lead to the conclusion that the populations had distinctly different species abundances. Real-world sampling combines the basic probabilistic errors described by Kinkel et al. with the biological biases of microbiological and molecular biological sampling techniques. Rarefaction and simulated estimates of variance of diversity measures are constrained by the information contained within the sample and do not account for sampling biases unless they have been measured experimentally. Hence, one can understand the importance of replicated sampling and of multiple independent experiments for measures of population structure. Furthermore, it is evident that the results of studies describing the composition of microbial populations are also affected by the way in which the sampling strategy conditions the representativeness of a sample.

Discerning Markers for Diagnosis and Identification

The notion of representativeness is also particularly pertinent to the search for markers. Many of these studies have relied heavily on characterization of strains from culture collections in order to span the geographical and temporal variability of the microorganisms of interest. Hence, we did not consider sampling strategy per se to be a fundamental issue in the overall approach taken for publications in the theme Markers. Rather, for publications in this theme, we noted if the authors explained the basis for choosing the strains or sample from which they identified markers or if there was any discussion about what the strains or samples represented. Furthermore, we also noted the use of strains outside of the taxonomic group for which the markers were identified and the use of independent samples (or collections) to validate the accuracy of the markers (Fig. 2).

The search for markers was dominated by studies of plant-pathogenic and phyllosphere bacteria and of mycorrhizal fungi, although there were examples of such studies for nearly all habitats. For bacteria, 88% of the studies published under this theme sought markers of taxonomic groups, whereas the studies of mycorrhizae were equally divided between the search for taxonomic groups and the search for functional groups. The experimental design and sampling strategies associated with these studies clearly influenced how we interpreted the results. In particular, for nearly all of the studies under the theme Markers, it is difficult to evaluate the potential utility of the markers in terms of the fraction of the naturally occurring population they could detect or identify. This difficulty arises from the fact that none of the 69 studies concerning markers presented information describing how the sample represented the inherent variability of the population and only 2 of these studies tested the validity of the reported markers against a collection or sample of strains independent of that used to develop the markers (158, 251).

To illustrate the impact of representativeness for studies in the theme Markers, we calculated rarefaction curves based on data presented in two studies in which markers of bacterial identity were developed (Fig. 6). In presenting these curves, we keep in mind the limits of rarefaction cited above. These curves suggest that the 19 different RFLP groups described for a collection of 137 strains of Erwinia carotovora (104) represent a value much closer to the asymptotic value than the 28 groups described for the 62 strains of Ralstonia (formerly Pseudomonas) solanacearum (47) (Fig. 6). Hence, in the application of these markers to the identification of strains of these bacteria, there would be a high probability that newly collected strains of R. solanacearum would fall into RFLP groups not described by Cook et al. (47) whereas newly collected strains of E. carotovora would be much more likely to be represented by one of the RFLP groups described by Hélias et al. (104). Such a comparison is based on the assumption that both sets of strains are equally representative of their respective populations. Nevertheless, such estimates could be useful in orienting studies toward the number of strains necessary for characterizing markers with the desired level of accuracy. Ultimately, the validity of such estimates can be confirmed by testing markers against additional samples or collections of strains.



View larger version (19K):
[in this window]
[in a new window]
 
FIG. 6. Rarefaction curves of the number of different RFLP profiles detected within collections of increasing size for strains of E. carotorovora (wide gray line) and R. solanacearum (narrow black line) based on data presented by Hélias et al. (104) and Cook et al. (47), respectively. Rarefaction curves were calculated by sequentially selecting each of the 137 strains presented by Hélias et al. (104) or the 62 strains presented by Cook et al. (47) in a random order and determining the number of different RFLP profiles obtained as the number of strains increased. The curves represent the mean of five independent randomizations for each data set.

 
Defining Relatedness among Microorganisms

Many of the publications concerning microbial phylogeny and taxonomy (theme Relatedness), similar to those concerning the theme Markers, were based on the use of strains from culture collections. Nevertheless, there were also numerous examples of studies in this theme involving the isolation of strains via culture methods and of DNA representing nonculturable microorganisms. Hence, in the analysis of publications under the theme Relatedness, we looked for information concerning sampling strategies or other descriptions of the representativeness of the organisms characterized. Furthermore, because of the nature of the numerical analyses employed in these studies, we also noted the number of strains used to represent each operational taxonomic unit