Chemical and Biochemical Engineering, Thayer School of Engineering and Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire 03755,1 USDA Agricultural Research Service, U.S. Dairy Forage Research Center and Department of Bacteriology, Madison, Wisconsin, 53706,2 Department of Microbiology,3 Institute for Wine Biotechnology, University of Stellenbosch, Stellenbosch 7600, South Africa4
SUMMARY INTRODUCTION FUNDAMENTALS Structure and Composition of Cellulosic Biomass Taxonomic Diversity Cellulase Enzyme Systems Noncomplexed cellulase systems. Complexed cellulase systems. Glycoside hydrolase families. Molecular Biology of Cellulase Enzymes Regulation of cellulase production. Organization of cellulase genes. Gene duplication and horizontal gene transfer. Physiology of Cellulolytic Microorganisms Substrate preference. Adhesion and formation of cellulose-enzyme-microbe complexes. Uptake and phosphorylation of cellulose hydrolysis products. Fermentative catabolism and end products. Ecological Aspects of Cellulose-Degrading Communities Rate-Limiting Factors in Nature METHODOLOGICAL BASIS FOR STUDY Quantification of Cells and Enzymes in the Presence of Solids Continuous Culture and Substrate Delivery QUANTITATIVE DESCRIPTION OF CELLULOSE HYDROLYSIS Adsorption Rates of Enzymatic Hydrolysis Bioenergetics of Microbial Cellulose Utilization Kinetics of Microbial Cellulose Utilization Contrast to Soluble Substrates PROCESSING OF CELLULOSIC BIOMASS A BIOLOGICAL PERSPECTIVE Pretreated Substrates Process Configurations ORGANISM DEVELOPMENT FOR CONSOLIDATED BIOPROCESSING Strategies Native Cellulolytic Strategy Metabolic engineering. Growth inhibition by ethanol and other factors. Genetic system development. Recombinant Cellulolytic Strategy Heterologous cellulase expression in bacteria. (i) Zymomonas mobilis. (ii) Enteric bacteria. Heterologous cellulase expression in yeast. (i) Endogenous saccharolytic enzymes of S. cerevisiae. (ii) Expression of heterologous cellulase genes in S. cerevisiae. (iii) Growth on nonnative substrates by virtue of heterologous expression of saccharolytic enzymes. CONCLUDING DISCUSSION Fundamentals Biotechnology Alternative Cellulose Hydrolysis Paradigms ACKNOWLEDGMENTS REFERENCES
| SUMMARY |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Plant biomass is the only foreseeable sustainable source of fuels and materials available to humanity (410). Cellulosic materials are particularly attractive in this context because of their relatively low cost and plentiful supply. The central technological impediment to more widespread utilization of this important resource is the general absence of low-cost technology for overcoming the recalcitrance of cellulosic biomass. A promising strategy to overcome this impediment involves the production of cellulolytic enzymes, hydrolysis of biomass, and fermentation of resulting sugars to desired products in a single process step via a cellulolytic microorganism or consortium. Such "consolidated bioprocessing" (CBP) offers very large cost reductions if microorganisms can be developed that possess the required combination of substrate utilization and product formation properties (405).
Notwithstanding its importance in various contexts, fundamental understanding of microbial cellulose utilization is in many respects rudimentary. This is a result of the inherent complexity of microbial cellulose utilization as well as methodological challenges associated with its study. Understanding of cellulose hydrolysis can be approached at several levels of aggregation: components of cellulase enzyme systems, unfractionated cellulase systems, pure cultures of cellulolytic microorganisms, and mixed cultures of cellulolytic microorganisms. In general, our understanding is progressively less complete at more highly aggregated levels of study. Thus, although much remains to be elucidated at the level of enzyme components and the underlying genetics of such components, understanding of cellulose hydrolysis by unfractionated cellulase systems is still less complete, understanding of hydrolysis by pure cultures is more limited yet, and hydrolysis in multispecies cultures and mixed communities is least understood of all. There is a natural tendency for science to proceed over time toward a finer level of aggregatione.g., from pathways to enzymes to genesand this "reductionist" approach has yielded tremendous insights with respect to the life sciences generally and cellulose hydrolysis in particular. An alternative "integrative" approach, involving the development of an understanding of aggregated systems based on an understanding of their less aggregated components, is also a valid and important focus for scientific endeavor. With respect to cellulose hydrolysis, such integration is essential for research advances to be translated into advances in technological, ecological, and agricultural domains.
The great majority of cellulose hydrolysis research to date has focused on the genetics, structure, function, and interaction of components of cellulase enzyme systems. Several recent and comprehensive reviews address this large body of work (see "Cellulase enzyme systems" below). Whereas hydrolysis of cellulosic biomass has been approached in prior reviews and the research literature primarily as an enzymatic phenomenon, this review approaches the subject primarily as a microbial phenomenon. Thus, we intend our review to embody the integrative approach described in the previous paragraph.
The goals of this review are to collect and synthesize information from the literature on microbial cellulose utilization in both natural and technological contexts, to point out key unresolved issues, and to suggest approaches by which such issues can be addressed. In seeking to consider microbial cellulose utilization from an integrative perspective, we endeavor to consider a diversity of cellulolytic organisms and enzyme systems. This effort is, however, constrained by the information available, which is much more extensive for some types of systems and some levels of consideration than for others. Both aerobic and anaerobic organisms and enzymes are considered in our discussion of fundamentals (see "Fundamentals" below) and methodological aspects (see "Methodological basis for study" below). Our treatment of quantitative aspects of microbial cellulose utilization (see "Quantitative description of cellulose hydrolysis" below) of necessity focuses primarily on aerobic organisms and their enzymes. Information on anaerobic organisms and their enzymes is included in this section as possible, but is much more limited. In considering processing of cellulosic biomass (see "Processing of cellulosic biomassa biological perspective" below) and organism development for consolidated bioprocessing (see "Organism development for consolidated bioprocessing" below), we focus on organisms producing reduced metabolic products via an effectively anaerobic metabolism because this is responsive to the needs, constraints, and opportunities associated with microbial conversion of cellulosic feedstocks (see"Processing of cellulosic biomassa biological perspective" below). Literature pertaining to noncellulolytic organisms is included in cases where it provides important foundational understanding for topics involving cellulolytic organisms, as in the case of metabolic engineering of end product formation in cellulolytic anaerobes and expression of heterologous saccharolytic enzymes in noncellulolytic hosts (see "Organism development for consolidated bioprocessing" below). We conclude with a discussion of the genesis, status, and future direction of the microbial cellulose utilization field from both fundamental and biotechnological perspectives.
| FUNDAMENTALS |
|---|
|
|
|---|
An important feature of cellulose, relatively unusual in the polysaccharide world, is its crystalline structure. Cellulose is synthesized in nature as individual molecules (linear chains of glucosyl residues) which undergo self-assembly at the site of biosynthesis (86). There is evidence that associated hemicelluloses regulate this aggregation process (19). Approximately 30 individual cellulose molecules are assembled into larger units known as elementary fibrils (protofibrils), which are packed into larger units called microfibrils, and these are in turn assembled into the familiar cellulose fibers.
The arrangement of individual chains within the elementary fibrils has largely been inferred from the fitting of X-ray diffraction data to statistical models that calculate structure based on minimum conformational energy. Individual models are a source of considerable controversy, even in terms of such fundamentals as the orientation of adjacent chains (parallel up versus parallel down) (354, 355, 510). Regardless of their orientation, the chains are stiffened by both intrachain and interchain hydrogen bonds. Adjacent sheets overlie one another and are held together (in cellulose I, the most abundant form of cellulose in nature) by weak intersheet van der Waals forces; despite the weakness of these interactions, their total effect over the many residues in the elementary fibril is considerable (538). The crystalline nature of cellulose implies a structural order in which all of the atoms are fixed in discrete positions with respect to one another. An important feature of the crystalline array is that the component molecules of individual microfibrils are packed sufficiently tightly to prevent penetration not only by enzymes but even by small molecules such as water.
Although cellulose forms a distinct crystalline structure, cellulose fibers in nature are not purely crystalline. The degree of departure from crystallinity is variable and has led to the notion of a "lateral order distribution" of crystallinity, which portrays a population of cellulose fibers in statistical terms as a continuum from purely crystalline to purely amorphous, with all degrees of order in between (427). In addition to the crystalline and amorphous regions, cellulose fibers contain various types of irregularities, such as kinks or twists of the microfibrils, or voids such as surface micropores, large pits, and capillaries (63, 127, 178, 428). The total surface area of a cellulose fiber is thus much greater than the surface area of an ideally smooth fiber of the same dimension. The net effect of structural heterogeneity within the fiber is that the fibers are at least partially hydrated by water when immersed in aqueous media, and some micropores and capillaries are sufficiently spacious to permit penetration by relatively large moleculesincluding, in some cases, cellulolytic enzymes (647, 648).
Purified celluloses used for studies of hydrolysis and microbial utilization vary considerably in fine structural features, and the choice of substrate for such studies undoubtedly affects the results obtained. Holocelluloses such as Solka Floc are produced by delignification of wood or other biomass materials. These materials contain substantial amounts of various hemicelluloses and often have a low bulk density suggestive of some swelling of cellulose fibers. Microcrystalline celluloses (e.g., Avicel and Sigmacell) are nearly pure cellulose, and the dilute-acid treatment used in their preparation removes both hemicelluloses and the more extensive amorphous regions of the cellulose fibers. Commercial microcrystalline celluloses differ primarily in particle size distribution, which (as indicated below) has significant implications for the rate of hydrolysis and utilization. Cellulose synthesized by the aerobic bacterium Acetobacter xylinum has been tremendously useful as a model system for studying cellulose biosynthesis, but has only been used for a few studies of microbial cellulose utilization. Like plant cellulose, bacterial cellulose is highly crystalline, but the two celluloses differ in the arrangement of glucosyl units within the unit cells of the crystallites (20), and genetic evidence suggests that the two celluloses are synthesized by enzymatic machinery that differs considerably at the molecular level (86). The two celluloses also differ substantially in rate of hydrolysis by fungal cellulases (246) and in rate of utilization by mixed ruminal bacteria (602, 731). The variable structural complexity of pure cellulose and the difficulty of working with insoluble substrates has led to the wide use of the highly soluble cellulose ether, carboxymethylcellulose (CMC), as a substrate for studies of endoglucanase production. Unfortunately, the use of CMC as an enzymatic substrate has weakened the meaning of the term "cellulolytic," since many organisms that cannot degrade cellulose can hydrolyze CMC via mixed ß-glucan enzymes (185). Because of the substituted nature of the hydrolytic products, relatively few microbes (including some fungi and Cellulomonas strains) can use CMC as a growth substrate.
Utilization of cellulosic biomass is more complex than is that of pure cellulose, not only because of the former's complex composition (i.e., presence of hemicelluloses and lignin) but also because of the diverse architecture of plant cells themselves. Plant tissues differ tremendously with respect to size and organization. Some plant cell types (e.g., mesophyll) have thin, poorly lignified walls that are easily degraded by polysaccharide-hydrolyzing enzymes. Others, like sclerenchyma, have thick cell walls and a highly lignified middle lamella separating cells from one another. These cell walls must be attacked from the inside (luminal) surface out through the secondary wall (as opposed to particles of pure cellulose, which are degraded from the outside inward). Thus, in addition to constraints imposed by the structure of cellulose itself, additional limitations are imposed by diffusion and transport of the cellulolytic agent to the site of attack. These constraints may severely limit utilization in some habitats (750).
The broad distribution of cellulolytic capability could suggest conservation of a cellulose-degrading capability acquired by a primordial ancestor early in evolutionary development; however, this would seem unlikely, given that the capacity for cellulose biosynthesis did not evolve until much later, with the development of algae, land plants and the bacterium A. xylinum. More likely is the convergent evolution toward a cellulolytic capability under the selective pressure of abundant cellulose availability following the development of cellulose biosynthesis. Evidence for such convergent evolution is discussed below (see "Molecular biology of cellulase enzymes").
Fungi are well-known agents of decomposition of organic matter in general and of cellulosic substrates in particular (94, 462). Fungal taxonomy is based largely on the morphology of mycelia and reproductive structures during various stages of the fungal life cycle rather than on substrate utilization capability. Indeed, systematic characterization of growth substrates has not been carried out for many described fungal species. Therefore, it is currently unclear how broadly and deeply cellulolytic capability extends through the fungal world, and a consideration of the taxonomy of cellulolytic fungi may ultimately prove to be only a slightly narrower topic than consideration of fungal taxonomy in its entirety. Nevertheless, some generalizations can be made regarding the distribution of cellulolytic capabilities among these organisms.
A number of species of the most primitive group of fungi, the anaerobic Chytridomycetes, are well known for their ability to degrade cellulose in gastrointestinal tracts of ruminant animals. Although taxonomy of this group remains controversial (94), members of the order Neocallimastigales have been classified based on the morphology of their motile zoospores and vegetative thalli; they include the monocentric genera Neocallimastix, Piromyces, and Caecomyces and the polycentric genera Orpimomyces and Anaeromyces (376). Cellulolytic capability is also well represented among the remaining subdivisions of aerobic fungi. Within the approximately 700 species of Zygomycetes, only certain members of the genus Mucor have been shown to possess significant cellulolytic activity, although members of this genus are better known for their ability to utilize soluble substrates. By contrast, the much more diverse subdivisions Ascomycetes, Basidiomycetes, and Deuteromycetes (each of which number over 15,000 species [94]), contain large numbers of cellulolytic species. Members of genera that have received considerable study with respect to their cellulolytic enzymes and/or wood-degrading capability include Bulgaria, Chaetomium, and Helotium (Ascomycetes); Coriolus, Phanerochaete, Poria, Schizophyllum and Serpula (Basidiomycetes); and Aspergillus, Cladosporium, Fusarium, Geotrichum, Myrothecium, Paecilomyces, Penicillium, and Trichoderma (Deuteromycetes). For a more detailed consideration of fungal taxonomy and some of its unresolved issues, see reference 94.
When viewed through the lens of microbial physiology, the cellulolytic bacteria can be observed to comprise several diverse physiological groups (Table 1): (i) fermentative anaerobes, typically gram positive (Clostridium, Ruminococcus, and Caldicellulosiruptor) but containing a few gram-negative species, most of which are phylogenetically related to the Clostridium assemblage (Butyrivibrio and Acetivibrio) but some of which are not (Fibrobacter); (ii) aerobic gram-positive bacteria (Cellulomonas and Thermobifida); and (iii) aerobic gliding bacteria (Cytophaga, and Sporocytophaga). Generally, only a few species within each of the above-named genera are actively cellulolytic. The distribution of cellulolytic capability among organisms differing in oxygen relationship, temperature, and salt tolerance is a testament to the wide availability of cellulose across natural habitats. Complicating the taxonomic picture is the recent genomic evidence that the noncellulolytic solventogenic Clostridium acetobutylicum contains a complete cellulosomal gene cluster system that is not expressed, due in part to disabled promoter sequences (606). Examination of the rapidly expanding genomics database may reveal similar surprises in the future.
|
Aerobic cellulose degraders, both bacterial and fungal, utilize cellulose through the production of substantial amounts of extracellular cellulase enzymes that are freely recoverable from culture supernatants (554, 606), although enzymes are occasionally present in complexes at the cell surface (67, 715). The individual enzymes often display strong synergy in the hydrolysis of cellulose. While many aerobic bacteria adhere to cellulose, physical contact between cells and cellulose does not appear to be necessary for cellulose hydrolysis. Kauri and Kushner (322) have shown that separating Cytophaga cells from cellulose via an agar layer or membrane filters appears to enhance cellulose utilization; they suggest that this separation may dilute hydrolytic products, thus relieving catabolite repression of enzyme synthesis. Aerobic cellulolytic bacteria and fungi produce high cell yields characteristic of aerobic respiratory growth, and this has led to considerable technological interest in producing microbial cell protein from waste cellulosic biomass (175, 567, 594, 623). In addition, many studies of aerobic cellulolytic microbes have focused on improving the yield and characteristics of cellulase enzymes. The physiology of the organisms themselves has received surprisingly little study, apart from studies on the effect of growth conditions on enzyme secretion (see, e.g., reference 236).
An interesting point suggested from Table 1 is that cellulose utilization generally proceeds via organisms that are either aerobic or anaerobic, but not both. Indeed, despite the wide distribution of facultatively anaerobic bacteria in general, members of the genus Cellulomonas are the sole reported facultatively anaerobic cellulose degraders (25, 26, 113, 150). Whether the general paucity of facultatively anaerobic groups is a consequence of a physiological or ecological incompatibility of two fundamentally different strategies for cellulose utilization employed by the two groups remains an interesting open question.
It is also notable that most aerobic cellulolytic bacterial species common in soil are classified within genera well known for secondary (non-growth-associated) metabolism, including the formation of distinct resting states (Bacillus, Micromonospora, and Thermobifida) and/or production of antibiotics (Bacillus and Micromonospora) and other secondary metabolites. While antibiotic production in cellulolytic species has not been systematically investigated, production of such compounds might provide additional selective fitness to compensate for their rather modest maximum growth rate on cellulose. An ability to form resting states relatively resistant to starvation or other environmental insult also provides a selective advantage in nature.
For microorganisms to hydrolyze and metabolize insoluble cellulose, extracellular cellulases must be produced that are either free or cell associated. The biochemical analysis of cellulase systems from aerobic and anaerobic bacteria and fungi has been comprehensively reviewed during the past two decades. Components of cellulase systems were first classified based on their mode of catalytic action and have more recently been classified based on structural properties (260). Three major types of enzymatic activities are found: (i) endoglucanases or 1,4-ß-D-glucan-4-glucanohydrolases (EC 3.2.1.4), (ii) exoglucanases, including 1,4-ß-D-glucan glucanohydrolases (also known as cellodextrinases) (EC 3.2.1.74) and 1,4-ß-D-glucan cellobiohydrolases (cellobiohydrolases) (EC 3.2.1.91), and (iii) ß-glucosidases or ß-glucoside glucohydrolases (EC 3.2.1.21). Endoglucanases cut at random at internal amorphous sites in the cellulose polysaccharide chain, generating oligosaccharides of various lengths and consequently new chain ends. Exoglucanases act in a processive manner on the reducing or nonreducing ends of cellulose polysaccharide chains, liberating either glucose (glucanohydrolases) or cellobiose (cellobiohydrolase) as major products. Exoglucanases can also act on microcrystalline cellulose, presumably peeling cellulose chains from the microcrystalline structure (672). ß-Glucosidases hydrolyze soluble cellodextrins and cellobiose to glucose (Fig. 1). Cellulases are distinguished from other glycoside hydrolases by their ability to hydrolyze ß-1,4-glucosidic bonds between glucosyl residues. The enzymatic breakage of the ß-1,4-glucosidic bonds in cellulose proceeds through an acid hydrolysis mechanism, using a proton donor and nucleophile or base. The hydrolysis products can either result in the inversion or retention (double replacement mechanism) of the anomeric configuration of carbon-1 at the reducing end (58, 751).
|
Cellulase systems are not merely an agglomeration of enzymes representing the three enzyme groups (endoglucanases, exoglucanases, and ß-glucosidases, with or without CBMs), but rather act in a coordinated manner to efficiently hydrolyze cellulose. Microorganisms have adapted different approaches to effectively hydrolyze cellulose, naturally occurring in insoluble particles or imbedded within hemicellulose and lignin polymers (683). Cellulolytic filamentous fungi (and actinomycete bacteria) have the ability to penetrate cellulosic substrates through hyphal extensions, thus often presenting their cellulase systems in confined cavities within cellulosic particles (176). The production of "free" cellulases, with or without CBMs, may therefore suffice for the efficient hydrolysis of cellulose under these conditions. The enzymes in these cellulase systems do not form stable high-molecular weight complexes and therefore are called "noncomplexed" systems (Fig. 1A). By contrast, anaerobic bacteria lack the ability to effectively penetrate cellulosic material and perhaps had to find alternative mechanisms for degrading cellulose and gaining access to products of cellulose hydrolysis in the presence of competition from other microorganisms and with limited ATP available for cellulase synthesis. This could have led to the development of "complexed" cellulase systems (called "cellulosomes"), which position cellulase-producing cells at the site of hydrolysis, as observed for clostridia and ruminal bacteria (Fig. 1B). Noncomplexed cellulase systems are discussed first, highlighting the cellulase systems of the aerobic filamentous fungi Trichoderma reesei and Humicola insolens as well as aerobic actinomycetes belonging to the genera Cellulomonas and Thermobifida. The interesting multidomain cellulase systems of anaerobic hyperthermophilic bacteria are mentioned briefly. Thereafter the complexed cellulase systems of anaerobic Clostridium species, Ruminococcus species, and anaerobic fungi are considered.
Noncomplexed cellulase systems. Cellulases from aerobic fungi have received more study than have those of any other physiological group, and fungal cellulases currently dominate the industrial applications of cellulases (235, 492, 614). In particular, the cellulase system of T. reesei (teleomorph:Hypocrea jecorina, initially called Trichoderma viride) has been the focus of research for 50 years (424, 561, 562, 563). T. reesei produces at least two exoglucanases (CBHI and CBHII), five endoglucanases (EGI, EGII, EGIII, EGIV, and EGV), and two ß-glucosidases (BGLI and BGLII (358, 494, 664). Intensive efforts over several decades to enhance cellulase yields have resulted in strains that produce up to 0.33 g of protein/g of utilizable carbohydrate (177). The necessity for the two exoglucanases (cellobiohydrolases) has been attributed to their particular preferences for the reducing (CBHI) and nonreducing (CBHII) ends of cellulose chains of microcrystalline cellulose. This notion has also been supported by the exo-exo synergy observed between these two enzymes (259, 438, 489). Crystallography has elucidated the three-dimensional structures of the two cellobiohydrolases (163, 574). CBHI contains four surface loops that give rise to a tunnel with a length of 50 Å; CBHII contains two surface loops that give rise to a tunnel of 20 Å. These tunnels proved to be essential to the cellobiohydrolases for the processive cleavage of cellulose chains from the reducing or nonreducing ends. The three-dimensional (3-D) structure of CBHI confirmed that cellobiose is the major hydrolytic product as the cellulose chain passes through the tunnel. Occasionally, cellotriose or glucose is released during initial stages of hydrolysis (163). The structure of EGI (structurally related to CBHII) also has been resolved (345) to reveal the presence of shorter loops that create a groove rather than a tunnel. The groove presumably allows entry of the cellulose chain for subsequent cleavage. A similar groove was shown for the structure of EGIII (592), an endoglucanase that lacks a CBM.
Cellobiohydrolase activity is essential for the hydrolysis of microcrystalline cellulose. CBHI and CBHII are the principal components of the T. reesei cellulase system, representing 60 and 20%, respectively, of the total cellulase protein produced by the fungus on a mass basis (756). The important role of CBMs for both enzymes to ensure binding and processivity has been shown clearly (512). However, both the cellobiohydrolases are very slow at decreasing the degree of polymerization of cellulose. Endoglucanases are thought to be primarily responsible for decreasing degree of polymerization by internally cleaving cellulose chains at relatively amorphous regions, thereby generating new cellulose chain ends susceptible to the action of cellobiohydrolases (673). The need for five endoglucanase species in the T. reesei cellulase system has not been clearly explained, particularly considering that endoglucanases (with EGI and EGII as major species) represent less than 20% of the total cellulase protein of T. reesei. Synergism between endoglucanases and cellobiohydrolases has been shown for EGI (693), and EGII (437), and EGIII (489). However, synergism between endoglucanases has not been clearly demonstrated. Part of the problem may be that natural cellulosic substrates are not used for laboratory experiments due to their heterogeneous nature and the true functions of the different endoglucanases may not be observed on purified cellulose. It is noteworthy that some endoglucanases, such as EGI, have broad substrate specificity (e.g., xylanase activity [358]). The presence of CBMs is not essential for endoglucanase activity or for endo-exo synergism (592). Cellobiose, the major product of CBHI and CHBII activity, inhibits the activity of the cellobiohydrolases and endoglucanases (279, 437, 470).
The production of at least two ß-glucosidases by T. reesei facilitates the hydrolysis of cellobiose and small oligosaccharides to glucose. Both BGLI and BGLII have been isolated from culture supernatants, but a large fraction of these enzymes remains cell wall bound (442, 690). The presence of ß-glucosidases in close proximity to the fungal cell wall may limit loss of glucose to the environment following cellulose hydrolysis. T. reesei produces ß-glucosidases at low levels compared to other fungi such as Aspergillus species (560). Furthermore, the ß-glucosidases of T. reesei are subject to product (glucose) inhibition (102, 217, 417) whereas those of Aspergillus species are more glucose tolerant (138, 231, 724, 768). The levels of T. reesei ß-glucosidase are presumably sufficient for growth on cellulose, but not sufficient for extensive in vitro saccharification of cellulose. T. reesei cellulase preparations, supplemented with Aspergillus ß-glucosidase, are considered most often for cellulose saccharification on an industrial scale (560, 644).
The cellulase system of the thermophilic fungus H. insolens possesses a battery of enzymes that allows the efficient utilization of cellulose. The H. insolens cellulase system is homologous to the T. reesei system and also contains at least seven cellulases (two cellobiohydrolases [CBHI and CBHII] and five endoglucanases [EGI, EGII, EGIII, EGV, and EGVI]) (603). However, differences exist, such as the absence of a CBM in EGI of H. insolens. The enzymatic activity of the low-molecular-weight EGIII (also lacking a CBM) observed on different soluble cellulosic substrates was very low, and the natural function of this enzyme still remains unclear (603). Boisset et al. (65) studied the hydrolysis of bacterial microcrystalline cellulose (BMCC), using recombinant CBHI, CBHII, and EGV produced in Aspergillus oryzae, and elegantly showed that a mixture of the three enzymes allow efficient saccharification of crystalline cellulose. Moreover, optimal saccharification was observed when the mixture contained about 70 and 30% of total protein as CBHI and CBHII, respectively. Although the endoglucanase EGV was essential for efficient crystalline cellulose hydrolysis by either CBHI or CBHII, only 1 to 2% of the total protein was needed for maximum efficiency. The combination of all three enzymes yielded more than 50% microcrystalline cellulose hydrolysis. In comparison, the individual enzymes yielded less than 10% microcrystalline cellulose hydrolysis, whereas EGV plus CBHI, EGV plus CBHII, and CBHI plus CBHII yielded approximately 25, 14, and 33% hydrolysis from 3 g of bacterial microcrystalline cellulose (BMCC) per liter, respectively (65).
The white rot fungus Phanerochaete chrysosporium has been used as a model organism for lignocellulose degradation (78). P. chrysosporium produces complex arrays of cellulases, hemicellulases, and lignin-degrading enzymes for the efficient degradation of all three major components of plant cell walls: cellulose, hemicellulose, and lignin (79, 80, 118, 126, 698). Cellulose and hemicellulose degradation occur during primary metabolism, whereas lignin degradation is a secondary metabolic event triggered by limitation of carbon, nitrogen, or sulfur (80). P. chrysosporium produces a cellulase system with CBHII and six CBHI-like homologues, of which CBHI-4 is the major cellobiohydrolase (125, 698). Recently, a 28-kDa endoglucanase (EG28) lacking a CBM was isolated from P. chrysosporium (252). Synergism between the EG28 and the cellobiohydrolases was demonstrated, and it has been suggested that EG28 is homologous to EGIII of T. reesei and H. insolens. No other typical endoglucanase has been isolated from P. chrysosporium. However, Birch et al. (55) reported differential splicing in the CBM-encoding region of the cbh1.2 gene, depending on whether microcrystalline cellulose (Avicel) or amorphous cellulose (CMC) was used as the substrate. They proposed that differential splicing of the cbhI-like genes of P. chrysosporium could yield cellobiohydrolase and endoglucanase activity. Apart from cellobiohydrolases and possible endoglucanase activities, P. chrysosporium also produces cellobiose dehydrogenase that, in the presence of O2, oxidizes cellobiose to cellobionolactone, which reacts spontaneously with water to form cellobionic acid (251, 695). The biological function of cellobiose dehydrogenase has not been clarified, but its binding to microcrystalline cellulose and the enhancement of cellulose hydrolysis have been reported (28, 253). Cellobiose dehydrogenase may help generate hydroxyl radicals that could assist in lignin and cellulose depolymerization (251).
The best-studied species of cellulolytic aerobic bacteria belong to the genera Cellulomonas and Thermobifida (formerly Thermomonospora). Cellulomonas species are coryneform bacteria that produce at least six endoglucanases and at least one exoglucanase (Cex) (99). The individual cellulases of Cellulomonas resemble the cellulase systems of aerobic fungi and contain CBMs; however, cellulosome-like protuberant structures have been noted on Cellulomonas cells grown with cellulose and cellobiose as carbon sources (370, 714). The thermophilic filamentous bacterium Thermobifida fusca (formerly Thermomonospora fusca) is a major cellulose degrader in soil. Six cellulases, three endoglucanases (E1, E2, and E5), two exoglucanases (E3 and E6), and an unusual cellulase with both endoglucanase and exoglucanase activity (E4) have been isolated. The latter enzyme has high activity on BMCC and also exhibits synergism with both the other T. fusca endoglucanases and exoglucanases (304). The E4 enzyme also contains a family III CBM that assists the enzyme in processivity (303). Factorial optimization of the T. fusca cellulase system was undertaken, and the highest synergistic effect was shown with the addition of CBHI from T. reesei (335).
The thermophilic and hyperthermophilic procaryotes represent a unique group of microorganisms that grows at temperatures that may exceed 100°C. Several cellulolytic hyperthermophiles have been isolated during the past decade (48). Surprisingly, no cellulolytic thermophilic archaea have been described, although archaea that can grow on cellobiose and degrade other abundant polysaccharides, such as starch, chitin, and xylan, have been isolated (172, 656). Only two aerobic thermophilic bacteria have been described that produce cellulases: Acidothermus cellulolyticus (an actinomycete) and Rhodothermus (238, 584).
Complexed cellulase systems. Microorganisms producing complexed cellulase systems (cellulosomes) are typically found in anaerobic environments, where they exist in consortia with other microorganisms, both cellulolytic and noncellulolytic. The cellulosome is thought to allow concerted enzyme activity in close proximity to the bacterial cell, enabling optimum synergism between the cellulases presented on the cellulosome. Concomitantly, the cellulosome also minimizes the distance over which cellulose hydrolysis products must diffuse, allowing efficient uptake of these oligosaccharides by the host cell (33, 606).
Cellulosomes are protuberances produced on the cell wall of cellulolytic bacteria when growing on cellulosic materials. These protuberances are stable enzyme complexes that are firmly bound to the bacterial cell wall but flexible enough to also bind tightly to microcrystalline cellulose. Cellulosomes from different clostridia (Clostridium thermocellum, Clostridium cellulolyticum, Clostridium cellulovorans, and Clostridium josui) and Ruminococcus species in the rumen have been studied in detail. The architecture of cellulosomes is similar among these organisms, although cellulosome composition varies from species to species. The cellulosome of the thermophilic C. thermocellum is discussed and briefly compared to those of the mesophilic C. cellulolyticum, C. cellulovorans, and R. albus (606).
The cellulosome structure of C. thermocellum was resolved through a combination of biochemical, immunochemical, ultrastructural, and genetic techniques (33). The cellulosome consists of a large noncatalytic scaffoldin protein (CipA) of 197 kDa that is multimodular and includes nine cohesins, four X-modules (hydrophilic modules), and a family III CBM. The scaffoldin is anchored to the cell wall via type II cohesin domains. A total of 22 catalytic modules, at least 9 of which exhibit endoglucanase activity (CelA, CelB, CelD, CelE, CelF, CelG, CelH, CelN, and CelP), 4 of which exhibit exoglucanase activity (CbhA, CelK, CelO, CelS), 5 of which exhibit hemicellulase activity (XynA, XynB, XynV, XynY, XynZ), 1 of which exhibits chitinase activity (ManA), and 1 of which exhibits lichenase activity (LicB), have dockerin moieties that can associate with the cohesins of the CipA protein to form the cellulosome. The assembly of the catalytic modules onto the scaffoldin, their composition, and their synergistic activity are still poorly understood. It is assumed that the cellulosome composition can vary and that the catalytic domains do not bind to specific cohesins (39). Preferred proximity relationships between specific catalytic domains cannot be excluded. The major exoglucanase, CelS, is always present in the cellulosome (466). CelS is a processive cellulase with a preference for microcrystalline or amorphous cellulose but not for CMC. CelS is thus defined as an exoglucanase and produces predominantly cellobiose with cellotriose as a minor product. Cellobiose acts as a strong inhibitor of CelS (356, 357). CelA is the major endoglucanase associated with the cellulosome (13, 606). Cellulosomes are remarkably stable, large complexes that can vary from 2 to 16 MDa and even up to 100 MDa in the case of polycellulosomes (39, 122, 606). The cellulosomes are extensively glycosylated (6 to 13% carbohydrate content), particularly on the scaffoldin moiety. The glycosyl groups may protect the cellulosome against proteases but may also play a role in cohesin-dockerin recognition (208).
Cellulosome preparations from C. thermocellum are very efficient at hydrolyzing microcrystalline cellulose (see "Rates of enzymatic hydrolysis" below). The high efficiency of the cellulosome has been attributed to (i) the correct ratio between catalytic domains that optimize synergism between them, (ii) appropriate spacing between the individual components to further favor synergism, and (iii) the presence of different enzymatic activities (cellulolytic or hemicellulolytic) in the cellulosome that can remove "physical hindrances" of other polysaccharides in heterogeneous plant cell materials.
Electron microscopy indicated that cellulosomes are compact "fist"-like structures that open when attaching to microcrystalline cellulose, allowing local spreading of the catalytic domains (Fig. 1B). Between the cellulosome and the cell wall is a stagnant region in which contact corridors and/or glycocalyces may be present, through which oligosaccharides remain in close proximity to the cell, restricting diffusion into the environment (31). Cellobiose and soluble cellodextrin transport are considered below (sec "Physiology of cellulolytic microorganisms").
Cellulosome architecture in other clostridia is less complex. Taxonomically, the mesophilic C. cellulolyticum and C. josui belong, together with the thermophilic C. thermocellum, to group III within the Clostridiaceae. C. cellulovorans and C. acetobutylicum belong to group I within the Clostridiaceae and are more distant from C. thermocellum and C. cellulolyticum; however, the cellulosome components of C. cellulovorans are surprisingly similar to those of C. cellulolyticum. The cellulosome genes of these clostridia are clustered, and Tamaru et al. (665) suggested that C. cellulovorans could have acquired its cellulosome gene cluster through horizontal gene transfer from a common ancestor. The cellulosome of C. cellulolyticum is the best understood among mesophilic clostridia and is discussed as a model system here (43).
C. cellulolyticum cellulosomes vary from 600 kDa to about 16 MDa and, apart from the scaffoldin (CipC), may contain at least 13 distinct catalytic proteins. The CipC scaffoldin contains eight cohesins, two X-modules, and a family III CBM. As with C. thermocellum, exoglucanases form the major catalytic domains present in the cellulosome. CelE and CelF, exoglucanases (cellobiohydrolases) with opposite processivity, are always present in the C. cellulolyticum cellulosome (206, 513). The crystal structure of CelF revealed the presence of a tunnel (characteristic of processive exoglucanases); however, the tunnel may open into a cleft to allow endoglucanase-like entry of a cellulose chain in amorphous cellulose. CelF is thus considered a processive endoglucanase (514) with cellobiose as major product. Initially, small amounts of cellotriose are released, as observed for CelS of C. thermocellum. The ability of CelF to act on the interior of a cellulose chain may shed light on the question of how the cellulosome retains two processive enzymes attached to the scaffoldin and working in opposite directions.
ExgS is the major exoglucanase in the cellulosome of C. cellulovorans (166). ExgS is homologous to CelS of C. thermocellum and CelF of C. cellulolyticum (family 48 cellulases) and probably fulfills a similar function in the cellulosome. It is important to note that the cohesin-dockerin recognition is species specific (436). Fiérobe et al. (188) used this feature of C. thermocellum and C. cellulolyticum cellulosomes to engineer chimeric miniscaffoldins and chimeric catalytic domains, and they elegantly demonstrated two- to threefold synergism between the CelA endoglucanase and CelF exoglucanase of C. cellulolyticum when associated with miniscaffoldins. Determination of the genome sequence of the noncellulolytic C. acetobutylicum surprisingly revealed a cellulosome gene cluster (495). The noncellulolytic C. acetobutylicum can hydrolyze CMC but not amorphous or microcrystalline cellulose (621). It is tempting to speculate that C. acetobutylicum was once cellulolytic or that it fortuitously acquired the cellulosome gene cluster through horizontal gene transfer. Clostridium stercorarium is the only species from group III for which no cellulosome has been observed (606).
Ruminal bacteria of the genus Ruminococcus are phylogenetically related to, but do not fall within, the family Clostridiaceae. Recently, the presence of dockerin-like sequences in at least seven of the cellulase and xylanase genes of Ruminococcus flavefaciens and the production of 1.5-MDa cellulosome-like structures on the R. albus cell surface in the presence of cellobiose and organic acids (phenylacetic and phenylpropionic acid) suggested that Ruminococcus species indeed produce cellulosomes (162). A large protein of 250 kDa was isolated from R. albus cellulosomes, suggesting a possible large scaffoldin. The structure of the R. albus cellulosomes differs from that of the clostridia, suggesting an independent evolutionary path (336, 500). Fibrobacter succinogenes S85 is another efficient cellulolytic bacterium isolated from the rumen that, like the ruminococci, actively adheres to cellulose (184). Although the cellulases of F. succinogenes are cell associated, no cellulosome structures have been identified, and it would be interesting to know whether cellulose hydrolysis is mediated by cellulosomes in this actively cellulolytic anaerobe.
Anaerobic chytrid fungi are only found in the rumens of herbivorous animals (509) and produce highly active cellulases (68, 103, 745, 759). High-molecular-weight complexes with high affinity for microcrystalline cellulose have been isolated from Piromyces sp. strain E2. Conserved noncatalytic repeat peptide domains have been identified in cellulases and xylanases from Neocallimastix and Piromyces species and are thought to provide a docking function (180, 385). Recently, Steenbakkers et al. (639) used PCR primers based on DNA sequences that encode these 40-amino-acid cysteine-rich docking domains to recover the genes of several cellulosome-like components. Preliminary data indicate the presence of multiple scaffoldins; however they have not yet been isolated from culture fractions (639). Evidence is thus mounting that anaerobic fungi also utilize cellulosomes for hydrolysis of crystalline cellulose. Evolutionary convergence might have occurred between the anaerobic fungi and clostridia. However, the 40-amino-acid dockerin sequence of the anaerobic fungi differs significantly from those of the clostridia, suggesting independent development of the cellulosomes of anaerobic fungi.
Highly cellulolytic anaerobic hyperthermophiles are found in the genera Thermotoga (386) and Caldicellulosiruptor (549), and cellulases isolated from these organisms are often highly thermostable (66). A peculiar feature of the Caldicellulosiruptor hydrolases is the multidomain and multicatalytic nature of these "megazymes." Many of these megazymes contain five or more domains, which can include a variety of cellulases, hemicellulases, and CBMs (48, 211). The megazymes differ in the number and position of catalytic domains and CBMs and could have evolved via domain shuffling. It is also tempting to speculate that the megazymes are primitive alternatives to operons, and could realize advantages associated with cellulosomes, such as facilitating synergism between different catalytic domains firmly attached to microcrystalline cellulose via multiple CBMs.
Glycoside hydrolase families.
Proteins are designated according to their substrate specificity, based on the guidelines of the International Union of Biochemistry and Molecular Biology (IUBMB). The cellulases are grouped with many of the hemicellulases and other polysaccharidases as O-glycoside hydrolases (EC 3.2.1.x). However, some of the auxiliary enzymes involved, particularly in hemicellulose hydrolysis, also belong to the group of glycosyltransferases (EC 2.4.1.x). Traditionally, the glycoside hydrolases and their genes were named at random. The classification of the glycoside hydrolases has become insufficient, with several thousand glycoside hydrolases identified during the last decade alone. An alternative classification of glycoside hydrolases into families was suggested based on amino acid sequence similarity (254). This classification has been updated several times (255, 256), but with the exponential growth in the number of glycoside hydrolases identified, Coutinho and Henrissat have begun to maintain and update the classification of glycoside hydrolases families at the Expasy server (http://afmb.cnrs-mrs.fr/
pedro/CAZY/db.html) (124). Families were defined based on amino acid sequence similarities. There is usually a direct relationship between the amino acid sequence and the folding of an enzyme, and as the tertiary structures of many proteins were added, it became clear the families contain basic enzyme folds (257). At the latest update (26 July 2001), more than 5000 glycoside hydrolases were grouped into 86 families. Thus far, CBMs have been divided into 28 families.
Classification of glycoside hydrolases into structurally determined families provides valuable insights that extend and complement the functionally oriented IUBMB classification. The family classification scheme reflects the structural features of the enzymes, which are more informative than substrate specificity alone because the complete range of substrates is only rarely determined for individual enzymes. Once a 3-D structure in a family is known, it can be used to infer the structures of other members of the family. Tertiary structure, particularly at the active site, dictates the enzyme mechanism, and thus families also contain members whose enzyme mechanism is either inverting or retaining. Often enzymes contain multiple domains that belong to the glycoside hydrolase and glycosyltransferase groups. Classification into families defines the modules of such enzymes and resolves the contradiction about substrate specificity for multifunctional enzymes. The family classification also sheds light on the evolution of the glycoside hydrolases. Some families contain enzymes with different substrate specificities; for example, family 5 contains cellulases, xylanases, and mannanases. This suggests divergent evolution of a basic fold at the active site to accommodate different substrates. At the same time cellulases (hydrolyze ß-1,4-glycosidic bonds) are found in several different families [families 5, 6, 7, 8, 9, (10), 12, 44, 45, 48, 61, and 74], suggesting convergent evolution of different folds resulting in the same substrate specificity. Some families are deeply rooted evolutionarily, such as family 9, which contains cellulases of bacteria (aerobic and anaerobic), fungi, plants (141), and animals (protozoa and termites [723]). In contrast, family 7 contains only fungal hydrolases whereas family 8 contains only bacterial hydrolases. Furthermore, cellulases from several families, and thus from different folds with either an inverting or retaining mechanism, are found in the same microorganism (for example, the C. thermocellum cellulosome contains endoglucanases and exoglucanases from families 5, 8, 9, and 48 [621]). Cellulases are thus a complex group of enzymes that appear to have evolved through convergence from a repertoire of basic folds. It is tempting to speculate that the pervasive diversity within the cellulase families reflects the heterogeneity of cellulose and associated polysaccharides within plant materials and diversity of niches where hydrolysis takes place. It might be also that nature has not yet fully optimized enzymes for the efficient hydrolysis of recalcitrant insoluble microcrystalline cellulose. In the era of microbial genomics, the large body of information obtained about sequence-structure relationships of existing members of the glycoside hydrolase families allows for the searching of putative glycoside hydrolases in cellulolytic and noncellulolytic microorganisms for which the genome sequence has been determined (258).
The classification of cellulases and other plant cell wall-hydrolyzing enzymes into families not only allows access to information on the structure, mechanism, and evolutionary origin but also structurally orders the ever-increasing list of newly identified hydrolases into functional groups. Henrissat et al. (260) recently proposed a new nomenclature for hydrolases in which the first three letters designate the preferred substrate, the next digits designate the glycoside hydrolase family, and the following capital letters indicate the order in which the enzymes were first reported. For example, the three enzymes CBHI, CBHII, and EGI of T. reesei are designated Cel7A (CBHI), Cel6A (CBHII), and Cel6B (EGI). When more than one catalytic domain is present, it is reflected in the name, such as Cel9A-Cel48A for the two catalytic domains of CelA of Caldocellulosiruptor saccharolyticus. However, researchers have still not completely embraced this new nomenclature. Two possible reasons for this could be (i) hesitance to let go of the established nomenclature and (ii) lack of substrate specificity, for instance the distinction between endoglucanase and exoglucanase/cellobiohydrolase activity. Because this discussion of cellulase systems and their components focuses primarily on catalytic functionality rather than structural relationships, the older, functionally based nomenclature is used below as we consider the molecular biology of cellulase enzymes.
cbh1 mutant increased the transcription of cbh2 more than twofold. Fowler and Brown (194) revealed that deletion of the bgl1 gene, which encodes the extracellular ß-glucosidase BGL1, resulted in decreased endoglucanase activities and a lag in the transcription of cbh1, cbh2, egl1, and egl3, suggesting that a ß-glucosidase may be partially responsible for formation of the inducer as well. As early as 1962, sophorose (ß-1,2-glucobiose) was identified as a strong inducer of cellulases in T. reesei (423). It is assumed that sophorose is formed via the transglycosylation of cellobiose by a ß-glucosidase, possibly BGLII of T. reesei (358, 691). However, it has not been demonstrated conclusively that sophorose is the natural inducer of cellulase production. Cellobiose,
-cellobiose-1,5-lactone, and other oxidized products of cellulose hydrolysis, or even xylobiose resulting from xylan hydrolysis, have not been ruled out as the natural inducer(s). Moreover, the possibility that cellobiose functions as an inducer is more complex because at high levels it inhibits cellulase production (358). It should also be noted that not only the production of cellulases but also the production of hemicellulases is induced, presumably reflecting the intertwined occurrence of these polymers in nature (429). Production of cellulases by T. reesei is regulated at the transcriptional level. Expression of the cellulase genes (cbh1, cbh2, egl1, egl2, and egl5) of T. reesei QM9414 is coordinated through transcription factors (296). The genes encoding the transcriptional factors ACEI (585) and ACEII (18) were identified based on their ability to bind to the T. reesei cbh1 promoter region and subsequently their DNA sequences were determined. ACEII is homologous to XlnR (700), a transcriptional activator identified in Aspergillus niger, and ACEII also stimulates the expression of cellulase and xylanase genes.
The general carbon catabolite repressor protein CRE1 represses the transcription of cellulase genes in T. reesei (295, 649, 663). The cellulase-hyperproducing T. reesei strain Rut C-30 has a cre1 mutation and still produces cellulases in the presence of glucose (295). The production of cellulases is repressed by CRE1 in the presence of glucose, but a basal level of production occurs in the absence of glucose (93). A link between catabolite repression and the energy status of the cell may exist. A study of four filamentous fungi revealed that extracellular cellulase was repressed at intracellular ATP concentrations above 10-7 mg/ml and that cyclic AMP (cAMP) played a role in derepression of enzyme synthesis (720). Basal levels of cellulase production presumably allow the production of the inducer through limited cellulose hydrolysis, which in turn mediates further induction of cellulase production. The mechanism by which sophorose or other inducers stimulate transcription through the transcriptional activators ACEI and ACEII is not clear yet.
Expression of the cellulase genes of T. fusca is also regulated at two levels: induction by cellobiose and catabolite repression in the presence of glucose (746). CelR represses cellulase production in the absence of cellulose or cellobiose. However, cellobiose acts as an inducer and inactivates CelR, thereby facilitating its dissociation from promoters allowing transcription of cellulase genes (635). Catabolite repression of cellulase genes occurs in the presence of glucose and may be regulated by cAMP levels, as indicated by studies done with Thermobifida curvata (747, 760). Various strains of Cellulomonas have been reported to produce high yields of cellulase on cellulosic substrates and lower yields on xylan, galactomannan, starch, and sugars (543). These data suggest that constitutive production of cellulases at basal levels occurs in the absence of glucose and that cellulase production is subjected to catabolite repression. Cellulosic substrates, as well as cellobiose and xylose, at moderate levels of 0.05 to 0.2 g/liter, serve as inducers for cellulase production (566).
Cellulosome formation in C. thermocellum occurs under carbon-limited conditions, with conflicting statements in the literature about whether induction is important (39, 458, 621). In cellobiose-grown C. thermocellum, celA, celD, and celF were detected during late exponential and early stationary phase, whereas celC occurred primarily in early stationary phase (455). Cellulase production is thus presumably down-regulated via catabolite repression. However, the composition of the cellulosome may be influenced by the carbon source used; for example, the major exoglucanase CelS is more prominent when cells are grown on cellulose than when they are grown on cellobiose (315, 458, 621). The cellulosome of C. cellulovorans is produced on cellulose but not on soluble carbohydrates such as glucose, fructose, cellobiose, or even CMC (60). However, C. cellulovorans grown on cellobiose and CMC does exhibit a high cellulase activity and transcription of cellulase genes. This suggests that cellulases may be produced on certain soluble carbohydrates, such as cellobiose, but that (poly)cellulosome assembly and detachment from the cell wall need some "triggering" from the presence of insoluble microcrystalline cellulose (60, 166, 665). Analysis of mRNA transcripts in the ruminal bacterium R. flavefaciens FD-1 has revealed somewhat contradictory results. Doerner et al. (164) have reported that the celA and celC genes were expressed constitutively while expression of the celB and celD was induced by cellulose. However, Wang et al. (721) have reported that the cellodextrinase celA and celE genes are both induced by cellulose.
Organization of cellulase genes. The genes encoding cellulases are chromosomal in both bacteria and fungi. In the fungi, cellulase genes are usually randomly distributed over the genome, with each gene having its own transcription regulatory elements (683). Only in exceptional cases, such as for P. chrysosporium, are the three cellobiohydrolase-like genes clustered (126). A comparison of the promoter regions of cbh1, cbh2, eg1, and eg2 of T. reesei reesei revealed the presence of CRE1-binding sites through which catabolite repression is exerted (358). ACEI and ACEII activate transcription by binding to at least the cbh1 promoter region (18, 585).
In bacteria, the cellulase genes are either randomly distributed (e.g., in C. thermocellum [228]) or clustered on the genome (e.g., in C. cellulolyticum, C. cellulovorans, and C. acetobutylicum [42, 43]). The cellulase gene cluster of C. cellulovorans is approximately 22 kb in length and contains nine cellulosomal genes with a putative transposase gene in the 3' flanking region. Similar arrangements have also been found in the chromosome of C. cellulolyticum and C. acetobutylicum, suggesting the presence a common bacterial ancestor to these mesophilic clostridia or the occurrence of transposon-mediated horizontal gene transfer events. Transcriptional terminators could be identified within these large gene clusters; however, promoter sequences have not yet been found (665).
Both cellulolytic bacteria and fungi (aerobic and anaerobic) primarily contain multidomain cellulases, with single-domain cellulases being the exception (e.g., EGIII of T. reesei and EG 28 of P. chrysosporium [252, 592]). The most common modular arrangements involve catalytic domains attached to CBMs through flexible linker-rich regions. The CBM module can be either at the N or C. terminus; the position is of little relevance when considering the tertiary structure of the protein. This arrangement is found predominantly in noncomplexed cellulase systems. The enzymes of complexed systems (anaerobic bacteria and fungi) are more diverse. Cellulosomal enzymes contain at least a catalytic domain linked to a dockerin. However, some enzymes contain multiple CBMs, a immunoglobulin-like domain (e.g., for CelE of C. cellulolyticum) (206), and a fibronectin type III domain (CbhA of C. thermocellum) (785). The most complex enzymes are those of the extremely thermophilic bacteria (48). The megazymes of the anaerobic hyperthermophile Caldicellulosiruptor isolate Tok7B.1 often have two catalytic domains, usually a cellulase and a hemicellulase domain (combinations from glycoside hydrolases families 5, 9, 10, 43, 44, and 48), linked through several CBM domains (211).
Gene duplication and horizontal gene transfer. The large number of homologous cellulase genes observed within cellulolytic organisms, between related organisms, or between distant organisms within a niche environment, such as the rumen, suggest that chromosomal rearrangements and horizontal gene transfer contributed to the current rich repertoire of cellulase systems available. The presence of CBH1-like gene clusters in P. chrysosporium (126) and the highly homologous CelK and CbhA exoglucanases in C. thermocellum (786) suggests more recent gene duplication events. The formation of cellulases from the same family within a species but with different cellulase activity, such as EGI (Cel6B) and CBHII (Cel6A) of T. reesei, could represent more distant gene duplications, followed by substrate specificity divergence. The development of polyspecific families, such as the cellulases and hemicellulases in family 5, may represent common ancestor genes that underwent gene duplication followed by substantial divergence with regard to substrate specificity. Examples are the CelE (endoglucanase) and CelO (cellobiohydrolase) of C. thermocellum (621), as well as EGIII (endoglucanase) (587) and MANI (mannanase) in T. reesei (638). The different arrangement of catalytic domains and CBMs in the megazymes of the hyperthermophilic bacteria in all likelihood originated from intergenic domain shuffling through homologous or unequal crossover recombination events (48).
The role of horizontal gene transfer in the evolution of cellulase systems has been expected, but only recently has evidence of such events started to accumulate. The possibility that the cellulosomal gene cluster of C. cellulovorans could have been acquired through a transposase-mediated transfer event was discussed (665). The absence of introns in the glycoside hydrolase genes of the anaerobic fungi (in contrast to aerobic fungi, which contain introns in their glycoside hydrolase genes) raised suspicion that the anaerobic fungi acquired their genes from bacteria. Garcia-Vallvé et al. (203) systematically performed sequence homology analysis between the glycoside hydrolase protein and DNA sequences of the anaerobic fungi and the ruminal bacterium F. succinogenes. They also examined the G+C content and codon bias of the glycoside hydrolase genes of anaerobic fungi as well as the phylogenetic trees derived from the multialignment of orthologous sequences. Their analysis showed that the anaerobic fungi in all likelihood acquired the genes for cellulase systems from bacteria. The high microbial density in the rumen (1010 to 1011 bacteria per ml of ruminal fluid) and the consequent close proximity between ruminal bacteria and fungi, provide ideal conditions for horizontal gene transfer events to occur. Horizontal gene transfer has been demonstrated in the rumen (468, 483), suggesting genome plasticity in this niche environment that could also allow the anaerobic fungi to acquired new genes (430).
Some of the more recently described anaerobic cellulolytic species (Anaerocellum thermophilum [659], C. saccharolyticus [549], and Halocella cellulolytica [622] display somewhat wider carbohydrate utilization spectra, with compounds such as starch and various monosaccharides variously reported to serve as substrates.