Charles A. Dana Research Institute for Scientists Emeriti, Drew University, Madison, New Jersey,1 Department of Chemical Engineering, University of Rochester, Rochester, New York2
SUMMARY INTRODUCTION Need for Alternative Energy Source THE CLOSTRIDIUM THERMOCELLUM CELLULASE SYSTEM Strains and Media Properties of the Cellulase System THE CLOSTRIDIUM THERMOCELLUM CELLULOSOME Components SL and SS Cellulosome Structure Cohesins and Dockerins Cellulose-Binding Domain Attachment of Cellulosomes to the Cell Surface Assembly of the Cellulosome Genes and Enzymes Dockerin-Containing Proteins CARBON SOURCE NUTRITION REGULATION OF CELLULASE PRODUCTION Carbon Source Regulation Cellulase Gene Clusters Negative Regulation CLOSTRIDIAL COCULTURES APPLICATION OF RECOMBINANT DNA TECHNOLOGY TO THE SELECTIVITY PROBLEM CLOSING COMMENTS ACKNOWLEDGMENTS REFERENCES
| SUMMARY |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
With the hike in oil prices around the world in the 1970s and the realization that the world's oil supply is finite, the quest for alternative fuels began in 1975, and researchers looked for economical ways to produce ethanol, preferably from abundantly available, biodegradable, and renewable raw materials. Ethanol is an excellent transportation fuel, in some respects superior to gasoline (199, 200). In particular, with respect to gasoline, neat (unblended) ethanol burns more cleanly, has a higher octane rating, can be burned with greater efficiency, is thought to produce smaller amounts of ozone precursors (thus decreasing urban air pollution), and is particularly beneficial with respect to low net CO2 put into the atmosphere. Furthermore, ethanol by fermentation offers a more favorable trade balance, enhanced energy security, and a major new crop for a depressed agricultural economy. Ethanol is considerably less toxic to humans than is gasoline (or methanol). Ethanol also reduces smog formation because of low volatility; its photochemical reactivity and that of its combustion products are low, and only low levels of smog-producing compounds are formed by its combustion (359). Its high heat of vaporization, high octane rating, and low flame temperature yield good engine performance. Cellulosics, at $42 per dry ton, cost the same as petroleum, at 6 to $7 per barrel based on equivalent mass or 12 to $13 per barrel based on equivalent energy content (203, 356); today, the price of petroleum is much higher. It appears probable that ethanol from cellulose could become competitive with gasoline if the processing costs for the former could be lowered. Ethanol is also used as an oxygenate to reduce automobile emissions. With the current and impending phase-out of methyl tert-butyl ether as an oxygenate in many states in the United States, ethanol will fill the void. Useful reviews on the biological conversion of lignocellulosic biomass to ethanol have been published (183, 196, 198, 200, 201, 203, 204, 206, 224, 356).
The potential quantity of ethanol that could be produced from cellulose is over an order of magnitude larger than that producible from corn. In contrast to the corn-to-ethanol conversion, the cellulose-to-ethanol route involves little or no contribution to the greenhouse effect and has a clearly positive net energy balance (five times better). As a result of such considerations, microorganisms that metabolize cellulose have gained prominence in recent years (38, 329).
Lignocellulose is difficult to hydrolyze because (i) it is associated with hemicellulose, (ii) it is surrounded by a lignin seal which has a limited covalent association with hemicellulose, and (iii) much of it has a crystalline structure with a potential formation of six hydrogen bonds, four intramolecular and two intermolecular, giving it a highly ordered, tightly packed structure (344). Pretreatments aim at increasing the surface area of cellulose by (i) removing the lignin seal, (ii) solubilizing hemicellulose, (iii) disrupting crystallinity, and/or (iv) increasing pore volume. The value of a cellulase system that attacks crystalline cellulose lies in the observation that many of the pretreatments which increase surface area also increase crystallinity. These include dilute sulfuric acid, alkali, and ethylenediamine.
The rate-limiting step in the conversion of cellulose to fuels is its hydrolysis, especially the initial attack on the highly ordered, insoluble structure of crystalline cellulose, since the products of this attack are readily solubilized and converted to sugars. A great deal of effort has gone into the development of methods for conversion of cellulose to sugars. Most of this work has emphasized the biochemistry, genetics, and process development of fungi (especially Trichoderma reesei) coupled to the further conversion of the sugars produced to ethanol by yeast (Saccharomyces cerevisiae). After many years of study, it is becoming apparent that such a process is not the only potential solution.
The large amounts of attention and resources devoted over the past 30 years to the Trichoderma-Saccharomyces concept have perhaps interfered in the industrial development of the potential of the cellulolytic, thermophilic anaerobic bacteria such as Clostridium thermocellum (49, 51, 66, 81, 216) and their cellulases and hemicellulases (28, 32, 78, 195, 204, 274, 301, 315). C. thermocellum breaks down cellulose, with the formation of cellobiose and cellodextrins as main products. Cellobiose, a disaccharide of two glucose moieties held together by a ß-1,4 linkage, can be further utilized by the organism, and the final end products are ethanol, acetic acid, lactic acid, hydrogen, and carbon dioxide (181) (Fig. 1). Small cellodextrins can also be taken into the cell, broken down further, and metabolized (203). The interest in this organism is due to several factors (Table 1). First, C. thermocellum can utilize lignocellulosic waste and generate ethanol, a rare property among living organisms. Indeed, the cellulase system of C. thermocellum mediates hydrolysis of lignin-containing materials such as hardwood pretreated with dilute acid as well as model substrates not containing lignin (196, 198). Second, for large-scale culture, anaerobiosis is an advantage because one of the most expensive steps in industrial fermentations is that of providing adequate oxygen transfer, e.g., for cellulase production. Third, since the optimum temperature for the cultivation of the organism is 60°C, the problems of contamination are lessened and the cooling of large fermentors is much simplified. Fourth, growth at a high temperature facilitates the recovery of ethanol. Fifth, thermophiles are thought to be robust microorganisms and contain stable enzymes. Sixth, anaerobes generally have a low cell yield, and hence more of the substrate is converted to ethanol. Experience suggests that even more advantageous are in situ cellulase production and the high rates of growth on and metabolism of cellulose and hemicellulose. Since the carbon dioxide produced during fermentation and fuel use of ethanol is recycled by growth of plants, ethanol does not contribute to carbon dioxide accumulation in the atmosphere and possible global warming (359).
|
|
Cellulosic biomass in the early 1980s, at 20 to $30/dry ton, was much cheaper than corn, at $110/dry ton, yet cellulose hydrolysis was not competitive with starch hydrolysis to produce sugar. The problems included cellulose crystallinity and lignin content, requiring expensive pretreatment. Also, corn usage provided oil and protein by-products, which were of much higher value than lignin (247). By 1990, the cost of ethanol production from cellulose (by Trichoderma plus yeast) was thought to be approaching that of production from corn (by yeast) on an unsubsidized basis (130, 196, 200). However, neither ethanol produced from corn nor ethanol produced from cellulose is currently competitive with conventional liquid fuels. A state-of-the-art process designed for ethanol production from cellulose had a selling price of U.S. $1.35 per gallon in 1988 (200), $1.18 per gallon in 1996 (201), and $1.20 in 2003 (356). For production from corn, the price in 1994 was $1.22 per gallon and that in 2001 was $1.65 (355, 357). However, a price of 50 cents per gallon would have been required for ethanol to compete with gasoline in the early 1990s, 67 cents per gallon would have been required in 1994, and 70 cents per gallon would have been required in 2000. With improved technology applied to the microbial biomass-to-ethanol technology, it was projected that the selling price of pure ethanol could be 50 cents per gallon and in the best case could be as low as 34 cents per gallon (201). Thus, substantial cost reductions are possible and could make biological ethanol a competitive neat transportation fuel (206).
The cost of the cellulase in the Trichoderma-yeast process is still prohibitively high, whereas for a direct clostridial coculture process, the enzymes cost very little because they are produced by the fermenting organism in the course of ethanol production. The direct fermentation of cellulose to ethanol could save 50 cents per gallon compared to a state-of-the-art Trichoderma-yeast simultaneous saccharification and fermentation process, since the former process combines cellulase production, hydrolysis, and fermentation in a single bioreactor (130). Conversion of mixed hardwood flour to ethanol in a continuous fermentor was 2.5 times higher with C. thermocellum than with the simultaneous saccharification and fermentation process using Trichoderma cellulase, ß-glucosidase, and S. cerevisiae (310); furthermore, the rate of conversion was four times higher. Mixed cultures of thermophilic anaerobic bacteria offer the further potential of decreasing the production costs of lignocellulose conversion to ethanol by twofold (198). With resources dedicated to the exploitation of these bacteria, the conversion of agricultural, forest, and urban resources into ethanol could become an economic substitute for petroleum fuels when oil prices are about U.S. $30 or more per barrel. This new technology could provide a profitable outlet for renewable resources sooner than would occur by waiting for the Trichoderma-yeast process to become economical by gradual increases in the price of petroleum. The primary advantages of a direct clostridial conversion include elimination of capital or operating costs for enzyme production, greatly reduced diversion of substrate for enzyme production, and compatible enzyme and fermentation systems. Moreover, with the increased use of ethanol as an octane enhancer replacing tetraethyl lead and as an oxygenate replacing methyl tert-butyl ether, the development of an anaerobic bacterial process using waste feedstocks is likely to be justified at current petroleum prices.
| THE CLOSTRIDIUM THERMOCELLUM CELLULASE SYSTEM |
|---|
|
|
|---|
Unlike fungal cellulases, the C. thermocellum cellulase complex has very high activity on crystalline cellulose (140); i.e., it has "true cellulase activity" (also called Avicelase) which is characterized by its ability to completely solubilize crystalline forms of cellulose such as cotton and Avicel (141). This unique cellulase system has been studied by a number of groups biochemically, immunologically, and via molecular biological techniques. The complex is comprised of: (i) numerous endo-ß-glucanases which are responsible for the random breakdown of amorphous types of cellulose, including CMC and TNP-CMC (4, 30, 31, 41, 62, 80, 93, 122, 143-145, 162, 221, 239, 240, 249, 267, 268, 276, 295, 296, 304, 307, 312); (ii) at least four exoglucanases (152, 222, 234, 308, 342, 372); (iii) a cellobiose phosphorylase that breaks down cellobiose to glucose and glucose-1-phosphate (5); (iv) a cellodextrin phosphorylase that phosphorylyzes ß-1,4-oligoglucans (300); and (v) two ß-glucosidases that hydrolyze cellobiose to glucose (3, 4, 104). C. thermocellum also possesses at least six xylanases (107, 232), two lichenases (367), two laminarinases (332), and minor activities of ß-xylosidase, ß-galactosidase, and ß-mannosidase (164). A strain of C. thermocellum has been shown to degrade pectin and probably produces pectin lyase, polygalacturonate hydrolase, and pectin methylesterase (314). We found the activity responsible for hydrolysis of crystalline cellulose, unlike the endoglucanases, to be inhibited by cellobiose (141); glucose had no such inhibitory effect.
In our experiments (141), filter paper was found to be the preferred substrate for Trichoderma cellulase, whereas cotton was the best substrate for the clostridial enzyme complex. This high activity on a highly ordered substrate reflects C. thermocellum's ability to proliferate under thermophilic, anaerobic conditions on partially digested plant tissues. As an anaerobe depending solely on glycolysis for its cellular energy, C. thermocellum cannot afford to produce large quantities of extracellular proteins. Thus, it makes a cellulase complex with about 50-fold-higher specific activity than that of Trichoderma under the assay conditions used; the latter is excreted at a high extracellular concentration but is rather weak in activity.
For a long time, it was puzzling that C. thermocellum could grow well on crystalline cellulose whereas its extracellular fluid only poorly degraded crystalline cellulose (e.g., Avicel). When the organism was grown on agar containing Avicel, we found clear zones around the colonies, indicating production of an active, extracellular enzyme. We finally were able to solve this problem. We found the extracellular cellulase activity from C. thermocellum to have three unusual properties that had not been previously seen with cellulase preparations from other bacteria or fungi and which made it a uniquely active enzyme complex. (i) This thermophilic cellulase complex contains sulfhydryl groups that are essential either for the saccharification of crystalline celluloses such as cotton fibers, filter paper, or Avicel or for its structural stability (142). This property requires that the cellulase complex be used under reducing conditions and renders the activity susceptible to oxidation and sulfhydryl inactivation. When the cellulase activity is protected by reducing agents such as dithiothreitol (DTT), cysteine, sodium dithionate, glutathione, or mercaptoethanol, it displays high specific activity. (ii) The complex has a requirement for Ca2+. Cotton, Avicel, and filter paper are completely solubilized under reducing conditions in the presence of Ca2+. Although its activity was stimulated by 10 mM DTT, it was inhibited by a lower DTT concentration (0.1 to 1 mM). We found this to be due to autooxidation of low DTT producing H2O2, which inactivates the enzyme under aerobic or anaerobic conditions (139). The inactivation was prevented by catalase but not by superoxide dismutase or hydroxyl radical scavengers. As expected, sulfhydryl oxidizing agents [e.g., o-iodosobenzoate and 5,5'-dithio-bis-(2 nitrobenzoic acid)] inhibited the activity, and the inhibition was reversed by 10 mM DTT. Furthermore, nonoxidizing SH reagents such as N-ethylmaleimide, iodoacetate, and p-chloromercuribenzoate were also inhibitory. Since oxidation of thiol groups involves metals, it was not surprising that a low concentration of EDTA (1 mM) inhibited inactivation by low DTT concentration whereas H2O2 and Cu2+ stimulated inactivation. The component of the cellulase complex that was susceptible to sulfhydryl inactivation appeared not to be an endoglucanase, since endoglucanase activity is unaffected by oxidation or thiol reagents. It probably is exoglucanase CelS (also known as SS, equivalent to component S8 of the complex) whose stability is increased by Ca2+ and thiols and whose activity is inhibited by cellobiose (234). Another exoglucanase (CBH3) has a molecular mass (78 kDa) similar to that of CelS but some different properties, such as substrate specificity and less inhibition by cellobiose; it also is protected by Ca2+ and DTT from high temperature inactivation (308). It should be noted, however, that some endoglucanases (e.g., CelD) are stimulated and stabilized at high temperature by Ca2+ (55). (iii) The third unusual property of the C. thermocellum true cellulase system is the possible involvement of a transition metal ion, presumably iron (139). Cellulose saccharification under anaerobic (reducing) conditions was inhibited by the chelators o-phenanthroline and dipyridyl; this was reversed by preincubation with Fe2+ plus Fe3+. The presence of a catalytic metal ion and/or sulfhydryl groups could provide an explanation for the high specific activity of clostridial cellulase on crystalline cellulose.
Breakdown products of cellulose are both cellodextrins and cellobiose. When these enter the cell, they can be broken down extracytoplasmically via phosphorolytic cleavage by cellodextrin phosphorylase and/or cellobiose phosphorylase or via hydrolytic cleavage by ß-glucosidase. Two intracellular ß-glucosidases have been purified, characterized, and cloned (3, 103, 148, 153, 294). Rates of phosphorolytic cleavage have been found to be 20-fold higher than those of hydrolytic cleavage (365). From a bioenergetic standpoint, phosphorolytic cleavage is more beneficial because it provides a route to ATP synthesis. Cellodextrins appear to be more favorable to phosphorolysis than hydrolysis; the apparent Km for cellodextrin phosphorylase (measured with cellopentaose) was found to be considerably lower (0.61 mM) than that for cellobiose phosphorylase (3.3 mM). It appears that the mean cellodextrin length assimilated during growth on cellulose is equal to or larger than four hexose units (Y.-H. P. Zhang, and L. R. Lynd, Abstr. 102nd Gen. Meet. Am. Soc. Microbiol., abstr. I-002, 2002).
C. thermocellum cells as well as its extracellular cellulase complex completely solubilize the model crystalline substrate Avicel and a more realistic substrate, dilute-acid-pretreated mixed hardwoods (198, 202). With either cells or extracellular filtrate, the performances on the two substrates were similar, indicating that insoluble lignin does not interfere with growth or hydrolysis. The pretreated hardwood supports growth not only in batch culture but also in continuous culture. A new procedure, an enzyme-linked immunosorbent assay, is now available for quantitation of cell and cellulose mass concentrations during batch fermentations of C. thermocellum (364).
| THE CLOSTRIDIUM THERMOCELLUM CELLULOSOME |
|---|
|
|
|---|
The highly ordered arrangement of the cellulosome gives it stability. This resistance to environmental insults correlates with its resistance to dissociation into individual components even in the presence of urea or nonionic detergent; hence, purification of individual proteins was extremely difficult (174, 175). However, we accomplished a breakthrough in the purification of this complex aggregate of cellulolytic proteins (353, 354). Using Avicel breakdown as a turbidimetric assay for true cellulase activity and CMC hydrolysis as an assay for endoglucanases, we found the aggregate to contain at least four endoglucanases of different molecular weights accompanying true cellulase activity. We dissociated the aggregate by mild sodium dodecyl sulfate (SDS) treatment plus EDTA and DTT, but the resulting individual fractions exhibited only endoglucanase activity, the true cellulase activity being lost. However, we were able to reconstitute true cellulase activity by combining two of the major components, which we called SS (Mr = 82,000) and SL (Mr = 250,000). SS and SL are more abundant than any other cellulosomal components (179, 351). They were purified by gel filtration chromatography and by elution from an SDS-polyacrylamide gel, respectively. The reconstituted true cellulase activity yielded cellobiose as the predominant product of hydrolysis, was inhibited by cellobiose, required Ca2+ and reducing conditions, and thus behaved like the crude cell-free supernatant. In 1984, we proposed that an exoglucanase was the component subject to oxidation (139). SS, an exoglucanase, appeared to be that component responsible for cellobiose inhibition, calcium dependency, and oxidation sensitivity of the true cellulase activity (169, 170, 231, 234). These results indicate that SS plays an important role in the cellulolytic activity of the cellulosome and that it may be the rate-limiting cellulosomal component. SL is glycosylated, a rare modification for a bacterial extracellular protein. As many as 13 cellulosomal proteins may be glycosylated (164, 236, 249), but SL has the major part of the sugar, with about 40% of this component being carbohydrate (97, 99). The oligosaccharides that have been characterized are O-glycosidically attached via galactopyranose to threonine residues of SL (98). These threonine residues are in the Thr/Pro-rich regions which link the cellulose-binding domain (CBD) to the enzyme receptor regions (i.e., cohesins; see below) of SL. The major carbohydrate has (i) a basic tetrasaccharide structure containing two galactose units, one galactitol unit, and one 3-O-methyl-N-acetylglucosamine unit and (ii) a disaaccharide structure containing D-galactose.
SS alone acted on CMC, but SL alone had little to no enzymatic activity (179, 354). The enzymatic activity of SS on CMC was not enhanced by SL, but its adsorption to crystalline cellulose was (354). We hypothesized that the cooperative degradation of crystalline cellulose involves an interaction between SS (and presumably other cellulases), SL, and the insoluble substrate. SL (an anchorage subunit) would function to bind SS (and other catalytic proteins of the complex) to the cellulose surface in a manner optimal for hydrolysis (351, 353). As discussed below, the DNA sequence of the SL gene (cipA) reveals that SL indeed contains a CBD and multiple enzyme receptor domains, consistent with the anchor-enzyme hypothesis. The anchor-enzyme model has been further confirmed by using recombinant forms of SS and SL (168). The anchorage function of SL is the basis of our current understanding of the cellulosome structure (see next section).
We and others devoted considerable efforts to the isolation and purification of a number of proteins of the C. thermocellum system as well as the cloning, expression, and sequencing of their relevant genes in Escherichia coli and other hosts (80, 96, 148, 163, 276, 277). Of particular interest was our gene sequencing of SL (96, 277) and the gene sequencing of SS (341).
Protein SL, which is equivalent to component S1 described by Lamed et al. (179), is now called the cellulosome-integrating protein (CipA), the scaffolding protein, or scaffoldin (18). It contains approximately 1,850 amino acid residues and is the most important protein of the cellulosome. In addition to its function of binding other members of the cellulase complex to itself, it also binds to cellulose (33). Our first cloning and sequencing of cipA involved a truncated 5' region (277). Later, the remainder of the gene was cloned by chromosome walking and the entire sequence was determined (96). Its nucleotide sequence revealed a deduced protein size of 196,800 Da, a CBD, and nine domains of about 150 to 166 amino acid residues each. The CBD is of type 3 (28, 96). The nine repeated sequences, called cohesins by Bayer et al. (18), are quite similar to each other, i.e., exhibiting between 60 and 100% identity, with six of the nine domains being at least 90% identical. They are the receptors that bind the individual cellulases, xylanases, and other enzymes to CipA (88, 96, 327) (Fig. 2).
|
CelS contains a duplicated 24-amino-acid-residue dockerin, the site of binding to scaffoldin. The recombinant enzyme produced in E. coli behaves like an exoglucanase, hydrolyzing phosphoric acid-swollen cellulose faster than Avicel and hydrolyzing Avicel more rapidly than CMC (169, 342). Like the cellulosome itself, recombinant CelS is inhibited by cellobiose and only marginally so by glucose (167). Inhibition by cellobiose was found to be competitive.
In collaboration with Alzari's group in the Pasteur Institute, we determined the crystal structure of CelS (111). The overall structure resembles that of C. cellulolyticum CelF (262, 263, 271). The protein folds into an (
/
)6 barrel with a tunnel-shaped substrate-binding region. The most salient feature of the CelS structure is that its tunnel-shaped substrate-binding site, which is capable of binding seven glucose moieties, is adjacent to an open cleft that accommodates two glucose moieties. It is proposed that the cellulose chain threads through the tunnel and that the glycosidic bond between the second and third glucose residues is hydrolyzed to produce cellobiose as the product. Upon the release of cellobiose from the open cleft, the cellulose chain slides forward through the tunnel with a distance of two glucose units. This "processivity," consisting of alternating steps of hydrolysis and sliding-threading, explains the cellobiohydrolase nature of CelS. Structural comparisons with other (
/
)6 barrel glycosidases indicate that CelS and endoglucanase CelA, a family 8 glycosidase whose sequence is unrelated to that of CelS and which has a groove-shaped substrate-binding region, use the same catalytic machinery to hydrolyze the glycosidic linkage, despite a low sequence similarity and a different endo-exo mode of action. CelS and CelA can therefore be classified in a new clan of glycoside hydrolases. For the substrate-binding site, CelA has an open cleft while CelS has a closed tunnel, explaining their different endo-exo modes of action, as also observed in other endo-exo pairs (see reference 324 for a review).
Thus, an association is formed by a synergistic cassette of catalytic proteins, which is optimal for hydrolysis of insoluble polymers to the level of soluble oligosaccharides. Synergism between two cloned C. thermocellum endoglucanases and one cloned exoglucanase has been observed in vitro (333). The proximity of these synergistic enzymes to their cellulosic substrate as mediated by the scaffolding protein CipA may provide the structural basis for the high specific activity of the cellulosome.
Researchers at the University of Georgia (81, 131) found even larger aggregates, of ca. 108 x 106 Da (polycellulosomes). Such protuberances covering the surface of the cell are packed with (poly)cellulosomes; each protuberance seems to contain several hundred cellulosomes (176). A mutant which did not bind cellulose was found to lack cellulosomes and protuberances (21). When cells are grown on cellobiose, cellulosome complexes are packed into discrete exocellular structures. When grown on cellulose, these polycellulosome-containing organelles (protuberances with diameters of 60 to 200 nm) undergo extensive structural modification (17). After attachment to the insoluble substrate, the protuberances rapidly aggregate into "contact corridors" that physically mediate between the cellulosome, which is attached to the cellulose, and the bacterial cell surface. Protuberances are not produced when the organism is grown under cellulase-repressing conditions (178). The proteins of the cellulosome are arranged in a highly ordered chain-like array (215). The cellulose-bound cellulosome clusters appear to be the sites of active cellulolysis, and the products may be channeled down the fibrous structures to the cell. Cellulosomes also contain lipids with a high concentration of unsaturated fatty acids (44). The lipid material is thought to be localized mainly at the contact point between cellulosomes and crystalline cellulose. Both the cellulosomes described by the Israeli group (177) and the polycellulosomes described by the Georgia group display the same requirement for reducing agents and Ca2+ that we had found for true cellulase activity of the C thermocellum culture supernatant (139).
The most important component of the cellulosome is the nonenzymatic scaffoldin (96, 233). It is a unique scaffolding protein subunit, which assembles cellulases and related enzyme subunits (Fig. 2). The catalytic subunits, on the other hand, contain different modules (dockerins) which are responsible for their attachment to the scaffold (96, 327). Important in this relationship are (i) cohesin domains on scaffoldin, (ii) dockerin domains on the enzymes, and (iii) a CBD on the scaffoldin, binding the complex to cellulose. Cellulosomes and scaffoldin have been found in many bacteria, such as Clostridium cellulovorans (306), Clostridium cellulolyticum (258), Clostridium josui (149), Clostridium acetobutylicum (254, 279), Acetovibrio cellulolyticus (360), Bacteroides cellulosolvens, R. albus, Ruminococcus flavefaciens (273), Vibrio sp., and the anaerobic fungal genera Neocallimastix, Piromyces, and Orpinomyces (305). At least eight scaffoldin genes from cellulolytic bacteria have been sequenced (74). These include cipA from C. thermocellum, cbpA from C. cellulovorans (306), cipC from C. cellulolyticum (258), cipA from C. josui (149), cipA from C. acetobutylicum (254, 280), scaB from R. flavefaciens (71), cipBc from B. cellulosolvens (70), and cipV from A. cellulolyticus (69). The last two named organisms are closely related to the clostridia (191). Quite similar to the C. thermocellum cellulosomal complex are those of the mesophilic anaerobes C. cellulovorans (72) and C. cellulolyticum (90). In C. cellulovorans, the cellulosome contains a large, nonenzymatic scaffoldin, called CbpA, which has a signal peptide, a CBD, a hydrophilic domain (HLD) present four times, and a hydrophobic domain present nine times. The hydrophobic domains are the cohesins of this species. Although C. acetobutylicum is not known to degrade cellulose, the genome sequence reveals the presence of a large cellulosome gene cluster (254). This cluster contains the genes encoding the scaffolding protein CipA, the processive endocellulase Cel48A, several endoglucanases of families 5 and 9, the mannanase Man5G, and a hydrophobic protein, OrfXp. The genetic organization of this large cluster is very similar to that of C. cellulolyticum. An inactive cellulosome with an apparent molecular mass of 665 kDa has been subsequently reported (279). Recently, the entire scaffoldin (CipA) of this bacterium has been cloned and successfully expressed in E. coli (280). Chimeric miniscaffoldins consisting of domains derived from C. cellulolyticum and C. thermocellum have been expressed in C. acetobutylicum (266). In C. papyrosolvens, a mesophilic anaerobe, the cellulase system (53) can be fractionated by ion-exchange chromatography into seven high-molecular-weight multiprotein complexes, with the molecular weights ranging from 500,000 to 600,000. Each complex has a different ultrastructure (269) and a unique profile of enzymatic activities (270). The common protein appearing in each fraction is a glycoprotein (Mr = 125,000) that lacks any enzymatic activity. Whether this protein serves as the scaffoldin of the complexes and how these complexes are assembled remain to be investigated.
A number of studies on the dissociation of the cellulosome have been done. We used a mild SDS-EDTA-DTT treatment (352, 354) to separate the components of the cellulosome. Morag and coworkers (235) found that cellulosomes were dissociated under nondenaturing conditions by incubation at 60°C in the presence of EDTA and crystalline cellulose. During this dissociation, scaffoldin remained tightly bound to the cellulose but the enzyme subunits were released. A mixture of the dissociated free subunits minus scaffoldin had activity equal to that of undissociated cellulosomes on soluble or acid-swollen cellulose but had only 25 to 30% of the activity on crystalline cellulose. Bhat and Bhat (39) reported that cellulosomes can be disassociated without much loss of the ability to degrade crystalline cellulose by use of 50 mM Na acetate buffer (pH 5.0) containing 10 mM DTT, 10 mM EDTA, and 0.2% SDS at 30°C for 25 min.
Cloning and DNA sequencing showed that genes encoding at least nine cellulosomal endoglucanases, one exoglucanase (169), one xylanase, and one lichenase (367) from C. thermocellum contain a highly conserved, noncatalytic region of 50 to 60 residues which is usually found at the carboxy terminus. These duplicated sequences, now called docking sequences or dockerins (18), are not essential for catalytic activity (113) but are responsible for the binding of the respective cellulosomal enzymes, e.g., endoglucanase D (CelD) and xylanase Z (XynZ), to one or more of the nine cohesins of CipA (186). Dockerin domains consist of two very similar segments of 22 to 24 residues connected by a peptide containing 8 to 17 amino acid residues; they are over 65% identical among the different subunits of the cellulosome (27, 28, 292, 327). Two types of dockerin exist: type I (186), which anchors catalytic subunits to scaffoldin, and type II, which anchors scaffoldin and free enzymes to the cell surface. Both depend on Ca2+. All cellulosomal enzymes contain dockerin modules. Based on current understanding, if an enzyme does not contain a dockerin, it is not part of a cellulosome. For example, endoglucanase CelC does not contain dockerins, does not bind to scaffoldin, and is not cellulosomal. When the dockerin of CelD was grafted onto CelC, it gained the ability to bind to CipA (325).
The anchor-enzyme model that we proposed was verified by data obtained from recombinant proteins (168), showing that (i) recombinant CelS, via its dockerin, forms a stable complex with cohesin 3 (also known as R3) of CipA but not with the CBD of CipA and (ii) the attachment of recombinant CelS to cellulose is dependent on the presence of a protein sequence containing both cohesin 3 and CBD but not on either alone. In both of these cases, the binding of CelS was dependent on its dockerin, since removal of the dockerin eliminated binding. The binding of endoglucanase CelD to a recombinant "mini-CipA" containing a cohesin and a CBD enhanced catalytic activity, as did the binding of CelS to CipA (89). CelD is the most active endoglucanase of the C. thermocellum cellulosome (92). Scaffoldin enhanced the activity of CelD by at least 10-fold on Avicel but had no effect on the activity of a truncated CelD lacking an intact dockerin. Similarly, the activity enhancement of an endoglucanase from C. thermocellum against Avicel by the presence of scaffoldin was found to be the result of the attachment of the enzyme to a structure bearing a CBD (151). The anchorage function of the scaffoldin has also been demonstrated by using the truncated forms of CipC of C. cellulolyticum (260). In C. cellulovorans, mini-CbpA could help cellulase components degrade insoluble cellulose but not soluble cellulose (243), as would be expected from the anchor-enzyme model. In this work, it was also demonstrated that an endoglucanase initiates the attack on cellulose, followed by ExgS, which is homologous to CelS.
The CelS dockerin can bind any of the cohesins of CipA (207). In C. thermocellum, the draft genome sequence indicates that there are at least 72 dockerin-containing proteins (see "Dockerin-Containing Proteins" below) but only nine cohesins per scaffoldin molecule. These cohesins are highly homologous, and five of them have more than 90% identity. The cohesins of a strain recognize nearly all of the dockerins of the same strain. These observations indicate that, at least in individual species, binding between catalytic dockerins and scaffoldin cohesins is relatively nonspecific (32, 168, 207, 362; P. Beguin, Abstr. Pasteur Symp., p. 37-40, 1995), with the exception of one report on binding efficiency or specificity for the C. cellulovorans CbpA (261). In C. josui, the affinity of a dockerin for various cohesins can vary, and up to a 34-fold difference has been observed (136). Although cellulosomes in a particular species appear to be heterogeneous and their assembly does not appear to follow a single pattern, there is specificity between species. For example, there is no interaction between the cohesin domain of C. thermocellum and dockerins of C. cellulolyticum and vice versa (257) (see details in "Assembly of the Cellulosome" below).
Calcium is the main metal of the cellulosome, and the reaction between dockerins and cohesins requires calcium in C. thermocellum (59, 362) and C. cellulolyticum (259); dockerins bind calcium (see "Assembly of the Cellulosome" below). This is the reason that the cellulosome can be dissociated by mild conditions if EDTA is present (25, 37, 352-354). EDTA inhibits cellulosomal activity due to its ability to chelate Ca2+ (131, 137, 139). Ca2+ is tightly bound in the cellulosome once it is taken up (58). If incubated in 50 mM Tris buffer (pH 7.5), 0.1 M NaCl, and 5 mM EDTA at 37°C, the cellulosome breaks up into polypeptides and the Ca2+ is released. Bands with masses of 160, 98, 76, and 54 kDa are lost, and new bands of 150, 132, 91, 71, 57, and 46 kDa appear. The 91- and 71-kDa polypeptides represent CelS and truncated CelS, respectively. Cleavage occurs at asparagine residue 681, eliminating 60 residues from the C terminus. All catalytic subunits examined contain a similar asparagine residue which is part of their dockerin regions. Thus, the dissociated catalytic subunits may be susceptible to proteolytic degradation at this position.
In view of the calcium requirement for optimal true cellulase activity, it is of interest that endoglucanase D has three binding sites for calcium and that calcium lowers the dissociation constant (KD) for CMC and increases thermal stability (81, 147). This endoglucanase also has one possible binding site for zinc (147). Calcium also increases the thermostability of exoglucanase CelS, the most abundant enzyme of the cellulosome (170). Finally, protein folding of the dockerin is dependent on Ca2+, explaining why Ca2+ is essential for the cohesin-dockerin interaction and hence the structural stability of the cellulosome (210, 211).
Individual domains of CipA, obtained by protease or spontaneous degradation, bind to cellulose, to cellulosomal enzymes, or both (288). Since certain fragments as large as 200 kDa which failed to bind cellulose were obtained, it was concluded that the CBD was at one of the termini of CipA and distinct from the catalytic domain (301). The experimental observation is therefore consistent with the CipA domain structure deduced from its DNA sequence. Although CBDs are also found in some but not all catalytic subunits, the CBD on CipA binds cellulose much more tightly than CBDs on individual cellulosomal enzymes that have been studied. The C. thermocellum CipA CBD belongs to family IIIa (19, 329) as do the CBDs on all other known scaffoldins except ScaB of R. flavefaciens, which does not have a CBD. Purified C. thermocellum CipA CBD binds crystalline cellulose with a KD of 0.4 µM and a maximum binding capacity of 10 mg of CBD per g of microcrystalline cellulose (0.54 µmol/g) (230). The capacity for amorphous cellulose is 20-fold higher. The KDs of other clostridial scaffoldin family IIIa CBDs are as follows: 0.6 µM for CbpA of C. cellulovorans (102), 0.038 µM for CipA of C. acetobutylicum (280), and 0.14 µM for C. cellulolyticum (91).
The C. thermocellum CipA CBD has been cloned and overexpressed in E. coli, and its crystal structure has been determined (330). As the cohesin, the CBD assumes a nine-stranded ß sandwich with a jelly-roll topology. Cellulose binding likely involves interactions between a planar strip of aromatic amino acid residues on a surface of the CBD and glucose moieties on a cellulose chain and between polar amino acids and two adjacent glucose chains of crystalline cellulose. The CBD binds a calcium ion whose function is unknown. The crystal structure of the C. cellulolyticum CipC CBD has also been determined (302). It is very similar to that of the CBD from the CipA, with minor differences. It includes a well-conserved calcium-binding site, a putative cellulose-binding surface, and a conserved shallow groove of unknown function. It is clear that the function of the scaffoldin CBD is to anchor the catalytic subunits to the substrate surface. An additional function of the CBD in modifying the cellulose surface to facilitate enzymatic hydrolysis has also been suggested (68, 260).
Each of the known scaffoldins except the R. flavefaciens ScaB has one internal or N-terminal CBD. The catalytic subunit may also have its own CBD. The CBD in a nonscaffoldin subunit may further enhance binding of the cellulosome to cellulose. However, in some family 9 cellulases, their respective catalytic domains are immediately adjacent to a family IIIc CBD, of which many aromatic amino acid residues on the planar strip thought to be crucial for binding to cellulose are not conserved (19). Examples include CelI (101, 123, 373), CelN (373), CelQ (9), CelF of C. thermocellum (19), CelG of C. cellulolyticum (90), EngH of C. cellulovorans (194, 321), CelZ of C. stercorarium (134), and cellulase E4 of Thermomonospora fusca (132). Family IIIc CBDs are considered to play a very different role from that of the family IIIa CBDs. The crystal structure of cellulase E4 revealed the novel feature of the catalytic domain and adjacent family IIIc CBD interacting with each other (285). Sakon et al. (285) first proposed that the CBD may act by binding to a single cellulose chain and feeding it into the active-site cleft of the catalytic domain and thus that it participates directly in the catalytic function of the enzyme (285, 348). Thus, family IIIc CBDs are better considered as a "cellulose-binding subsite" of the catalytic domain (19, 132). Indeed, deletion of the family IIIc CBD from these enzymes rendered the enzyme almost completely inactive (9, 90, 101, 132, 285). It is interesting that each cellulosomal catalytic subunit (CelF, CelN, CelG, and EngH) has a family IIIc CBD only but each noncellulosomal enzyme (CelI, CelZ, and E4) has an additional family IIIa CBD. These cellulosomal proteins may thus depend on the scaffoldin's family IIIa CBD for binding to cellulose. It is also interesting that all except 2 of 13 identified or putative cellulosomal cellulases containing a family 9 glycosyl hydrolase domain have a family IIIc CBD (7 enzymes) or an immunoglobulin (Ig)-like domain (4 enzymes) immediately adjacent to the catalytic domain (370). The Ig-like domain likely participates in the catalytic function, as does the family IIIc CBD. Bayer et al. (23) have classified family 9 glycosyl hydrolases into four themes of molecular architecture: (i) the theme A enzymes lack any accessory module, (ii) the theme B enzymes possess a family IIIc CBD fused to the C-terminal end of the catalytic domain, (iii) the theme C enzymes possess an N-terminal Ig-like domain, and (iv) the theme D enzymes contain both an Ig-like domain and a family 4 CBD. Recently, a new type of CBD (i.e., the family 30 CBD) was found to be N terminal to the family 9 catalytic domain of C. thermocellum CelJ, both being linked by an Ig-like domain (8). The family 30 CBD binds to cellulose and is crucial for catalytic activity. Although the involvement of the CBD in the catalytic function remains to be characterized, it is clear that CelJ belongs to a new theme of family 9 enzymes (8). The family 4 CBD is generally not essential for catalytic function, except for C. cellulolyticum CelE (94). The family 4 CBD of the C. thermocellum CelK has a binding capacity of
4 mmol/g of cellulose (154), similar to those of the family 3 CBDs which have been characterized. It is interesting that LicA, a noncellulosomal C. thermocellum ß-1,3-glucanase, has four family 4 CBDs at its C-terminal end (86).
An interesting application of the CBD is its use in cloning cellulosomal genes in E. coli (242). The recombinant products of such genes from C. cellulovorans unfortunately are expressed as insoluble inclusion bodies. However, when the catalytic domain contained the CBD of the noncellulosomal EngD enzyme, the recombinant protein was produced in E. coli in soluble form.
CipA itself contains, at its COOH terminus, a dockerin, which suggested that it may self-associate or play a role in anchoring the cellulosome to the surface of the cell (28, 88, 96). Gel scanning densitometry indicated the cellulosome to contain at least two CipA components per 2.1-MDa complex (179, 180). Evidence against CipA self-association (287) was obtained by finding that the seventh cohesin of CipA did not bind to the dockerin of CipA. Indeed, the dockerin of CipA does not bind to any of the CipA cohesins (207).
Immediately downstream from cipA in the C. thermocellum genome are three open reading frames (ORFs) encoding cell surface proteins, forming a three-gene cluster on DNA. These four genes form two operons (87). All three ORFs encode C-terminal domains (three repeats of about 65 amino acid residues) responsible for the cell surface layer (S-layer) location of the encoded proteins. These are called S-layer homology (SLH) repeats, i.e., modules which are present in polypeptides associated with bacterial cell surfaces and promote a noncovalent attachment of the protein to peptidoglycan of the cell wall (214). The three genes are called olpA (olp for outer layer protein; previously known as ORF3p), olpB (previously ORF1p), and orf2p. A fourth gene, sdbA, encoding another cell surface protein, is in another part of the genome (186). The proteins encoded by these genes anchor cellulosomes or free enzymes to the cell.
The gene furthest from cipA, i.e., olpA, encodes a protein containing a cohesin sequence in its NH2-terminal region which can bind the dockerins of the catalytic subunits (88, 286). Protein OlpA has been localized to the cell surface of C. thermocellum (286). It was hypothesized that this region may anchor the cellulosome to the cell surface (87). However, there was no binding found between the dockerin of CipA and the receptor domain of OlpA (287). Thus, CipA does not appear to anchor itself to cell surface protein OlpA. OlpA has a single type I cohesin molecule binding cellulosomal enzymes via their type I dockerins. Since the OlpA receptor binds to the dockerin of CelD, it has been suggested that the catalytic proteins bind to OlpA prior to the binding to CipA of the cellulosome. Gene olpB encodes OlpB, which is also located at the cell surface (189).
Protein SdbA of C. thermocellum has a type II cohesin domain in its NH2-terminal domain which specifically binds the type II CipA dockerin domain (187); its carboxy terminus contains SLH repeats. A model of the attachment of CipA to the cell involving use of its type II dockerin binding to a cohesin domain in the SdbA cell surface protein was proposed by Beguin and Alzari (27). SdbA is anchored via its SLH domain to the S-layer of the cell, which is external to the peptidoglycan layer of the cell wall. Actually, the binding between CipA and the cell surface probably occurs between the carboxy-terminal dockerin domain of CipA and the type II cohesin domains at the amino-terminal ends of OlpB, ORF2p, and SdbA (32). SdbA has one cohesin domain, Orf2p has two, and OlpB has four cohesins, presumably binding one, two, and four molecules of scaffoldin, respectively. The binding is a type II interaction (186), which involves calcium, as does the type I interaction. Binding of CipA dockerin to SdbA as a function of Ca2+ concentration is sigmoid, corresponding to a Hill coefficient of 2, suggesting the presence of two cooperatively bound Ca2+ ions in the cohesin-dockerin complex. Western blotting of C. thermocellum subcellular fractions and electron microscopy of immunocytochemically labeled cells indicated that SdbA is not only a cell surface protein but also a cellulosome component (187). It has been determined that OlpB binds to the C. thermocellum cell wall with a dissociation constant on the order of 107 M (56).
Figure 3 summarizes our current understanding of how the cellulosome is attached to the cell surface in C. thermocellum. Some noncellulosomal enzymes, may bind directly to the cell surface, e.g., xylanase XynX. It contains no dockerin domain but does have SLH segments (32). Similarly, LicA contains three SLH domains at its N-terminal end that have been shown to mediate its association with cells (86). Many scaffoldins, including that of C. cellulovorans (CbpA), do not have a dockerin. It has been postulated that the four HLDs of CbpA, each having partial homology with the SLH domain, play a role in binding the cellulosome to the cell surface (320). Furthermore, EngE, one of the three major subunits of the C. cellulovorans cellulosome, has three SLH domains at the N-terminal half, a family 5 glycosyl hydrolase domain, and a dockerin at the C terminus (320). It has been shown that EngE bridges the cellulosome and the cell by binding to the cellulosome via its dockerin and to the cell surface via its SLH domains (166). Thus, a cellulosomal enzyme may serve as an anchor for the cellulosome on the cell surface. Most scaffoldins carry HLDs with copy numbers ranging from one to six (74). The HLDs are sometimes called the X domains (19). The solution structure of one of such domain, the X2-1 domain of C. cellulolyticum, has been determined (238). It has an immunoglobulin-like fold with two ß-sheets packed against each other.
|
The type I cohesin-dockerin interaction is extremely strong, with dissociation constants on the order of 1010 M in the presence of calcium (82, 136, 219, 290). This high affinity explains the stability of the quaternary structure of the cellulosome in the extracellular environment. Our study (208) and that of Fierobe and coworkers (82) showed that both duplicated segments of the dockerin domain are required for binding to cohesin. As mentioned, it has also been demonstrated that within a given species, the interaction between the various dockerins and cohesins is nonselective (207, 362). However, the interaction displays species specificity between dockerins and cohesins from C. thermocellum and the mesophile C. cellulolyticum (257). Based on sequence comparisons of dockerins from these two species, two residues in each duplicated segment, i.e., positions 10 and 11 of each calcium-binding loop, were pinpointed as being likely recognition determinants in the binding of dockerin to cohesin (257). This prediction has been corroborated by experiments in which the species specificity of the interaction was altered through site-directed mutagenesis of these residues (219, 220). However, the results also indicated that additional residues are involved in the interaction, since binding between the mutated dockerins and the complementary cohesins from the same species was not disrupted.
As a further step toward understanding the cohesin-dockerin interaction, we undertook a solution nuclear magnetic resonance (NMR) study to determine the structure of the dockerin domain from the C. thermocellum CelS. To this date, CelS is the only cellulosomal protein with three-dimensional structures of both the catalytic and dockerin domains determined. Interestingly, two-dimensional 15N-1H heteronuclear single quantum correlation NMR spectroscopy revealed that Ca2+ induces folding of the dockerin into its tertiary structure (210). The calcium-dependent folding may be a mechanism that evolved to prevent folding into an active form in the cytoplasm, where the free Ca2+ concentration is only on the order of 1 mM. The Ca2+ requirement could potentially safeguard against binding of the catalytic subunits to scaffoldin prior to secretion (210). It is remarkable that the unfolded dockerin remains soluble at the concentration typically used for NMR analysis (
1 mM). The solution structure of the folded CelS dockerin in the presence of Ca2+ has been determined by protein NMR spectroscopy (209, 337). Although the highly conserved dockerin domain is characterized by two Ca2+-binding sites with sequence similarity to the EF-hand motif, the dockerin domain assumes a completely different structure. The structure consists of two Ca2+-binding loop-helix motifs connected by a linker; the E helices entering each loop of the classical EF-hand motif are absent from the dockerin domain. This result agrees with the secondary structure prediction reported by Pages et al. (257). Each dockerin Ca2+-binding subdomain is stabilized by a cluster of buried hydrophobic side chains. Structural comparisons reveal that, in its noncomplexed state, the dockerin fold displays a dramatic departure from that of Ca2+-bound EF-hand domains and represents a novel Ca2+-binding domain.
The dockerin structure is symmetric, as expected from its duplicated primary sequence. How the symmetric dockerin docks to the nonsymmetric cohesin had been a puzzle. While experimental data indicate that the type I cohesin and dockerin form a 1:1 complex, results of site-directed mutagenesis on positions 10 and 11 of the dockerin on either segment suggest that either segment of the dockerin is capable of binding to the cohesin (290). The notion that only one of the two halves of the symmetric dockerin is used to bind to cohesin was confirmed when the cohesin-dockerin complex structure was determined by coexpressing cohesin 2 of CipA and the dockerin domain of a xylanase (Xyn 10B) in E. coli (52). While the cohesin structure remains essentially unchanged in the complex, the dockerin undergoes conformational change and ordering compared with its solution structure to display a near-perfect internal twofold symmetry. On the other hand, the classical 12-residue EF-hand coordination to two calcium ions is maintained. Significantly, the cohesin binds predominantly to the second segment of the dockerin, suggesting that the first segment of the dockerin may provide the second binding site, enabling the formation of a 1:2 complex (one dockerin to two cohesins) and thus a higher order of the cellulosome structure. On the other hand, a few amino acid residues on the first segment also participate in the complex formation, through either hydrophobic interactions or hydrogen bonding. Therefore, both segments of the dockerin participate in binding even in a 1:1 complex. The results agree with a report demonstrating that the two segments (subdomains) of the CelS dockerin are both required for interaction with a cohesin (208). Similar results were obtained for the C. cellulolyticum dockerin, the two segments of which are homologous enough to replace each other (82). The structure of the complex further revealed that protein-protein recognition is mediated by hydrophobic interactions between one face of the cohesin and the helices of the dockerin. Many of these hydrophobic amino acids of the dockerin have been predicted to be involved in protein docking, since they are solvent accessible in the dockerin solution structure (209). Isothermal titration calorimetry also showed that the cohesin-dockerin binding is mainly hydrophobic (290). Five hydrogen bonds between the two proteins are found, which are dominated by a Ser-Thr pair (positions 10 and 11 of the Ca2+-binding loop). Notably, the Ser-Thr pairs are strictly conserved in the C. thermocellum dockerins but not in the C. cellulolyticum dockerins. Other amino acid residues forming direct hydrogen bonds with the cohesin, such as Arg (or Lys) and Leu (or Ile), are also well conserved. Conservation of the amino acid residues involved in hydrophobic interactions and hydrogen bonding in both the cohesin and dockerin explains the lack of binding selectivity in cohesin-docking binding in C. thermocellum (207, 362). On the other hand, replacements of the hydrogen bonding residues (Ser, Thr, and Arg) with other residues (Ala, Ile, and Lys) in C. cellulolyticum dockerins provide a partial explanation for the cross-species specificity of the dockerin in recognizing the cognate cohesin. Bioinformatic analysis and site-specific mutagenesis have previously indicated the important roles of the dockerin's Ser-Thr pair and Arg (position 22 based on numbering of the Ca2+-binding loop) in complex formation (219, 220, 257). The cohesin amino acids previously identified to be involved in dockerin-cohesin interaction by site-directed mutagenesis (226) were found to be present in the interface of the complex (52).
The structures of the type I dockerin, cohesin, and their complex shed light on how the cellulosome is assembled and provide a blueprint for reengineering the cellulosome for biotechnological explorations. Unfortunately, to date, no three-dimensional structures of type II components have been reported. The dissociation constant for a type II cohesin-dockerin pair (i.e., the CipA dockerin and the SdbA cohesin) is on the order of 109 M (135). Secondary-structure assignments of the SdbA cohesin have been made by 1H, 13C, and 15N protein NMR (309). It was found that the positioning of the secondary structural elements is very similar to that observed in the type I cohesin except that an
-helix was present between ß-strands 6 and 7 of SdbA, which may contribute to the type I-type II specificity. Further elucidation of the structures of type II components is essential for understanding the molecular basis of this specificity.
Although species specificity is generally observed, exceptions have been reported. For example, in C. thermocellum, the dockerin of Cel9D-Cel44A (formerly CelJ [1]) does not seem to bind to cohesins 1, 4, and 7 (136) or to cohesins 2 and 3 (362) of CipA. Its binding specificity remains to be determined. Furthermore, in C. josui, the affinities of binding of a dockerin to various cohesins can vary as much as 34-fold (136). Another exception to species specificity was reported in the same study, revealing that the Xyn11A dockerin of C. thermocellum binds to various cohesins of C. josui with high affinities (KD of
108 M) (136). Thus, binding specificity and affinity may be influenced by subtle differences in the primary or tertiary structures of the dockerin and cohesin. Further studies are needed to completely understand these subtleties.
Four cellulosomal cellobiohydrolases have been reported in C. thermocellum. They are CbhA (formerly Cbh3; component S3) (222, 308, 332, 375), CelS (S8) (343), CelK (S5) (152), and CelO (372). Genes cbhA and celK have been cloned and sequenced. Gene celK is upstream of cbhA with an intervening sequence of 524 bp (374). The two genes are highly homologous, both encoding a family 4 CBD, an Ig-like domain, a family 9 glycosyl hydrolase, and a dockerin. The only difference is that CbhA contains two fibronectin-like domains and a family 3b CBD, both N terminal to the dockerin. These two genes are likely the result of gene duplication and recombination. CelO consists of a family 3b CBD, a family 5 glycosyl hydrolysis domain, and a dockerin. It produces cellobiose as the only product from cellulose. The crystal structures of CelS in complex with its substrate or inhibitor indicate that CelS hydrolyzes the cellulose chain from the reducing end (111). CelO activities on various substrates indicate that it also attacks the cellulose chain from the same end (372). On the other hand, the same study revealed that CbhA hydrolyzes the cellulose chain from the nonreducing end. The facts that the catalytic domain of CelK is highly homologous to that of CbhA and that CelK is active on p-nitrophenyl-ß-D-cellobioside indicate that its mode of action is like that of CbpA (13). Why the bacterium needs two sets of exocellulases remains to be explained. It appears that CbhA and CelO are capable of reducing the viscosity of a CMC solution (332, 372), whereas CelS is not (170, 234), indicating that the former two enzymes have some endoglucanase activity. A recent reexamination of the mode of action of CbhA showed that it had a very high activity on CMC relative to that on ball-milled crystalline cellulose, it rapidly reduced the viscosity of CMC, and it produced 40% insoluble reducing sugars from filter paper. Thus, in these tests CbhA behaved like an endocellulase (C. McGrath and D. B. Wilson, personal communication), and it should be reclassified as such. The results illustrate the importance of using multiple criteria to evaluate an exoglucanase. CelS behaved like an exoglucanase in these same tests (170, 342). The behaviors of CelK and CelO in these tests remain to be studied. CelF, the CelS equivalent in C. cellulolyticum, also displays significant activity in reducing the viscosity of a CMC solution (271).
Some of the cloned endoglucanase genes are celA (26), celB (105), celC (298), celD (143, 326), celE (113), celF (245), celG (188), celH (361), celI (101, 123, 373), celJ (encoding component S2 of the complex) (1, 8), celM (162), celN (373), celQ (9), celT (172), and celX (113).
Six of the xylanases present in the C. thermocellum cellulosome are XynA, XynB, XynC, XynX, XynY, and XynZ (107), the genes of which have all been cloned (121, 146). Despite the large number of xylanases, the organism cannot grow on xylan or xylose.
The cellulosomal pectate lyase gene in C. cellulovorans has an ORF containing 2,742 bp and encoding a protein of 914 amino acid residues with a molecular mass of 94,458 Da. The protein contains a dockerin at its C terminus. It breaks down polygalacturonic acid into di- and trigalacturonic acids. This is the first reported pectate lyase in cellulolytic clostridia (319). The mannanase gene man26B of the C. thermocellum cellulosome has been cloned and sequenced (171).
Sequencing of celS has been described above (see "Components SL and SS<