MMBR Figure table search 04
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental material
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Han, M.-J.
Right arrow Articles by Lee, S. Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Han, M.-J.
Right arrow Articles by Lee, S. Y.
Microbiology and Molecular Biology Reviews, June 2006, p. 362-439, Vol. 70, No. 2
1092-2172/06/$08.00+0     doi:10.1128/MMBR.00036-05
Copyright © 2006, American Society for Microbiology. All Rights Reserved.

The Escherichia coli Proteome: Past, Present, and Future Prospects{dagger}

Mee-Jung Han1 and Sang Yup Lee1,2*

Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering and BioProcess Engineering Research Center,1 Department of BioSystems and Bioinformatics Research Center, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea2

SUMMARY
INTRODUCTION
PROGRESS IN E. COLI PROTEOMIC TECHNOLOGY
    Gel-Based Approaches
    Non-Gel-Based Approaches
    Predictive Proteomics
CURRENT STATUS OF THE E. COLI PROTEOME
    Proteomics for Biology
        Stationary-phase response.
        Temperature response.
        pH response.
        Oxidative stress response.
        Starvation response.
        Other environmental responses.
    Proteomics for Biotechnology
CONCLUSIONS AND FUTURE PROSPECTS
ACKNOWLEDGMENTS
REFERENCES

   SUMMARY
 Top
 Next
 References
 
Proteomics has emerged as an indispensable methodology for large-scale protein analysis in functional genomics. The Escherichia coli proteome has been extensively studied and is well defined in terms of biochemical, biological, and biotechnological data. Even before the entire E. coli proteome was fully elucidated, the largest available data set had been integrated to decipher regulatory circuits and metabolic pathways, providing valuable insights into global cellular physiology and the development of metabolic and cellular engineering strategies. With the recent advent of advanced proteomic technologies, the E. coli proteome has been used for the validation of new technologies and methodologies such as sample prefractionation, protein enrichment, two-dimensional gel electrophoresis, protein detection, mass spectrometry (MS), combinatorial assays with n-dimensional chromatographies and MS, and image analysis software. These important technologies will not only provide a great amount of additional information on the E. coli proteome but also synergistically contribute to other proteomic studies. Here, we review the past development and current status of E. coli proteome research in terms of its biological, biotechnological, and methodological significance and suggest future prospects.


   INTRODUCTION
 Top
 Previous
 Next
 References
 
Escherichia coli, one of the best-characterized prokaryotes, has served as a model organism for countless biochemical, biological, and biotechnological studies. Since the completion of the E. coli genome-sequencing project (28), this organism has been characterized on the genome-wide scale in terms of its transcriptome, proteome, interactome, metabolome, and physiome by use of DNA microarray, two-dimensional (2-D) gel electrophoresis (2-DE) coupled with mass spectrometry (MS), liquid and gas chromatography coupled with MS, and bioinformatics (34, 176, 217, 226, 325). Recent advances in these functional genomics studies have facilitated understanding of global metabolic and regulatory alterations caused by genotypic and/or environmental changes. DNA microarray has proven to be a successful tool for monitoring whole-genome-wide expression profiles at the mRNA level (176). Similarly, proteomics can be employed to compare changes in the expression levels of many proteins under particular genetic and environmental conditions. Unlike transcriptomics, which focuses on gene expression, proteomics examines the levels of proteins and their changes in response to different genotypes and conditions. The studies on proteomes under well-defined conditions can provide a better understanding of complex biological processes and may allow inference of unknown protein functions. Most of all, proteomic approaches provide information about posttranslational modifications which cannot be obtained from mRNA expression profiles; these approaches have proven critical to our understanding of proper physiological protein function, translocation, and subcellular localization.

The most prominent developments within the field of proteomics to date are shown in Fig. 1. Although the first proteomic analyses were conducted 30 years ago, renewed interest in this field has been fueled by several recent advances, including the availability of public genome and protein databases, the development of database search engines capable of exploiting these databases, and the introduction of high-sensitivity, easy-to-use MS techniques. Other important recent advances include improved 2-DE, computer programs for analysis of the 2-D gel images, protocols for proteolytic digestion of proteins in excised gel pieces, and low-flow chromatography methods. Recently, in order to reduce complexity and detect low-abundance proteins, proteomics researchers have become increasingly aware of non-gel-based technologies combined with subcellular fractionation by n-dimensional chromatographies.


Figure 1
View larger version (36K):
[in this window]
[in a new window]
 
FIG. 1. Major developments in the history of proteomics. Since the beginning of proteome studies in 1975, proteomics and the associated technologies have evolved dramatically, resulting in almost exponential increases in the number of resolved proteins and their identification and greatly enhancing our understanding of complex biological processes in a variety of organisms.

 
These advances in proteomics technologies led to the generation of unprecedentedly large amounts of proteome data, which are used in fundamental as well as applied research. Here, we review the technological and methodological advances in proteome research in terms of the E. coli proteome. Gel-based and non-gel-based approaches and predictive proteomics including 2-DE, MS, tandem mass spectrometry (MS/MS), and computational tools are reviewed. Applications of MS combined with pulldown methods to investigate the E. coli interactome are also reviewed. In addition, physiological responses to growth stage, temperature, pH, oxidative stress, and other environmental conditions revealed by proteome analysis are reviewed. Following the review on the applications of proteome studies in biotechnology, the future direction of proteomic studies is suggested. For those topics that are not covered in this paper, readers are recommended to refer to the following excellent review articles on E. coli: for phage or bacterial display, refer to reference 65; for protein microarray, refer to reference 21, and for information on the two-hybrid system, refer to reference 119.


   PROGRESS IN E. COLI PROTEOMIC TECHNOLOGY
 Top
 Previous
 Next
 References
 
The exploration of the E. coli proteome can be divided roughly into three phases: (i) the gel-based approaches, (ii) the non-gel-based approaches, and (iii) predictive proteomics (bioinformatics tools). The gel-based and non-gel-based approaches are defined as being based on separation of complex protein mixtures in gel and non-gel matrices, respectively, whereas predictive proteomics cover functional proteomic studies performed by computational tools in silico. These approaches overlap in time, and their evolutions have resulted in an almost exponential increase in the number and quality of resolved protein spots over the past 30 years (287) as increasingly complex separations have been developed to continue forward progress. In recent years, the E. coli proteome has been used as a standard for evaluating and validating new technologies and methodologies such as sample prefractionation, protein enrichment, 2-DE, protein detection, MS, combinatorial assays with n-dimensional chromatography and MS, and image analysis (Table 1) . In comparison to the proteomes of other organisms, the E. coli proteome provides an excellent model for various research needs based on the following advantages (161): (i) the availability of public databases such as SWISS-PROT (http://www.expasy.ch/ch2d/) and NCBI (http://www.ncbi.nlm.nih.gov/), which contain rich information on the proteins and corresponding genes; (ii) the existence of the E. coli SWISS-2DPAGE maps, which are based on a great deal of biochemical and biological data; and (iii) the fact that the E. coli proteome is less complex than those of other organisms such as humans and plants, boasting smaller open reading frame (ORF) products and less protein modification. Furthermore, as summarized in Fig. 2, the basic processes and strategies for an E. coli proteomic analysis have been well defined and optimized.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Summary of proteomic technologies used to study E. colia

 

Figure 2
View larger version (45K):
[in this window]
[in a new window]
 
FIG. 2. General steps for proteomic analysis and tips for success. Once the project objective is set, E. coli cells are cultured and sampled for proteome profiling. During this process, protein samples can be prefractionated or labeled differentially for better comparison of the results. Proteome profiles can be obtained by gel-based and/or non-gel-based approaches. Also, predictive proteomic studies can be performed to analyze a priori the characteristics of proteins in the proteome. Gel-based approaches and non-gel-based approaches are complementary and should be combined if possible to maximize the total number of proteins detected and identified. sHsps IbpA and IbpB were from E. coli and Hsp26 was from Saccharomyces cerevisiae (96). SDS-PAGE, sodium dodecyl sulfate-polyacrylamide gel electrophoresis; AEBSF, aminoethyl benzylsufonyl fluoride or Pefabloc SC; BCA, bicinchoninic acid; delta Cn, correlation value (difference between the first hit and the second hit); DTE, dithioerythritol; DTT, dithiothreitol; iTRAQ, a multiplexed set of isobaric reagents that yield amine-derivatized peptides (iTRAQ reagents; Applied Biosystems, CA) (253); PMSF, phenylmethylsulfonyl fluoride; RSp, rank preliminary score; SELDI-TOF-MS, surface-enhanced laser desorption ionization-time of flight mass spectrometry; TCA, trichloroacetic acid; Xcorr, cross-correlation (measures how close the spectrum fits to the ideal spectrum).

 
Gel-Based Approaches

2-DE is currently the most widely used proteomic approach for analyzing the protein composition of cells, tissues, or biofluids and might even be called "classic" or "blue-collar" proteomics (316). 2-DE was first independently introduced by O'Farrell and Klose in 1975 (147, 220) and was first used for analyzing basic proteins (222). VanBogelen and colleagues (294) then pioneered the use of 2-DE for determining the protein composition of E. coli, and the technique has been intensively pursued by others since then (25, 83, 287). However, these initial studies of the E. coli proteome were limited by the fact that the complex protein mixtures were displayed only with respect to their positions on the 2-D gels and also by the lack of reproducibility among different laboratories. The later use of an immobilized pH gradient (IPG) gel instead of the carrier ampholyte method allowed researchers to apply 2-DE for easier and more-reproducible proteome analyses (25, 83). The current use of commercially available 18-cm IPG strips (pH, 3 to 10) along with high-sensitivity staining is generally able to resolve up to 1,000 to 1,500 protein spots in the case of the E. coli proteome (286). However, a large number of the protein spots are found in a 2-D gel of the E. coli proteome cluster at an isoelectric point of 4 to 7 and a molecular weight (MW) of 10 to 100 (294), representing a limitation of 2-D gel separation of unfractionated samples on IPG strips. Furthermore, despite the excellent sensitivity of MS, only the most abundant proteins from 2-D gels can be analyzed, leading to the exclusion of many low-abundance proteins.

One strategy for enhancing the capacities of 2-D gels involves parallel separation of replicate aliquots from unfractionated samples on a series of narrow-pH-range IPG gels (or zoom gels). The E. coli 3.5-10 SWISS-2DPAGE map shows 40% of the E. coli proteome (286), among which 231 proteins have been identified by techniques such as gel comparison, microsequencing, N-terminal sequencing, and amino acid composition analysis (Table 2) . In contrast, the use of narrow-range pH gradients (pH 4 to 5, 4.5 to 5.5, 5 to 6, 5.5 to 6.7, 6 to 9, and 6 to 11) was shown to potentially display proteins existing at low levels (up to a few protein molecules per cell), resulting in the discrimination of >70% of the entire E. coli proteome (Table 2; reference 287). The number of displayed proteins was higher than that identified by non-gel-based approaches, but not all of the proteins could be identified. The main benefit of using narrow-pH-range IPG strips is that the total number of protein spots per pH unit that can be separated increases due to higher spatial resolution. However, in practice this approach results in only a moderate increase in the number of proteins detected compared to that detected by use of a single broad-pH-range gel. Narrow-pH-range IPG gels show variable and unreliable separation of proteins, especially when unfractionated complex protein samples are analyzed, because proteins having pIs outside the pH range of the IPG strip usually cause massive precipitation and aggregation on the gel.


View this table:
[in this window]
[in a new window]
 
TABLE 2. E. coli proteins identified on 2-D gelsa

 
As another interesting strategy for enhancing the separation capacity of 2-D gels, researchers have employed sample prefractionation methods, such as sequential extractions with increasingly stronger solubilization solutions, subcellular fractionation, selective removal of the most abundant protein components, preparative isoelectric focusing (IEF) separations, and chromatographic fractionation of sample mixtures. This strategy offers the benefits of high protein-loading capability along with the ability to discriminate two or more proteins migrating together. For example, since membrane proteins have proven difficult to solubilize with common solubilization agents such as urea, thiourea, 3-[(3-cholamidopropryl)dimethylammonio]-1-propanesulfonic acid (CHAPS), and dithiothreitol, Molloy et al. (201) introduced a new isolation method of sequential extractions with increasing concentrations of sodium carbonate in analyzing E. coli outer membrane proteins. This led to the successful identification of 21 out of 26 of the predicted integral outer membrane proteins. Similarly, Lai et al. (153) identified more than 200 E. coli membrane proteins by use of the method described by Molloy et al. (201), after modifying it to minimize nonmembrane protein contamination. The largest database of E. coli membrane proteins constructed to date is that reported by Fountoulakis and Gasser (68), who identified 394 different gene products using a method identical to that described by Molloy et al. (201). Notably, these studies demonstrate that membrane proteins, which are commonly absent from 2-D gel maps, are amenable to 2-DE separation using specific techniques.

As an alternative method, high-resolution preparative IEF separation can be combined with the use of narrow-pH-range IPG strips. Several preparative electrophoresis devices, such as Rotofor (Bio-Rad, Hercules, CA), IsoPrime (Amersham Biosciences, Uppsala, Sweden), and the ZOOM IEF fractionator (Invitrogen, Carlsbad, CA), have been developed for increasing the number of proteins separated and detecting less abundant proteins (334). For example, Herbert and Righetti (108) used a multicompartment electrolyzer (MCE) to prefractionate E. coli prior to 2-DE analysis and observed many more spots than with the standard maps available in databases such as SWISS-2DPAGE. This device appears simple, but it still contains large sample chambers (~100 ml), which are not compatible with samples available in small quantities. Zuo and Speicher (333) prefractionated E. coli using a ZOOM IEF fractionator and found that this initial step greatly enhanced the loading ability, resolution, and detection sensitivity of their 2-D gels. This method greatly conserves proteome samples compared with direct analyses of unfractionated samples on a series of narrow-pH-range 2-D gels. Most interestingly, MicroSol IEF prefractionation is compatible with most downstream proteome-profiling methods, including 1-DE, narrow-pH-range 2-DE, 2-D difference gel electrophoresis (2-D DIGE), and liquid chromatography (LC)-MS/MS methods.

Sample fractionation by chromatography can generate hundreds of fractions for individual 2-DE analysis, allowing enrichment of low-abundance proteins. This results in better qualitative and quantitative analysis of 2-D gels. The combination of LC, 2-DE, and MS/MS has expanded the upper limits of protein visibility typically obtainable by gel-based approaches, but this method has higher costs in terms of price, labor, and time.

Recently, some researchers have focused on subcellular proteomics (or organelle proteomics), which is proteome analysis of the macromolecular architecture of a cell, e.g., subcellular compartments, organelles, macromolecular structures, and multiprotein complexes. This technique has the added benefits of reducing sample complexity, identifying additional unique proteins, localizing newly discovered proteins to specific organelles, and, in some cases, allowing functional validation (121, 281). In terms of the E. coli proteome, subcellular proteomics based on 2-DE can be used to assign various proteins to the cytosol, periplasm, inner membrane, or outer membrane by biochemical fractionation; this method was used to assemble the largest proteome database to date, as shown in Table 2 (179). Analysis of 2,160 spots revealed 575 unique ORF entries, including 151 hypothetical ORF entries, 76 proteins of completely unknown functions, and 222 proteins currently not assigned in the SWISS-PROT database. Of the 575 different entries identified, 241 (42%) were found to exist in more than 1 form, at an average of 7.5 forms per entry. These findings indicate that proteomics involving sample fractionation and 2-DE can be a valuable research technique. However, we have to choose carefully an appropriate fractionation method that prevents substantial and variable protein cross-contamination among the multiple fractions, as this severely complicates the quantitative comparison of protein profiles. A more important factor for quantitative proteome analysis is the need to control separation quality and reproducibility.

The development of improved methodologies for the detection of protein spots has formed the basis for a number of remarkable advances in 2-DE research. A number of general protein detection methods have been developed using organic dyes, silver staining, radiolabeling, reverse staining, fluorescent staining, and chemiluminescent staining. Typically, the majority of researchers have used Coomassie brilliant blue and silver staining for protein detection, but these stains have low sensitivity and narrow linearity, respectively. In case of a radiolabeling method, which is the most sensitive detection method, the potential hazards of working with radioactive material, the limited shelf life, the costs of disposal, and problems with handling mixed waste have decreased its popularity.

Fluorescent dyes provide great sensitivity and broad, linear, dynamic responses compared to their colorimetric counterparts and are compatible with modern downstream protein identification and characterization procedures, such as MS. In comparison to their colorimetric counterparts, fluorophores are easy to handle, have long shelf lives, and have minimal disposal issues. Thus, fluorescence-based protein detection has become a more common practice in recent years. For example, 2-D DIGE was first introduced by Ünlü et al. (289) in 1997 and has been further developed by GE Healthcare (Chalfont St. Giles, Bucks, United Kingdom; formerly Amersham Biosciences, Uppsala, Sweden). The basis of the technique is the use of two or three mass- and charge-matched N-hydroxy succinimidyl ester derivatives of the fluorescent cyanine dyes Cy2, Cy3, and Cy5, which possess distinct excitation and emission spectra. Each labeled sample is then mixed and run simultaneously on a single 2-D gel. However, it should be noted that the use of amino group labels will favor detection of basic proteins over acidic proteins. This technology allows two or three samples to be coseparated under identical electrophoretic conditions, reducing the number of gels required while allowing more-accurate comparative proteome profiling (100). In a case study on the E. coli proteome after benzoic acid treatment (321), 2-D DIGE was shown to produce quantitative results more accurate than those produced with conventional 2-DE. As shown in Table 2 (DIGE pH range, 4.5 to 6.5), a total of 179 differentially expressed E. coli protein spots could be identified by use of matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) and quadrupole-time of flight MS, indicating that this technique not only avoids the complications of gel-to-gel variation but also enables a more accurate and rapid analysis of differences and reduces the number of gels that need to be run. Furthermore, since the gels can be directly scanned and imaged after electrophoresis, this process reduces artifactual features, and the image has a wider dynamic range and more sensitivity than other detection methods.

Recently, researchers have sought to develop detection methods suitable for revealing posttranslational protein modifications, such as glycosylation, phosphorylation, proteolytic modification, S nitrosylation, arginine methylation, and ADP ribosylation (229). For example, the multiplexed proteomics platform allows different samples to be run on separate 2-D gels that are individually stained, thus allowing parallel determination of protein expression levels and certain functional attributes, such as levels of glycosylation or drug-binding and-metabolizing capabilities. These multiplexing techniques have facilitated the use of 2-DE to examine fundamental proteome-wide changes in protein expression and posttranslational modifications in the past few years.

Together, the gel-based methods form the core of proteomic technology and the source of most of the published work on the E. coli proteome, despite their technical shortcomings. To date, 715 E. coli proteins (336 proteins available in the current E. coli SWISS-2DPAGE database plus an additional 379 nonredundant proteins reported in the literature) have been identified on 2-D gels (Fig. 3 and Table 2), with the number of identified proteins continuously increasing. However, it is important to note that an organism will not synthesize all the proteins under a given condition; for example, alkaline phosphatase (PhoA) is not synthesized by E. coli grown in normal growth medium but is significantly induced under a phosphate-limited condition (Table 2). While a great deal of progress in elucidating the E. coli proteome has been made, it is still extremely difficult (if not impossible) to examine the whole proteome of an organism under a given condition. More importantly, 2-DE will likely remain a key technology for the detection of protein variants that undergo proteolytic processing and posttranslational modifications such as phosphorylation or glycosylation. More protein spots will be identified as advanced MS technologies such as MALDI-TOF-MS, electrospray ionization (ESI)-MS, and MS/MS are paired with functional genomic studies based on the complete genome sequence. Thus, the gel-based techniques are, and will likely remain, highly useful tools for assessing differential protein expression.


Figure 3
View larger version (27K):
[in this window]
[in a new window]
 
FIG. 3. Distribution of E. coli proteins identified by gel-based and non-gel-based approaches. These figures plot the theoretical pI versus the theoretical MW (Mw) of the open reading frame products in E. coli. Shown are images of E. coli proteins identified by gel-based approaches (a) and non-gel-based approaches (b) and the virtual 2-D image of 4,237 E. coli K-12 ORF entries predicted by a predictive proteomic tool (c). Each crossbar represents a protein spot. The numbers of proteins found by gel-based and/or non-gel-based approaches and by predictive proteomic tools are compared in panel d. The total number of E. coli proteins nonredundantly identified by experiments is 1,627 (~38% of 4,237 ORF entries). For alkaline proteins (pI, >8.0), only 253 proteins (~19%) out of 1,356 ORF entries were identified so far. For the names and the exact locations of all these protein spots, see Fig. S1 in the supplemental material. The theoretical pI/MW ratios were calculated using the Compute pI/Mw tool (http://www.kr.expasy.org/tools/pi_tool.html).

 
Non-Gel-Based Approaches

MS has been used for identifying proteins resolved by 2-DE and other methods and also for direct analysis of complex protein mixtures. MS has essentially replaced the classical technique of Edman degradation, even in traditional protein chemistry (1, 111), because it is much more sensitive, can deal with protein mixtures, and offers much higher throughput. The use of MS techniques to identify proteins in complex samples depends on the existence of large protein sequence databases generally derived from DNA-sequencing efforts. There are two main approaches for mass spectrometric protein identification. First, the peptide mass fingerprinting method, initially suggested by Henzel and coworkers (107), involves measurement of the mass spectrum of an eluted peptide mixture, which is then compared with theoretically derived peptide mass databases generated by applying specific enzymatic cleavage rules to predicted and known protein sequences. Typically, protein mixtures are first separated by use of 2-DE, and protein spots are subsequently excised from the gel (251). The proteins contained in the gel pieces are digested using a sequence-specific protease, such as trypsin, and then the resulting peptides are analyzed by MS. When MALDI is used, the samples of interest are solidified within an acidified matrix, which absorbs energy in a specific UV range and dissipates the energy thermally. This rapidly transferred energy generates a vaporized plume of matrix and thereby simultaneously ejects the analytes into the gas phase, where they acquire charge. A strong electrical field between the MALDI plate and the entrance of the MS tube forces the charged analytes to rapidly reach the entrance at different speeds based on their mass-to-charge (m/z) ratios. Because trypsin cleaves the protein backbone at the arginine and lysine residues, the masses of tryptic peptides can be predicted theoretically from protein sequence databases. These predicted peptide masses are compared with those obtained experimentally by MALDI analysis. The protein can be identified correctly if there are sufficient peptide matches with a protein in the databases, resulting in a high score. A high degree of mass accuracy is critical for the unambiguous identification and elimination of the false positives. This technique allows rapid identification of proteins when a fully decoded genome is available. A disadvantage of this approach is that it does not directly provide a sequence-based identification, which results in clustering of proteins with similar masses and necessitates additional effort for the identification.

To solve this problem, a sequence-based approach has been applied to protein identification. In this method, there are two major mass spectrometric strategies that use ESI. The unique feature of ESI is that at atmospheric pressure it allows the rapid transfer of analytes from the liquid phase to the gas phase. The spray device creates droplets, which once in the MS go through a repetitive process of solvent evaporation until the solvent disappears and charged analytes are left in the gas phase. In one strategy, the unseparated mixture of peptides is applied to a low-flow nanoelectrospray device. The peptide mixture is electrosprayed from a very fine needle into the mass spectrometer. Individual peptides from the mixture are isolated in the first step and fragmented during the second step to sequence the peptides (hence MS/MS). Peptide fragments obtained by this method are derived from the N or C terminus of the protein and are designated "b" and "y" ions, respectively (322). The other strategy uses liquid chromatography for initial separation of peptides followed by sequencing as they elute into the electrospray ion source. This method can also be used without gel electrophoresis; in this case, a mixture of proteins is digested in solution and the scrambled sets of peptides are sequenced, ideally resulting in the mixture. A great deal of data can be obtained from a single run done in an automated fashion. The fragmentation data can be used to find matches in various protein and nucleotide sequence databases, including the expressed sequence tag and raw genomic sequence databases.

The most significant breakthrough in non-gel-based approaches was the development of methods involving the combination of n-dimensional prefractionation methods (1-D or 2-D LC) with MS, as shown in Table 1. In these methods, chromatographic separations by affinity, covalent chromatography, strong anion/cation exchange, size exclusion, or the use of packed reactive dye compound or reverse-phase columns are used to reduce the complexity of digested protein mixtures, and this is followed by an MS technique such as MALDI-TOF-MS, ESI-MS, or MS/MS for high-throughput identification of the fractionated peptides. Gevaert et al. (78) identified 800 E. coli proteins from sorted methionine-containing peptides by use of a combination of technologies consisting of combined fractional diagonal chromatography (COFRADIC), LC-MS/MS, and MALDI-TOF-MS (78). More than 1,100 E. coli proteins (a quarter of those encoded in the E. coli genome) were identified by high-performance liquid chromatography (HPLC)-MS/MS analysis (49). Perhaps the most popular of these techniques to date is multidimensional protein identification technology, often referred to as MudPIT (193). In this method, mixtures of trypsin-digested peptides are loaded onto a biphasic microcapillary column containing a strong-cation-exchange resin upstream of a reverse-phase resin directly coupled to an MS/MS. Peptides are displaced from the strong-cation-exchange resin using a salt step gradient and subsequently bind to the reverse-phase resin. Elution from the reverse-phase resin is accomplished using an acetonitrile gradient, and the peptides are analyzed online by MS/MS. Repeated rounds of step and gradient elutions can result in analysis and identification of a large number of peptides in a single run. Vollmer et al. (301) used this approach for the analysis of E. coli cellular extracts originating from lactose- and glucose-grown cultures, which resulted in the identification of 305 and 450 proteins, respectively, from a single experiment within the 95% confidence level. Results with these approaches can be achieved rapidly with small amounts of cell extract, and the software can quickly and accurately analyze the mass and/or sequence data. However, because of the complexity of any given proteome and the separation limits of 1-D or 2-D LC, it is still required to reduce the complexity prior to protein separation and characterization.

An advanced instrument that combines the benefits of high mass accuracy and highly sensitive detection is the Fourier transform ion cyclotron resonance (FTICR) mass spectrometer. FTICR-MS has recently been applied to identify low-abundance compounds or proteins in complex mixtures and to resolve species of closely related m/z ratios (261). Coupled with HPLC and ESI, FTICR-MS is able to characterize single compounds (up to 500 Da) from large combinatorial chemistry libraries and to accurately detect the masses of peptides in a complex protein sample in a high-throughput mode. Jensen and colleagues identified more than 1,000 E. coli proteins using capillary IEF (CIEF) combined with FTICR-MS (126, 127).

Another strategy for monitoring differential protein expression and identifying low-abundance proteins was introduced by Weinberger et al. (309). In this approach, proteins of E. coli lysates were digested, and the resultant peptides were selectively extracted by covalent attachment of methionine residues with bromoacetyl-reactive groups tethered to the surface of glass beads packed in small reaction vessels. The recovered methionine-containing peptides were profiled using the surface-enhanced laser desorption ionization retentate chromatography-MS method. The parent proteins of the selected peptides were then identified using ProteinChip MS/MS (Ciphergen Biosystems, Inc.). Of 34 proteins identified by this method (309), at least 5 (BglX, ParD, YeaM, YfiO, and YhgF; 12% of the total) were low-abundance proteins, demonstrating that this method is capable of visualizing proteins having low expression levels. However, this method does not seem to be suitable for detecting proteins with posttranslational modifications, such as proteolytic truncation, glycosylation, and phosphorylation.

In non-gel-based approaches, it should be noted that the quantities of extracted peptides described above may not truly represent nascent protein abundance, as it is possible that the peptide extraction and liberation steps could be biased by peptide properties such as hydrophobicity. For quantitative comparison, two samples may be labeled with stable isotopes prior to sample separation, either by metabolic incorporation or through chemical derivatization. In this way, proteins derived from the different samples (e.g., normal versus abnormal or untreated versus treated samples) can be directly separated, identified, and quantified using n-D LC-MS/MS (36, 303, 305). A recently developed, attractive method for quantitative comparison of two proteomes is the isotope-coded affinity tag (ICAT) method (331). The ICAT reagent has a protein-reactive group, a biotin tag, and an ethylene glycol linker connecting the two functional groups, which can be synthesized with hydrogen (light ICAT) or deuterium (heavy ICAT). For comparison, one sample is reacted with the light reagent and the second sample is reacted with the heavy reagent under identical labeling conditions. After trypsin digestion, the extremely complex tryptic peptide mixture is simplified by affinity purification of the cysteine-containing derivatized peptides on an avidin affinity resin. The eluted peptides are then analyzed using LC-MS/MS for simpler samples or LC/LC-MS/MS for more-complex samples. The ratios of MS signals from the light and heavy ICAT-labeled forms of the same peptide are compared to determine the relative abundances of the parent protein in the respective samples, and MS/MS is used to identify the proteins. A typical ICAT-MS experiment was used to measure proteome changes in E. coli cells treated with triclosan, an inhibitor of fatty acid biosynthesis (202). The technique provided good quantitative reproducibility and on average identified more than 450 unique proteins per experiment. Furthermore, ICAT-MS identified a number of E. coli proteins that had not previously been identified on 2-DE gels. However, the method was limited in that it was strongly biased to detect acidic proteins (pI, <7), underrepresented small proteins (MW, >10), and failed to detect hydrophobic proteins. Another weakness of the current ICAT method is that it requires the proteins to contain cysteine residues flanked by appropriately spaced protease cleavage sites (102). This problem was highlighted in the study of a multisubunit membrane protein, E. coli FoF1 ATP synthase (20), in which none of the membrane-embedded proteins in the Fo complex could be visualized by ICAT. In the E. coli genome, about 10 to 15% of the proteins do not contain cysteine residues, obviating the use of a cysteine-specific technology as a total-protein indicator. This cysteine-labeling problem could be overcome by devising ICAT reagents that react with other amino acid residues. Chakraborty and Regnier (36) introduced a new isotope-labeling method as a global internal standard technology for identifying and quantifying protein changes during overexpression of ß-galactosidase in E. coli. They used N-acetoxysuccinimide and N-acetoxy-[2H3]succinimide to differentially derivatize primary amino groups in peptides extracted and tryptic digested from cultures treated with 0.5 nM or 2 mM isopropyl-ß-D-thiogalactopyranoside. However, these authors tested the efficacy of their strategy only with ß-galactosidase; this work has not yet been extended to a large-scale proteomic analysis. In another use of the isotopic labeling method, Veenstra et al. (300) identified intact proteins from genomic databases with a combination of accurate molecular mass measurements and partial amino acid content analysis. Proteins extracted from E. coli cells grown in natural-isotopic-abundance minimal medium or minimal medium containing isotopically labeled leucine (Leu-D10) were mixed and analyzed by CIEF coupled with FTICR. The difference in the molecular masses between proteins labeled with the natural isotope or Leu-D10 was used to determine the number of Leu residues present in each protein. Information on the molecular mass and the number of Leu residues present could be used to unambiguously identify intact proteins (e.g., CspE, Mdh, and YggX).

Recently, a multiplexed protein quantitation strategy that provides relative and absolute measurements of proteins in complex mixtures was developed by Ross et al. (253). The multiplex strategy simultaneously determines the relative levels of proteins at multiple states (e.g., several experimental controls or time-course studies) for up to four samples in parallel. A multiplexed set of isobaric reagents that yield amine-derivatized peptides (iTRAQ reagents; Applied Biosystems, CA) was used for labeling at the N termini and lysine side chains of peptides in a digest mixture. The derivatized peptides are indistinguishable in MS but exhibit intense low-mass MS/MS signature ions that support quantitation. Absolute quantitation of targeted proteins can also be achieved using synthetic peptides tagged with one of the members of the multiplex reagent set. Aggarwal et al. (2) used this approach to study rhsA expression in E. coli. They were able to quantify 780 proteins, including several low-abundance proteins, such as transcription factors (DnaB and DnaG).

In addition to identifying proteins, characterizing interactions among proteins is important to understand dynamic biological processes in response to changes in cellular environment, since proteins often function as components of multisubunit complexes. Indeed, protein interactions are observed in nearly all cellular processes, and protein complexes are so ubiquitous that the biological function of an unknown protein can often be predicted from the functions of the proteins with which it is associated. Classically, ligand-binding methods, such as radioreceptor assays, were standard methods of determining protein interactions. Additionally, coimmunoprecipitation studies are commonly used to assess protein-protein interactions. High-throughput analysis of protein-protein interactions is now possible by pulldown assay coupled with MS; this method serves as an important alternative to the yeast two-hybrid system. In pulldown assays, a target is expressed in a cell (in vitro) or added to a cell lysate (in vitro), usually fused with a tag, such as glutathione S-transferase (269), polyhistidine (43, 180), or a tandem affinity purification (TAP) tag (34, 93) or its various relatives, including the sequential peptide affinity tags (34, 327) and split tag (92). The glutathione S-transferase or polyhistidine tag is immunoprecipitated, and associating proteins are then identified by immunological methods, sequencing, or MS. The TAP method was first used to purify complexes containing the acyl carrier protein (ACP) from E. coli. Besides the identification of several known partners of ACP, three proteins, including SpoT, IscS, and MukB, were found to interact with ACP. This method has recently been used to its full potential to build the interaction network of E. coli (34). The TAP procedure for isolating protein complexes makes use of site-specific recombination to introduce a dual tagging cassette into chromosomal loci. E. coli does not readily recombine exogenous linear DNA fragments into its chromosome, but the expression of the lambda general recombination system ({lambda}-Red) markedly enhances integration. This system consists of a DNA cassette bearing a selectable marker and either the TAP or sequential peptide affinity tag into the C termini of ORF products in E. coli. A total of 857 proteins, including 198 proteins that are most highly conserved and soluble nonribosomal ones essential in at least one bacterial species, were tagged successfully. Also, 648 proteins could be purified to homogeneity, and their interacting protein partners were identified by using MS and MS/MS. This network includes many new interactions as well as interactions predicted based solely on genomic inference or limited phenotypic data. However, it is important to verify various interactions observed this way, as there may be false positives.

Taken together, many of the proteins in the E. coli proteome have been identified by using more than one method, whereas others have been uniquely identified by one particular method, indicating that these techniques are complementary to each other. More than 1,486 E. coli proteins from the two major databases (49, 78) were identified using non-gel-based approaches (Fig. 3). A total of 1,627 proteins, which correspond to more than one-third of the E. coli proteins (e.g., the ~4,237 proteins of E. coli K-12 from the NCBI database), were identified by gel-based and non-gel-based approaches. Among them, 574 proteins were identified in common by gel-based and non-gel-based approaches. The non-gel-based approaches showed clear superiority over 2-DE methods in monitoring alkaline proteins (pI, >8.0) but still need technical improvement.

Non-gel-based analyses can be done for the samples with or without tags, which cause different problems. The former condition results in poor recovery of peptides and proteins by specific amino acid residue labeling, while the latter causes higher complexity and inaccurate quantitation. These problems can lead to the identification of proteins with low confidence (or false positives). Thus, it would be helpful to develop a multiple labeling system for a given sample, which would allow MS analyses of each tag to eliminate false positives and increase confidence. The development of search algorithms and databases with high accuracy is of continued interest and importance. They have been continuously developed and updated, as shown in Table 3. The proper assignment of the MS/MS data to the sequences in the databases would enhance the quality and quantity of data collected by non-gel-based approaches.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Useful databases for proteomic and related studies

 
Predictive Proteomics

Although the complete proteome of an organism cannot be obtained by gel-based or non-gel-based approaches, it can be predicted from the complete genome sequence. The predictive proteome of E. coli MG1655 was examined in this manner and was found to consist of 4,288 ORF products (28). The predictive proteome can be displayed like a 2-D gel, as shown in Fig. 3c, represented by the predicted isoelectric point (pI) versus the predicted molecular masses of the putative ORF products by use of the Compute pI/Mw tool in ExPASy (http://www.expasy.org/tools/pi_tool.html) (295) or virtual 2-D databases (Table 3). Predictive data readily can be compared with the experimental data from actual proteome analyses. For example, alkaline proteins (pI, >8.0), which include 253 proteins (~19%) out of 1,356 ORF entries identified, are currently underestimated. Recently, a more realistic virtual 2-D gel was created based on the relationship between expression-level-dependent features in codon usage and protein abundance (194). Compared with results from a real 2-D gel experiment conducted with a protein extract from exponentially growing E. coli cells, many abundant proteins identified in the real gel corresponded to abundant proteins in the virtual 2-D gel. This computational approach can help researchers to determine the appropriate 2-D gel composition for optimal separation of proteins. Thus, predictive proteomics can be used to extract valuable information on the function, topology, localization, and structure of E. coli proteins. In recent years, many bioinformatics researchers have created and developed computer-based tools and databases, as shown in Table 3. For example, protein topology prediction methods allow identification of possible membrane-bound proteins, allowing researchers to predict protein location and sometimes even function and structure. Several programs for predicting transmembrane segments exist, with prediction accuracies reportedly as high as 80% (124, 199, 330). Predicting the subcellular localization of proteins by computational method has been attracting much research interest during recent years. Computational methods for predicting protein subcellular localization can generally be divided into the following four categories based on the prediction method (58): (i) by the overall amino acid composition, (ii) by known targeting sequences, (iii) by sequence homology and/or motifs, and (iv) by a hybrid method which combines the above three elements. Several tools listed in Table 3 allow researchers to readily identify protein localizations and functions and to estimate the efficiencies of different methods, such as subcellular fractionation.

Recently, a neural network-based method was used to predict the bonding state of cysteines from the protein sequence (187), allowing researchers to predict the entire content of disulfide-rich proteins in a proteome (the so-called disulfide proteome). The formation of disulfide bonds between the paired cysteine residues is a key step in the folding process of many proteins. This method predicted the percentage of proteins with disulfide bonds (6% of 4,173 proteins) in E. coli K-12 with 86% accuracy. The percentage of proteins with disulfide bonds is higher in the extracytoplasmic compartment (18% of 405 proteins) than in the cytoplasmic space (5% of 2,796 proteins), confirming that the extracytoplasmic proteins are more likely to form disulfide bridges due to a more oxidizing environment.

In addition, predictive proteomics can identify a significant number of previously unknown candidate proteins within an organism or might reveal interesting characteristics of the organism. For example, the histograms of pI values computationally estimated for all predicted ORF products encoded by the fully sequenced genomes revealed bimodality in bacterial and archaeal genomes and trimodality in eukaryotic genomes (265). The nuclear proteins have a broader distribution that accounts for the third mode observed in eukaryotes. This distribution suggests that whole-proteome pI values correlate with subcellular localization of proteins. However, even with all the benefits of computational approaches, the probable functional relations obtained in silico must still be confirmed at least in vitro and ideally in vivo.


   CURRENT STATUS OF THE E. COLI PROTEOME
 Top
 Previous
 Next
 References
 
The recent studies on the E. coli proteome can be classified into two main topics: proteomics for biology and proteomics for biotechnology. An enormous number of E. coli proteome studies have focused on improving our biological knowledge regarding proteins and finding members of regulons and/or stimulons under particular conditions (290, 292); these studies are referred to as "proteomics for biology." Other groups have studied the E. coli proteome under various genetic and/or environmental perturbations in an effort to develop strategies for improving cellular properties and enhancing the production of bioproducts based on comparative proteome profiles (95); these studies are referred to as "proteomics for biotechnology."

Proteomics for Biology

Proteomics has changed the way in which cellular physiology is studied. Previously, one or more proteins were chosen as models for understanding local physiological phenomena. These days, proteomic studies allow researchers to identify large members of stimulons or regulons and to obtain information that indicates which specific proteins should be studied further. When subjected to environmental perturbations, E. coli cells undergo fundamental changes in cellular physiology and/or morphology, as reflected and directed by changes in the global gene and protein expression patterns. Up- and downregulation of specific protein sets is seen in response to a number of chemical and physical stresses, such as heat, oxidative agents, and hyperosmotic shock; these responses are thought to act as protective mechanisms leading to elimination of the stress agent and/or repair of cellular damage. Thus, the cellular responses, as reflected by the proteome, can differ widely according to the stresses imposed. Comparative proteome profiling under various genotypic and environmental conditions can reveal new regulatory circuits and the relative abundances of protein sets at the system-wide level.

In one of the first studies using proteomics, comparison of 2-D gels allowed identification of a large group of E. coli heat shock proteins (166, 208, 210, 296). In the following years, many E. coli proteomic studies revealed changes in proteome profiles in response to various stresses, such as changes in pH (24, 27, 274, 323), cell density (70, 325), and temperature (109, 291); organic solvents (321); nutrient starvation (293, 311); and anaerobic conditions (271) (Table 2). These studies resulted in the identification of various E. coli stress-induced stimulons (Fig. 4). The applied stresses were found to affect the observable proteome size by anywhere from a few proteins to nearly half of the proteins in the cell. Some of the altered proteins appear to be general stress-induced proteins, while others appear specific to particular environmental stimuli. More importantly, these studies also showed that the responses of an organism to an environmental stimulus are not simply the sum of independent responses of individual genes but rather seem to be a coordinated series of linked events leading to cross-adaptation among the stress responses.


Figure 4
View larger version (50K):
[in this window]
[in a new window]
 
FIG. 4. The cascade-like regulation observed with various stimulons and/or regulons in a complex regulatory network. The circles indicate regulons, while the rectangles indicate stimulons. Stimulons in which proteins are induced by stimuli such as stationary phase, temperature shock, pH variation, oxidative stress, and starvation are shown in the respectively labeled panels. Regulons shown in large circles are accompanied by small circles which represent major regulators for the corresponding stimuli. One signal activates or represses many regulators, as shown in small circles, to control the transcription and translation of various genes, leading to complex interactions in the cell. For example, E. coli cells enter the stationary phase in response to complex stresses such as cell growth, increased cell density, the presence of byproducts or toxic substances, and inappropriate conditions (restriction of oxygen, low/high temperature and pH, and limitation of nutrients). This complex response is mediated by a variety of specific regulators in addition to the master regulator, RpoS, which is controlled by itself or by other proteins (see the text for a detailed explanation). Abbreviations: HNS, nucleoid-associated protein; IHF, integration host factor; Lrp, leucine-responsive protein; Fis, factor for inversion stimulation.

 
The complex and physiologically far-reaching responses of E. coli are often under the control of master regulators located at the interface between upstream signal processing and downstream regulatory mechanisms. These master regulators, which serve as the decisive information-processing units, connect complex signaling networks with the downstream regulatory cascades or networks that ultimately control expression of the response-associated genes. A regulon is a set of proteins whose synthesis is regulated by the same regulatory proteins, while a stimulon is a set of proteins whose amount or synthesis rate changes in response to a certain stimulus. At the molecular level, each stimulon may be composed of more than one regulon, each controlled by a different molecular factor. The dissection of stimulons into regulons is based on comparison of the induction patterns in wild-type cells versus those of strains having mutations in known regulatory elements. In this way, regulators such as the RNA polymerase sigma factor RpoS (192), the histone-like protein H-NS (19, 118, 158), and the leucine-responsive regulatory protein Lrp (61) have been studied based on the comparative analysis of wild-type and mutant or double mutant strains.

The highly complex and nonlinear behaviors of these networks have complicated their studies, but proteomic analysis of cells stimulated by one or more signals (stimulons) or cells lacking one or more global regulators (regulons) has provided new insight into the stimulus-response processes in E. coli. Proteomics has allowed researchers to isolate new stress-associated and stress-specific protein markers, identify all proteins controlled by a certain regulatory protein, and understand the integrated cellular metabolic and regulatory networks. Here, several main cellular responses of E. coli under different conditions are described in more detail based on proteome analyses. Moreover, a main regulator and/or its coordinated regulators in response to each stress are discussed.

Stationary-phase response. At the onset of the stationary phase, E. coli cells undergo a global modification of their protein expression pattern, leading to the acquisition of resistance to complex stresses such as increased cell density, the presence of toxic byproducts, and nutrient limitation. Overall, these properties result in better cell survival under adverse conditions. One top-level master regulator of this genetic program is an RNA polymerase sigma factor called RpoS ({sigma}S, {sigma}38, or KatF), which is encoded by the rpoS gene (104, 155). This {sigma}S has been reported to control an E. coli regulon comprised of 70 or more genes expressed in response to starvation or during the transition to stationary phase (104). These genes were identified by transcriptional analysis of specific genes as well as by proteomic approaches (Fig. 4).

In terms of upstream signal processing, the {sigma}S regulon can be divided into subfamilies of genes regulated by specific stresses and/or additional global regulatory proteins. As shown in Fig. 4, many of these subsets of {sigma}S-dependent genes or proteins may also be induced by stresses such as anaerobiosis (12), oxidative stress (6), and osmotic stress (105). Additionally, these genes are regulated by transcription factors specific for certain stress responses (e.g., OxyR, which is involved in the oxidative stress response) or more-global regulators, such as H-NS, IHF, cyclic AMP (cAMP)-cAMP receptor protein (CRP), and Lrp (104). These regulators can individually or coordinately affect many {sigma}S-dependent stationary-phase-responsive proteins. For instance, comparative proteome analysis of an H-NS deletion mutant and the wild-type strain revealed that some of the {sigma}S-dependent proteins or genes, including rpoS itself, were controlled by the H-NS regulon (19).

In terms of downstream regulatory mechanisms, the starvation-induced DNA protection protein Dps, which is one of the {sigma}S-dependent stationary-phase-responsive proteins, was also found to affect the expression of other proteins in E. coli (5). Dps was rapidly degraded during exponential growth by the protease ClpXP (which is regulated by {sigma}32 or RpoH) but was stabilized under conditions of carbon starvation or oxidative stress. This, along with increased Dps synthesis, results in the high-level accumulation of Dps during the stationary phase (276), showing that Dps levels are specifically controlled under certain stress conditions. In addition, studies have shown that {sigma}S itself is also controlled by the subregulatory protease ClpXP and the recognition factor RssB (332). Collectively, these findings indicate that {sigma}S is controlled by a complex signal transduction network with redundancy, additivity, and internal feedback regulatory loops, resulting in its sophisticated regulations.

Temperature response. Protein expression in E. coli can be altered significantly when cells are grown at temperatures outside the normal range. This response plays a critical role in protecting the cells from temperature stress, producing tolerance, or repairing cellular systems. The E. coli cellular response to high temperature includes the synthesis of a set of highly conserved proteins known as the heat shock proteins (249). Similarly, a separate, nonoverlapping group of proteins known as cold shock proteins are produced during the period of growth cessation following a shift from 37°C to 10°C (283).

Many of the heat shock proteins are molecular chaperones that function to bind newly synthesized partially folded or unfolded proteins and promote their folding and refolding by limiting the nonproductive interactions that lead to aggregation and misfolding. Some of the other heat shock proteins are proteases that function to degrade misfolded or abnormal proteins (209). These proteins were first recognized as being highly abundant when examined with 2-D gels in 1978 (166). Later, Neidhardt and colleagues (208, 210, 211, 296) used 2-D gels to monitor the synthesis rates of individual proteins before and after heat shock and identified a number of proteins whose synthesis rates were dramatically increased following the temperature increase. Initially, these proteins were named by their positions on 2-D gels. Later, many of the spots were identified as known gene products, including DnaK (77), GroEL (211), GroES (284), GrpE (335), La (lon) protease (237), and LysU (168). For E. coli, at least 34 heat shock proteins have been identified to date by use of a combination of genomics, transcriptional analysis of specific genes, and proteomics (Fig. 4). The characterized proteins include the main cellular chaperones, DnaK and GroELS; the ATP-dependent proteases ClpP, DegP, FtsH (FhlB), HslVU, and La; and other proteins involved in protein folding, refolding, quality control, and degradation (249). Other important heat shock proteins include HTS (homoserine transsuccinylase), which is a key enzyme in methionine biosynthesis (23); protein pairs involved in protein isomerization, such as HtrM (240) and PpiD (55); and the vegetative sigma factor {sigma}70 (33, 91, 282).

The synthesis of the major heat shock protein regulon is controlled by the alternative sigma factor {sigma}32 (encoded by the rpoH gene), which guides RNA polymerase to the heat shock promoters (91). In addition, E. coli contains a second heat-responsive regulon, which is controlled by an alternative sigma factor, {sigma}E (encoded by the rpoE gene) (91). It is thus possible that the heat-mediated induction of some genes may occur via other mechanisms and regulators that remain to be elucidated. For example, members of the phage shock protein (Psp) family, which are induced in response to filamentous phage infection as well as in response to heat, ethanol, and osmotic shock, do not require the action of {sigma}32 (32). Furthermore, a large set of heat shock proteins was found to be induced by other stimuli, such as exposure to denaturing conditions (i.e., the presence of alcohols or of heavy metals) (91). The proteins induced under various stress conditions can overlap with one another to degrees ranging from complete overlap to no overlap at all. For example, in E. coli, the heat shock and ethanol stimulons overlap, while the heat shock and cold shock responses have no shared proteins (133).

Additional information on the heat shock response has been obtained by examination of subproteomes. For example, proteins damaged or unfolded by elevated temperatures during heat shock tend to aggregate (198); thus, a proteomic study of the aggregates can be used to define the thermally unstable proteins. This study is also important for the elucidation of cellular protein quality control mechanisms, because damaged proteins can be refolded with the aid of chaperones or can be degraded by proteases (84). An example of one such study is the investigation of E. coli aggregates at various temperatures, which contained 350 to 400 protein species that were all classifiable as substrates of the ClpB and DnaK chaperones (285). Another proteomic study on the DnaKJ- and ribosome-associated trigger factor mutant strains revealed approximately 340 spots of aggregated proteins in two mutant strains. All major aggregated proteins were shared between the two mutants, indicating that they cooperatively assist the folding of newly synthesized proteins in E. coli (57). A similar study indicated that the major cytosolic energy-dependent proteases are involved in preventing aggregation, because protein aggregates formed in their absence showed increased concentrations and contained more protein species (250). These studies suggest that it may be possible to use proteomic analysis of aggregates to identify specific substrates for the various chaperones and proteases.

One major advantage of using 2-D gels to analyze the heat shock proteome is the ability to discriminate posttranslationally modified proteins from their nonmodified forms. The major chaperones, DnaK and GroEL, appear as multiple spots on a narrow range of 2-D gels (pH 4.5 to 5.5), indicating that they may be present in several forms within the E. coli cell (Table 2). In addition, MALDI-TOF peptide mass fingerprinting allowed identification of E. coli ribosomal proteins with posttranslational modifications such as acetylation, methylation, ß-methylthiolation, multiple methylations, and amino acid cleavage (11). These findings provide important insights into the regulatory mechanisms of heat shock response by protein modification.

Proteome analysis of E. coli cells adapting to low temperatures has also been carried out. In E. coli, a downshift in temperature causes transient inhibition of most protein synthesis, resulting in a growth lag called the acclimation phase. During the acclimation phase, a group of cold shock proteins is dramatically induced (Fig. 4), while the heat shock proteins remain repressed (133). A single regulatory factor for the cold shock response has not been identified, but some cold-induced proteins are essential for the cells to resume growth at low temperature and have been shown to function in various cellular processes. CspA, CspB, and CspG, which belong to a family of structurally related cold shock proteins, show the highest induction in response to cold shock (207). CspA has long been suspected to play a regulatory role in this response by destabilizing mRNA secondary structures, allowing more-efficient translation at low temperatures (81). The CspA mRNA appears to become more stable during a shift to temperatures below 30°C. In addition members of to the Csp family, 12 out of 16 cold shock proteins in E. coli have been further identified by 2-DE (283). As shown in Fig. 4, these include the following proteins: GyrA, the a-subunit of the topoisomerase DNA gyrase (132, 134); H-NS, a nucleoid-associated DNA-binding protein required for optimal growth at low temperature (157); Hsc66, a homolog of Hsp70 (165); NusA, which is involved in both termination and antitermination of transcription (132); PNP, which is an exoribonuclease (132); and RecA, which is involved in recombination and induction of the cold shock response (283). In addition, three cold shock proteins have been shown to be ribosome associated: initiation factor 2 (IF2); CsdA, which is an RNA-unwinding protein (132); and ribosome-binding factor A (RbfA) (130, 132).

The latter group is interesting in that cells experiencing the transition to a low temperature showed accumulation of 70S monosomes, decreased the number of polysomes, and stabilized RNA and DNA secondary structures (283). This transition decreases efficient mRNA translation and leads to a transient reduction in cell growth rate (131). Both CsdA and RbfA are required for optimal growth at low temperatures, indicating that these two newly produced ribosome-associated proteins, along with enhanced synthesis of IF2, are required for efficient ribosomal function at low temperatures. Comparative 2-DE proteomic analyses of E. coli cultures treated with various antibiotics or with temperature shock (291) showed that the heat and cold shock responses could be mimicked by different sets of ribosome-targeting antibiotics. For example, streptomycin, puromycin, and kanamycin induced protein expression patterns that resemble the heat shock and stringent responses, while tetracycline, chloramphenicol, erythromycin, fusidic acid, and spiramycin invoked the cold shock and relaxed ribosomal responses (291). Based on these findings, researchers have suggested that translational blocks induce heat shock-like or cold shock-like responses, indicating that the state of the ribosome or a ribosomal product may signal these responses.

A recent study on the E. coli cells experiencing a temperature shift from 37°C to 4°C using improved 2-DE methods showed that 69 proteins were overexpressed and that the total number of proteins decreased by 40% (233). Further 2-DE studies on cold shock responses were carried out under several different conditions, including suspended cells at the exponential phase, suspended cells at the stationary phase, and cells immobilized on 2% (wt/vol) agar. Comparative analysis of the proteomes obtained with and without cold shock allowed identification of 203 protein spots showing expression changes during the temperature downshift (between 10 °C and 4°C or when 4°C was reached) compared with synthesis at 37°C using a principal component analysis (234). In suspended E. coli cells, the synthesis levels of 91 proteins (71 newly synthesized proteins and 26 induced proteins) were altered at the exponential phase after cold shock, and the synthesis levels of 59 proteins (34 newly synthesized proteins and 25 induced proteins) were changed at the stationary phase after cold shock. In immobilized E. coli cells, the synthesis of 53 proteins (33 newly synthesized proteins and 20 induced proteins) was induced by cold shock. These results suggest that the number of cold shock-induced proteins was originally underestimated and that further proteomics work will likely uncover a large cohort of cold shock response proteins.

pH response. Since E. coli requires homeostasis of internal pH in the range from 7.4 to 7.9, the pH response is important for cellular growth and survival under conditions of fluctuating pH levels. In addition, pH plays an important role in pathogenic bacteria. Pathogenic strains of E. coli, Salmonella enterica serovar Typhimurium, Shigella flexneri, and Vibrio cholerae often encounter extreme pH conditions both within and outside human hosts. During pathogenesis, cells are exposed to low pH in the stomach or within the phagosomes and phagolysosomes of intestinal epithelial cells or macrophages. Consequently, low pH induces virulence factors that contribute to pathogenesis, such as the virulence regulator ToxR in V. cholerae (268). Perturbations in pH can exert many significant effects on cell growth and induce different classes of proteins, such as stress proteins, redox modulators, and envelope proteins (223). The major pH-responsive proteins that have been identified by genomics, proteomics, and other technologies (268) are shown in Fig. 4. Among them, the acid stress chaperones HdeA and HdeB enhance survival under extremely acidic conditions (10, 72), while the membrane-bound Na+/H+ antiporter NhaA protects cells from excess Na+ under high-external-pH conditions (141). Proteome analyses have allowed identification of a number of other pH-responsive proteins in E. coli. Initially, 2-D gels were used to elucidate individually the cellular protein responses to changes in pH (294), to aerobiosis, and to anaerobiosis (270, 271). The pH-dependent response is often coinduced by other environmental factors, such as growth phase, anaerobiosis, and various metabolites. Thus, most of the proteomic studies on pH responses have been performed under specific aerobic and/or anaerobic conditions, allowing identification of new classes of acid- and base-dependent regulators and dissection of the relationship between pH and oxygen levels. For example, Blankenhorn et al. (27) identified a total of nine pH-responsive proteins from 18 spots during aerobic or anaerobic growth: five acid-induced proteins (GatY, ManX, PtsH, YfiD, and the aerobic acid-induced protein AhpC); three base-induced proteins (MalE, TnaA, and the anaerobic base-induced protein GadA); and one aerobic acid- or base-induced protein (AceA). Stancik et al. (274) identified a total of 22 pH-dependent proteins from 44 spots during aerobic or anaerobic growth: 9 acid-induced proteins (DeaD, LuxS, RibB, SodB, SucB, SucC, YcdO, YdiY, and YfiD); 11 base-induced proteins (AroK, CysK, DksA, DsbA, MalE, OmpA, Pta, RpiA, Ssb, TnaA, and YceI); and 2 acid- or base-induced proteins (OmpX and Tpx). Furthermore, Yohannes et al. (323) identified a total of 28 pH-dependent proteins from 32 spots in anaerobic cultures: 11 acid-induced proteins (AckA, GatY, GuaB, HdeA, Hmp, Lpd, NikA, OmpA, Ppa, TolC, and Tsf) and 17 base-induced proteins (AccB, DegQ, DhaK [YcgT], DhaL [YcgS], DhaM [YcgC], DsbA, GapA, HisC, HisD, MalB, MglB, OppA, ProX, Tig, TnaA, UspA, and YjgF). These findings indicate that low pH accelerates acid consumption and proton export while coinducing the oxidative stress and heat shock regulons. In contrast, high pH accelerates proton import while repressing the energy-expensive flagellar and chemotaxis regulons. Furthermore, pH differentially regulates a large number of periplasmic and envelope proteins as well as enzymes involved in several pH-dependent amino acid and carbohydrate catabolism pathways. High pH was shown to favor the catabolic pathways that generate NH3 and fermentation acids (AstD, CysK, DhaKLM, GabT, GadAB, GapA, SdaA, and TnaA), whereas low pH favored pathways that generate CO2 and amines (Adi, CadA, GadAB, and SpeF). Among these, Adi, AstD, CadA, CysK, GabT, SpeF, and TnaA were also significantly induced by anaerobiosis. Recently, researchers have used enrichment by column chromatography on reactive dye columns for the proteomic investigation of low-abundance acid-responsive proteins in E. coli grown at either pH 7.0 or pH 5.8 (24). This work allowed identification of new pH-responsive proteins: six acid-induced proteins (EF-Ts, GdhA, PanC, ProC, TkrA, and YodA) and five acid-repressed proteins (AroG, EF-Tu, FabI, GlyA, and PurA).

During the growth of E. coli, the external pH can be substantially changed by fermentative generation of acids or through aerobic consumption of acids. External acids which show an amplified uptake in response to increased pH gradients, such as acetic and formic acids, have been shown to induce heat shock proteins, oxidative stress proteins, and the RpoS regulon (146, 256). Several studies using proteomic approaches have revealed that benzoate induced heat shock and universal stress proteins (154), while propionate induced pH-responsive proteins such as AhpC, GatY, ManX, and YfiD (27). Treatment of E. coli cells with acetic acid increased the expression levels of 37 proteins, including periplasmic transporters for amino acids and peptides (ArtI, FliY, OppA, and ProX), metabolic enzymes (GatY and YfiD), the RpoS growth phase regulon, and the autoinducer synthesis protein LuxS (146). In contrast, acetic acid repressed 17 proteins, including phosphotransferase (Pta) (146). Similarly, an ackA-pta deletion, which abrogated the interconversion between acetate and acetyl-coenzyme A (CoA), led to elevated basal levels of 16 of the acetate-inducible proteins, including the RpoS regulon. Consistent with RpoS activation, the ackA-pta strain also showed constitutive extreme-acid resistance (146). On the other hand, treatment of E. coli cells with formic acid repressed 10 of the acetate-inducible proteins, including the RpoS regulon (146). Acetic and formic acids appear to exert opposite effects on proteins such as arginine-binding periplasmic protein 1 (ArtI), DNA-protecting protein during starvation (Dps), cysteine-binding periplasmic protein (FliY), tagatose-bisphosphate aldolase (GatY), extreme-acid periplasmic chaperone (HdeA and HdeB), hyperosmotically inducible protein Y (OsmY), and 6-phosphofructokinase isozyme 2 (PfkB). Membrane-permeable acids also induce the Mar multiple drug resistance regulon, which is coregulated by the SoxRS superoxide stress system (252). Several genetic systems are coregulated by pH and growth phase; for example, the RpoS growth phase sigma factor regulates several components of resistance to both acids and bases. Thus, the effects of pH on global cellular regulation are complex because they overlap with other environmental factors such as oxygenation, growth phase, and various metabolites. Current proteomic studies continue in an effort to dissect the relationships among the effects of pH, oxygen level, and osmolarity from combinatorial stimuli.

Oxidative stress response. Reactive oxygen species (ROS) are produced as an inescapable consequence of aerobic life and are maintained at low, tolerable levels within cells by the actions of specific enzymes, such as superoxide dismutase (SodA). The expression levels of these defense enzymes are modulated in response to the environmental oxidative threat. However, this basic protection is not sufficient to protect cells against sudden large increases in ROS, which can act negatively on important cellular materials, including lipids, proteins, certain enzyme prosthetic groups, and DNA (183). To cope with oxidative stress, E. coli cells trigger rapid global responses designed to eliminate ROS, repair oxidative damage, bypass damaged functions, and induce adapted metabolism, thus allowing the cells to persist under high-ROS conditions. In E. coli, the SoxRS and OxyRS regulatory systems are known to control many of the oxidative stress-responsive proteins.

The soxRS regulon is induced in a two-stage process. Upon activation, SoxR induces expression of the soxS gene in response to superoxide-generating agents, and then SoxS activates transcription of genes within the regulon. About 40 E. coli proteins are induced, including the following: superoxide dismutase (SodA), which might associate with DNA to provide special protection from superoxide damage (275); endonuclease IV (Nfo), which is involved in DNA repair (40, 169); glucose-6-phosphate dehydrogenase (Zwf), whose increase is expected to elevate the pool of NADPH (254); fumarase (FumC) (174); aconitase (AcnA) (73); and NADPH ferredoxin:oxidoreductase (Fpr), which may serve to maintain FeS groups in the reduced form (173). The E. coli acn mutants were shown to be hypersensitive to the redox stress reagents H2O2 and methyl viologen (278). Physiological and enzymological studies have shown that AcnB is a major citric acid cycle enzyme synthesized during the exponential phase, whereas AcnA is a more stable stationary-phase enzyme, which is also specifically induced by iron and oxidative stress (53). Proteomic analyses have further revealed that the level of SodA is enhanced in acnB and acnAB mutants and by exposure to methyl viologen (278). The amounts of other proteins, including thioredoxin reductase, 2-oxoglutarate dehydrogenase, succinyl-CoA synthetase, and chaperones, were also affected in the acn mutants. These studies demonstrated that AcnA enhances the stability of the sodA transcript, whereas AcnB lowers its stability. Thus, aconitases serve as a protective buffer against the basal level of oxidative stress that accompanies aerobic growth by acting as a sink for ROS and modulating translation of the sodA transcript.

Similarly, the OxyRS regulon is also induced in a two-stage process. Exposure of cells to H2O2 in a range from 5 to 200 µM activates OxyR and enhances the synthesis of ~40 proteins (183), including HPI catalase (KatG) (279), an NADPH-dependent alkyl hydroperoxide reductase (AhpCF) (279), glutathione reductase (GorA) (183), and Dps, which nonspecifically binds DNA to protect cells from H2O2 toxicity (6). In an oxyR-deleted mutant strain, 20 to 30 enzymes were found to remain H2O2 inducible (86). Some of these enzymes are also elevated during other stress responses, including exposure to redox cycling agents and heat shock.

These enzyme responses to oxidative stress are underpinned by metabolites or proteins such as NADPH, NADH, thioredoxin, and glutathione, which remove harmful oxygen species by stoichiometric reactions. In particular, thioredoxin, a ubiquitous and evolutionarily conserved protein, modulates the structure and activity of proteins involved in a spectrum of processes, such as gene expression and the oxidative stress response (183). A comprehensive analysis of the thioredoxin-linked E. coli proteome was performed by using tandem affinity purification and nanospray microcapillary MS/MS (151). A total of 80 thioredoxin-associated proteins were identified, and their various functions suggest that thioredoxin is involved in at least 26 distinct cellular processes, including transcription regulation, cell division, energy transduction, and several biosynthetic pathways. These thioredoxin-associated proteins either participate directly (AhpC, KacG, and SodA) or have key regulatory functions (AcnB and Fur) in cellular detoxification. Transcription factors including NusG, OmpR, and RcsB, which are generally not considered to be under redox control, were also associated with thioredoxin, providing compelling evidence for an extensively coupled network of redox regulation of E. coli.

E. coli cells treated with nontoxic levels of the superoxide-generating redox cycling agents menadione and paraquat showed dramatic changes in protein composition as monitored by 2-D gel analysis (87). The distribution of proteins synthesized after treatment with these agents overlapped significantly with that seen after H2O2 treatment. In addition, the redox cycling agents elicited the synthesis of at least 33 other proteins that were not H2O2 responsive. These include three heat shock proteins (C41.7, C62.5, and GroES), the Mn-containing superoxide dismutase (SodA), the DNA repair protein endonuclease IV (Nfo), and glucose-6-phosphate dehydrogenase (Zwf). At least some of these redox inducible proteins appear to be part of a specific response to intracellular superoxide, indicating that E. coli cells are equipped with a network of inducible responses against oxidative damage which are controlled via multiple regulatory pathways.

Starvation response. The complex response of E. coli to nutrient starvation includes the sequential synthesis of starvation-inducible proteins. Although starvation for different individual nutrients generally provokes unique and individual patterns of protein expression, there are some