SUMMARY
SUMMARY Proteomics has emerged as an indispensable methodology for large-scale protein analysis in functional genomics. The Escherichia coli proteome has been extensively studied and is well defined in terms of biochemical, biological, and biotechnological data. Even before the entire E. coli proteome was fully elucidated, the largest available data set had been integrated to decipher regulatory circuits and metabolic pathways, providing valuable insights into global cellular physiology and the development of metabolic and cellular engineering strategies. With the recent advent of advanced proteomic technologies, the E. coli proteome has been used for the validation of new technologies and methodologies such as sample prefractionation, protein enrichment, two-dimensional gel electrophoresis, protein detection, mass spectrometry (MS), combinatorial assays with n-dimensional chromatographies and MS, and image analysis software. These important technologies will not only provide a great amount of additional information on the E. coli proteome but also synergistically contribute to other proteomic studies. Here, we review the past development and current status of E. coli proteome research in terms of its biological, biotechnological, and methodological significance and suggest future prospects.
INTRODUCTION
Escherichia coli, one of the best-characterized prokaryotes, has served as a model organism for countless biochemical, biological, and biotechnological studies. Since the completion of the E. coli genome-sequencing project (28), this organism has been characterized on the genome-wide scale in terms of its transcriptome, proteome, interactome, metabolome, and physiome by use of DNA microarray, two-dimensional (2-D) gel electrophoresis (2-DE) coupled with mass spectrometry (MS), liquid and gas chromatography coupled with MS, and bioinformatics (34, 176, 217, 226, 325). Recent advances in these functional genomics studies have facilitated understanding of global metabolic and regulatory alterations caused by genotypic and/or environmental changes. DNA microarray has proven to be a successful tool for monitoring whole-genome-wide expression profiles at the mRNA level (176). Similarly, proteomics can be employed to compare changes in the expression levels of many proteins under particular genetic and environmental conditions. Unlike transcriptomics, which focuses on gene expression, proteomics examines the levels of proteins and their changes in response to different genotypes and conditions. The studies on proteomes under well-defined conditions can provide a better understanding of complex biological processes and may allow inference of unknown protein functions. Most of all, proteomic approaches provide information about posttranslational modifications which cannot be obtained from mRNA expression profiles; these approaches have proven critical to our understanding of proper physiological protein function, translocation, and subcellular localization.
The most prominent developments within the field of proteomics to date are shown in Fig. 1. Although the first proteomic analyses were conducted 30 years ago, renewed interest in this field has been fueled by several recent advances, including the availability of public genome and protein databases, the development of database search engines capable of exploiting these databases, and the introduction of high-sensitivity, easy-to-use MS techniques. Other important recent advances include improved 2-DE, computer programs for analysis of the 2-D gel images, protocols for proteolytic digestion of proteins in excised gel pieces, and low-flow chromatography methods. Recently, in order to reduce complexity and detect low-abundance proteins, proteomics researchers have become increasingly aware of non-gel-based technologies combined with subcellular fractionation by n-dimensional chromatographies.
Major developments in the history of proteomics. Since the beginning of proteome studies in 1975, proteomics and the associated technologies have evolved dramatically, resulting in almost exponential increases in the number of resolved proteins and their identification and greatly enhancing our understanding of complex biological processes in a variety of organisms.
These advances in proteomics technologies led to the generation of unprecedentedly large amounts of proteome data, which are used in fundamental as well as applied research. Here, we review the technological and methodological advances in proteome research in terms of the E. coli proteome. Gel-based and non-gel-based approaches and predictive proteomics including 2-DE, MS, tandem mass spectrometry (MS/MS), and computational tools are reviewed. Applications of MS combined with pulldown methods to investigate the E. coli interactome are also reviewed. In addition, physiological responses to growth stage, temperature, pH, oxidative stress, and other environmental conditions revealed by proteome analysis are reviewed. Following the review on the applications of proteome studies in biotechnology, the future direction of proteomic studies is suggested. For those topics that are not covered in this paper, readers are recommended to refer to the following excellent review articles on E. coli: for phage or bacterial display, refer to reference 65; for protein microarray, refer to reference 21, and for information on the two-hybrid system, refer to reference 119.
PROGRESS IN E. COLI PROTEOMIC TECHNOLOGY
The exploration of the E. coli proteome can be divided roughly into three phases: (i) the gel-based approaches, (ii) the non-gel-based approaches, and (iii) predictive proteomics (bioinformatics tools). The gel-based and non-gel-based approaches are defined as being based on separation of complex protein mixtures in gel and non-gel matrices, respectively, whereas predictive proteomics cover functional proteomic studies performed by computational tools in silico. These approaches overlap in time, and their evolutions have resulted in an almost exponential increase in the number and quality of resolved protein spots over the past 30 years (287) as increasingly complex separations have been developed to continue forward progress. In recent years, the E. coli proteome has been used as a standard for evaluating and validating new technologies and methodologies such as sample prefractionation, protein enrichment, 2-DE, protein detection, MS, combinatorial assays with n-dimensional chromatography and MS, and image analysis (Table 1) . In comparison to the proteomes of other organisms, the E. coli proteome provides an excellent model for various research needs based on the following advantages (161): (i) the availability of public databases such as SWISS-PROT (http://www.expasy.ch/ch2d/ ) and NCBI (http://www.ncbi.nlm.nih.gov/ ), which contain rich information on the proteins and corresponding genes; (ii) the existence of the E. coli SWISS-2DPAGE maps, which are based on a great deal of biochemical and biological data; and (iii) the fact that the E. coli proteome is less complex than those of other organisms such as humans and plants, boasting smaller open reading frame (ORF) products and less protein modification. Furthermore, as summarized in Fig. 2, the basic processes and strategies for an E. coli proteomic analysis have been well defined and optimized.
General steps for proteomic analysis and tips for success. Once the project objective is set, E. coli cells are cultured and sampled for proteome profiling. During this process, protein samples can be prefractionated or labeled differentially for better comparison of the results. Proteome profiles can be obtained by gel-based and/or non-gel-based approaches. Also, predictive proteomic studies can be performed to analyze a priori the characteristics of proteins in the proteome. Gel-based approaches and non-gel-based approaches are complementary and should be combined if possible to maximize the total number of proteins detected and identified. sHsps IbpA and IbpB were from E. coli and Hsp26 was from Saccharomyces cerevisiae (96). SDS-PAGE, sodium dodecyl sulfate-polyacrylamide gel electrophoresis; AEBSF, aminoethyl benzylsufonyl fluoride or Pefabloc SC; BCA, bicinchoninic acid; delta Cn, correlation value (difference between the first hit and the second hit); DTE, dithioerythritol; DTT, dithiothreitol; iTRAQ, a multiplexed set of isobaric reagents that yield amine-derivatized peptides (iTRAQ reagents; Applied Biosystems, CA) (253); PMSF, phenylmethylsulfonyl fluoride; RSp, rank preliminary score; SELDI-TOF-MS, surface-enhanced laser desorption ionization-time of flight mass spectrometry; TCA, trichloroacetic acid; Xcorr, cross-correlation (measures how close the spectrum fits to the ideal spectrum).
Summary of proteomic technologies used to study E. colia
Gel-Based Approaches2-DE is currently the most widely used proteomic approach for analyzing the protein composition of cells, tissues, or biofluids and might even be called “classic” or “blue-collar” proteomics (316). 2-DE was first independently introduced by O'Farrell and Klose in 1975 (147, 220) and was first used for analyzing basic proteins (222). VanBogelen and colleagues (294) then pioneered the use of 2-DE for determining the protein composition of E. coli, and the technique has been intensively pursued by others since then (25, 83, 287). However, these initial studies of the E. coli proteome were limited by the fact that the complex protein mixtures were displayed only with respect to their positions on the 2-D gels and also by the lack of reproducibility among different laboratories. The later use of an immobilized pH gradient (IPG) gel instead of the carrier ampholyte method allowed researchers to apply 2-DE for easier and more-reproducible proteome analyses (25, 83). The current use of commercially available 18-cm IPG strips (pH, 3 to 10) along with high-sensitivity staining is generally able to resolve up to 1,000 to 1,500 protein spots in the case of the E. coli proteome (286). However, a large number of the protein spots are found in a 2-D gel of the E. coli proteome cluster at an isoelectric point of 4 to 7 and a molecular weight (MW) of 10 to 100 (294), representing a limitation of 2-D gel separation of unfractionated samples on IPG strips. Furthermore, despite the excellent sensitivity of MS, only the most abundant proteins from 2-D gels can be analyzed, leading to the exclusion of many low-abundance proteins.
One strategy for enhancing the capacities of 2-D gels involves parallel separation of replicate aliquots from unfractionated samples on a series of narrow-pH-range IPG gels (or zoom gels). The E. coli 3.5-10 SWISS-2DPAGE map shows 40% of the E. coli proteome (286), among which 231 proteins have been identified by techniques such as gel comparison, microsequencing, N-terminal sequencing, and amino acid composition analysis (Table 2) . In contrast, the use of narrow-range pH gradients (pH 4 to 5, 4.5 to 5.5, 5 to 6, 5.5 to 6.7, 6 to 9, and 6 to 11) was shown to potentially display proteins existing at low levels (up to a few protein molecules per cell), resulting in the discrimination of >70% of the entire E. coli proteome (Table 2; reference 287). The number of displayed proteins was higher than that identified by non-gel-based approaches, but not all of the proteins could be identified. The main benefit of using narrow-pH-range IPG strips is that the total number of protein spots per pH unit that can be separated increases due to higher spatial resolution. However, in practice this approach results in only a moderate increase in the number of proteins detected compared to that detected by use of a single broad-pH-range gel. Narrow-pH-range IPG gels show variable and unreliable separation of proteins, especially when unfractionated complex protein samples are analyzed, because proteins having pIs outside the pH range of the IPG strip usually cause massive precipitation and aggregation on the gel.
E. coli proteins identified on 2-D gelsa
As another interesting strategy for enhancing the separation capacity of 2-D gels, researchers have employed sample prefractionation methods, such as sequential extractions with increasingly stronger solubilization solutions, subcellular fractionation, selective removal of the most abundant protein components, preparative isoelectric focusing (IEF) separations, and chromatographic fractionation of sample mixtures. This strategy offers the benefits of high protein-loading capability along with the ability to discriminate two or more proteins migrating together. For example, since membrane proteins have proven difficult to solubilize with common solubilization agents such as urea, thiourea, 3-[(3-cholamidopropryl)dimethylammonio]-1-propanesulfonic acid (CHAPS), and dithiothreitol, Molloy et al. (201) introduced a new isolation method of sequential extractions with increasing concentrations of sodium carbonate in analyzing E. coli outer membrane proteins. This led to the successful identification of 21 out of 26 of the predicted integral outer membrane proteins. Similarly, Lai et al. (153) identified more than 200 E. coli membrane proteins by use of the method described by Molloy et al. (201), after modifying it to minimize nonmembrane protein contamination. The largest database of E. coli membrane proteins constructed to date is that reported by Fountoulakis and Gasser (68), who identified 394 different gene products using a method identical to that described by Molloy et al. (201). Notably, these studies demonstrate that membrane proteins, which are commonly absent from 2-D gel maps, are amenable to 2-DE separation using specific techniques.
As an alternative method, high-resolution preparative IEF separation can be combined with the use of narrow-pH-range IPG strips. Several preparative electrophoresis devices, such as Rotofor (Bio-Rad, Hercules, CA), IsoPrime (Amersham Biosciences, Uppsala, Sweden), and the ZOOM IEF fractionator (Invitrogen, Carlsbad, CA), have been developed for increasing the number of proteins separated and detecting less abundant proteins (334). For example, Herbert and Righetti (108) used a multicompartment electrolyzer (MCE) to prefractionate E. coli prior to 2-DE analysis and observed many more spots than with the standard maps available in databases such as SWISS-2DPAGE. This device appears simple, but it still contains large sample chambers (∼100 ml), which are not compatible with samples available in small quantities. Zuo and Speicher (333) prefractionated E. coli using a ZOOM IEF fractionator and found that this initial step greatly enhanced the loading ability, resolution, and detection sensitivity of their 2-D gels. This method greatly conserves proteome samples compared with direct analyses of unfractionated samples on a series of narrow-pH-range 2-D gels. Most interestingly, MicroSol IEF prefractionation is compatible with most downstream proteome-profiling methods, including 1-DE, narrow-pH-range 2-DE, 2-D difference gel electrophoresis (2-D DIGE), and liquid chromatography (LC)-MS/MS methods.
Sample fractionation by chromatography can generate hundreds of fractions for individual 2-DE analysis, allowing enrichment of low-abundance proteins. This results in better qualitative and quantitative analysis of 2-D gels. The combination of LC, 2-DE, and MS/MS has expanded the upper limits of protein visibility typically obtainable by gel-based approaches, but this method has higher costs in terms of price, labor, and time.
Recently, some researchers have focused on subcellular proteomics (or organelle proteomics), which is proteome analysis of the macromolecular architecture of a cell, e.g., subcellular compartments, organelles, macromolecular structures, and multiprotein complexes. This technique has the added benefits of reducing sample complexity, identifying additional unique proteins, localizing newly discovered proteins to specific organelles, and, in some cases, allowing functional validation (121, 281). In terms of the E. coli proteome, subcellular proteomics based on 2-DE can be used to assign various proteins to the cytosol, periplasm, inner membrane, or outer membrane by biochemical fractionation; this method was used to assemble the largest proteome database to date, as shown in Table 2 (179). Analysis of 2,160 spots revealed 575 unique ORF entries, including 151 hypothetical ORF entries, 76 proteins of completely unknown functions, and 222 proteins currently not assigned in the SWISS-PROT database. Of the 575 different entries identified, 241 (42%) were found to exist in more than 1 form, at an average of 7.5 forms per entry. These findings indicate that proteomics involving sample fractionation and 2-DE can be a valuable research technique. However, we have to choose carefully an appropriate fractionation method that prevents substantial and variable protein cross-contamination among the multiple fractions, as this severely complicates the quantitative comparison of protein profiles. A more important factor for quantitative proteome analysis is the need to control separation quality and reproducibility.
The development of improved methodologies for the detection of protein spots has formed the basis for a number of remarkable advances in 2-DE research. A number of general protein detection methods have been developed using organic dyes, silver staining, radiolabeling, reverse staining, fluorescent staining, and chemiluminescent staining. Typically, the majority of researchers have used Coomassie brilliant blue and silver staining for protein detection, but these stains have low sensitivity and narrow linearity, respectively. In case of a radiolabeling method, which is the most sensitive detection method, the potential hazards of working with radioactive material, the limited shelf life, the costs of disposal, and problems with handling mixed waste have decreased its popularity.
Fluorescent dyes provide great sensitivity and broad, linear, dynamic responses compared to their colorimetric counterparts and are compatible with modern downstream protein identification and characterization procedures, such as MS. In comparison to their colorimetric counterparts, fluorophores are easy to handle, have long shelf lives, and have minimal disposal issues. Thus, fluorescence-based protein detection has become a more common practice in recent years. For example, 2-D DIGE was first introduced by Ünlü et al. (289) in 1997 and has been further developed by GE Healthcare (Chalfont St. Giles, Bucks, United Kingdom; formerly Amersham Biosciences, Uppsala, Sweden). The basis of the technique is the use of two or three mass- and charge-matched N-hydroxy succinimidyl ester derivatives of the fluorescent cyanine dyes Cy2, Cy3, and Cy5, which possess distinct excitation and emission spectra. Each labeled sample is then mixed and run simultaneously on a single 2-D gel. However, it should be noted that the use of amino group labels will favor detection of basic proteins over acidic proteins. This technology allows two or three samples to be coseparated under identical electrophoretic conditions, reducing the number of gels required while allowing more-accurate comparative proteome profiling (100). In a case study on the E. coli proteome after benzoic acid treatment (321), 2-D DIGE was shown to produce quantitative results more accurate than those produced with conventional 2-DE. As shown in Table 2 (DIGE pH range, 4.5 to 6.5), a total of 179 differentially expressed E. coli protein spots could be identified by use of matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) and quadrupole-time of flight MS, indicating that this technique not only avoids the complications of gel-to-gel variation but also enables a more accurate and rapid analysis of differences and reduces the number of gels that need to be run. Furthermore, since the gels can be directly scanned and imaged after electrophoresis, this process reduces artifactual features, and the image has a wider dynamic range and more sensitivity than other detection methods.
Recently, researchers have sought to develop detection methods suitable for revealing posttranslational protein modifications, such as glycosylation, phosphorylation, proteolytic modification, S nitrosylation, arginine methylation, and ADP ribosylation (229). For example, the multiplexed proteomics platform allows different samples to be run on separate 2-D gels that are individually stained, thus allowing parallel determination of protein expression levels and certain functional attributes, such as levels of glycosylation or drug-binding and-metabolizing capabilities. These multiplexing techniques have facilitated the use of 2-DE to examine fundamental proteome-wide changes in protein expression and posttranslational modifications in the past few years.
Together, the gel-based methods form the core of proteomic technology and the source of most of the published work on the E. coli proteome, despite their technical shortcomings. To date, 715 E. coli proteins (336 proteins available in the current E. coli SWISS-2DPAGE database plus an additional 379 nonredundant proteins reported in the literature) have been identified on 2-D gels (Fig. 3 and Table 2), with the number of identified proteins continuously increasing. However, it is important to note that an organism will not synthesize all the proteins under a given condition; for example, alkaline phosphatase (PhoA) is not synthesized by E. coli grown in normal growth medium but is significantly induced under a phosphate-limited condition (Table 2). While a great deal of progress in elucidating the E. coli proteome has been made, it is still extremely difficult (if not impossible) to examine the whole proteome of an organism under a given condition. More importantly, 2-DE will likely remain a key technology for the detection of protein variants that undergo proteolytic processing and posttranslational modifications such as phosphorylation or glycosylation. More protein spots will be identified as advanced MS technologies such as MALDI-TOF-MS, electrospray ionization (ESI)-MS, and MS/MS are paired with functional genomic studies based on the complete genome sequence. Thus, the gel-based techniques are, and will likely remain, highly useful tools for assessing differential protein expression.
Distribution of E. coli proteins identified by gel-based and non-gel-based approaches. These figures plot the theoretical pI versus the theoretical MW (Mw) of the open reading frame products in E. coli. Shown are images of E. coli proteins identified by gel-based approaches (a) and non-gel-based approaches (b) and the virtual 2-D image of 4,237 E. coli K-12 ORF entries predicted by a predictive proteomic tool (c). Each crossbar represents a protein spot. The numbers of proteins found by gel-based and/or non-gel-based approaches and by predictive proteomic tools are compared in panel d. The total number of E. coli proteins nonredundantly identified by experiments is 1,627 (∼38% of 4,237 ORF entries). For alkaline proteins (pI, >8.0), only 253 proteins (∼19%) out of 1,356 ORF entries were identified so far. For the names and the exact locations of all these protein spots, see Fig. S1 in the supplemental material. The theoretical pI/MW ratios were calculated using the Compute pI/Mw tool (http://www.kr.expasy.org/tools/pi_tool.html ).
Non-Gel-Based ApproachesMS has been used for identifying proteins resolved by 2-DE and other methods and also for direct analysis of complex protein mixtures. MS has essentially replaced the classical technique of Edman degradation, even in traditional protein chemistry (1, 111), because it is much more sensitive, can deal with protein mixtures, and offers much higher throughput. The use of MS techniques to identify proteins in complex samples depends on the existence of large protein sequence databases generally derived from DNA-sequencing efforts. There are two main approaches for mass spectrometric protein identification. First, the peptide mass fingerprinting method, initially suggested by Henzel and coworkers (107), involves measurement of the mass spectrum of an eluted peptide mixture, which is then compared with theoretically derived peptide mass databases generated by applying specific enzymatic cleavage rules to predicted and known protein sequences. Typically, protein mixtures are first separated by use of 2-DE, and protein spots are subsequently excised from the gel (251). The proteins contained in the gel pieces are digested using a sequence-specific protease, such as trypsin, and then the resulting peptides are analyzed by MS. When MALDI is used, the samples of interest are solidified within an acidified matrix, which absorbs energy in a specific UV range and dissipates the energy thermally. This rapidly transferred energy generates a vaporized plume of matrix and thereby simultaneously ejects the analytes into the gas phase, where they acquire charge. A strong electrical field between the MALDI plate and the entrance of the MS tube forces the charged analytes to rapidly reach the entrance at different speeds based on their mass-to-charge (m/z) ratios. Because trypsin cleaves the protein backbone at the arginine and lysine residues, the masses of tryptic peptides can be predicted theoretically from protein sequence databases. These predicted peptide masses are compared with those obtained experimentally by MALDI analysis. The protein can be identified correctly if there are sufficient peptide matches with a protein in the databases, resulting in a high score. A high degree of mass accuracy is critical for the unambiguous identification and elimination of the false positives. This technique allows rapid identification of proteins when a fully decoded genome is available. A disadvantage of this approach is that it does not directly provide a sequence-based identification, which results in clustering of proteins with similar masses and necessitates additional effort for the identification.
To solve this problem, a sequence-based approach has been applied to protein identification. In this method, there are two major mass spectrometric strategies that use ESI. The unique feature of ESI is that at atmospheric pressure it allows the rapid transfer of analytes from the liquid phase to the gas phase. The spray device creates droplets, which once in the MS go through a repetitive process of solvent evaporation until the solvent disappears and charged analytes are left in the gas phase. In one strategy, the unseparated mixture of peptides is applied to a low-flow nanoelectrospray device. The peptide mixture is electrosprayed from a very fine needle into the mass spectrometer. Individual peptides from the mixture are isolated in the first step and fragmented during the second step to sequence the peptides (hence MS/MS). Peptide fragments obtained by this method are derived from the N or C terminus of the protein and are designated “b” and “y” ions, respectively (322). The other strategy uses liquid chromatography for initial separation of peptides followed by sequencing as they elute into the electrospray ion source. This method can also be used without gel electrophoresis; in this case, a mixture of proteins is digested in solution and the scrambled sets of peptides are sequenced, ideally resulting in the mixture. A great deal of data can be obtained from a single run done in an automated fashion. The fragmentation data can be used to find matches in various protein and nucleotide sequence databases, including the expressed sequence tag and raw genomic sequence databases.
The most significant breakthrough in non-gel-based approaches was the development of methods involving the combination of n-dimensional prefractionation methods (1-D or 2-D LC) with MS, as shown in Table 1. In these methods, chromatographic separations by affinity, covalent chromatography, strong anion/cation exchange, size exclusion, or the use of packed reactive dye compound or reverse-phase columns are used to reduce the complexity of digested protein mixtures, and this is followed by an MS technique such as MALDI-TOF-MS, ESI-MS, or MS/MS for high-throughput identification of the fractionated peptides. Gevaert et al. (78) identified 800 E. coli proteins from sorted methionine-containing peptides by use of a combination of technologies consisting of combined fractional diagonal chromatography (COFRADIC), LC-MS/MS, and MALDI-TOF-MS (78). More than 1,100 E. coli proteins (a quarter of those encoded in the E. coli genome) were identified by high-performance liquid chromatography (HPLC)-MS/MS analysis (49). Perhaps the most popular of these techniques to date is multidimensional protein identification technology, often referred to as MudPIT (193). In this method, mixtures of trypsin-digested peptides are loaded onto a biphasic microcapillary column containing a strong-cation-exchange resin upstream of a reverse-phase resin directly coupled to an MS/MS. Peptides are displaced from the strong-cation-exchange resin using a salt step gradient and subsequently bind to the reverse-phase resin. Elution from the reverse-phase resin is accomplished using an acetonitrile gradient, and the peptides are analyzed online by MS/MS. Repeated rounds of step and gradient elutions can result in analysis and identification of a large number of peptides in a single run. Vollmer et al. (301) used this approach for the analysis of E. coli cellular extracts originating from lactose- and glucose-grown cultures, which resulted in the identification of 305 and 450 proteins, respectively, from a single experiment within the 95% confidence level. Results with these approaches can be achieved rapidly with small amounts of cell extract, and the software can quickly and accurately analyze the mass and/or sequence data. However, because of the complexity of any given proteome and the separation limits of 1-D or 2-D LC, it is still required to reduce the complexity prior to protein separation and characterization.
An advanced instrument that combines the benefits of high mass accuracy and highly sensitive detection is the Fourier transform ion cyclotron resonance (FTICR) mass spectrometer. FTICR-MS has recently been applied to identify low-abundance compounds or proteins in complex mixtures and to resolve species of closely related m/z ratios (261). Coupled with HPLC and ESI, FTICR-MS is able to characterize single compounds (up to 500 Da) from large combinatorial chemistry libraries and to accurately detect the masses of peptides in a complex protein sample in a high-throughput mode. Jensen and colleagues identified more than 1,000 E. coli proteins using capillary IEF (CIEF) combined with FTICR-MS (126, 127).
Another strategy for monitoring differential protein expression and identifying low-abundance proteins was introduced by Weinberger et al. (309). In this approach, proteins of E. coli lysates were digested, and the resultant peptides were selectively extracted by covalent attachment of methionine residues with bromoacetyl-reactive groups tethered to the surface of glass beads packed in small reaction vessels. The recovered methionine-containing peptides were profiled using the surface-enhanced laser desorption ionization retentate chromatography-MS method. The parent proteins of the selected peptides were then identified using ProteinChip MS/MS (Ciphergen Biosystems, Inc.). Of 34 proteins identified by this method (309), at least 5 (BglX, ParD, YeaM, YfiO, and YhgF; 12% of the total) were low-abundance proteins, demonstrating that this method is capable of visualizing proteins having low expression levels. However, this method does not seem to be suitable for detecting proteins with posttranslational modifications, such as proteolytic truncation, glycosylation, and phosphorylation.
In non-gel-based approaches, it should be noted that the quantities of extracted peptides described above may not truly represent nascent protein abundance, as it is possible that the peptide extraction and liberation steps could be biased by peptide properties such as hydrophobicity. For quantitative comparison, two samples may be labeled with stable isotopes prior to sample separation, either by metabolic incorporation or through chemical derivatization. In this way, proteins derived from the different samples (e.g., normal versus abnormal or untreated versus treated samples) can be directly separated, identified, and quantified using n-D LC-MS/MS (36, 303, 305). A recently developed, attractive method for quantitative comparison of two proteomes is the isotope-coded affinity tag (ICAT) method (331). The ICAT reagent has a protein-reactive group, a biotin tag, and an ethylene glycol linker connecting the two functional groups, which can be synthesized with hydrogen (light ICAT) or deuterium (heavy ICAT). For comparison, one sample is reacted with the light reagent and the second sample is reacted with the heavy reagent under identical labeling conditions. After trypsin digestion, the extremely complex tryptic peptide mixture is simplified by affinity purification of the cysteine-containing derivatized peptides on an avidin affinity resin. The eluted peptides are then analyzed using LC-MS/MS for simpler samples or LC/LC-MS/MS for more-complex samples. The ratios of MS signals from the light and heavy ICAT-labeled forms of the same peptide are compared to determine the relative abundances of the parent protein in the respective samples, and MS/MS is used to identify the proteins. A typical ICAT-MS experiment was used to measure proteome changes in E. coli cells treated with triclosan, an inhibitor of fatty acid biosynthesis (202). The technique provided good quantitative reproducibility and on average identified more than 450 unique proteins per experiment. Furthermore, ICAT-MS identified a number of E. coli proteins that had not previously been identified on 2-DE gels. However, the method was limited in that it was strongly biased to detect acidic proteins (pI, <7), underrepresented small proteins (MW, >10), and failed to detect hydrophobic proteins. Another weakness of the current ICAT method is that it requires the proteins to contain cysteine residues flanked by appropriately spaced protease cleavage sites (102). This problem was highlighted in the study of a multisubunit membrane protein, E. coli FoF1 ATP synthase (20), in which none of the membrane-embedded proteins in the Fo complex could be visualized by ICAT. In the E. coli genome, about 10 to 15% of the proteins do not contain cysteine residues, obviating the use of a cysteine-specific technology as a total-protein indicator. This cysteine-labeling problem could be overcome by devising ICAT reagents that react with other amino acid residues. Chakraborty and Regnier (36) introduced a new isotope-labeling method as a global internal standard technology for identifying and quantifying protein changes during overexpression of β-galactosidase in E. coli. They used N-acetoxysuccinimide and N-acetoxy-[2H3]succinimide to differentially derivatize primary amino groups in peptides extracted and tryptic digested from cultures treated with 0.5 nM or 2 mM isopropyl-β-d-thiogalactopyranoside. However, these authors tested the efficacy of their strategy only with β-galactosidase; this work has not yet been extended to a large-scale proteomic analysis. In another use of the isotopic labeling method, Veenstra et al. (300) identified intact proteins from genomic databases with a combination of accurate molecular mass measurements and partial amino acid content analysis. Proteins extracted from E. coli cells grown in natural-isotopic-abundance minimal medium or minimal medium containing isotopically labeled leucine (Leu-D10) were mixed and analyzed by CIEF coupled with FTICR. The difference in the molecular masses between proteins labeled with the natural isotope or Leu-D10 was used to determine the number of Leu residues present in each protein. Information on the molecular mass and the number of Leu residues present could be used to unambiguously identify intact proteins (e.g., CspE, Mdh, and YggX).
Recently, a multiplexed protein quantitation strategy that provides relative and absolute measurements of proteins in complex mixtures was developed by Ross et al. (253). The multiplex strategy simultaneously determines the relative levels of proteins at multiple states (e.g., several experimental controls or time-course studies) for up to four samples in parallel. A multiplexed set of isobaric reagents that yield amine-derivatized peptides (iTRAQ reagents; Applied Biosystems, CA) was used for labeling at the N termini and lysine side chains of peptides in a digest mixture. The derivatized peptides are indistinguishable in MS but exhibit intense low-mass MS/MS signature ions that support quantitation. Absolute quantitation of targeted proteins can also be achieved using synthetic peptides tagged with one of the members of the multiplex reagent set. Aggarwal et al. (2) used this approach to study rhsA expression in E. coli. They were able to quantify 780 proteins, including several low-abundance proteins, such as transcription factors (DnaB and DnaG).
In addition to identifying proteins, characterizing interactions among proteins is important to understand dynamic biological processes in response to changes in cellular environment, since proteins often function as components of multisubunit complexes. Indeed, protein interactions are observed in nearly all cellular processes, and protein complexes are so ubiquitous that the biological function of an unknown protein can often be predicted from the functions of the proteins with which it is associated. Classically, ligand-binding methods, such as radioreceptor assays, were standard methods of determining protein interactions. Additionally, coimmunoprecipitation studies are commonly used to assess protein-protein interactions. High-throughput analysis of protein-protein interactions is now possible by pulldown assay coupled with MS; this method serves as an important alternative to the yeast two-hybrid system. In pulldown assays, a target is expressed in a cell (in vitro) or added to a cell lysate (in vitro), usually fused with a tag, such as glutathione S-transferase (269), polyhistidine (43, 180), or a tandem affinity purification (TAP) tag (34, 93) or its various relatives, including the sequential peptide affinity tags (34, 327) and split tag (92). The glutathione S-transferase or polyhistidine tag is immunoprecipitated, and associating proteins are then identified by immunological methods, sequencing, or MS. The TAP method was first used to purify complexes containing the acyl carrier protein (ACP) from E. coli. Besides the identification of several known partners of ACP, three proteins, including SpoT, IscS, and MukB, were found to interact with ACP. This method has recently been used to its full potential to build the interaction network of E. coli (34). The TAP procedure for isolating protein complexes makes use of site-specific recombination to introduce a dual tagging cassette into chromosomal loci. E. coli does not readily recombine exogenous linear DNA fragments into its chromosome, but the expression of the lambda general recombination system (λ-Red) markedly enhances integration. This system consists of a DNA cassette bearing a selectable marker and either the TAP or sequential peptide affinity tag into the C termini of ORF products in E. coli. A total of 857 proteins, including 198 proteins that are most highly conserved and soluble nonribosomal ones essential in at least one bacterial species, were tagged successfully. Also, 648 proteins could be purified to homogeneity, and their interacting protein partners were identified by using MS and MS/MS. This network includes many new interactions as well as interactions predicted based solely on genomic inference or limited phenotypic data. However, it is important to verify various interactions observed this way, as there may be false positives.
Taken together, many of the proteins in the E. coli proteome have been identified by using more than one method, whereas others have been uniquely identified by one particular method, indicating that these techniques are complementary to each other. More than 1,486 E. coli proteins from the two major databases (49, 78) were identified using non-gel-based approaches (Fig. 3). A total of 1,627 proteins, which correspond to more than one-third of the E. coli proteins (e.g., the ∼4,237 proteins of E. coli K-12 from the NCBI database), were identified by gel-based and non-gel-based approaches. Among them, 574 proteins were identified in common by gel-based and non-gel-based approaches. The non-gel-based approaches showed clear superiority over 2-DE methods in monitoring alkaline proteins (pI, >8.0) but still need technical improvement.
Non-gel-based analyses can be done for the samples with or without tags, which cause different problems. The former condition results in poor recovery of peptides and proteins by specific amino acid residue labeling, while the latter causes higher complexity and inaccurate quantitation. These problems can lead to the identification of proteins with low confidence (or false positives). Thus, it would be helpful to develop a multiple labeling system for a given sample, which would allow MS analyses of each tag to eliminate false positives and increase confidence. The development of search algorithms and databases with high accuracy is of continued interest and importance. They have been continuously developed and updated, as shown in Table 3. The proper assignment of the MS/MS data to the sequences in the databases would enhance the quality and quantity of data collected by non-gel-based approaches.
Useful databases for proteomic and related studies
Predictive ProteomicsAlthough the complete proteome of an organism cannot be obtained by gel-based or non-gel-based approaches, it can be predicted from the complete genome sequence. The predictive proteome of E. coli MG1655 was examined in this manner and was found to consist of 4,288 ORF products (28). The predictive proteome can be displayed like a 2-D gel, as shown in Fig. 3c, represented by the predicted isoelectric point (pI) versus the predicted molecular masses of the putative ORF products by use of the Compute pI/Mw tool in ExPASy (http://www.expasy.org/tools/pi_tool.html ) (295) or virtual 2-D databases (Table 3). Predictive data readily can be compared with the experimental data from actual proteome analyses. For example, alkaline proteins (pI, >8.0), which include 253 proteins (∼19%) out of 1,356 ORF entries identified, are currently underestimated. Recently, a more realistic virtual 2-D gel was created based on the relationship between expression-level-dependent features in codon usage and protein abundance (194). Compared with results from a real 2-D gel experiment conducted with a protein extract from exponentially growing E. coli cells, many abundant proteins identified in the real gel corresponded to abundant proteins in the virtual 2-D gel. This computational approach can help researchers to determine the appropriate 2-D gel composition for optimal separation of proteins. Thus, predictive proteomics can be used to extract valuable information on the function, topology, localization, and structure of E. coli proteins. In recent years, many bioinformatics researchers have created and developed computer-based tools and databases, as shown in Table 3. For example, protein topology prediction methods allow identification of possible membrane-bound proteins, allowing researchers to predict protein location and sometimes even function and structure. Several programs for predicting transmembrane segments exist, with prediction accuracies reportedly as high as 80% (124, 199, 330). Predicting the subcellular localization of proteins by computational method has been attracting much research interest during recent years. Computational methods for predicting protein subcellular localization can generally be divided into the following four categories based on the prediction method (58): (i) by the overall amino acid composition, (ii) by known targeting sequences, (iii) by sequence homology and/or motifs, and (iv) by a hybrid method which combines the above three elements. Several tools listed in Table 3 allow researchers to readily identify protein localizations and functions and to estimate the efficiencies of different methods, such as subcellular fractionation.
Recently, a neural network-based method was used to predict the bonding state of cysteines from the protein sequence (187), allowing researchers to predict the entire content of disulfide-rich proteins in a proteome (the so-called disulfide proteome). The formation of disulfide bonds between the paired cysteine residues is a key step in the folding process of many proteins. This method predicted the percentage of proteins with disulfide bonds (6% of 4,173 proteins) in E. coli K-12 with 86% accuracy. The percentage of proteins with disulfide bonds is higher in the extracytoplasmic compartment (18% of 405 proteins) than in the cytoplasmic space (5% of 2,796 proteins), confirming that the extracytoplasmic proteins are more likely to form disulfide bridges due to a more oxidizing environment.
In addition, predictive proteomics can identify a significant number of previously unknown candidate proteins within an organism or might reveal interesting characteristics of the organism. For example, the histograms of pI values computationally estimated for all predicted ORF products encoded by the fully sequenced genomes revealed bimodality in bacterial and archaeal genomes and trimodality in eukaryotic genomes (265). The nuclear proteins have a broader distribution that accounts for the third mode observed in eukaryotes. This distribution suggests that whole-proteome pI values correlate with subcellular localization of proteins. However, even with all the benefits of computational approaches, the probable functional relations obtained in silico must still be confirmed at least in vitro and ideally in vivo.
CURRENT STATUS OF THE E. COLI PROTEOME
The recent studies on the E. coli proteome can be classified into two main topics: proteomics for biology and proteomics for biotechnology. An enormous number of E. coli proteome studies have focused on improving our biological knowledge regarding proteins and finding members of regulons and/or stimulons under particular conditions (290, 292); these studies are referred to as “proteomics for biology.” Other groups have studied the E. coli proteome under various genetic and/or environmental perturbations in an effort to develop strategies for improving cellular properties and enhancing the production of bioproducts based on comparative proteome profiles (95); these studies are referred to as “proteomics for biotechnology.”
Proteomics for BiologyProteomics has changed the way in which cellular physiology is studied. Previously, one or more proteins were chosen as models for understanding local physiological phenomena. These days, proteomic studies allow researchers to identify large members of stimulons or regulons and to obtain information that indicates which specific proteins should be studied further. When subjected to environmental perturbations, E. coli cells undergo fundamental changes in cellular physiology and/or morphology, as reflected and directed by changes in the global gene and protein expression patterns. Up- and downregulation of specific protein sets is seen in response to a number of chemical and physical stresses, such as heat, oxidative agents, and hyperosmotic shock; these responses are thought to act as protective mechanisms leading to elimination of the stress agent and/or repair of cellular damage. Thus, the cellular responses, as reflected by the proteome, can differ widely according to the stresses imposed. Comparative proteome profiling under various genotypic and environmental conditions can reveal new regulatory circuits and the relative abundances of protein sets at the system-wide level.
In one of the first studies using proteomics, comparison of 2-D gels allowed identification of a large group of E. coli heat shock proteins (166, 208, 210, 296). In the following years, many E. coli proteomic studies revealed changes in proteome profiles in response to various stresses, such as changes in pH (24, 27, 274, 323), cell density (70, 325), and temperature (109, 291); organic solvents (321); nutrient starvation (293, 311); and anaerobic conditions (271) (Table 2). These studies resulted in the identification of various E. coli stress-induced stimulons (Fig. 4). The applied stresses were found to affect the observable proteome size by anywhere from a few proteins to nearly half of the proteins in the cell. Some of the altered proteins appear to be general stress-induced proteins, while others appear specific to particular environmental stimuli. More importantly, these studies also showed that the responses of an organism to an environmental stimulus are not simply the sum of independent responses of individual genes but rather seem to be a coordinated series of linked events leading to cross-adaptation among the stress responses.
The cascade-like regulation observed with various stimulons and/or regulons in a complex regulatory network. The circles indicate regulons, while the rectangles indicate stimulons. Stimulons in which proteins are induced by stimuli such as stationary phase, temperature shock, pH variation, oxidative stress, and starvation are shown in the respectively labeled panels. Regulons shown in large circles are accompanied by small circles which represent major regulators for the corresponding stimuli. One signal activates or represses many regulators, as shown in small circles, to control the transcription and translation of various genes, leading to complex interactions in the cell. For example, E. coli cells enter the stationary phase in response to complex stresses such as cell growth, increased cell density, the presence of byproducts or toxic substances, and inappropriate conditions (restriction of oxygen, low/high temperature and pH, and limitation of nutrients). This complex response is mediated by a variety of specific regulators in addition to the master regulator, RpoS, which is controlled by itself or by other proteins (see the text for a detailed explanation). Abbreviations: HNS, nucleoid-associated protein; IHF, integration host factor; Lrp, leucine-responsive protein; Fis, factor for inversion stimulation.
The complex and physiologically far-reaching responses of E. coli are often under the control of master regulators located at the interface between upstream signal processing and downstream regulatory mechanisms. These master regulators, which serve as the decisive information-processing units, connect complex signaling networks with the downstream regulatory cascades or networks that ultimately control expression of the response-associated genes. A regulon is a set of proteins whose synthesis is regulated by the same regulatory proteins, while a stimulon is a set of proteins whose amount or synthesis rate changes in response to a certain stimulus. At the molecular level, each stimulon may be composed of more than one regulon, each controlled by a different molecular factor. The dissection of stimulons into regulons is based on comparison of the induction patterns in wild-type cells versus those of strains having mutations in known regulatory elements. In this way, regulators such as the RNA polymerase sigma factor RpoS (192), the histone-like protein H-NS (19, 118, 158), and the leucine-responsive regulatory protein Lrp (61) have been studied based on the comparative analysis of wild-type and mutant or double mutant strains.
The highly complex and nonlinear behaviors of these networks have complicated their studies, but proteomic analysis of cells stimulated by one or more signals (stimulons) or cells lacking one or more global regulators (regulons) has provided new insight into the stimulus-response processes in E. coli. Proteomics has allowed researchers to isolate new stress-associated and stress-specific protein markers, identify all proteins controlled by a certain regulatory protein, and understand the integrated cellular metabolic and regulatory networks. Here, several main cellular responses of E. coli under different conditions are described in more detail based on proteome analyses. Moreover, a main regulator and/or its coordinated regulators in response to each stress are discussed.
Stationary-phase response.At the onset of the stationary phase, E. coli cells undergo a global modification of their protein expression pattern, leading to the acquisition of resistance to complex stresses such as increased cell density, the presence of toxic byproducts, and nutrient limitation. Overall, these properties result in better cell survival under adverse conditions. One top-level master regulator of this genetic program is an RNA polymerase sigma factor called RpoS (σS, σ38, or KatF), which is encoded by the rpoS gene (104, 155). This σS has been reported to control an E. coli regulon comprised of 70 or more genes expressed in response to starvation or during the transition to stationary phase (104). These genes were identified by transcriptional analysis of specific genes as well as by proteomic approaches (Fig. 4).
In terms of upstream signal processing, the σS regulon can be divided into subfamilies of genes regulated by specific stresses and/or additional global regulatory proteins. As shown in Fig. 4, many of these subsets of σS-dependent genes or proteins may also be induced by stresses such as anaerobiosis (12), oxidative stress (6), and osmotic stress (105). Additionally, these genes are regulated by transcription factors specific for certain stress responses (e.g., OxyR, which is involved in the oxidative stress response) or more-global regulators, such as H-NS, IHF, cyclic AMP (cAMP)-cAMP receptor protein (CRP), and Lrp (104). These regulators can individually or coordinately affect many σS-dependent stationary-phase-responsive proteins. For instance, comparative proteome analysis of an H-NS deletion mutant and the wild-type strain revealed that some of the σS-dependent proteins or genes, including rpoS itself, were controlled by the H-NS regulon (19).
In terms of downstream regulatory mechanisms, the starvation-induced DNA protection protein Dps, which is one of the σS-dependent stationary-phase-responsive proteins, was also found to affect the expression of other proteins in E. coli (5). Dps was rapidly degraded during exponential growth by the protease ClpXP (which is regulated by σ32 or RpoH) but was stabilized under conditions of carbon starvation or oxidative stress. This, along with increased Dps synthesis, results in the high-level accumulation of Dps during the stationary phase (276), showing that Dps levels are specifically controlled under certain stress conditions. In addition, studies have shown that σS itself is also controlled by the subregulatory protease ClpXP and the recognition factor RssB (332). Collectively, these findings indicate that σS is controlled by a complex signal transduction network with redundancy, additivity, and internal feedback regulatory loops, resulting in its sophisticated regulations.
Temperature response.Protein expression in E. coli can be altered significantly when cells are grown at temperatures outside the normal range. This response plays a critical role in protecting the cells from temperature stress, producing tolerance, or repairing cellular systems. The E. coli cellular response to high temperature includes the synthesis of a set of highly conserved proteins known as the heat shock proteins (249). Similarly, a separate, nonoverlapping group of proteins known as cold shock proteins are produced during the period of growth cessation following a shift from 37°C to 10°C (283).
Many of the heat shock proteins are molecular chaperones that function to bind newly synthesized partially folded or unfolded proteins and promote their folding and refolding by limiting the nonproductive interactions that lead to aggregation and misfolding. Some of the other heat shock proteins are proteases that function to degrade misfolded or abnormal proteins (209). These proteins were first recognized as being highly abundant when examined with 2-D gels in 1978 (166). Later, Neidhardt and colleagues (208, 210, 211, 296) used 2-D gels to monitor the synthesis rates of individual proteins before and after heat shock and identified a number of proteins whose synthesis rates were dramatically increased following the temperature increase. Initially, these proteins were named by their positions on 2-D gels. Later, many of the spots were identified as known gene products, including DnaK (77), GroEL (211), GroES (284), GrpE (335), La (lon) protease (237), and LysU (168). For E. coli, at least 34 heat shock proteins have been identified to date by use of a combination of genomics, transcriptional analysis of specific genes, and proteomics (Fig. 4). The characterized proteins include the main cellular chaperones, DnaK and GroELS; the ATP-dependent proteases ClpP, DegP, FtsH (FhlB), HslVU, and La; and other proteins involved in protein folding, refolding, quality control, and degradation (249). Other important heat shock proteins include HTS (homoserine transsuccinylase), which is a key enzyme in methionine biosynthesis (23); protein pairs involved in protein isomerization, such as HtrM (240) and PpiD (55); and the vegetative sigma factor σ70 (33, 91, 282).
The synthesis of the major heat shock protein regulon is controlled by the alternative sigma factor σ32 (encoded by the rpoH gene), which guides RNA polymerase to the heat shock promoters (91). In addition, E. coli contains a second heat-responsive regulon, which is controlled by an alternative sigma factor, σE (encoded by the rpoE gene) (91). It is thus possible that the heat-mediated induction of some genes may occur via other mechanisms and regulators that remain to be elucidated. For example, members of the phage shock protein (Psp) family, which are induced in response to filamentous phage infection as well as in response to heat, ethanol, and osmotic shock, do not require the action of σ32 (32). Furthermore, a large set of heat shock proteins was found to be induced by other stimuli, such as exposure to denaturing conditions (i.e., the presence of alcohols or of heavy metals) (91). The proteins induced under various stress conditions can overlap with one another to degrees ranging from complete overlap to no overlap at all. For example, in E. coli, the heat shock and ethanol stimulons overlap, while the heat shock and cold shock responses have no shared proteins (133).
Additional information on the heat shock response has been obtained by examination of subproteomes. For example, proteins damaged or unfolded by elevated temperatures during heat shock tend to aggregate (198); thus, a proteomic study of the aggregates can be used to define the thermally unstable proteins. This study is also important for the elucidation of cellular protein quality control mechanisms, because damaged proteins can be refolded with the aid of chaperones or can be degraded by proteases (84). An example of one such study is the investigation of E. coli aggregates at various temperatures, which contained 350 to 400 protein species that were all classifiable as substrates of the ClpB and DnaK chaperones (285). Another proteomic study on the DnaKJ- and ribosome-associated trigger factor mutant strains revealed approximately 340 spots of aggregated proteins in two mutant strains. All major aggregated proteins were shared between the two mutants, indicating that they cooperatively assist the folding of newly synthesized proteins in E. coli (57). A similar study indicated that the major cytosolic energy-dependent proteases are involved in preventing aggregation, because protein aggregates formed in their absence showed increased concentrations and contained more protein species (250). These studies suggest that it may be possible to use proteomic analysis of aggregates to identify specific substrates for the various chaperones and proteases.
One major advantage of using 2-D gels to analyze the heat shock proteome is the ability to discriminate posttranslationally modified proteins from their nonmodified forms. The major chaperones, DnaK and GroEL, appear as multiple spots on a narrow range of 2-D gels (pH 4.5 to 5.5), indicating that they may be present in several forms within the E. coli cell (Table 2). In addition, MALDI-TOF peptide mass fingerprinting allowed identification of E. coli ribosomal proteins with posttranslational modifications such as acetylation, methylation, β-methylthiolation, multiple methylations, and amino acid cleavage (11). These findings provide important insights into the regulatory mechanisms of heat shock response by protein modification.
Proteome analysis of E. coli cells adapting to low temperatures has also been carried out. In E. coli, a downshift in temperature causes transient inhibition of most protein synthesis, resulting in a growth lag called the acclimation phase. During the acclimation phase, a group of cold shock proteins is dramatically induced (Fig. 4), while the heat shock proteins remain repressed (133). A single regulatory factor for the cold shock response has not been identified, but some cold-induced proteins are essential for the cells to resume growth at low temperature and have been shown to function in various cellular processes. CspA, CspB, and CspG, which belong to a family of structurally related cold shock proteins, show the highest induction in response to cold shock (207). CspA has long been suspected to play a regulatory role in this response by destabilizing mRNA secondary structures, allowing more-efficient translation at low temperatures (81). The CspA mRNA appears to become more stable during a shift to temperatures below 30°C. In addition members of to the Csp family, 12 out of 16 cold shock proteins in E. coli have been further identified by 2-DE (283). As shown in Fig. 4, these include the following proteins: GyrA, the a-subunit of the topoisomerase DNA gyrase (132, 134); H-NS, a nucleoid-associated DNA-binding protein required for optimal growth at low temperature (157); Hsc66, a homolog of Hsp70 (165); NusA, which is involved in both termination and antitermination of transcription (132); PNP, which is an exoribonuclease (132); and RecA, which is involved in recombination and induction of the cold shock response (283). In addition, three cold shock proteins have been shown to be ribosome associated: initiation factor 2 (IF2); CsdA, which is an RNA-unwinding protein (132); and ribosome-binding factor A (RbfA) (130, 132).
The latter group is interesting in that cells experiencing the transition to a low temperature showed accumulation of 70S monosomes, decreased the number of polysomes, and stabilized RNA and DNA secondary structures (283). This transition decreases efficient mRNA translation and leads to a transient reduction in cell growth rate (131). Both CsdA and RbfA are required for optimal growth at low temperatures, indicating that these two newly produced ribosome-associated proteins, along with enhanced synthesis of IF2, are required for efficient ribosomal function at low temperatures. Comparative 2-DE proteomic analyses of E. coli cultures treated with various antibiotics or with temperature shock (291) showed that the heat and cold shock responses could be mimicked by different sets of ribosome-targeting antibiotics. For example, streptomycin, puromycin, and kanamycin induced protein expression patterns that resemble the heat shock and stringent responses, while tetracycline, chloramphenicol, erythromycin, fusidic acid, and spiramycin invoked the cold shock and relaxed ribosomal responses (291). Based on these findings, researchers have suggested that translational blocks induce heat shock-like or cold shock-like responses, indicating that the state of the ribosome or a ribosomal product may signal these responses.
A recent study on the E. coli cells experiencing a temperature shift from 37°C to 4°C using improved 2-DE methods showed that 69 proteins were overexpressed and that the total number of proteins decreased by 40% (233). Further 2-DE studies on cold shock responses were carried out under several different conditions, including suspended cells at the exponential phase, suspended cells at the stationary phase, and cells immobilized on 2% (wt/vol) agar. Comparative analysis of the proteomes obtained with and without cold shock allowed identification of 203 protein spots showing expression changes during the temperature downshift (between 10 °C and 4°C or when 4°C was reached) compared with synthesis at 37°C using a principal component analysis (234). In suspended E. coli cells, the synthesis levels of 91 proteins (71 newly synthesized proteins and 26 induced proteins) were altered at the exponential phase after cold shock, and the synthesis levels of 59 proteins (34 newly synthesized proteins and 25 induced proteins) were changed at the stationary phase after cold shock. In immobilized E. coli cells, the synthesis of 53 proteins (33 newly synthesized proteins and 20 induced proteins) was induced by cold shock. These results suggest that the number of cold shock-induced proteins was originally underestimated and that further proteomics work will likely uncover a large cohort of cold shock response proteins.
pH response.Since E. coli requires homeostasis of internal pH in the range from 7.4 to 7.9, the pH response is important for cellular growth and survival under conditions of fluctuating pH levels. In addition, pH plays an important role in pathogenic bacteria. Pathogenic strains of E. coli, Salmonella enterica serovar Typhimurium, Shigella flexneri, and Vibrio cholerae often encounter extreme pH conditions both within and outside human hosts. During pathogenesis, cells are exposed to low pH in the stomach or within the phagosomes and phagolysosomes of intestinal epithelial cells or macrophages. Consequently, low pH induces virulence factors that contribute to pathogenesis, such as the virulence regulator ToxR in V. cholerae (268). Perturbations in pH can exert many significant effects on cell growth and induce different classes of proteins, such as stress proteins, redox modulators, and envelope proteins (223). The major pH-responsive proteins that have been identified by genomics, proteomics, and other technologies (268) are shown in Fig. 4. Among them, the acid stress chaperones HdeA and HdeB enhance survival under extremely acidic conditions (10, 72), while the membrane-bound Na+/H+ antiporter NhaA protects cells from excess Na+ under high-external-pH conditions (141). Proteome analyses have allowed identification of a number of other pH-responsive proteins in E. coli. Initially, 2-D gels were used to elucidate individually the cellular protein responses to changes in pH (294), to aerobiosis, and to anaerobiosis (270, 271). The pH-dependent response is often coinduced by other environmental factors, such as growth phase, anaerobiosis, and various metabolites. Thus, most of the proteomic studies on pH responses have been performed under specific aerobic and/or anaerobic conditions, allowing identification of new classes of acid- and base-dependent regulators and dissection of the relationship between pH and oxygen levels. For example, Blankenhorn et al. (27) identified a total of nine pH-responsive proteins from 18 spots during aerobic or anaerobic growth: five acid-induced proteins (GatY, ManX, PtsH, YfiD, and the aerobic acid-induced protein AhpC); three base-induced proteins (MalE, TnaA, and the anaerobic base-induced protein GadA); and one aerobic acid- or base-induced protein (AceA). Stancik et al. (274) identified a total of 22 pH-dependent proteins from 44 spots during aerobic or anaerobic growth: 9 acid-induced proteins (DeaD, LuxS, RibB, SodB, SucB, SucC, YcdO, YdiY, and YfiD); 11 base-induced proteins (AroK, CysK, DksA, DsbA, MalE, OmpA, Pta, RpiA, Ssb, TnaA, and YceI); and 2 acid- or base-induced proteins (OmpX and Tpx). Furthermore, Yohannes et al. (323) identified a total of 28 pH-dependent proteins from 32 spots in anaerobic cultures: 11 acid-induced proteins (AckA, GatY, GuaB, HdeA, Hmp, Lpd, NikA, OmpA, Ppa, TolC, and Tsf) and 17 base-induced proteins (AccB, DegQ, DhaK [YcgT], DhaL [YcgS], DhaM [YcgC], DsbA, GapA, HisC, HisD, MalB, MglB, OppA, ProX, Tig, TnaA, UspA, and YjgF). These findings indicate that low pH accelerates acid consumption and proton export while coinducing the oxidative stress and heat shock regulons. In contrast, high pH accelerates proton import while repressing the energy-expensive flagellar and chemotaxis regulons. Furthermore, pH differentially regulates a large number of periplasmic and envelope proteins as well as enzymes involved in several pH-dependent amino acid and carbohydrate catabolism pathways. High pH was shown to favor the catabolic pathways that generate NH3 and fermentation acids (AstD, CysK, DhaKLM, GabT, GadAB, GapA, SdaA, and TnaA), whereas low pH favored pathways that generate CO2 and amines (Adi, CadA, GadAB, and SpeF). Among these, Adi, AstD, CadA, CysK, GabT, SpeF, and TnaA were also significantly induced by anaerobiosis. Recently, researchers have used enrichment by column chromatography on reactive dye columns for the proteomic investigation of low-abundance acid-responsive proteins in E. coli grown at either pH 7.0 or pH 5.8 (24). This work allowed identification of new pH-responsive proteins: six acid-induced proteins (EF-Ts, GdhA, PanC, ProC, TkrA, and YodA) and five acid-repressed proteins (AroG, EF-Tu, FabI, GlyA, and PurA).
During the growth of E. coli, the external pH can be substantially changed by fermentative generation of acids or through aerobic consumption of acids. External acids which show an amplified uptake in response to increased pH gradients, such as acetic and formic acids, have been shown to induce heat shock proteins, oxidative stress proteins, and the RpoS regulon (146, 256). Several studies using proteomic approaches have revealed that benzoate induced heat shock and universal stress proteins (154), while propionate induced pH-responsive proteins such as AhpC, GatY, ManX, and YfiD (27). Treatment of E. coli cells with acetic acid increased the expression levels of 37 proteins, including periplasmic transporters for amino acids and peptides (ArtI, FliY, OppA, and ProX), metabolic enzymes (GatY and YfiD), the RpoS growth phase regulon, and the autoinducer synthesis protein LuxS (146). In contrast, acetic acid repressed 17 proteins, including phosphotransferase (Pta) (146). Similarly, an ackA-pta deletion, which abrogated the interconversion between acetate and acetyl-coenzyme A (CoA), led to elevated basal levels of 16 of the acetate-inducible proteins, including the RpoS regulon. Consistent with RpoS activation, the ackA-pta strain also showed constitutive extreme-acid resistance (146). On the other hand, treatment of E. coli cells with formic acid repressed 10 of the acetate-inducible proteins, including the RpoS regulon (146). Acetic and formic acids appear to exert opposite effects on proteins such as arginine-binding periplasmic protein 1 (ArtI), DNA-protecting protein during starvation (Dps), cysteine-binding periplasmic protein (FliY), tagatose-bisphosphate aldolase (GatY), extreme-acid periplasmic chaperone (HdeA and HdeB), hyperosmotically inducible protein Y (OsmY), and 6-phosphofructokinase isozyme 2 (PfkB). Membrane-permeable acids also induce the Mar multiple drug resistance regulon, which is coregulated by the SoxRS superoxide stress system (252). Several genetic systems are coregulated by pH and growth phase; for example, the RpoS growth phase sigma factor regulates several components of resistance to both acids and bases. Thus, the effects of pH on global cellular regulation are complex because they overlap with other environmental factors such as oxygenation, growth phase, and various metabolites. Current proteomic studies continue in an effort to dissect the relationships among the effects of pH, oxygen level, and osmolarity from combinatorial stimuli.
Oxidative stress response.Reactive oxygen species (ROS) are produced as an inescapable consequence of aerobic life and are maintained at low, tolerable levels within cells by the actions of specific enzymes, such as superoxide dismutase (SodA). The expression levels of these defense enzymes are modulated in response to the environmental oxidative threat. However, this basic protection is not sufficient to protect cells against sudden large increases in ROS, which can act negatively on important cellular materials, including lipids, proteins, certain enzyme prosthetic groups, and DNA (183). To cope with oxidative stress, E. coli cells trigger rapid global responses designed to eliminate ROS, repair oxidative damage, bypass damaged functions, and induce adapted metabolism, thus allowing the cells to persist under high-ROS conditions. In E. coli, the SoxRS and OxyRS regulatory systems are known to control many of the oxidative stress-responsive proteins.
The soxRS regulon is induced in a two-stage process. Upon activation, SoxR induces expression of the soxS gene in response to superoxide-generating agents, and then SoxS activates transcription of genes within the regulon. About 40 E. coli proteins are induced, including the following: superoxide dismutase (SodA), which might associate with DNA to provide special protection from superoxide damage (275); endonuclease IV (Nfo), which is involved in DNA repair (40, 169); glucose-6-phosphate dehydrogenase (Zwf), whose increase is expected to elevate the pool of NADPH (254); fumarase (FumC) (174); aconitase (AcnA) (73); and NADPH ferredoxin:oxidoreductase (Fpr), which may serve to maintain FeS groups in the reduced form (173). The E. coli acn mutants were shown to be hypersensitive to the redox stress reagents H2O2 and methyl viologen (278). Physiological and enzymological studies have shown that AcnB is a major citric acid cycle enzyme synthesized during the exponential phase, whereas AcnA is a more stable stationary-phase enzyme, which is also specifically induced by iron and oxidative stress (53). Proteomic analyses have further revealed that the level of SodA is enhanced in acnB and acnAB mutants and by exposure to methyl viologen (278). The amounts of other proteins, including thioredoxin reductase, 2-oxoglutarate dehydrogenase, succinyl-CoA synthetase, and chaperones, were also affected in the acn mutants. These studies demonstrated that AcnA enhances the stability of the sodA transcript, whereas AcnB lowers its stability. Thus, aconitases serve as a protective buffer against the basal level of oxidative stress that accompanies aerobic growth by acting as a sink for ROS and modulating translation of the sodA transcript.
Similarly, the OxyRS regulon is also induced in a two-stage process. Exposure of cells to H2O2 in a range from 5 to 200 μM activates OxyR and enhances the synthesis of ∼40 proteins (183), including HPI catalase (KatG) (279), an NADPH-dependent alkyl hydroperoxide reductase (AhpCF) (279), glutathione reductase (GorA) (183), and Dps, which nonspecifically binds DNA to protect cells from H2O2 toxicity (6). In an oxyR-deleted mutant strain, 20 to 30 enzymes were found to remain H2O2 inducible (86). Some of these enzymes are also elevated during other stress responses, including exposure to redox cycling agents and heat shock.
These enzyme responses to oxidative stress are underpinned by metabolites or proteins such as NADPH, NADH, thioredoxin, and glutathione, which remove harmful oxygen species by stoichiometric reactions. In particular, thioredoxin, a ubiquitous and evolutionarily conserved protein, modulates the structure and activity of proteins involved in a spectrum of processes, such as gene expression and the oxidative stress response (183). A comprehensive analysis of the thioredoxin-linked E. coli proteome was performed by using tandem affinity purification and nanospray microcapillary MS/MS (151). A total of 80 thioredoxin-associated proteins were identified, and their various functions suggest that thioredoxin is involved in at least 26 distinct cellular processes, including transcription regulation, cell division, energy transduction, and several biosynthetic pathways. These thioredoxin-associated proteins either participate directly (AhpC, KacG, and SodA) or have key regulatory functions (AcnB and Fur) in cellular detoxification. Transcription factors including NusG, OmpR, and RcsB, which are generally not considered to be under redox control, were also associated with thioredoxin, providing compelling evidence for an extensively coupled network of redox regulation of E. coli.
E. coli cells treated with nontoxic levels of the superoxide-generating redox cycling agents menadione and paraquat showed dramatic changes in protein composition as monitored by 2-D gel analysis (87). The distribution of proteins synthesized after treatment with these agents overlapped significantly with that seen after H2O2 treatment. In addition, the redox cycling agents elicited the synthesis of at least 33 other proteins that were not H2O2 responsive. These include three heat shock proteins (C41.7, C62.5, and GroES), the Mn-containing superoxide dismutase (SodA), the DNA repair protein endonuclease IV (Nfo), and glucose-6-phosphate dehydrogenase (Zwf). At least some of these redox inducible proteins appear to be part of a specific response to intracellular superoxide, indicating that E. coli cells are equipped with a network of inducible responses against oxidative damage which are controlled via multiple regulatory pathways.
Starvation response.The complex response of E. coli to nutrient starvation includes the sequential synthesis of starvation-inducible proteins. Although starvation for different individual nutrients generally provokes unique and individual patterns of protein expression, there are some overlaps among the starvation stimulons. Proteome analyses revealed that the subset of proteins involved in protein synthesis in E. coli was greatly increased during growth inhibition caused by depletion of various nutrients, such as carbon, nitrogen, phosphate, sulfate, and amino acids. For example, SspA expression increased with decreasing growth rate and was induced by glucose, nitrogen, phosphate, or amino acid starvation. Furthermore, the proteome profiles during the exponential growth phase showed that the expression levels of at least 11 proteins were altered in sspA mutant strains (314). These findings indicate that SspA acts as a transcription factor and is essential for starvation stress-induced tolerance (e.g., stationary phase) in E. coli.
At the onset of glucose starvation, cyclic AMP and its receptor protein (cAMP-CRP) were found to play important roles in the expression of a number of genes. An early 2-DE study identified five glucose-responsive outer membrane proteins (four upregulated and one downregulated) (186). A comparison with membrane proteins from mutant strains revealed that two of the upregulated proteins were the receptors for λ and T6, and coelectrophoresis of the outer membrane fraction identified the downregulated protein as OmpA. The glucose starvation stimulon was further examined using 2-DE followed by comparison to the E. coli gene-protein database (218). Members of this stimulon were found to include enzymes of the Embden-Meyerhof-Parnas pathway, phosphotransacetylase (Pta) and acetate kinase (AckA) in the acetic acid pathway, and formate transacetylase. Trichloroacetic acid cycle enzymes were repressed, whereas enzymes involved in acetate and formate production and the Embden-Meyerhof-Parnas pathway were induced. These modulations suggest that a glucose-starved cell increases the relative flow of carbon through the Pta-AckA pathway. Indeed, pta and pta-ackA mutants were found to be impaired in their abilities to survive glucose starvation, indicating that the capacity to synthesize acetyl phosphate, an intermediate of this pathway, is indispensable for glucose-starved cells. The pta mutant failed to induce several proteins of the glucose starvation stimulon. More recently, proteome studies revealed that glucose limitation upregulates the levels of proteins such as AceA, AldA, ArgT, AtpA, DppA, GatY, LivJ, MalE, MglB, RbsB, UgpB, and YdcS (311). Of these, ArgT, DppA, LivJ, MalE, MglB, RbsB, UgpB, and YdcS are periplasmic binding proteins of the ABC transporters, suggesting that in addition to the central metabolism proteins, periplasmic binding proteins are involved in the carbohydrate and amino acid uptakes that are important during glucose limitation.
Inorganic phosphate is the preferred source of phosphorus for E. coli. Phosphonates are commonly found in nature and can serve as an alternate phosphorus source when inorganic phosphate is depleted, but this causes a significant decrease in growth rate. Researchers have used 2-DE to examine the effects of inorganic phosphate limitation and the use of phosphonates as the sole phosphorus source (293). Depletion of inorganic phosphate was shown to induce the expression of 208 proteins and reduce the levels of 205 proteins, whereas growth on phosphonate induced 227 proteins and reduced the levels of 30. Comparison of these stimulons revealed that 118 of the induced proteins and 19 of the proteins with reduced levels were shared, suggesting that these may be involved in the adaptive response to phosphate limitation. The large number of downregulated proteins (205 proteins) involved in inorganic phosphate starvation compared with the number involved in growth on phosphonate indicates that the starvation response is more strongly characterized by repression.
In E. coli, sulfur limitation leads to derepression of the cysteine regulon (cysB, cysE, cysDNC, cysJIH, cysK cysM, cysPTWA, and sbp) and subsequent upregulation of cysteine biosynthesis (149). Maximal expression of the cys genes is seen during growth in limiting sulfur sources such as glutathione or l-djenkolic acid. On the other hand, growth in sulfate, sulfite, or thiosulfate leads to partial repression of these genes, and growth with sulfide, l-cysteine, and l-cystine leads to full repression (149). Comparative proteomic analyses have revealed that several proteins are induced in E. coli grown in media offering compounds other than sulfate or cysteine as the sole sulfur source (239, 299). Wild-type E. coli cells showed upregulation of sulfate starvation-induced proteins, such as CysK, Sbp, Ssi4, Ssi6, TauA, and TauD, during growth with lanthionine or glutathione as the sulfur source (Fig. 4). These sulfate-starvation-induced proteins were significantly reduced or wholly absent in cbl mutants (299), indicating that the cbl gene product, a transcription factor governing the genes required for sulfonate-sulfur utilization, is required for the synthesis of sulfate-regulated proteins. Interestingly, although the cbl mutant grew on sulfate, it lacked production of CysK and Sbp, which are involved in the sulfate assimilation pathway. The ability of the cbl mutant to assimilate sulfate may be explained by the fact that E. coli contains CysM and CysP, which act as functional backups for CysK and Sbp, respectively (149). Additional sulfate starvation-induced proteins include the products of the tauABCD genes, which are required for utilization of taurine as the sulfur source for growth (298). These findings indicate that most of the genes involved are coordinately regulated as the cysteine regulon, and high-level expression of these genes requires sulfur limitation and transcriptional regulator(s) CysB and/or Cbl.
The leucine-responsive regulatory protein (Lrp) has been shown to both positively and negatively regulate transcription of a number of genes in response to exogenous leucine (215); Lrp action is significantly activated by the absence of l-leucine in the growth medium, whereas it is repressed in the presence of l-leucine. On the other hand, exogenous leucine has little or no effect on the expression of some other Lrp-responsive proteins, such as glutamine synthetase (GlnA) and glutamate synthase (GltD) (61, 215). The total number of genes in the l-leucine/Lrp regulon was estimated to be between 35 and 75. The lower estimate comes from a comparison of 2-D gels from extracts of wild-type and lrp mutant strains grown with and without leucine. Some 30 proteins were clearly affected up or down by the absence of Lrp (61, 66, 172). The higher estimate is from a study of random λ placMu insertions in the E. coli genome with subsequent screening for Lrp-responsive proteins affected by l-leucine (171). Among the well-known proteins that are regulated by Lrp (Fig. 4) are upregulated proteins including DaaABCDE, FanABC, FimB, GcvTHP, GltBDF, IlvIH, LacZYA, LeuABCD, MalEFG, MalK-LamB-MalM, MalT, OmpF, PapBA, PapI, PntAB, SdaC, SerA, and SfaA and downregulated proteins including Fae, GlyA, Kbl-Tdh, LivJ, LivKHMG, Lrp, LysU, OmpC, OppABCDF, OsmY, and SdaA. These findings collectively indicate that the expression of many genes required for the transport and catabolism of amino acids and peptides is negatively regulated by Lrp, while the expression of genes required for amino acid biosynthesis and ammonia assimilation in a nitrogen-poor environment is positively regulated by Lrp (215).
Finally, the stringent response is a general starvation response mediated by guanosine 3′,5′-bispyrophosphate (ppGpp). RelA and SpoT strictly regulate the levels of ppGpp during growth-favorable conditions (263), while starvation increases the levels of ppGpp, leading to an abrupt decrease in rRNA and tRNA transcription and blockade of purine biosynthesis. Early studies showed that starvation and subsequent increases of ppGpp decreased the fidelity of protein translation (221). Later mutant studies suggested that the stringent response reduces the concentration of mistranslated proteins, which is critical for survival (185, 218). High ppGpp levels also increase the stationary-phase regulator RpoS (σS), accelerate protein degradation, and impair initiation of DNA replication (104). In contrast, depletion of ppGpp induces the so-called “relaxed response,” where transcription and translation factor synthesis remains high despite a growth lag. RpoS is involved in the signaling of many cell responses, including starvation, multiple stress responses, and inhibition of glycogen and trehalose synthesis (155, 192, 205). Induction of the stationary phase in response to starvation is also dependent on the ClpAP protease, which plays a key role in the degradation of growth phase proteins (308).
Other environmental responses.Cadmium is used in a variety of industrial applications and is a potential source of environmental contamination. Cadmium is readily taken up by bacterial cells, presumably by the Mn2+ uptake system, and can seriously damage the cell via its activity as a potent oxidative agent (297) and inhibitor of DNA replication (196, 219). At low cadmium concentrations, cells are able to adapt and resume growth after a period of stasis. This period appears to involve the repair of cadmium-mediated cellular damage and adjustment of the cell physiology to limit the distribution of the toxic ion in the cell (196). During cadmium-induced growth arrest, E. coli cells increase the synthesis of the cadmium-induced proteins (CDPs), which form the cadmium stress stimulon. Most of the CDPs are of unknown function, and only limited information as to the identities of the specific sensors or signals responsible for triggering the synthesis of these proteins is available (297). Some CDPs are members of well-characterized stress regulons (297). Only a limited number of proteins in these regulons are induced during cadmium exposure, and the synthesis of these CDPs constitutes a minor fraction of the overall cellular response (297). The CDPs identified by 2-DE include Adk, ArgI, ClpB, DnaK, H-NS, HtpG, MaoA, MetK, RecA, Tig, TyrA, UspA, W-protein, XthA, the cold shock protein G041.2, and five unknown proteins (64). Some CDPs were found to be induced by the heat shock, oxidation stress, SOS, and stringent response regulons, while others appeared to be general stress-inducible proteins (e.g., H-NS, UspA). The synthesis rates of most of the immediate responders to cadmium exposure decreased when cell growth resumed, but seven CDPs, including ArgI, TyrA, and XthA, were found to maintain a high production rate during growth in the presence of cadmium (64). This type of E. coli response to cadmium may be employed to monitor cadmium contamination in the environment.
The effects of low concentrations of monochlorophenol, pentachlorophenol, and cadmium chloride as industrial pollutants on total cellular proteins in E. coli have been studied using 2-DE (62). Induction of previously identified stress-responsive proteins was noted, as were transient decreases in the synthesis rates of several other proteins, including OmpF and aspartate transcarbamoylase (ATCase). Their transient repression appears to be an overall response to stress elicited by different pollutants and may prove useful as a general and sensitive early warning system for pollutant stress.
Proteomics for BiotechnologyResearchers have found engineered E. coli to be of enormous value for both scientific and practical applications. To enhance the production of bioproducts and improve the performance of E. coli strains in various biotechnological processes, native or foreign genes have been amplified or deleted through recombinant DNA technology. These efforts initially involved trial-and-error approaches, in which various genetic modifications are repeatedly tried until a desired objective is achieved. However, since bioproducts are formed by coordinated enzyme functions acting through the metabolic pathways, it is essential to understand the metabolism and regulation that occur during cell growth and product formation. Recently, these investigations have been streamlined with the use of new high-throughput analytical, molecular biological, and mathematical tools, all of which have been combined to facilitate the development of “custom-made” production systems in E. coli. In this important context, proteome analysis enables measurement of whole-protein (enzyme) expression levels, facilitating the construction of metabolic pathways that researchers can use to elucidate which molecules supply the energy and building blocks or precursors (e.g., amino acids and other metabolic intermediates) necessary for cell function and product formation.
As described by VanBogelen et al. (292), several E. coli proteomic signatures can be used to monitor cellular states. First, the L7 (modified form)-to-L12 (unmodified form) ratio of ribosomal protein RplL, which is highly correlated with the growth rate, can serve as specific biospectrophotometry marker for monitoring cell growth. Second, some heat and cold shock proteins, which are increased at the temperature extremes, can be used as cellular thermometers. Third, the RecA protein can be used as an initial indicator of loss of chromosome function. Furthermore, conditional promoters activated by environmental changes such as stationary phase, pH, temperature, and nutrient limitation may be used for efficient production of heterologous proteins in bacteria and also for developing strains for bioremediation purposes (190). Indeed, the heat-inducible and inorganic phosphorus-responsive promoters have been widely used in numerous laboratories (3, 37, 38, 115, 139). In addition, the genes encoding proteins that confer tolerances to acid, heat, and toxic substances have been successfully used for the improvement of cellular properties and enhanced degradation of toxic chemicals. Survival under extremely acidic conditions may be associated with the viability of pathogenic bacteria in the stomach (195), so an improved understanding of pH sensors in virulence may lead to the development of therapeutic strategies targeting these functions.
Proteomic analysis has been used to directly monitor cellular changes occurring during the production of heterologous proteins in E. coli and develop efficient strains for the enhanced production of bioproducts (3, 37, 38, 97, 98, 114, 115, 116, 135, 138, 160, 162, 235, 306) and biodegradable polymers (99, 139). Furthermore, many of these proteomic studies have been performed in large-scale processes employing E. coli and recombinant E. coli for industrial applications (3, 37, 38, 97, 114, 115, 116, 135, 138, 139, 145, 241, 245, 325). In addition, proteomic studies for analyzing the composition of inclusion bodies (IBs) (98, 101, 135, 138, 246, 247) have been carried out in order to improve the quality (or uniformity) of the desired product and the downstream process of recombinant proteins such as protein purification and refolding. Unfortunately, most of the results from proteome analysis cannot be clearly compared, since they differ in terms of growth conditions, strains and genotypes, target products, sampling times, and bioprocesses.
Among these studies, those that led to the enhanced production of recombinant proteins, including IBs and secretory proteins, and improved industrial processes are described below. Jordan and Harcum (135) analyzed the proteome profiles of soluble and insoluble IB fractions to detect and characterize proteases upregulated during the production of Axokine in recombinant E. coli cells. Exposure to EDTA reduced protease activity, indicating that Axokine degradation was likely mediated by metalloproteases. In addition, two small heat shock proteins (sHsps), IbpA and IbpB, were first identified by the conventional biochemical technique as the major proteins associated with the IBs of recombinant proteins produced in E. coli (4). Furthermore, IbpA and IbpB were recently first demonstrated to facilitate the production of recombinant proteins in E. coli and play important roles in protecting recombinant proteins from degradation by cytoplasmic proteases (98). Amplification of the ibpA and/or ibpB genes enhanced production of recombinant proteins as IBs, whereas ibpAB gene knockout enhanced the secretory production of recombinant proteins as soluble forms (98). More recently, LeThanh et al. (167) reported results similar to those of Han et al. (98), i.e., that α-glucosidase production was enhanced at elevated IbpA and IbpB levels and reduced in an ibpAB-negative mutant strain in a temperature-dependent manner. Also, it was revealed that IbpA and IbpB prevent IBs of α-glucosidase from degradation in a temperature-dependent manner. These findings suggest that manipulation of ibpAB gene expression may prove to be a valuable new technique for fine-tuning the production of recombinant proteins in E. coli. In addition, these results demonstrate the effectiveness of employing proteome profiling in the development of production strains suitable for industrial applications.
The use of sHsps has recently been extended for significantly enhancing the performance of 2-DE (96). Proteolytic degradation is one of the critical problems in 2-DE. Loss of protein spots in 2-D gels due to residual protease activity is commonly observed when using immobilized pH gradient gels for isoelectric focusing. Three sHsps, IbpA and IbpB from E. coli and Hsp26 from Saccharomyces cerevisiae, were found to be able to protect proteins in vitro from proteolytic degradation. The addition of sHsps during 2-DE of human serum or whole-cell extracts of bacteria (E. coli, Mannheimia succinciproducens), plant Arabidopsis thaliana, and human kidney cells allowed detection of up to 50% more protein spots than were obtainable with currently available protease inhibitors. This may change the way proteome profiling is carried out by generally enabling the detection of many protein spots that could not be seen previously.
Recently, the physiological changes of recombinant E. coli during secretory production of a recombinant humanized antibody fragment were monitored by 2-DE (3). Twenty-five protein spots were differentially expressed in the control and production fermentations at 72 h, while 19 other protein spots were present only in the control or production fermentation at this time. The synthesis of the stress protein phage shock protein A (PspA) was strongly correlated with the synthesis of a recombinant product. Coexpression of the pspA gene with a recombinant antibody fragment in E. coli significantly improved the yield of the secreted biopharmaceutical (3). In another study, a combined analysis of proteome, transcriptome, and mathematical models was used to engineer an E. coli strain (162). This E. coli mutant strain, obtained by random mutagenesis and secreting fourfold more active alpha-hemolysin (HlyA) than its parent strain, was characterized using both high-density microarrays for mRNA profiling and a proteomic strategy for protein expression. The relative mRNA and protein expression levels of tRNA synthetases, including AsnS, AspS, LysS, PheT, and TrpS, were lower in the mutant than in the parent. This combined examination of the mRNA and protein expression profiles showed that downregulation of the tRNA synthetases in the mutant lowered the general translation rate and, more specifically, lowered the rate of HlyA synthesis. Better secretion of alpha-hemolysin at a low synthesis rate is attributable to a balance between translation and secretion. The use of rare codons in the hlyA gene has been shown to reduce its rate of translation, because the number of available aminoacyl tRNAs is limited. A variant of the hlyA gene involving the alteration of five bases but encoding the same amino acid sequence was designed using a mathematical model of prokaryotic translation. In this way, the rate of translation could be artificially slowed down, leading to further improved secretory production of alpha-hemolysin.
An important factor to be considered in the production of recombinant proteins is the direct and indirect influences of the metabolic pathways that supply the energy and precursors required for the synthesis of proteins. Global proteome profiling of recombinant E. coli during the overproduction of human leptin was used to identify a target gene, leading to successful metabolic engineering for increased productivity of leptin and other serine-rich proteins by coexpression of the cysK gene (97). Thus, proteomic analysis can be used to examine changes in protein (enzyme) expression levels or to identify rate-controlling steps in metabolic pathways and develop a systematic strategy for optimizing the relevant metabolic pathways (95). It should be noted that the amount of protein is not always proportional to the protein activity, which in turn does not necessarily correlate with the corresponding metabolic reaction rate. However, it has been reported that protein abundance data obtained from proteome profiles appear to correlate to some extent with the enzyme activities in E. coli, with a few exceptions (231). Thus, it appears as though proteomics can be effectively used to identify candidates for successful metabolic engineering for improved bioproduct yield.
Proteomic analysis can also be used to detect the presence or absence of host proteins in the recombinant protein products. A proteomic study of recombinant E. coli cells expressing different biopharmaceutical proteins showed that the host cell protein profiles were highly similar (85 to 90%) at the end of their production runs, indicating that the multiproduct host cell immunoassay is a feasible method for the detection of host cell protein contaminants during the downstream processing of recombinant protein products (37, 38). These findings and other reports continue to emphasize the fact that E. coli proteomics is likely to become increasingly important not only in the biological research fields but also in various biotechnological applications.
CONCLUSIONS AND FUTURE PROSPECTS
A major goal of proteomics is the complete description of the entire protein spectrum underlying cell physiology. A large number of small-scale and more-recent large-scale experiments have contributed to expanding our understanding of the nature of whole-protein networks, though there are still some limitations regarding the use of proteomic methods. Many initial proteome studies were applied to E. coli, yielding a collection of extremely well characterized proteome databases including 1,627 proteins identified using gel-based or non-gel-based approaches. Extensive gel-based studies have given researchers a solid understanding of the global protein network and well-established 2-D gel databanks. Recently, many non-gel-based approaches have been validated with E. coli strains, leading to the identification of additional proteomic components. As these two approaches are complementary, they will likely contribute to identifying more proteins in the future. A well-defined E. coli proteome will have direct applications in biochemical, biological, and biotechnological research fields in the following ways. (i) The E. coli proteome underpins our understanding not only of the prokaryotic regulatory network but also of complex eukaryotic regulatory networks including stimulon, regulon, and cascade-like networks. (ii) The E. coli proteome can provide invaluable information for designing metabolic engineering strategies to enhance production of various bioproducts, including recombinant proteins, biopolymers, and metabolites. (iii) The E. coli proteome can be used as a model system to help accelerate the development of advanced high-resolution, high-throughput, and high-sensitivity proteomic technologies.
As we peek at the future, we see that proteomic studies will likely evolve in a number of ways. First, proteomic studies can be expected to transition toward a miniaturized platform, allowing scale-down analysis. In the case of microorganisms such as E. coli, single-cell proteome analysis (rather than that of a population of cells) may be realized. Towards these goals, new analytical protocols capable of processing nanoliter to picoliter volumes and femtomole to attomole quantities of proteins or peptides are being developed (170, 260). Advances in microfluidics and processes for handling minute sample volumes without adsorptive losses and with improved reaction kinetics should make it possible to carry out proteome analysis on a microchip (175). Second, proteomic analysis will become more automated. So far, the 2-DE technologies have proven difficult to automate due to several issues, including sample contamination and degradation, loss of proteins, and generally poor-quality data. However, progress has been made in recent years, including the development of programmable IEF units for automated overnight IPG strip rehydration and focusing and even partially or fully automated 2-DE units. Even greater progress is being made in the post-gel-handling steps, including the use of robots for spot excision, in-gel trypsin digestion, postdigestion cleanup and concentration, sample mounting onto MALDI-MS targets, and sample injection for LC-MS analysis. Non-gel-based methods can be more easily automated using nanoscale-compatible autosamplers, sophisticated HPLC pumping systems, and automated switching valves for multidimensional separations. As automation methods become more robust, they are expected to enhance the throughput and reproducibility, particularly among different laboratories. Third, instruments of higher sensitivity and accuracy for the detection of proteins will be continuously developed. For example, the tremendous volumes of data generated from traditional mass spectrometers include large numbers of false positives and/or true negatives (at least 20%, depending on the mass spectrometer). Recently developed or in-production instruments are expected to improve the future accuracy of MS data. Finally, more-solid bioinformatic tools will be developed for the analysis of large data sets generated by proteomics. High-quality software is required for the accurate detection, quantification, and identification of protein spots. In the future, the software will likely be equipped with algorithms for heuristic clustering and neural network analysis, which are currently used in other disciplines to analyze large data sets. These improved data analysis techniques can be expected to yield more-accurate mass measurements, unambiguous protein identification, and discrimination between artifactual modifications and true posttranslational modifications.
Many cutting-edge biological and biotechnological studies are currently driven by the high-throughput acquisition and examination of proteomic data supported by systematic biological and bioinformatic analyses (164). E. coli has been and will continue to be a model organism for these global-scale studies, which are aimed toward understanding the cell and organism as a whole. Considering that proteins mediate most cellular activities, proteomics will play a central role in achieving this ambitious goal. During this exciting expansion of data and understanding in the coming years, the E. coli proteome will continue to stand strong as a standard platform and the gold standard of model organisms.
ACKNOWLEDGMENTS
This work was supported by a Korean Systems Biology Research Grant (M10309020000-03B5002-00000) of the Ministry of Science and Technology. Further support by the LG Chem Chair Professorship, the IBM SUR program, Microsoft, KOSEF through the Center for Ultramicrochemical Process Systems, and the Brain Korea 21 project is appreciated.
- Copyright © 2006 American Society for Microbiology
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.
- 8.
- 9.
- 10.↵
- 11.↵
- 12.↵
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.↵
- 20.↵
- 21.↵
- 22.
- 23.↵
- 24.↵
- 25.↵
- 26.
- 27.↵
- 28.↵
- 29.
- 30.
- 31.
- 32.↵
- 33.↵
- 34.↵
- 35.
- 36.↵
- 37.↵
- 38.↵
- 39.
- 40.↵
- 41.
- 42.
- 43.↵
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.↵
- 50.
- 51.
- 52.
- 53.↵
- 54.
- 55.↵
- 56.
- 57.↵
- 58.↵
- 59.
- 60.
- 61.↵
- 62.↵
- 63.
- 64.↵
- 65.↵
- 66.↵
- 67.
- 68.↵
- 69.
- 70.↵
- 71.
- 72.↵
- 73.↵
- 74.
- 75.
- 76.
- 77.↵
- 78.↵
- 79.
- 80.
- 81.↵
- 82.
- 83.↵
- 84.↵
- 85.
- 86.↵
- 87.↵
- 88.
- 89.
- 90.
- 91.↵
- 92.↵
- 93.↵
- 94.
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.
- 104.↵
- 105.↵
- 106.
- 107.↵
- 108.↵
- 109.↵
- 110.
- 111.↵
- 112.
- 113.
- 114.↵
- 115.↵
- 116.↵
- 117.
- 118.↵
- 119.↵
- 120.
- 121.↵
- 122.
- 123.
- 124.↵
- 125.
- 126.↵
- 127.↵
- 128.
- 129.
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.
- 137.
- 138.↵
- 139.↵
- 140.
- 141.↵
- 142.
- 143.
- 144.
- 145.↵
- 146.↵
- 147.↵
- 148.
- 149.↵
- 150.
- 151.↵
- 152.
- 153.↵
- 154.↵
- 155.↵
- 156.
- 157.↵
- 158.↵
- 159.
- 160.↵
- 161.↵
- 162.↵
- 163.
- 164.↵
- 165.↵
- 166.↵
- 167.↵
- 168.↵
- 169.↵
- 170.↵
- 171.↵
- 172.↵
- 173.↵
- 174.↵
- 175.↵
- 176.↵
- 177.
- 178.
- 179.↵
- 180.↵
- 181.
- 182.
- 183.↵
- 184.
- 185.↵
- 186.↵
- 187.↵
- 188.
- 189.
- 190.↵
- 191.
- 192.↵
- 193.↵
- 194.↵
- 195.↵
- 196.↵
- 197.
- 198.↵
- 199.↵
- 200.
- 201.↵
- 202.↵
- 203.
- 204.
- 205.↵
- 206.
- 207.↵
- 208.↵
- 209.↵
- 210.↵
- 211.↵
- 212.
- 213.
- 214.
- 215.↵
- 216.
- 217.↵
- 218.↵
- 219.↵
- 220.↵
- 221.↵
- 222.↵
- 223.↵
- 224.
- 225.
- 226.↵
- 227.
- 228.
- 229.↵
- 230.
- 231.↵
- 232.
- 233.↵
- 234.↵
- 235.↵
- 236.
- 237.↵
- 238.
- 239.↵
- 240.↵
- 241.↵
- 242.
- 243.
- 244.
- 245.↵
- 246.↵
- 247.↵
- 248.
- 249.↵
- 250.↵
- 251.↵
- 252.↵
- 253.↵
- 254.↵
- 255.
- 256.↵
- 257.
- 258.
- 259.
- 260.↵
- 261.↵
- 262.
- 263.↵
- 264.
- 265.↵
- 266.
- 267.
- 268.↵
- 269.↵
- 270.↵
- 271.↵
- 272.
- 273.
- 274.↵
- 275.↵
- 276.↵
- 277.
- 278.↵
- 279.↵
- 280.
- 281.↵
- 282.↵
- 283.↵
- 284.↵
- 285.↵
- 286.↵
- 287.↵
- 288.
- 289.↵
- 290.↵
- 291.↵
- 292.↵
- 293.↵
- 294.↵
- 295.↵
- 296.↵
- 297.↵
- 298.↵
- 299.↵
- 300.↵
- 301.↵
- 302.
- 303.↵
- 304.
- 305.↵
- 306.↵
- 307.
- 308.↵
- 309.↵
- 310.
- 311.↵
- 312.
- 313.
- 314.↵
- 315.
- 316.↵
- 317.
- 318.
- 319.
- 320.
- 321.↵
- 322.↵
- 323.↵
- 324.
- 325.↵
- 326.
- 327.↵
- 328.
- 329.
- 330.↵
- 331.↵
- 332.↵
- 333.↵
- 334.↵
- 335.↵