Departamento de Genética, Universidad de Sevilla, Seville 41080, Spain,1 Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, California 931062
SUMMARY INTRODUCTION FOUNDATIONS Origins: R-M Systems Orphan DNA MTases Dam. CcrM. Regulation of Cellular Events by the Hemimethylated DNA State DNA Methylation Patterns DNA ADENINE METHYLATION-DEPENDENT REGULATORY SYSTEMS Pap Pili The Pap OFF- to ON-phase transition. Environmental mechanisms for switch control. The Pap ON- to OFF-phase transition. Pap-Related Systems PapI homologue acting as a positive regulator of pilus expression. PapI homologue acting as a negative regulator of pilus expression. Phase-Variable Outer Membrane Protein Ag43 VSP Repair Bacteriophage Infection Regulation of DNA packaging in bacteriophage P1. Regulation of the cre gene in bacteriophage P1. Regulation of the mom operon in bacteriophage Mu. Conjugal Transfer in the Virulence Plasmid of Salmonella enterica Regulation of traJ transcription. Regulation of finP transcription. Bacterial Virulence Roles of Dam methylation in Salmonella virulence. Attenuation of bacterial virulence by Dam methylase overproduction. CcrM Methylation and Regulation of Cell Cycle in Alphaproteobacteria Regulation of ccrM transcription. Regulation of ctrA transcription. CONCLUDING REMARKS ACKNOWLEDGMENTS REFERENCES
| SUMMARY |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Epigenetic phenomena include prions, in which protein structure is heritably transmitted (223, 231, 235, 259); genomic imprinting, characterized by monoallelic repression of maternally or paternally inherited genes (52, 84, 128, 195, 213); histone modification, such as methylation of lysines by histone phase methyltransferases (MTases) that maintain active and silent chromatin states (132, 273); and DNA methylation patterns formed as a result of inhibition of methylation of specific DNA bases by protein binding (29, 41, 118, 262, 263). Each of these phenomena involve self-perpetuating states, be they protein or DNA related (116, 155, 230-232), and the particular state that the molecule is in affects gene expression.
Epigenetic regulation can enable unicellular organisms to respond rapidly to environmental stresses or signals. For example, the yeast prion PSI+ is generated by a conformational change of the Sup35p translation termination factor, which is then inherited by daughter cells. The PSI+ form of Sup35p allows readthrough of nonsense codons that can provide a survival advantage under adverse conditions such as growth in paraquat or caffeine (259). The PSI+ prion is a metastable element that is generated and lost spontaneously at low rates, and thus within a population of yeast, some yeast cells will carry the prion and others will not. This situation provides potential flexibility in the response of the yeast population to environmental changes, orchestrated through the ability of the PSI+ prion to act upon native Sup35p protein and convert it to prion protein (223).
Methylation of specific DNA sequences by DNA methyltransferases provides another mechanism by which epigenetic inheritance can be orchestrated. For example, in certain eukaryotes, including mammals, methylation of cytosine residues at 5'-CG-3' (CpG) sequences facilitates binding of methyl-CpG binding proteins (134, 156, 187). In turn, methyl-CpG binding proteins affect the transcription state of a local DNA region through further interaction with chromatin-remodeling proteins (145). Methylation of CpG can affect gene expression, and the methylated state is usually correlated with transcriptional repression. The methylation pattern of a DNA region is defined as the collective presence or absence of methyl groups on specific target sites. DNA methylation patterns can vary between cells, tissues, and individuals. DNA methylation patterns are established via de novo methylation during the first stages of embryonic development (28, 81, 213). Such patterns are propagated by DNA methyltransferases known as maintenance methylases (Dnmt1), which are active on hemimethylated DNA substrates generated by DNA replication. Thus, if a DNA region contains methylated CpG sequences, they will be propagated in the methylated state. Nonmethylated CpG sequences, however, are not substrates for the maintenance DNA methylases. Thus, if a DNA region contains nonmethylated CpGs, they will tend to remain nonmethylated. A major area of research in eukaryotic epigenetic regulation is directed at understanding the mechanisms by which DNA methylation patterns are erased following cleavage of the fertilized egg and then established via de novo methylation (74, 81, 141, 180).
DNA methylation plays important roles in the biology of bacteria: phenomena such as timing of DNA replication, partitioning nascent chromosomes to daughter cells, repair of DNA, and timing of transposition and conjugal transfer of plasmids are sensitive to the methylation states of specific DNA regions (16, 160, 172, 178, 202, 285). All of these events use as a signal the hemimethylated state of newly replicated DNA, generated by semiconservative replication of a fully methylated DNA molecule. In the case of DNA replication, the protein SeqA binds preferentially to hemimethylated DNA target sites (GATC sequence) clustered in the origin of replication (oriC) and sequesters the origin from replication initiation. In addition, SeqA also transiently blocks synthesis of the DnaA protein, which is necessary for replication initiation, by binding to hemimethylated GATC sites in the dnaA promoter (36, 49, 100, 140, 146, 163, 179, 249). In DNA repair, the methyl-directed mismatch repair protein MutH recognizes hemimethylated DNA sites and cuts the nonmethylated daughter DNA strand, ensuring that the methylated parental strand will be used as the template for repair-associated DNA synthesis (8, 12, 25, 178, 227, 237). In transposition of Tn10, hemimethylated DNA plays two roles: enhancing binding of RNA polymerase to the transposase promoter and enhancing binding of transposase to its DNA target sites (144, 181, 219). DNA methylation appears to play similar roles in regulating Tn5 transposition (73, 161, 175, 217, 253, 292). None of these phenomena are heritable since the hemimethylated state of DNA is not heritable, occurring transiently in newly replicated DNA.
Phenomena involving inheritance of DNA methylation patterns are also known in bacteria, and the best-known examples involve phase variation. In phase variation, gene expression alternates between active (ON phase) and inactive (OFF phase) states. For example, uropathogenic Escherichia coli (UPEC) cells undergo pilus phase variation, which can be observed using immunoelectron microscopy with antipilus antibodies marked with colloidal gold (Fig. 1). Phase variation can occur through a variety of genetic mechanisms involving changes in nucleotide sequence (e.g., site-specific recombination and mutation) which result in heritably altered gene expression (1, 4, 26, 32, 33, 42, 53, 69, 75, 79, 86, 98, 113, 119, 122, 133, 164, 191, 229, 240, 244, 256, 265, 298). Bacteria also use epigenetic mechanisms to control phase variation. In all cases examined, these systems use DNA methylation patterns to pass information regarding the phenotypic expression state of the mother cell on to the daughter cells. A DNA methylation pattern is formed by binding of a regulatory protein(s) to a site that overlaps a methylation target, blocking methylation. This pattern can control gene expression if methylation, in turn, affects binding of the regulatory protein(s) to its DNA target site, which could occur by steric hindrance or alteration of DNA structure due to methylation (206, 207). Notably, most adhesin genes in E. coli are regulated by epigenetic mechanisms involving DNA methylation patterns (32, 115, 116, 262).
|
| FOUNDATIONS |
|---|
|
|
|---|
Work by Kobayashi and colleagues has suggested that R-M systems have attributes of selfish genes (148-150). Nakayama and Kobayashi showed that a plasmid containing the type II R-M EcoRV system could not be displaced from cells by an incompatible plasmid due to the death of cells that lost the EcoRV-containing plasmid, a form of postsegregational killing (186). In cells lacking the R-M gene complex, the levels of methylase and cognate restriction enzyme drop to a point where insufficient methylase is present to protect all chromosomal target sites; the restriction enzyme then cleaves one or more sites, killing the cell. This scenario is similar to that for addiction modules such as hok-sok, in which sok gene expresses an antisense RNA that inhibits translation of the hok toxin gene. When cells lose a plasmid containing hok-sok, they die; since hok mRNA is stable but sok RNA is unstable (half-life [t1/2]), <30 s), translation of hok ensues which leads to cell death (91, 92). Other addiction modules are made of two proteins, a toxin and an antitoxin (82, 90, 106).
Further analysis of the EcoRV system has shown that a regulatory gene designated "C," sandwiched between the R and M genes, codes for a product that activates R gene expression (186). The C gene appears to be required for expression of the R gene, since postsegregational killing does not occur in C gene mutants. One function of the C gene is in establishment of an R-M system in a new host. In this case the M gene is immediately activated, allowing modification of host DNA sites. At the same time, C gene expression is also activated, building up the C protein level to a point that allows activation of R gene expression. This temporal delay in expression of the restriction enzyme is critical in allowing time for all chromosomal sites to be methylated and protected from digestion. In addition, C also functions as a suicide immunity gene, forcing expression of the R gene of an incoming closely related R-M complex with different restriction specificity, resulting in host cell death. This would be expected to prevent spread of a competing R-M complex of the same C gene immunity group (any R-M complex in which the resident C protein activates expression of an incoming R gene) within a bacterial population (250).
A second regulatory strategy used by R-M systems utilizes methylation of the cognate restriction site to control R-M transcription via a direct effect on RNA polymerase binding. For example, in the CfrBI system of Citrobacter freundii, methylation of a cytosine (underlined) within the 5'-CCATGG-3' DNA restriction site decreases expression of the CfrBI methylase (CfrBIM) and concomitantly increases expression of the CfrBI restriction enzyme (CfrBIR) (18, 294). This appears to occur as a result of the location of the cfrBI site within the 35 RNA polymerase
70 binding site of the cfrBIM gene. Since the cfrBIM promoter is stronger than that of cfrBIR, any bacterial cell receiving the CfrBI system will be methylated before restriction can occur. As the intracellular methylase level increases, the cfrBI site is methylated, decreasing expression of cfrBIM and enabling expression of cfrBIR. The latter may protect the cell from incoming foreign DNA lacking methylated sequences.
A third R-M regulatory mechanism utilizes the methylase itself as a feedback regulator. In a number of cases binding of the methylase to DNA occurs via an N-terminal extension containing a helix-turn-helix motif (142, 196, 197). For example, in the SsoII R-M system of Shigella sonnei, the SsoII methyltransferase (SsoIIM) represses its own synthesis and stimulates expression of the cognate restriction endonuclease (SsoIIR). Similar N-terminal extensions are present on a number of 5-methylcytosine methyltransferases, including those in the EcoRII, dcm, MspI, and LlaJI systems (142). The last system, present in Lactococcus lactis, encodes two methylases, M1.LlaJ1 and M2.LlaJ1, recognizing the complementary and asymmetric sequences 5'-GACGC-3' and 5'-GCGTC-3', respectively, with methylation of the internal cytosine in each case. Two LlaJI restriction sites are present 8 bp apart within the regulatory region of the llaJI operon, with one site overlapping the 35 RNA polymerase
70 recognition site of the operon. Notably, methylation of both 5'-GCGTC-3' sites by M2.LlaJ1 enhances binding of M1.LlaJ1, repressing transcription of the llaJI operon. The ability of the M1.LlaJ1 methylase to distinguish methylated and nonmethylated target sites provides a feedback mechanism by which expression of the llaJI operon is controlled by DNA methylation.
The analysis of regulation of the EcoRV, CfrBI and LlaJI R-M systems described above has provided insight into the evolution of epigenetic control systems that are predominantly controlled by "orphan" methyltransferases, including DNA cytosine methylase (Dcm) (202) in E. coli. It has been postulated that orphan methylases such as Dcm may have arisen by selection as vaccines against invasion of a restriction-modification complex (250). In the case of Dcm, which methylates the duplex sequence 5'-CCWGG-3' (top strand shown; W = A or T) at the first cytosine, this methylation protects against cleavage by EcoRII. It was shown that postsegregational killing by the EcoRII R-M complex was diminished by the presence of dcm (250), which partially protected host chromosomal DNA from restriction attack. This function of Dcm as a possible molecular vaccine may be analogous to the function of cytosine methylation in certain eukaryotes, including mammals, where methylation has been postulated to inactivate transposons (293), although this hypothesis has been challenged (30). Dcm is not known to be involved in gene regulatory control. However, the other orphan methylase in E. coli, DNA adenine methylase (Dam), with homologues in other Alphaproteobacteria, does play an essential role in regulating epigenetic circuits. As well, Gammaproteobacteria have a cell cycle-regulated methylase (CcrM) which plays a major role in the control of chromosome replication and regulates expression of certain genes. In the next section we describe the biochemical properties of these DNA methylases and additional components of epigenetic switches before discussing specific epigenetic systems in detail.
group of DNA MTases based on the organization of 10 domains (167). The E. coli dam gene (accession no. J01600) is 834 bp and codes for a 32-kDa monomeric protein (114). Dam homologues are present in Salmonella spp., Haemophilus influenzae, and additional gram-negative bacteria (16, 204, 254). Dam binds to DNA nonspecifically as a monomer, moving by linear diffusion and specifically methylating 5'-GATC-3' sequences. At GATC sites the adenine base is flipped out 180° into the active site of the enzyme, where it is stabilized by hydrophobic stacking with a tyrosine in the DPPY motif, which is conserved among adenine methyltransferases (123, 157). The methyl group donor, S-adenosyl-L-methionine (AdoMet), is required for stable binding of the flipped adenine in the active-site pocket of the enzyme and binds to Dam after the methylase binds DNA, transferring a methyl group to the exocyclic N6 nitrogen of adenine (261). AdoMet binds to two sites in the Dam protein: one is the catalytic center, and the other seems to be involved in an allosteric change that may increase specific binding of Dam to DNA (22). Dam appears to methylate only one of the adenosines of duplex GATC DNA sequence at a time (261). Notably, Dam shows high processivity for most DNAs; that is, after one methylation event, it slides on the same DNA molecule and carries out additional methylation events (turnovers). This high processivity effectively increases the rate of Dam methylation and may reflect the fact that there are few (<100) Dam molecules present in a single E. coli cell, yet there are about 19,000 GATC sites to methylate. Dam levels vary according to growth rate as a result of increased transcription from one of five dam gene promoters, designated P2 (158). Based on the estimated numbers of Dam and GATC target sites per cell, each Dam molecule modifies between 20 and 100 GATC sites per minute (kcat) (261). This number is about 100-fold higher than the turnover number observed in vitro using an oligonucleotide substrate with one GATC site, indicating that there is likely some difference(s) in vivo that enables Dam to be more efficient at methylation (261). One possibility, suggested by Urig et al. (261), is that Dam is associated with the DNA polymerase III machine, scanning DNA for GATC sites as DNA replication proceeds and thus methylating DNA much more efficiently than it would in a random walk.
The processive nature of Dam contrasts sharply with DNA methylases associated with R-M systems, such as the EcoRV methylase (MEcoRV), which methylates its GATATC recognition sites distributively (95). In this case and for other R-M systems, incoming DNA needs to be restricted (cut) by the restriction enzyme before every site is methylated. The restriction enzyme has the advantage, since if just one restriction site in an incoming phage genome is left unmodified, the enzyme can cleave the DNA and block its replication. Note that restriction could be hampered if R-M DNA methylases were highly processive like Dam: processivity would increase the chances that all restriction sites in an incoming phage, for example, would be modified before restriction could occur.
Other gram-negative Gammaproteobacteria besides E. coli, including Salmonella spp., Serratia marcescens, Yersinia spp., Vibrio cholerae, Haemophilus influenzae, and Neisseria meningitidis, code for orphan MTases with significant sequence identity to EcoDam and which target adenosine of the GATC DNA sequence (162). Although Dam is not essential for growth of E. coli and Salmonella on laboratory media (14, 172, 254), the Dam homologues in Yersinia pseudotuberculosis, Yersinia enterocolitica, and Vibrio cholerae are essential gene products (135). However, a strain of Y. pseudotuberculosis in which dam mutations are viable has been described (252). It is not known what essential function(s) Dam plays in the pathogens in which it is essential, but it is provocative that both Yersinia and Vibrio contain two chromosomes, in contrast to the single chromosomes in E. coli and Salmonella spp., where Dam is not essential. A speculation is that Dam may be essential to coordinate DNA replication in bacteria with two or more chromosomes (78).
Dam homologues without a restriction enzyme counterpart are also present in bacteriophages, including Sulfolobus neozealandicus droplet-shaped virus (7), halophilic phage
Ch1 (15), H. influenzae phage HP1 (204), phage P1 (61), phage T1 (9), and phage T4 (226). The last MTase, T4Dam, has been well characterized biochemically, primarily by Hattman and colleagues (123, 228). T4Dam, like EcoDam, is highly processive (169) and complements a dam mutant E. coli mutator phenotype (226). T4Dam and EcoDam may have a common evolutionary origin, sharing up to 64% sequence identity in four different regions (11 to 33 amino acids long) (105). After methylation with resulting formation of S-adenosyl-L-homocysteine, AdoMet binds to T4Dam without dissociating from the DNA duplex (299). Like EcoDam, T4Dam appears to flip out the adenosine of GATC sequence, facilitating its methylation (168).
CcrM. The cell cycle-regulated DNA MTase family (CcrM) constitutes a second important group of orphan methyltransferases, classified in the ß group of MTases and originally identified in Caulobacter crescentus (167, 242, 300). CcrM binds to and methylates adenosine in the sequence 5'-GANTC-3', where "N" is any nucleotide (167, 300). Like EcoDam, CcrM is a functional monomer and acts processively (20), although evidence suggests that it is a dimer at physiologic concentration (234). However, unlike EcoDam, CcrM has a distinct preference for hemimethylated DNA as a substrate, based on the observation that the turnover rate for hemimethylated DNA containing a GANTC target site(s) was significantly higher than that for DNA containing nonmethylated sites (20). CcrM binds to and methylates adenosine in the sequence 5'-GANTC-3', where "N" is any nucleotide. The GANTC sequence is also the target of HinfM methylase, which shares 49% identity with CcrM and whose cognate restriction enzyme HinfI from H. influenzae cuts at nonmethylated GANTC sites (300).
In Caulobacter, CcrM is an essential cell component and plays a crucial role in cell cycle regulation (20, 139, 170, 214-216, 242, 243, 300). CcrM homologues, which are likewise essential, have been found in Agrobacterium tumefaciens, the causative agent of crown gall disease in plants (137); in Rhizobium meliloti, the nitrogen-fixing symbiont of alfalfa and other legumes (286); and in the animal pathogen Brucella abortus (222). In B. abortus, aberrant CcrM expression impairs the pathogen's ability to proliferate in murine macrophages, raising the possibility that CcrM methylation might control the synthesis of virulence factors (222).
|
The presence of hemimethylated GATC sites provides a signal that DNA replication has just occurred and plays a role in diverse cellular processes. For example, in methyl-directed mismatch repair the MutH protein binds to nonmethylated GATC sites and cleaves the nonmethylated DNA strand, ensuring that mutations in the daughter DNA strand are repaired using the parental strand as a template. In the absence of Dam, MutH can cleave the daughter strand, the parental strand, or both DNA strands. If the cell survives double-strand DNA breakage, 50% of the time the mutant daughter strand is used as a template to "repair" the parental strand, resulting in fixation of a mutation into the DNA (172, 285). Hemimethylated GATC sites are also used to control rates of transposition of insertion sequences IS3, IS10, IS50, and IS903 as well as transposons Tn5, Tn10, and Tn903 (73, 217, 219, 292). Elegant studies from Kleckner's laboratory showed that hemimethylated GATC sites control IS10 transposition in two different ways (181, 219). First, a GATC site present at bp 67 to 70 (here designated GATC-68) within the 10 module of the transposase promoter pIN controls transcription of the transposase gene. Full methylation of the GATC-68 inhibits RNA polymerase binding, reducing the level of tnp IS10 transcription. A second GATC site at bp 1320 to 1323 (GATC-1321) near the inner terminus of IS10 controls binding of transposase. Full methylation of GATC-1321 blocks transposition by inhibiting transposase binding. These two effects of DNA methylation on transposase expression and binding effectively limit IS10 transposition to a brief period immediately following DNA replication when GATC-68 and GATC-1321 are hemimethylated. Remarkably, the two hemimethylated IS10 DNAs have different transposition activities: IS10 methylated on the template strand is about 330 times more active than IS10 methylated on the nontemplate strand and 1,000 times more active than fully methylated IS10 (219). The majority of this difference is due to increased binding of transposase at the inner IS10 terminus; in addition, activation of the transposase promoter is more efficient in the IS10 hemimethylated species whose template strand is methylated. Since transposition of Tn10 does not involve the inner terminus, stimulation of Tn10 transposition following DNA replication is less efficient than for IS10 (219).
Like that of Tn10, transposition of IS50 and of Tn5 is stimulated by DNA replication (175). GATC sites are present within the inside end (IE) of IS50, similar to the case for IS10, and within the 10 region of the transposase regulatory region (73, 253, 292). In both IS50 and Tn5, Dam methylation represses tnp promoter activity and transposase binding to the IS50 IE (73, 253, 292). Increased transposition of IS50 and Tn5 in a Dam host requires integration host factor (IHF), probably to compensate for a DNA conformational defect associated with the lack of Dam (165). In turn, binding of Fis (factor for inversion stimulation) to the IE inhibits IS50 transposition (276). Methylation of three GATC sites within the Fis recognition sequence inhibits Fis binding. Thus, immediately following DNA replication, Fis binds to the IE, inhibiting IS50 transposition, and counteracts the positive effects of the hemimethylated state on IS50 transposition. In contrast, Tn5 transposition is not inhibited by Fis, since it does not use IE (276).
DNA hemimethylation may regulate transcription of additional genes that contain GATC sites within their promoter regions. The list includes glnS, sulA, trpS, trpR, and tyrR of E. coli and cre of bacteriophage P1 (16, 172, 205, 246). Expression of these genes was increased in the absence of Dam, suggesting that GATC methylation may decrease binding of RNA polymerase. The possible physiologic significance of methylation of these sites is not known, but it could tie gene expression to the replication state of the cell, increasing transcription immediately after passage of the replication fork. In the case of the trpR gene, which encodes the repressor of the trp operon, an attractive speculation has been proposed by M. G. Marinus: because trpR is located between the origin of replication and the trp operon, a transient boost in trpR transcription might provide the increased concentration of repressor necessary to maintain repression when chromosome replication doubles trp operon dosage (171).
Further analysis of DNA methylation patterns in E. coli showed that multiple GATC sequences (ca. 36 sites) in the genome of E. coli K-12, which lack pap DNA sequences, were stably nonmethylated (218, 272). These sites were identified by digestion of chromosomal DNA with MboI, which cuts at nonmethylated GATC sites. Since nonmethylated GATC sites are rare, the DNA fragments generated by MboI digestion are too large to be resolved by conventional agarose gel electrophoresis. Pulsed-field gel electrophoresis was used to resolve these fragments; however, the DNA sequences flanking the nonmethylated GATC sites were not determined. Ringquist and Smith (218) also showed for the first time that a number of Dcm target sites [CC(A/T)GG; the second cytosine is methylated at the C-5 position] were stably nonmethylated.
Wang and Church analyzed Dam DNA methylation patterns to assess the binding of proteins to chromosomal DNA sites. Chromosomal DNA was digested with MboI and ClaI and cloned into pBluescript, which enabled the nonmethylated GATC sites to be sequenced (272). Since binding of proteins such as catabolite gene activator protein (CAP) is dependent upon environmental conditions via the secondary regulator cyclic AMP (cAMP), DNA methylation patterns within the regulatory regions of genes bound by cAMP-CAP and other regulatory factors were found to be environmentally controlled (218, 251). For example, a GATC sequence within the regulatory region of the car operon, controlling carbamoyl phosphate synthetase and involved in arginine and pyrimidine anabolism, was found to be protected from Dam methylation (272). This nonmethylated GATC site and others are listed in Table 1, with the chromosomal position (bp 29444 for the GATC near the carA gene) in E. coli MG1655 (a K-12 isolate) also shown. No protection of the car GATC site was detected in the absence of pyrimidines, consistent with the hypothesis that a pyrimidine repressor(s) binds to the car promoter region near or overlapping the GATC site, protecting it from methylation. Indeed, CarP and IHF were shown to bind in the regulatory region of carAB and protect GATC-207 (Table 1) from methylation (54).
|
crp cells. These data supported the hypothesis that CAP contributes to methylation protection of GATC-44.5 in vivo. However, further analysis of the gut operon showed that although cAMP-CAP binds to sites overlapping GATC-44.5, CAP does not protect this site from Dam methylation (263). Instead, the GutR repressor, which also binds at GATC-44.5, blocks methylation of this site both in vitro and in vivo. GutR-dependent protection of methylation of GATC-44.5 in vivo was not observed in the presence of glucitol, an activator of gut transcription, indicating that under these conditions GutR was no longer bound at GATC-44.5, allowing methylation of this site by Dam. However, methylation of GATC-44.5 did not affect binding of GutR to the gut regulatory region. These results led to the conclusion that although methylation protection indicates the presence of a DNA binding site in vivo, the absence of methylation protection of a GATC site does not prove the absence of binding of a protein at that site (263). Wang and Church also identified nonmethylated GATC sites within the mtl (mannitol, bp 3769597), cdd (deoxycytidine deaminase, bp 2229798), flh (flagellar synthesis, bp 1976481), psp (stress response, bp 1366007), and fep (iron transport, bp 621523) operons (272). Using a similar approach in which nonmethylated GATC sites in the E. coli chromosome were cloned by digestion with MboI and AvaI, Hale et al. identified four nonmethylated GATC sites in the regulatory regions of the ppiA (bp 3490085), yhiP (bp 3638351), rspA (bp 1653241), and b1776 (bp 1859455) genes (99). Protection of the ppiA GATC site was dependent upon growth phase and carbon source. Protection of a GATC site near yhiP required leucine-responsive regulatory protein (Lrp) and was leucine responsive, similar to the case for some operons controlled by this global regulator (44, 68, 188, 189). The other GATC sites were protected under all the environmental conditions examined (99). A more comprehensive approach to identification of nonmethylated GATC sites was undertaken by Tavoizoie and Church (251); this approach allowed 12 additional sites to be identified, all of which were located within 5' noncoding regions of genes and open reading frames (Table 1).
Recent work by Blomfield's group on fim regulation controlling type 1 pili has identified two nonmethylated GATC sites at bp 4537512 and 4538525 in the E. coli chromosome near yjhA that are stably nonmethylated, separated from the fim locus by 1.4 kilobase pairs (80). These GATC sites are located near cis-active element regions 1 and 2, both of which play positive roles in transcription of the fimB recombinase gene, controlling type 1 pilus phase variation together with FimE (239). Binding of two regulatory proteins, the NanR sialic acid-responsive regulator and NagC, the N-acetylglucosamine-responsive regulatory protein, is required to activate fimB expression. Binding of NanR to region 1 blocks methylation of one adjacent GATC site, and binding of NagC to region 2 blocks methylation of the second GATC site. Only a fraction of the two GATC sites are nonmethylated after growth in glycerol minimal medium (239). Methylation protection of these GATC sites is not observed after addition of sialic acid (also known as N-acetyl-neuraminic acid). This likely occurs via inhibition of NanR binding, which is sensitive to sialic acid and inhibition by NagC via binding of N-acetylglucosamine-6-phosphate generated by sialic acid catabolism. Thus, binding of NanR and NagC controls methylation of two GATC sites adjacent to yjhA, likely by steric hindrance of Dam. However, mutation of the GATC site adjacent to region 1 did not affect fimB expression (239), indicating that methylation of this GATC site does not, in turn, modulate NagC binding. Moreover, in a dam mutant, expression of fimB is decreased, the opposite of what would be expected if GATC methylation inhibits NagC and NanR binding. These results indicate that the reported regulation of fim expression by Dam (199) does not occur via methylation of the GATC sites located near regions 1 and 2 adjacent to fim.
In summary, a small fraction of the approximately 20,000 GATC sites in the E. coli chromosome are totally or partially nonmethylated in any given growth state and environmental condition. The protection of GATC site methylation by Dam is dependent upon competition between Dam and specific DNA binding proteins. Dam appears to methylate most GATC sites in a highly processive manner, as discussed above. Recently, however, analysis of methylation of the regulatory GATC sites in the pap operon indicates that they are not methylated processively (32) . That is, Dam binds to pap DNA, methylates one GATC site, and then dissociates before methylating the second site. This effectively reduces the ability of Dam to compete with proteins that bind to DNA sequences containing one or more GATC sites. Bergerat et al. first proposed that DNA sequences surrounding GATC sites may dictate the avidity of Dam for its target sites (23). Mutation of the AT-rich flanking sequences of the pap GATC sites to CG sequences increased processivity, which appeared to be due to changes in the kinetics of methyl transfer and not in binding affinity (203). Analysis of known nonmethylated GATC sites tentatively suggests a trend toward having AT-rich flanking sequences, though this is not always the case (Table 1).
Since DNA methylation patterns are formed as a result of binding of proteins primarily at gene regulatory regions, they are altered by growth conditions that affect regulatory protein level(s) and/or DNA binding properties. As discussed above, identification of nonmethylated GATC sites has been used as a sort of natural in vivo footprint system to track binding of regulatory proteins under different environmental conditions (251, 272). In addition, it is clear that a subset of nonmethylated GATC sites (for example within the pap, sfa, daa, agn43, and other operons [see below]) play important roles in epigenetic regulation. In these systems, not only is a DNA methylation pattern established by protection of specific GATC sites by a regulatory protein(s), but methylation of the GATC site(s), in turn, modulates regulatory protein binding (263). This results in two heritable states: either the regulatory protein is bound to a specific DNA sequence containing a GATC site(s), protecting it from methylation, or the regulatory protein is not bound due to a reduction of binding affinity for target sequence(s) caused by GATC methylation. Clearly, only a subset of all nonmethylated GATC sites have these particular properties and are involved in epigenetic control systems. For example, as shown in Table 1, DNA methylation patterns have been shown to directly control expression of agn43 (111, 271) but do not control the gut (srl) operon (263) and do not appear to directly regulate fim (239). Further study will be necessary to determine if any of the other genes containing nonmethylated GATC sites in their regulatory regions are under methylation pattern control (Table 1).
| DNA ADENINE METHYLATION-DEPENDENT REGULATORY SYSTEMS |
|---|
|
|
|---|
DNA adenine methylase controls Pap phase variation by methylation of two GATC sites, one proximal to the pap pilin promoter (GATCprox), located 53 bp from the papBA transcription start site, and the other located 102 bp upstream of GATCprox, designated GATCdist (Fig. 3A). Note that these two GATC sites are located within Lrp DNA binding site 2 and site 5, respectively. Methylation at these two pap GATC sites controls the binding of the global regulator Lrp (44, 189) and the coregulatory protein PapI (118, 138) to pap DNA sites 1, 2, and 3 proximal to the papBA pilin promoter and to sites 4, 5, and 6 distal to papBA. Lrp appears to bind cooperatively to sites 1, 2, and 3 or to sites 4, 5, and 6 (193). Binding to all six sites can be achieved in vitro by addition of sufficient Lrp but rarely occurs in vivo based on analysis of the methylation states of GATCprox and GATCdist (41). In ON-phase cells GATCdist is nonmethylated and GATCprox is methylated (41) (Fig. 3D). Protection of GATCdist from Dam methylation requires both Lrp and PapI based on the observation that GATCdist is fully methylated in either an lrp or a papI mutant (40, 41). In contrast, OFF-phase cells display the converse DNA methylation pattern in which GATCprox is nonmethylated and GATCdist is methylated (Fig. 3A). Protection of GATCprox requires Lrp but not PapI (41, 263). Based on these in vivo DNA methylation patterns together with in vitro studies of Lrp binding, it was concluded that in ON-phase cells PapI-Lrp binds to sites 4, 5, and 6, protecting GATCdist from Dam, and in OFF-phase cells Lrp binds to sites 1, 2, and 3, protecting GATCprox from Dam (41). These DNA methylation patterns result from competition between Dam and Lrp for binding at sites 1, 2, and 3 and at sites 4, 5, and 6, containing GATCprox and GATCdist, respectively, as discussed in detail below.
|
The transition from the OFF to ON phase requires that GATCprox be methylated by Dam; either a dam mutant E. coli strain or a GCTCprox A-to-C transversion mutant that cannot be methylated by Dam but does not significantly alter the affinity of Lrp for sites 1, 2, and 3 is locked in the OFF phase (41). In contrast, methylation of GATCdist has an inhibitory effect on the OFF-to-ON switch: overexpression of Dam by just fourfold prevents the OFF-to-ON switch. Moreover, E. coli containing a GCTCdist mutation that blocks Dam methylation is locked in the ON phase, even under conditions of Dam overexpression (41). These data support the hypothesis that OFF-to-ON switching requires DNA replication to generate a hemimethylated GATCdist intermediate, which is bound by PapI-Lrp with a higher affinity than DNA with a fully methylated GATCdist (118). A low level of the coregulatory protein PapI, required for Pap pili expression (138, 193, 194), increases the affinity of Lrp for pap DNA hemimethylated at GATCdist but does not enhance binding of Lrp to pap DNA fully methylated at GATCdist (118). Notably, the hemimethylation state of pap matters: PapI increases Lrp's affinity for DNA methylated on the top strand at GATCdist about fourfold more than for DNA methylated on the bottom strand (118). These results raise the intriguing possibility that Pap phase switching may be biased: daughter cells receiving a DNA methylated on the top strand may have a higher probability of switching to the ON phase than cells receiving DNA methylated on the bottom strand.
PapI is a small (ca. 9-kDa) coregulatory protein expressed from the papI promoter divergent to the papBA pilin promoter (Fig. 3A, top). PapI increases the affinity of Lrp for pap site 5, and to a lesser extent site 2, but has no effect on binding of Lrp to any of the other four Lrp binding sites (118) (Fig. 3C). pap Lrp binding sites 5 and 2 share the sequence "ACGATC," which differs from the other four pap Lrp binding sites and the ilvIH Lrp binding site 2 (65, 129, 138), which do not display PapI-dependent Lrp binding (118). All pap Lrp binding sites share the sequence "GNNNTTT" with the Lrp binding consensus determined by systematic evolution of ligands by exponential enrichment (64).
PapI does not appear to bind specifically to pap DNA by itself, based on gel shift analysis (138) and DNA cross-linking (118). DNA methylation interference indicated that methylation of bases in the sequence 5'-GNCGAT-3' overlapping GATCdist in the top strand and 3'-TGCTAG-5' in the bottom strand significantly reduced PapI-dependent Lrp binding compared with binding of Lrp alone. Methylation of the bottom-strand cytosine complementary to the guanine of "GATC" (meC9) blocked formation of the ternary PapI-Lrp-pap site 5 complex without affecting Lrp binding (118). These results support the hypothesis that enhancement of Lrp binding to site 5 occurs via formation of a PapI-dependent ternary complex with Lrp and pap DNA. Cross-linking with a photoactivatible 9-Å azidophenacyl cross-linker three bases from the presumptive PapI binding sequence "ACGATC" showed that PapI and Lrp were both cross-linked to pap DNA in the ternary complex with nonmethylated DNA, while only Lrp was cross-linked with DNA methylated at C9 (118). These results indicate that PapI is located near the pap ACGATC sequence in the PapI-Lrp-pap site 5 ternary complex and may directly contact this sequence.
The observation that PapI (100 nM) increases Lrp's affinity for pap site 2 (which contains the ACGATC PapI-specific sequence identical to site 5) (118) presents an apparent paradox, since this should block pap transcription due to its close proximity to the papBA pilin promoter (278). Further analysis showed that at low PapI levels significant enhancement of Lrp binding occurred at sites 4, 5, and 6 (CGATCdist) but not at sites 1, 2, and 3 (CGATCprox) (118). At 5 nM PapI, the affinity of Lrp was fourfold higher for pap sites 4, 5, and 6 (Kd = 0.25 nM) than for sites 1, 2, and 3 (Kd = 1.0 nM). Conversely, in the absence of PapI, the affinity of Lrp for sites 1, 2, and 3 (Kd = 1.2 nM) was about twofold higher than that for sites 4, 5, and 6 (Kd = 2.5 nM). Thus, binding of Lrp at sites 4, 5, and 6 should be favored at low PapI levels, resulting in activation of papBA transcription. This, in turn, would increase the PapI level via a PapB-mediated positive feedback loop whereby PapB binds upstream of the papI promoter and helps activate PapI expression (11, 85, 288) (Fig. 3B). High PapI levels could potentially shut off pap transcription by increasing the binding of PapI-Lrp complexes at promoter-proximal sites 1, 2, and 3. However, this is prevented by methylation of GATCprox by Dam, which specifically blocks PapI-dependent Lrp binding without affecting binding of Lrp alone (118).
To determine if the essential role of methylation of GATCprox in the OFF- to ON-phase transition is to specifically block PapI-dependent Lrp binding to sites 1, 2, and 3, the wild-type CGATCprox sequence was mutated to TGATCprox to specifically inhibit PapI-dependent Lrp binding. It was reasoned that under conditions in which PapI-dependent binding of Lrp to sites 1, 2, and 3 was blocked, switching from OFF to ON phase should occur in the absence of Dam. Analysis of the TGATCprox mutant showed that PapI-dependent Lrp binding to sites 1, 2, and 3 was inhibited but binding of Lrp was unaffected both in vitro and in vivo. Switch frequency analysis of E. coli containing the TGATCprox mutation showed that the OFF-to-ON rate (5.6 x 104/cell/generation) was about sevenfold higher than that of wild-type cells (8.2 x 105/cell/generation). Notably, in a dam null mutant background cells were locked in the ON-phase state, showing that methylation is not required for pap transcription under conditions in which PapI-dependent binding of Lrp to pap site 2 containing GATCprox is blocked. These results support the conclusion that methylation at GATCprox is required for the OFF- to ON-phase transition by specifically inhibiting PapI-dependent Lrp binding to sites 1, 2, and 3 (Fig. 3C, top).
Environmental mechanisms for switch control.
Binding of Lrp at sites 4, 5, and 6, together with binding of cAMP-CAP at 215.5 (relative to the papBA transcription start site) (277), enhances papBA transcription via contact between CAP activating region 1 and the
C-terminal domain of RNA polymerase (277). In this way, Pap pilus expression is environmentally controlled by carbon source via the cAMP level. The role of Lrp may be structural, bending pap DNA between the CAP binding site at 215.5 and the papBA promoter to facilitate contact between cAMP-CAP and the
C-terminal domain. This results in transcription initiation from papBA and expression of PapB, which has been reported to bind with highest affinity to a site between the papI promoter and the CAP binding site (85), stimulating papI transcription, which constitutes a positive feedback loop (Fig. 3D). The high PapI level ensures binding of PapI-Lrp to sites 4, 5, and 6, and methylation of GATCprox prevents binding of PapI-Lrp to sites 1, 2, and 3, which would shut off papBA transcription and turn the switch OFF (278). The fact that both PapI and PapB are required for switching from the OFF to ON phase raises a chicken-and-egg problem that has not been adequately addressed: which regulatory factor initiates the switch? We speculate that regulation is at the level of PapB expression and that a low level of papBA mRNA is made following DNA replication and Lrp/H-NS dissociation from sites 1, 2, and 3 (266). If this papBA mRNA is rapidly translated, it would induce papI transcription, initiating the OFF-to-ON switch cascade. There is indirect evidence to support the idea that there may be translational control involved in Pap pilus expression, since a rimJ mutation affects pap gene regulation (280-282). RimJ acetylates ribosomal protein S5 in the 30S subunit. Thus, it is possible that ultimately the initiation of the Pap OFF-to-ON switch may be dependent upon the translation of a basal level of papBA mRNA present immediately following DNA replication.
The global regulatory protein H-NS is not required for Pap phase variation (266), but it does modulate Pap gene expression and Pap switch rates. H-NS represses papBA transcription in response to low temperature (94), high osmolarity (283), and rich medium (283). This may occur by specific binding of H-NS to the pap regulatory region, as evidenced by blocking of methylation of both pap regulatory GATC sites in vitro and in vivo (279). Binding of H-NS near the papBA promoter could inhibit binding of RNA polymerase, repressing transcription. Notably, at 37°C H-NS appears to positively affect Pap phase variation, since the OFF-to-ON switch rate is reduced in an hns mutant (266, 283). This positive effect of H-NS on the OFF- to ON-phase transition could occur via competition with Lrp at sites 1, 2, and 3, which would help to move PapI-Lrp to sites 4, 5, and 6, analogous to the role of methylation of GATCprox (Fig. 3C).
Another environmental input into Pap phase variation is mediated by the CpxAR response regulatory system (117, 127). Under certain conditions that stress the cell envelope, including high pH, CpxA located in the inner membrane autophosphorylates and then transfers a phosphate group to CpxR to yield CpxR-phosphate (CpxR-P) (176, 211). CpxR-P binds to sites overlapping all six pap Lrp binding sites, competes with Lrp for binding to these sites, and shuts off papBA transcription and Pap pilus expression (115, 117). Notably, CpxR-P binding to pap sites 1 to 6 is not inhibited by DNA methylation, in contrast to Lrp, even though CpxR-P, like Lrp, binds at sites overlapping the pap GATCprox and GATCdist sites. The biological role of CpxAR regulation of Pap pilus expression is not fully clear. One possibility is that under conditions of envelope stress it makes sense to curtail pilus expression to prevent further damage to the membrane. Another provocative possibility is that under conditions of stress UPEC cells stop making Pap pili, making them susceptible to contact-dependent growth inhibition (3). The physiologic significance of this is unknown, but it might contribute to survival under harsh conditions by slowing bacterial metabolism and growth (3).
The Pap ON- to OFF-phase transition. The Pap ON- to OFF-phase transition occurs at about a 100-fold-higher rate than the OFF- to ON-phase transition (35, 266). Notably, factors including H-NS, carbon source, and osmolarity do not affect the ON- to OFF-phase transition rate (35, 266, 283); therefore it appears that the ON- to OFF-phase transition is relatively constant under different environmental conditions. The ON- to OFF-phase transition has not been thoroughly examined, but based on knowledge of the OFF-to-ON switch mechanism (116-118) (see above), the following model is postulated. Starting with a cell in the ON-phase state (Fig. 4A), DNA replication is postulated to dissociate PapI-Lrp from sites 4, 5, and 6, enabling Dam to compete with Lrp for binding at GATCdist (Fig. 4C) Methylation of GATCdist is essential for the OFF-phase state (41). DNA replication also generates two hemimethylated GATCprox sites, one methylated on the top strand and one on the bottom strand (Fig. 4B). Whether a cell remains in the ON phase or transitions to the OFF state may be dictated by competition of Lrp for binding to pap promoter-proximal sites 1, 2, and 3 versus distal sites 4, 5, and 6 (Fig. 4B). Lrp has about a twofold-higher affinity for the proximal sites than for distal sites, and methylation of GATCprox does not affect Lrp binding to these proximal sites (118). In contrast, methylation of GATCdist inhibits binding of Lrp and PapI-Lrp to the distal sites (118, 194). These two factors should favor binding of Lrp to the proximal sites over the distal sites, which may account in part for the high ON-to-OFF rate observed. Following one additional round of DNA replication, the OFF-phase state is attained (Fig. 4D).
|
PapI homologue acting as a positive regulator of pilus expression. The regulatory regions of many pilus operons in E. coli, including Pap-related fimbriae (Prf), foo (F1651 pili), clp (CS41 pili), sfa (S pili), daa (F1845), fae (K88), and afa (afimbrial adhesin), share two GATC sites analogous to GATCprox and GATCdist and spaced 102 base pairs apart as in pap (151) (Fig. 5). Moreover, these GATC sites are present within additional conserved sequences, "CGATCdistTTTT" and "CGATCproxTT," with the entire sequence called a "GATC box" (note the inverse orientations of the GATC boxes in the pilus regulatory sequences shown in Fig. 5). Since the GATC box sequence contains binding sites for Lrp and Dam, as well as a portion of the PapI response element "ACGATC," this provides the means by which these various pilus operons are controlled by DNA methylation patterns.
|
PapI homologue acting as a negative regulator of pilus expression. Two methylation-controlled pilus operons in E. coli, clp (CS31A) and fae (K88), and one pilus operon in Salmonella enterica serovar Typhimurium, pef, share common regulatory features with pap but have distinct differences as well. The regulatory regions of clp, fae, and pef contain conserved GATC box sites and spacing identical to that in pap (Fig. 5). Also similar to pap, binding of Lrp to regulatory DNA is controlled by DNA methylation and a PapI homologue. However, all three methylation-controlled operons are carried on plasmids, and in each case PapI homologues negatively control phase variation and transcription.
K88 pili, expressed by enterotoxigenic E. coli infecting pigs, is not under phase variation control, in contrast to the case for all other Pap family members, (124). The fae regulatory region shares GATC box sequences with pap, spaced 102 bp apart, as well as a PapI homologue, FaeA, and a PapB homologue, FaeB (124). A third regulatory GATC site (GATC-III) is present 28 bp downstream (toward the faeB promoter) of GATCprox, and two IS1 sequences are present between faeB and faeA (Fig. 5). In contrast to the case for pap, FaeA and Lrp act to negatively control fae transcription. Data from Huisman et al. indicated that in the absence of FaeA, Lrp binds at sites overlapping GATCprox, protecting it from methylation by Dam (124, 125). However, in contrast to the case for pap, this Lrp binding has little effect on pilin transcription. In the presence of FaeA, the PapI homologue, additional binding of Lrp near GATC-III occurs, blocking methylation of both GATCprox and GATC-III and reducing fae transcription. This GATC-III site shares the "CGATCTTTTA" sequence of the pap and fae GATCdist sites, though in opposite orientation, possibly accounting for FaeA-mediated binding of Lrp to this region. However, FaeA-mediated binding of Lrp to GATCdist was not observed. In fact, mutation of the GATCdist site to GTTC sequence was lethal due to overproduction of K88 pili, indicating that methylation of GATCdist normally blocks binding of FaeA-Lrp. Whether FaeA-Lrp binds to GATCdist under normal physiologic conditions is not clear, but it is possible that binding to a hemimethylated GATCdist site might occur immediately following DNA replication, stimulating K88 expression under certain conditions. Another difference between regulation of fae and pap is in control of faeA and of papI transcription. In the case of pap, papI is regulated by PapB via a positive feedback mechanism (116), whereas in fae, an IS1 insertion apparently disrupts this positive feedback. Instead, FaeA may bind to its own promoter, acting as a positive autoregulator (125).
Regulation of the clp operon, coding for CS31A pili, which are expressed by enterotoxigenic E. coli, shares common regulatory features with pap but, like for fae and pef, has distinct differences as well. In E. coli isolate CS31A harboring clp, CS31A pili are under phase variation control, yet the plasmid-carried clp operon does not have a papI homologue associated with it (62, 173). It seems likely that a pap operon identified on the chromosome of E. coli CS31A supplies PapI in trans, but this has not been confirmed. Analysis of clp regulation in E. coli K-12 (no papI homologue present) showed that Lrp and the PapB homologue ClpB repressed clp transcription. However, even in the presence of Lrp and ClpB, a moderate level of clp pilin transcription was observed. In addition, in lrp+ clpB+ cells lacking Dam, transcription was almost maximally derepressed. Introduction of the PapI homologue AfaF resulted in phase variation of CS31A expression: instead of a normally distributed transcription of CS31A among the cell population, individual cells either transcribed (ON phase) or did not transcribe (OFF phase) the clp operon, with the methylation patte