Fumio Arisaka,4 Takashi Kunisawa,5 and Wolfgang Rüger6
Department of Microbiology, North Carolina State University, Raleigh, North Carolina 27695-7615,1 The Evergreen State College, Olympia, Washington 98505,2 Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37232,3 Department of Molecular and Cellular Assembly, Tokyo Institute of Technology, Yokohama 226-8501,4 Department of Applied Biological Sciences, Science University of Tokyo, Noda 278-8510, Japan,5 Faculty for Biology, Ruhr-University-Bochum, 44780 Bochum, Germany6
SUMMARY T4 GENES TO GENOME NUCLEOTIDE SKEW IN THE T4 GENOME IDENTIFYING T4 GENES Computational Strategies for Gene Assignment Characterized T4 Genes and the Early Genetics ORFs of Unknown Function and Host Lethality PROMOTERS AND TRANSCRIPTION FUNCTIONS Early Transcription Middle Transcription Late Transcription Microarray Analysis of T4 Transcription Transcription Termination and Predicted RNA Structures Intrinsic transcription terminators. Rho-dependent transcription terminators. TRANSLATION AND POSTTRANSCRIPTIONAL CONTROL Ribosome-Binding Sites RNA Structure at Ribosome Binding Sites Internal Initiation Sites Translational Coupling Translational Repressor Proteins Codon Usage tRNAs Introns mRNA and tRNA Turnover Proteolysis DNA METABOLISM, REPLICATION, RECOMBINATION, AND REPAIR Enzymes of Nucleotide Metabolism DNA Replication Proteins Initiation of DNA Replication Recombination and Recombination-Dependent DNA Replication DNA Repair MOBILE ENDONUCLEASES, GENE TRANSFER, AND GENE EXCLUSION T4 PARTICLE, INFECTION, AND LYSIS Heads DNA Packaging Baseplate and Tails Infection and Superinfection Exclusion Lysis and Lysis Inhibition RESTRICTION-MODIFICATION SYSTEMS AND PHAGE EXCLUSION PREDICTED INTEGRAL MEMBRANE PROTEINS Integral Membrane Proteins of Known Function Hypothetical Proteins with Predicted Cell Membrane Associations Missing Membrane-Associated Proteins EVOLUTIONARY PERSPECTIVES: T4 PROTEINS AND THE GENOME T4 Protein Structures Orthologous T4 Proteins Paralogous Genes in the T4 Genome A Glimpse at Genome Diversity and Evolution in T4-Type Phages OUTLOOK ACKNOWLEDGMENTS REFERENCES
| SUMMARY |
|---|
|
|
|---|
| T4 GENES TO GENOME |
|---|
|
|
|---|
|
There was a price to pay for all of the benefits provided by this highly tractable genetic system. Early efforts to clone T4 genes were largely thwarted by the glucosylated hydroxymethyl cytosine (HMC) DNA (which is central to the high expression and replication of the phage genome, the concurrent total inhibition of host transcription, and the eventual degradation of the host DNA). Most of the available restriction endonucleases failed to digest T4 DNA, delaying the gene-by-gene cloning analysis that rapidly advanced in other model organisms. Eventually, multiply mutant T4 strains defective in the nucleases that cleave unmodified DNA, in the enzymes leading to the synthesis of HMC-DNA, and in the protein blocking transcription of cytosine-containing DNA were constructed (1020). These T4dC (or T4C) strains permitted the construction of detailed restriction maps of T4 (137a, 139, 600, 814, 833a, 1214) and rapidly accelerated cloning and sequence analysis of T4 gene clusters. By the early 1990s, much of the genome had been sequenced, but extensive regions remained intractable. The uncloned DNA appeared to largely encode proteins involved in the transition from host to phage metabolism, nucleases, and other proteins toxic to the Escherichia coli cloning host. These regions were sequenced by different members of the T4 community, who closed the gaps by using PCR to carry out direct sequencing without cloning. Regions that have not otherwise been published include the nrdC-tk region (laboratory of E. Kutter), the e-tRNA region (laboratories of V. Mesyanzhinov and E. Kutter), the 34-35 region (laboratory of E. Goldberg), the t-asiA.5 region (laboratory of J. Drake) and the ndd-rIIB region (laboratories of K. Kreuzer and M. Uzan). The complete 168,903-bp sequence of the T4 genome is available as GenBank accession no. AF158101 and as entry NC_000866 at the NCBI Entrez Genome site (http://www.ncbi.nlm.nih.gov/Entrez). Among sequenced viruses in the database, only Pseudomonas phage
KZ (727), the African swine fever virus, herpesviruses, chlorella virus, and vaccinia virus have larger genomes.
The T4 genome is a rich arena for evaluating complete genomes in the context of a well-characterized biological system. Here, we demonstrate the use of some of the computational tools currently available for complete genome sequence analysis and discuss the new insights gained from this analysis of the T4 genome and its nearly 300 genes.
| NUCLEOTIDE SKEW IN THE T4 GENOME |
|---|
|
|
|---|
A substantial skew toward G and against C in the coding strand is observed in translated regions. Only four genes have more than 20% C in the coding strand, while about 130 have more than 20% G and 37 have more than 22% G. A and T are more equitably divided between the strands. However, the AT bias is strong in the third position of codons, as expected with high-A+T genomes, and reflection points in the bias (Fig. 2) do correlate with changes in the direction of T4 transcription (499). Whether these biases are coupled to effects of transcription or replication on directional mutation pressure, as suggested previously (499), remains to be demonstrated. Variably used multiple origins of T4 DNA replication (see below) presumably preclude the use of nucleotide skew analysis to identify the origin of replication, as it is often used for microbial chromosomes (352). Overall, AT skew is a strong predictor of T4 coding regions and the transcribed strand, although in a few regions both strands are transcribed and, in at least one region, both are translated.
|
The AT-rich T4 genome may also present features advantageous for a virus: a DNA structure different from the B-DNA of its host (809). On a local scale, the structure would approach D-form DNA: a polymer consisting of poly(dA-dT) double strands, overwound with only 8 bp per turn, a wider and shallower major groove, and a deeper and narrower minor groove (126, 127, 636). Close contacts of the glucosyl residues with side groups of neighboring bases could alter the preferred values of roll, slide, and twist angles of base pairs (258). Such forces and structural features can influence the outward appearance of the DNA in a way that may be recognized by proteins. Enzymes that melt DNA as part of their action (such as RNA polymerase and DNA polymerase) might transcribe and replicate AT-rich DNA faster than they would transcribe and replicate DNA with a balanced GC and AT content or might attract RNA polymerase and other host proteins in a competitive manner.
| IDENTIFYING T4 GENES |
|---|
|
|
|---|
|
|
|
Work with T4 makes it clear that precisely identifying protein-coding regions can be complex, even in prokaryotes. (i) Five known T4 genes and several other ORFs have functional internal starts, with good experimental evidence for genes 17 and 49 that the shorter proteins have distinct functional roles (39, 286, 784, 788). In these two cases, separate but related gene names have been assigned (e.g., 17, 17', and 17") to indicate this complex relationship. We expect that other examples of internal translational start sites will be identified.
(ii) Five other genes and ORFs have two closely spaced start codons with similarly strong values for the sequence information content (defined below) at their translation initiation sites (or ribosome binding sites [RBS]). These include alc, vs.4, e.5, tRNA.2, and 57B. Until further evidence is available, we have listed these genes as simply starting from the first of the two possible sites. It will be interesting to determine if both starts are used in any or all of these cases and if there are special functions for two nearly identical proteins. In bacteriophage lambda, for example, two nested proteins, differing in start sites by only two amino acids, have important complementary functions: one makes the pore to permit access by lysozyme to the peptidoglycan layer, and the other delays formation of the pore (91). The regulation of the balance between these two genes is not understood but is crucial in determining the timing of lysis.
(iii) It is clear that there can be genes within genes in different reading frames. These can be read in the same direction, as seen for gene 30.3' (1234). They can also be in the opposite orientation, as seen for genes repEA and repEB, which are associated with initiation from origin E and are located opposite gene 5 (1109).
(iv) Introns that are later spliced out of the transcripts occur in at least three T4 genes: the thymidylate synthase gene (td), the gene encoding a subunit of the aerobic ribonucleotide reductase (nrdB), and the anaerobic ribonucleotide reductase gene (nrdD) (615, 991, 1229).
(v) As first demonstrated in T4 gene 60, an unusual relationship between nucleic acid and protein sequence can also occur through translational bypassing. A 50-base mRNA segment in the coding region is not translated in gene 60 by a mechanism that depends on cis-acting signals in the mRNA, ribosomal protein L9, a pair of GGA codons 47 bases apart, and the structure of the cognate glycyl tRNA (408, 450). This is the only known high-efficiency bypass site; to date, the phenomenon is unique to T4. Bypass with much lower efficiency appears to occur at the junction of genes 56 and 69 (segF) (160, G. Mosig, unpublished data).
Programmed frameshifting, which shifts translation by 1 base into the +1 or -1 reading frame, can expand the coding capacity of a genome (13). To date, no instance of programmed frameshifting has been identified in T4, although many other viral DNA and RNA genomes use this approach to "recode" (322).
T4 shows nearly four times the gene density predicted for herpesviruses and yeast and twice that for E. coli (92, 556, 557). The high gene density reflects both the small size of many T4 genes and the fact that there are very few noncoding regions (about 9 kb, 5.3% of the genome). Furthermore, regulatory regions are compact, occasionally overlapping coding regions. In many cases, the termination codon of one gene overlaps the start codon of the next gene (see "Translation and posttranscriptional control" below). In addition, T4 has several groups of nested genes as mentioned above. Clearly, computational and bioinformatic tools do not yet identify all the genes and complex coding arrangements in a genome perceived by many to be "simple," like that of T4.
Table 3 summarizes the functional assignments of T4 genes, referring to the color codes used in the functional genome map of Fig. 3. Some T4 proteins have multiple activities and are listed in more than one group. For example, T4 RNA ligase A (rnlA or 63) is also a catalyst for attaching tail fibers. Alternatively, a single activity can be viewed as being involved in multiple processes. For example, the nucleases EndoII and EndoIV (encoded by denA and denB) are responsible primarily for initiating degradation of cytosine-containing host DNA. They are included in the "nucleotide precursor" category because one important function of these proteins is the timely provision of nucleotide precursors. They are also included among the host alteration/shutoff genes.
|
|
Nonessential genes were typically assigned letter designations, reflecting the phenotype associated with the mutation (Table 2) or the host function that the gene duplicated (nrd, frd, td, etc.). They encode such products as enzymes for nucleotide biosynthesis, recombination, and DNA repair; nucleases to degrade cytosine-containing DNA; proteins responsible for exclusion of superinfecting phage, for lysis inhibition under conditions of high phage/host ratios, and for other membrane changes; and inhibitors of host replication, transcription, and protease activity. Unfortunately, the designation by letters versus numbers does not automatically identify a gene as essential. For example, the products of genes t, motA and asiA are essential under standard conditions, while that of 69 (segF) is not. Mutations in genes 46 and 47 still permit the synthesis of a few phage per cell, but too few are produced to reliably produce plaques under most conditions; a burst size of about 10 is generally required for plaque formation. Primase (gene 61) and topoisomerase (genes 39, 52, and 60) mutants produce plaques at temperatures above 25°C because they can use a recombinational bypass mechanism to prime lagging-strand DNA synthesis (784, 788). In several cases, mutations initially assigned to different genes by spot-test complementation ultimately proved to reside within the same gene; thus, genes 58 and 61 are identical, as are genes 2 and 64 and genes 4, 50, and 65.
Most genes first identified by mutation have now been located in the DNA sequence. However, no genes have yet been identified for any of the reported ribosome-binding proteins or other proteins that might be involved in the shutoff of host translation (reviewed in reference 1166). Mutations ama, stI, stIII, rs, goFB, and goFC have not been assigned to a sequence; the original mutants identifying most of these genes have been lost.
Most of the 127 uncharacterized ORFs lie in regions transcribed counterclockwise from strong early promoters. Only 16 of the uncharacterized ORFs would be expressed late in the T4 infection cycle. These are (i) ORFs under control of a late promoter in the clockwise direction, where almost exclusively late genes are found (5.1, 5.3, 5.4); (ii) ORFs following late promoters (some of which also may still be expressed from upstream early and/or middle promoters) in the counterclockwise direction (rI.1 and rI.-1; 24.2 and 24.3; uvsY.-1 and uvsY.-2; alt.-1 to alt.-3; and 30.9); and (iii) ORFs following middle promoters and without late promoters (denB.1)
Because they are likely to be expressed immediately after infection, some of the 127 uncharacterized T4 ORFs may be involved in the transition from host to phage metabolism or in resistance to plasmid- or prophage-encoded toxic proteins. Many of these genes (shown in white in Fig. 3) are in regions that can be deleted without seriously affecting phage production under usual laboratory conditions. However, at the same time, they have largely been retained in T4-related phages (534, 596, 919; E. Kutter et al., unpublished data about the nrdC-tRNA region). Most of the T4 early promoters are in these widely conserved yet deletable regions, which are densely packed with the predicted ORFs. Many of the hypothetical ORF proteinsat least those over about 9 kDahave been identified on two-dimensional gels by comparing labeled proteins produced by wild-type and the T4 deletion strains (604). These proteins are often produced in large quantities just after infection. Those that have been tested are generally lethal or very deleterious to the growth of E. coli.
Together, these findings suggest that the host-lethal, immediate-early proteins confer selective advantage for the phage but that they are necessary only under certain environmental conditions, for infecting other hosts, or that there is redundancy in their functions. Some of the proteins are quite large, but most are smaller than 15 kDa. In general, work with T-even phages emphasizes that small hypothetical ORF-encoded proteins should not be overlooked. The smallest characterized T4 protein, Stp, consists of only 29 amino acids; 62 predicted T4 proteins have fewer than 100 amino acids.
Most of the unidentified ORFs show very little homology to non-phage genes in the databases. That many of these ORFs are deleterious to E. coli when cloned reinforces the notion that their products inhibit or redirect important host proteins and that they may be useful in studying cellular proteins in their active, functional state. One example, the Alc protein, specifically terminates the elongation of transcription on cytosine-containing DNA (599, 601). Alc appears to uniquely recognize the rapidly elongating form of the RNA polymerase (RNAP) complex. It would be a valuable tool for studying the dynamic structural changes that occur in the polymerase during transcription; all other current approaches only examine the polymerase paused at particular sites and infer its behavior from the resultant static state.
Some of the host-lethal proteins may also suggest new targets for antibiotics. They should also aid in studies of evolutionary relationships and protein-protein interactions.
Another interesting set of proteins involved in the transition from host to phage gene expression involves three different ADP-ribosyltransferases. These include Alt, which is packaged in the phage particle and carried into the cell with the DNA, ModA, and ModB. The role of these ADP-ribosylation activities in the T4 transcription cycle is detailed below.
To fully understand the takeover of host metabolism by T4-like phages, it will be necessary to identify the ORFs that indeed encode proteins in vivo and to determine their biological functions and the conditions under which they exert their effects. The sequences of some of the small proteins that have been studied are highly conserved among the T-even phages, presumably reflecting their complex interactions with multiple cell components.
| PROMOTERS AND TRANSCRIPTION FUNCTIONS |
|---|
|
|
|---|
|
subunits, ß/ß' subunits, and
70 promoter recognition subunit. Most studies with T4 have been done in cells growing exponentially under high aeration, where the host
70 is present throughout infection. Under these conditions, the temporal transition through the different classes of promoters is accompanied by covalent modifications of RNAP and the appearance of new protein transcription factors that act in various ways. All of these functions serve to enhance phage promoter recognition and transcription; no DNA-binding transcriptional repressor protein has been identified in the T4 developmental cycle. To date, little is known about T4 infection under stationary-phase or anaerobic conditions (such as the phage would encounter in nature [599a]). Preliminary evidence shows that the patterns of infection under these conditions are often very different and that the status of rpoS clearly makes a difference in the outcome of aerobic infection in stationary-phase cells (E. Kutter, unpublished data). Corbin et al. (187a) have recently shown that T4 infection affects the morphology of E. coli biofilms and that glucose-limited biofilm cells can be a reservoir for phage. Additional study of T4 gene expression under different environmental conditions is warranted.
70-dependent bacterial promoters for approximately 2,000 RNAP holoenzymes in the commonly studied, rapidly growing exponential cells; the polymerase number is smaller under more limiting growth conditions. T4 redirects the transcriptional machinery to T4 promoters with high efficiency, as reflected by the appearance of phage-specific proteins soon after infection, the rapid shutoff of host gene expression (reviewed in reference 599), and, ultimately, the virulence of the phage. That T4 early promoters are stronger than E. coli promoters presumably plays a major role, since most promoters can be cloned only on plasmids designed to attenuate their transcriptional activity. Transcription start sites of many of the early promoters have been mapped by primer extension off of mRNA from T4-infected cells and/or from promoter-cloning vectors (reviewed in reference 1169).
The 39 characterized Pe sequences (1168, 1169) are noted in Table 1 and have been analyzed using the information content software developed by Schneider and Stephens (966). The sequence logos, maximizing the alignment at the -10 region and, independently, at the -35 region, are shown in Fig. 5A (E. Miller, T. Dean, and T. Schneider, unpublished data). The analyses show that there is high conservation at the -12, -11, and -7 positions similar to that in the E. coli E
70 promoters. However, T4 Pe sequences have more extended -10 regions, with sequence conservation extending through the G predominating at -14 to -18. In one group of early promoters, significant conservation extends on both sides of the -10 region [5'-GTGG(TAT/CT/AAT)ACAACT-3'] up to the T at position -1 (1169). The start site of the transcript (coordinate 0 in Fig. 5A) is frequently an A. The Pe -35 region has a 6-bp conserved region from position -36 to -31 (GTTTAC) that differs from the E. coli -35 consensus sequence (TTGACa). Upstream of the -35 region, T4 early promoters display a bias toward A-rich tracts centered around -42 and -52 (Fig. 5A) (1169). Upstream A-tract sequences (position -42) were first observed with T5 promoters (314) and have since been shown to activate certain E. coli promoters, of which the rrn operon promoters are the best studied. By affecting DNA curvature, upstream A tracts (UP elements) directly enhance E
70 promoter activity through interactions with the RNAP
subunit (266, 939). Many of the T4 Pe sequences include the most enhancing type of E. coli UP elements, where the two A tracts are separated by a T-rich region (266).
|
In addition to these early T4 promoters, there are some promoters that more closely resemble E. coli promoters. P bac (639, 1169) has been identified by mapping transcripts from cells carrying plasmid-bome T4 genes. It directs the synthesis of transcripts that are complementary to gene 3 mRNA. P repE(coordinate 79405 [Table 1]) has been identified in T4-infected cells (1109). It directs the synthesis of RepEA and RepEB proteins and an RNA primer for oriE-initiated replication. This RNA would be complementary to late-gene 5 transcripts but is undetectable by the time when these transcripts are made. Transcripts preceding gene 32 (142) have been detected that also map to
70-like promoters. While the later are active on supercoiled plasmids, little to no transcription was observed in T4-infected cells. A similar promoter preceding gene 57A was inferred to be active on plasmids (409). These host-like promoters as a group may be of limited significance, when host transcription in general is turned off and RNAP is modified early during T4 infection.
T4 modifies the host RNAP in several ways after infection. However, most of these modifications are not essential to the infection process. A 70-kDa protein, gpAlt, enters the host with the infecting DNA. Alt is a mono-ADP-ribosyltransferase that targets arginine residues. It efficiently ADP-ribosylates one of the
subunits of RNAP in the carboxy-terminal domain at position Arg265 (323, 324, 435, 459, 937a, 1011) and ADP-ribosylates the three other polymerase subunits to a lesser extent, along with a number of other uncharacterized polypeptides. ADP-ribosylation of RNAP by cloned Alt protein leads to enhanced transcription from cloned T4 early promoters (544). Mutation analyses reveal that T4 early promoters interact strongly with unmodified RNAP and even better, in most cases, with RNAP in which only one of the
subunits is ADP-ribosylated. In particular, base position -33 of the T4 promoter and the A-rich UP element at position -42 contribute to the strong interactions with ADP-ribosylated RNAP of T4-infected cells (1026). Therefore, Alt presumably contributes to the preferential transcription from T4 promoters after infection (1168, 1170).
Shortly after infection, two new ADP-ribosyltransferases are expressed, ModA (23 kDa) and ModB (24 kDa) (780, 1077). ModA, first observed by Skorko et al. (1011), ADP-ribosylates the
subunits of host RNAP but shows no activity toward the ß, ß', and
subunits. Like Alt, ModA ADP-ribosylates Arg265 on the
subunit; unlike Alt, it targets both
subunits, not just one. ADP-ribosylation replaces the positive charge of the Arg residue by two negative charges carried by the two phosphate groups and affects DNA-protein as well as protein-protein interactions. This second ADP-ribosylation inhibits transcription from promoters with the UP element; expression of cloned modA is highly lethal to the host. (The action of ModB [205] is summarized below.)
70 promoters, up to position -16. Significantly though, T4 Pm sequences have neither the well-characterized E
70 -35 region nor the Pe -35 region. Middle promoters are characterized by a specific -30 sequence called the Mot box, which extends between -32 and -27, with GCTT being the most highly conserved. The information content (Rs) calculated for the optimally aligned regions from -60 to +10 of the logo in Fig. 5B is 21.1 bits, with 13.1 bits of the information being associated with the -10 alignment. This is considerably less than the 38-bit Rs value of the Pe promoters, implying that there is less competition with host promoters for RNAP, perhaps because host DNA is already being degraded and ADP-ribosylation of RNAP is completed. Approximately 8 bits of Rs information are required for MotA to recognize the MotA box sequence. T4 middle promoters are all located on the minus strand (Table 1) relative to the GenBank genome entry. Fourteen new middle promoters have been recently described (1095a; R. Nivinskas, personal communication).
T4 gene products AsiA and MotA are required for middle-mode transcription. AsiA is an anti-
factor protein (see reference 454a for a review of anti-
proteins) that coactivates RNAP for middle-mode transcription initiation by the formation of AsiA-
70 heterodimers (12, 180, 1104). This interaction interferes with the recognition of -35 promoter sequences and at the same time stimulates T4 middle-mode transcription (180, 425, 1103, 1104). The AsiA-
70 interaction is regarded as the pivotal event in the transition between T4 early and middle transcription: in vitro it both inhibits the recognition of most host promoters and early T4 promoters and stimulates T4 middle-mode transcription (180, 425, 848, 849, 1104). However, in vivo, defective asiA mutants do not prolong early transcription (858), suggesting that other proteins (i.e., ModA and ModB) turn off most early T4 promoters. MotA is a DNA-binding transcriptional activator protein that binds to the MotA box sequence (Fig. 5B) through its C-terminal domain, facilitating Pm promoter recognition and transcriptional activation (see the model proposed in reference 987 and in Fig. 4 of that reference). MotA and AsiA together increase the initial recruitment of RNA polymerase to T4 middle promoters and facilitate the clearance of RNAP from the promoter and into the elongation mode (419).
55, for RNAP recognition of late promoters (1173). It specifically recognizes the -10 region sequence. Although
55 is required to selectively initiate transcription at T4 late promoters, it is not sufficient. AsiA does not appear to be a major determinant of middle- versus late-promoter competition (552). Instead, another phage-encoded protein, gp33, acts as a coactivator of late transcription, mediating interactions between
55 and the sliding clamp encoded by T4 gene 45. The trimeric gp45 protein is a key component in the processivity of the DNA replication complex and is also essential for late transcription (a "mobile enhancer" [405, 1186]). Primer-template junctions and single-stranded DNA (ssDNA) nicks are the most efficient loading sites for gp45, which is loaded by the clamp-loader proteins gp44 and gp62; gp45 slides on the DNA, enhancing the opening of late promoters more than 1,000 bp away from the loading site. Activated late promoters outcompete middle promoters on the same plasmid in vitro, especially at higher ionic strengths. This advantage is enhanced by ADP-ribosylation of RNAP
subunits and by binding of the phage-encoded RpbA protein to the RNAP core (552, 1082, 1173). DsbA protein is thought to also affect transcription from some late promoters (995), although it is not essential (1114).
At least three T4 proteinsMrh, Srd, and Srhare implicated in the interactions of different host sigma factors with core RNAP (781). Under heat shock conditions, the host
32 (RpoH) competes with other sigma factors for host core RNAP (354, 482). The products of the two nonessential genes mrh and srh together modulate the phosphorylation of
32 using ATP (781; Mosig, unpublished). Presumably, this would be most important for T4 late transcription, since T4
55 is one of the weakest known sigma factors. Consistent with this idea, infection with wild-type T4 of one specific host rpoH mutant (but not others) is aborted at the onset of late transcription, unless the T4 mrh gene is deleted (290). Srh protein resembles a segment of
32 that interacts with RNAP, suggesting that it acts as a decoy. Similarly, T4 Srd protein resembles an RNAP-interacting segment of
70 and
38 (RpoS; stationary-phase and oxidative stress sigma factor) and would also decoy RNAP from the host promoters. Expression of srd from a clone is lethal to E. coli.
Features of the predicted intrinsic transcription terminators in the T4 genome are summarized in Table 4, and their genome positions are noted in Table 1. Overall, 34 terminators were located between genes or at the 3' end of an ORF; 24 of these are predicted to be on early transcripts (therefore, their sequence corresponds to the minus strand of the T4 GenBank entry), while 10 are on late transcripts. The predominant tetraloop sequence is UUCG, found in 18 of these terminators, while 3 are GAAA and 3 are GCAA. All are about equally present on early and late transcripts. The remaining 10 transcription terminators have noncanonical 4-nucleotide loop sequences or have 3-, 5- or 6-base loop regions. Their features and locations suggest that they, too, are probably functional.
|
Seven regions were identified by the programs described in the preceding section as possible transcription termination sites, although they showed unusual attributes with respect to their location and the 3' U-rich region. Some are located wholly within coding regions (e.g., position 81769).
Overall, the predicted T4 intrinsic terminators generally appear to both define the 3' ends of multicistronic mRNAs and affect the dynamics of transcription complexes advancing on opposing DNA strands.
Rho-dependent transcription terminators. In enteric bacteria, the RNA-binding protein Rho modulates transcription termination at sites that are distinguished from intrinsic terminators by the absence of both the stable RNA hairpin and the 3' U-rich region. Rho utilization sequences (rut) in RNA generally are C-rich, have small amounts of G, and can be as long as 85 nucleotides (929). In addition, rut sites can be 150 to 200 bp 5' of the actual transcription termination site and therefore appear to function as locations for entry of Rho on transcribed RNA. Some of the better studied Rho-dependent termination sites (i.e., lambda tR1 and E. coli tnp) are regulated by antitermination which also involves host Nus proteins, lambda N protein, and the RNA sequence of the boxA and boxB regions (344, 553, 929). Together, these complex features have made computational methods for identifying Rho-dependent termination sites problematic relative to the easily defined intrinsic terminators.
Rho-dependent transcription termination sites in T4 have not been extensively characterized; little additional work has been done since the review by Stitt and Hinton (1043). One of the better candidate Rho terminators, or a 3' end of the RNA that is indirectly influenced by a rho mutation, lies between genes uvsX and 40 (416). Readthrough transcription from uvsX into 40 (and on through the helicase gene 41) is diminished by the Rho mutant rho026 (1044). In addition, the low level of readthrough transcripts is elevated in goF (comC
) mutants, probably by better protection against RNases (416, 1043). The Rop protein of ColE1-derived plasmids has a stabilizing effect similar to that of goF mutations (1028). As mentioned above, the uvsX-40 site (position 22347) is characterized by a stable tetraloop hairpin that is not followed by the typical U-rich sequence (Table 4). However, the rut-like C-rich region is part of a hairpin, which is not characteristic of other rut sites, and there is not an apparent nearby boxA sequence. Nonetheless, the available evidence points to this region as a likely Rho-dependent termination region. Similar properties are predicted for the putative rIIB-denB.1 terminator at position 167967. These RNA structures may help direct Rho-dependent termination.
Other sites in the T4 genome that have rut- and boxA-like sequences, and that therefore may be affected by Rho, occur at the end of the tRNA cluster (after RNA C at position 70742), in the region between genes repEB and repEA (position 78810), and between the late promoter at position 77490 and gene 5. The last two potential sites are near the oriE origin of DNA replication (1109; A. Harvey, R. Vaiskunaite, and G. Mosis, unpublished data) (see below). Other rut- and boxA-like sequences can be identified in the T4 genome, but the significance of these, as well as the entire aspect of Rho-dependent termination in the T4 developmental cycle, requires further study. Mutations in the gene goF (comC
) have been repeatedly isolated as suppressors of host mutations that affect T4 transcription termination; the GoF protein, which stabilizes residual long transcripts produced in the Rho026 mutant host, does not show overall similarity to other proteins in the genome databases (171, 956, 1043). However, the short acidic region between residues 87 and 111 is similar to amino acids in other RNA-binding proteins and ATP-dependent RNA helicases (Miller, unpublished).
| TRANSLATION AND POSTTRANSCRIPTIONAL CONTROL |
|---|
|
|
|---|
Aligned T4 RBS sequences can be collectively viewed in a sequence logo (966), although the variable spacing between the SD sequence and the AUG initiation codon presents a particular challenge. Figure 6A shows the logo aligned at the AUG. Due to the variable spacing between the SD sequence and initiation codon, only a minor peak for the SD is observed, in the -8 to -9 region. Alignment of the SD sequence alone, independent of the AUG (Fig. 6B), clearly illustrates the importance of the SD sequence. The Rs (defined above) of T4 RBS sequences, using the optimally aligned regions from -15 to +14 (Fig. 6) is 14.3 bits, which is higher than the calculated Rs for E. coli RBS sequences (8.9 bits [994]). However, a refined "flexible" model of E. coli RBS appears to more accurately account for the variable spacing between the SD sequence and AUG (994). In effect, subtracting the uncertainty of the variable SD-AUG spacing lowers the total Rs; thus, the 14.3-bit Rs value currently calculated for T4 ribosome binding sites is likely to be slightly lower (Miller et al., unpublished). Overall, the strength of the T4 RBS would in part account for the observed redirection of ribosomes from host to phage mRNAs.
|
The transition from host to phage protein synthesis may also involve changes that T4 reportedly makes in proteins of the translation apparatus, including IF3 alteration, release of S1 from ribosomes, and synthesis of new ribosome-binding proteins (601, 1166). These modifications to the translation initiation apparatus potentially could have major effects on the initiation efficiency of either phage or host mRNAs. Unfortunately, most of the genes responsible for these changes have not been identified. ModB ADP-ribosylates the S1 protein, elongation factor EF-TU, and the chaperone "trigger factor" (205), and thus these changes may be important for diminished translation of host mRNAs or may have a direct impact on the translation of phage mRNAs.
RNA structures can also have the opposite effect. Several T4 mRNAs fold into intramolecular RNA helices that inhibit ribosome binding and translation (736). Usually this is observed with mRNAs that are transcribed from early promoters and extend downstream into a late gene. The longer early transcript forms an RNA helix that sequesters the late-gene RBS (such as in the mRNAs for genes e, soc, I-TevI, and 49). Late promoters, located immediately upstream of the late-gene RBS, lack the 5' region of the helix and present RBS sequences that are accessible for translation initiation. As mentioned below, for gene 49, the intramolecular helix at the first RBS promotes use of the internal RBS for gp49'.
The rare AUU initiation codon used by gene 26' yields a protein, initiated at codon 114, that is only 95 residues long compared to the full-length gp26, which is 208 residues long (823). The function of gp26' is unknown.
ORF 30.3' is the one example in T4 of a coding region that is translated in the +1 reading frame entirely within another gene (30.3). Translation of the two overlapping ORFs has been confirmed, with the internal RBS of 30.3' resembling other T4 RBS sequences (1234).