Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida 32611,1 BioScience Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87544,2 Department of Chemistry, City College of New York, New York, New York 100313
SUMMARY INTRODUCTION Biochemical Pathway of Tryptophan Biosynthesis Nomenclature. Seven catalytic domains and two {alpha}/ß-subunit complexes. Relatives of Trp pathway catalytic domains. Identical Trp pathways exist within varied metabolic contexts. Operon Stability trp Operon and Its Regulation Known regulatory mechanisms. Unknown regulatory systems awaiting discovery? Feasibility for Deduction of Evolutionary Histories GENOMIC DISTRIBUTION OF THE TRYPTOPHAN PATHWAY Mapping of trp Gene Patterns to the 16S rRNA Tree Trp Biosynthesis in Its Larger Context of Aromatic Biosynthesis Implications of Missing Genes Unidentified analogue genes. Alternative metabolic relationships. Reductive evolution. Search for an Elusive trpC Gene in Actinomycete Bacteria One actinomycete exception is explained by LGT. Post-LGT events of vertical descent can be tracked in C. diphtheriae. Pattern and profile search. Evaluation of an unknown gene inserted in the trp operon. Possible catalysis of the TrpC reaction by HisA. Evolution of competence for TrpC catalysis by TrpD. Other possibilities. GENE FUSIONS Phylogenetic Distribution of trp Gene Fusions Nested Gene Fusions Trp PATHWAY GENE ORGANIZATION IN THE ARCHAEA Trp PATHWAY GENE ORGANIZATION IN THE BACTERIA Whole-Pathway trp Operons Dispersal of trp Operon Genes Gene Scrambling RETENTION OF THE ANCESTRAL OPERON AT SPACED PHYLOGENETIC NODES IN BACTERIA TWO MAJOR EVENTS UNDERLIE THE DYNAMICS OF trp OPERON CHANGE IN BACTERIA Operon Scission Yields Two Half-Pathway Operons Fusion of trpD with trpC Restores a Whole-Pathway Operon LATERAL GENE TRANSFER OF trp OPERONS Lateral Gene Transfer of Whole-Pathway Operons Lateral Gene Ttransfer of Partial-Pathway trp Operons FINE-TUNED EVOLUTIONARY DEDUCTIONS Single Change in a Common Ancestor versus Multiple Independent Changes in Descendants Distinguishing Derived States from Ancestral States Deducing Ancestral Character States at Phylogenetic Node Positions Value of Flanking-Gene Context EXPANDED METABOLIC CONTEXT Pyrococcus and Its Archaeal Relatives Convergent trp and giant aro operons of Pyrococcus. Dynamics of archaeal gene shuffling. Bacillus/Staphylococcus Clade B. subtilis subgroup. Listeria subgroup. Interconnectivity of the trp, aro, pab, and his operons. Evolutionary information derived from flanking-gene context. Deducing the likely common ancestor of the clade. OVERVIEW PERSPECTIVES Lineage-Specific Evolutionary Trends Individual Divergences Unmasked in the Larger Genomic Context Analysis of the Ancestral State at Phylogenetic Nodes Intellectual Dilemma Addressed Does trp gene reorganization necessarily imply functional deterioration? Are there any clear examples of efficient operons systems that have been disrupted? Elaborate regulation seems to be fairly recent. Regulation extending beyond the Trp pathway. Does Regulation Power Evolutionary Dynamics? FUTURE PROSPECTS FOR ELEVATED KNOWLEDGE OF Trp PATHWAY EVOLUTION APPENDIX Analysis of Raw DNA Sequence Data 16S rRNA Tree Construction DNA Composition Fusion Protein and Linker Region Analyses ACKNOWLEDGMENTS REFERENCES
| SUMMARY |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
An ideal operon system for this analysis is the trp operon. We show that the trp operon must have been present in early prokaryote ancestors. In Bacteria but not in Archaea, sufficient genome representation exists to deduce an ancestral whole-pathway trp operon. The regulation of this operon may initially have been quite minimal since the first evolutionary step(s) probably would be to collect the structural genes together. Parsimony principles support a hypothesis developed in this paper of two major evolutionary events in Bacteria, one splitting the ancestral operon in two and the other rejoining it by gene fusion. We assert that a detailed analysis can recognize occasional events of lateral gene transfer (LGT) or paralogy. Both are likely to be associated with Trp pathway genes engaged in specialized metabolic pathways other than primary amino acid biosynthesis. We show that when two sister lineages differ in particular trp operon characteristics, it is possible to deduce which is the derived change and which reflects the state of the ancestral node.
Recently, Gogarten et al. (28) endorsed a "synthesis" that will acknowledge both the traditional tree-like behavior (vertical descent of genes) and web-like, reticulate behavior (horizontal gene transfer) of the evolutionary process. They leave it open whether or not "vertical descent remains the best descriptor of the history of most genes over evolutionary time." Our overall analysis yields a very optimistic viewpoint that the evolution of the trp operon can be deduced as a vertical genealogy, with events of LGT and paralogy enriching the analysis as interesting features rather than undermining or obliterating the vertical trace of evolutionary history.
-subunits for anthranilate synthase and tryptophan synthase, respectively; TrpAb and TrpEb are ß subunits for anthranilate synthase and tryptophan synthase, respectively). Capital letters are assigned according to the order of the enzyme reactions (or overall reactions, in the case of the two complexes). C. Yanofsky has expressed to us his preference (probably shared by most experimentalists working specifically with trp systems) for adherence to previous nomenclature schemes to minimize disruption of what is most familiar in the existing literature. Admittedly, the designations generally in use for the Trp branch do not generate as many problems of annotation errors as is the case for the rest of the aromatic pathway, but for consistency with our overall work with the aromatic pathway, we use the new naming system in this paper. Both sets of designations are shown in Table 1.
|
|
/ß-subunit complexes.
Trp is an essential amino acid among the assemblage of required amino acids in mammals. Trp is generally synthesized by free-living prokaryotes, lower eukaryotes, and higher plants. The Trp pathway is one of three amino acid branches diverging from a common flow route that produces chorismate. The apparent universal biosynthetic pathway for Trp biosynthesis that initiates with chorismate and L-glutamine is shown in Fig. 1. Seven catalytic domains are deployed to carry out the reactions shown. In a given organism these may be individually expressed, but a wide variety of gene fusions that encode single proteins carrying two or more catalytic domains are known. TrpAa can function as an ammonia-utilizing aminase in the anthranilate synthase reaction. Although the aminase reaction can proceed with ammonia at unphysiologically high pH values, such reactions typically rely upon a glutamine-utilizing glutaminase subunit to deliver the ammonia at the active site (probably within a "tunnel"). Accordingly, TrpAb is a glutaminase homologue that forms a complex with TrpAa, thereby conferring an amidotransferase component to the overall anthranilate synthase reaction in the presence of glutamine. In either case, whether or not the overall anthranilate synthase reaction is carried out in the presence of TrpAb, 2-amino-2-deoxyisochorismate (ADIC) is an enzyme-bound intermediate. Interestingly, some species of Pseudomonas and Streptomyces produce an enzyme called PhzE (59), which carries out the ADIC synthase reaction but not the ADIC lyase reaction (see Fig. 1). ADIC is then converted ultimately to phenazine pigments. PhzE is a fusion of domains homologous to TrpAa and TrpAb (hence our designation TrpAaTrpAb_phz in Table 1). TrpAa belongs to a protein superfamily that includes other chorismate-utilizing enzymes: PabAa converts chorismate to 4-amino-4-deoxychorismate (precursor of 4-aminobenzoate), and MenF and EntC are different homologue subgroups that convert chorismate to isochorismate (as precursors of ubiquinones and an iron siderophore, respectively).
Tryptophan synthase also exists as a complex of nonidentical subunits and is one of the best-understood examples of allosteric interaction exerted between subunits (97). Why indole should be sequestered to a tunnel in the
/ß complex of tryptophan synthase is not known, but indole is volatile and rather toxic. Yanofsky has speculated that recent findings of a role for indole in quorum sensing and biofilm formation might suggest that indole either produced by tryptophanase or otherwise available in the environment may serve as a metabolite cue that might otherwise be disrupted if biosynthetic indole were not enzyme-bound (see reference 96 and references therein). It has been speculated (92) that some Archaea may not form a tryptophan synthase complex.
Relatives of Trp pathway catalytic domains. The pathway of Trp biosynthesis is the first amino acid pathway for which the atomic structure of every catalytic domain has been determined (58), a circumstance of significance because evolutionary analysis can be greatly enhanced through insight gained at the structural level of protein folding. Consultation of the reference by Yanofsky et al. (97) is highly recommended for a definitive presentation of the detailed literature up to about 2000. Each catalytic domain belongs to a protein superfamily at the structural level of protein folding. Many of the catalytic domains exhibit clear homology on the criterion of amino acid identity with proteins that have different substrate specificities and which participate in different pathways. From an evolutionary perspective, this is of interest with respect to such questions as the extent to which the Trp pathway enzymes have been assembled (via gene duplication and substrate alteration) by recruitment of homologues from other pathways or the extent to which the Trp pathway has been the source of genes recruited for function in other pathways or a homologous gene with a recent history of function in another pathway has "crossed over" to replace a Trp pathway gene (or vice versa). This aspect is not addressed further in this article except indirectly (e.g., see the later section on the search for an elusive trpC gene).
Identical Trp pathways exist within varied metabolic contexts. The Trp pathway is generally defined as an unbranched pathway that begins with chorismate and produces Trp as a substrate for general protein synthesis. The Trp pathway appears to have evolved only once. These aspects of universality are favorable for the task of deducing the evolutionary history. However, many aspects of biochemical individuality are not usually considered. In some cases, Trp biosynthesis does not compete with Phe and/or Tyr biosynthesis because one or both of these are absent. In other cases, as exemplified by the use of ADIC for phenazine biosynthesis in Pseudomonas and Streptomyces species, chorismate is no longer the last branch point, and if one starts with chorismate as a reference point, then the pathway is branched. The pathway does not necessarily end exclusively with the Trp end product supplying protein synthesis, e.g., in cases where Trp may be a component of an antibiotic (as in Streptomyces), or where it is converted to indoleacetic acid in plant symbionts such as Azospirillum. Eukaryotes (but no prokaryotes so far) deploy Trp as a precursor of niacin. In such cases, the pathway can be considered divergently branched at the end, with Trp being guided to different molecular fates.
Trp is the most biochemically expensive of the amino acid pathways, requiring the input of erythrose-4-phosphate, ATP, phosphoribosyl pyrophosphate (PRPP), two phosphoenolpyruvate molecules, L-glutamine, and L-serine. Thus, efficient regulation is generally expected, but these rules no longer apply in an endosymbiont such as Buchnera, which has abandoned Trp regulation. In this case, loss of regulation can be viewed as a positive selective step in order to satisfy the needs of its aphid host. In addition, some prokaryotes sustain different physiological or developmental states where the demands impacting the Trp pathway may be more complicated than just sensing the availability of Trp for protein synthesis. These often involve specialized pathways that coexist with primary Trp biosynthesis. These specialized pathways are encoded in part or entirely by divergent trp gene duplicates whose expression is triggered by a variety of temporal and environmental cues, e.g., to make a given pigment or antibiotic derived in part from the Trp pathway.
These are all interesting but complicating elements that we have tried to keep in mind. This is relevant to the task of sorting out and recognizing paralogues (or xenologues) that may be engaged in specialist pathways other than primary Trp biosynthesis. Appreciation of such complexity may also prove relevant to understanding the nature of split-pathway trp operons in many prokaryotes.
The Itoh et al. study (37) was a broad-scope analysis of many operons that was necessarily limited with respect to in-depth consideration of any individual operon system. It should be noted that for these kinds of studies, operons have been considered simply as a collection of structural genes that are linked. The presence or absence of linked or unlinked regulatory elements has not usually been evaluated, undoubtedly because this is not easily done. In this paper we pursue in great detail the evolution of a single well-known operon system in the large number of prokaryote genomes now available. We found strong support for the hypothesis that the trp operon, minimally defined as the linked assemblage of structural genes for tryptophan (Trp) biosynthesis, is of ancient origin and has indeed followed a dynamic time course of change that includes several identifiable milestone events in Bacteria. Our study leads to the further hypothesis that the instability of early trp operons (and perhaps some modern ones) can be attributed to weak positive selection conferred by relatively undeveloped control mechanisms.
We suggest that since the time that operons evolved a variety of control mechanisms, the characterization of operons as dynamic (rather than unstable) yields better semantics to describe a positive ongoing process of fine-tuning. In modern free-living organisms, the variety of recently evolved trp operon systems which differ from one another and are endowed with intricate control features mediated by one or more unlinked regulatory genes may in fact be highly stable in the contemporary time frame. One caveat, however, is that this frequently will not apply to pathogenic or endosymbiotic relatives, where the rules dictating selective advantage have completely changed.
Known regulatory mechanisms. At the bioinformatic level, the analysis of trp operons in the literature has been largely restricted to the structural genes. Consideration of regulatory features has been understandably limited, mainly because relatively little comparative information is available at the experimental level and also because analysis of alternative stem-loop structures, etc., is not a trivial task. Escherichia coli, Bacillus subtilis, Pseudomonas aeruginosa, and Lactococcus lactis represent clades for which detailed control mechanisms have been described, each of them entirely different. Importantly, each mechanism seems to be narrowly distributed, and therefore we infer that they are of recent origin. Note that in each case, unlinked genes exist that markedly decrease the probability that the total regulated operon system could be transferred by LGT in one event.
Regulation of Trp biosynthesis in E. coli, the most widely known system, is quite sophisticated (23, 94), being subject to the following multiple levels of control: (i) repression control via the Trp repressor (encoded by the unlinked trpR) which binds Trp as a corepressor moiety, (ii) an attenuation mechanism mediated by a Trp-rich leader peptide (encoded by trpL), and (iii) allosteric feedback inhibition of anthranilate synthase by Trp (95). The E. coli mechanisms of overall trp operon regulation are generally shared by the enteric lineage of Bacteria, defined by us as the clade that includes Shewanella putrefaciens as the outlying point of divergence from E. coli.
Bacillus subtilis has a different system of trp operon regulation (72, 80, 95, 96), whereby genes unlinked to the trp operon encode (i) a trp RNA-binding attenuation protein (TRAP) encoded by mtrB as well as (ii) an anti-TRAP gene product encoded by rtpA (80). Trp both feedback inhibits anthranilate synthase and activates TRAP for attenuator function, whereas uncharged tRNATrp induces synthesis of anti-TRAP. TRAP can also block translation of the trp operon through interference with the ribosome-binding site. The clade sharing the TRAP system of regulation includes Bacillus halodurans, Bacillus stearothermophilus, and Oceanobacillus iheyensis in addition to Bacillus subtilis. At this time it is not clear whether the anti-TRAP component is present throughout this clade.
A third finely tuned system of regulation has been documented in Lactococcus lactis (69). In this case uncharged tRNA can bind directly to the leader transcript, stabilizing an antiterminator configuration that promotes expression of the operonic genes. In Lactococcus lactis, unlinked, unknown genes involved in trp operon transcript processing and in transcription initiation have been suggested (69). The presence or absence of the Lactococcus lactis mode of trp operon regulation in close relatives, such as species of Streptococcus, has apparently not yet been investigated.
In Pseudomonas aeruginosa, the fourth well-documented system, the Trp pathway is represented by four operon entities: a free-standing trpAa, the trpAbBD operon, a free-standing trpC, and the trpEbTrpEa operon. The trpAa and trpAbBD operons are regulated by attenuation mechanisms employing leader peptides (67), whereas the trpEbtrpEa operon is controlled by an indoleglycerol phosphate-activated regulatory protein encoded by trpI (6). trpC is not known to be regulated in any way. The P. aeruginosa system is complicated by the presence of paralogues of trpAa and trpAb. These include genes of unknown physiological function (also known as phnA and phnB) expressed in stationary phase (57) as well as two copies of PhzE (trpAatrpAb_phz), a gene that encodes ADIC synthase (Fig. 1), the initial reaction committed to phenazine biosynthesis. It is not entirely clear what physiological conditions exist in P. aeruginosa (and close relatives) that have resulted in its unusual use of indoleglycerol phosphate as a regulatory cue for the selective regulation of the trpEbTrpEa operon, but it is certainly evident that much has been committed to the overall regulation in this system. Close genomic neighbors of P. aeruginosa that possess identical split-pathway trp operons and trpI include Pseudomonas fluorescens, Pseudomonas syringae, and Azotobacter vinelandii.
Unknown regulatory systems awaiting discovery? We do not know the extent to which the total network of regulatory elements governing the single trp operons in the E. coli, B. subtilis, and L. lactis clades or the multiple split-pathway operons of the P. aeruginosa clade might be more elaborate than that of most other organisms. Different lifestyles undoubtedly select mechanisms accommodating varied ranges of control responsiveness. A simple mode of Trp regulation may very well be appropriate in a cyanobacterium but not E. coli. A variety of alternative regulatory systems in other modern lineages probably remain to be elucidated. Transcriptional regulation has been reported in the whole-pathway operons of Methanobacterium thermoautotrophicum (26) and Pyrococcus kodakaraensis (77), but the exact mechanisms are unknown. The split-pathway operons of the clade represented by Rhizobium meliloti (7) and Azospirillum brasilense (21) exhibit an attenuation mechanism involving a Trp-rich leader peptide, upstream of the trpAatrpAb fusion, but no regulation of the remaining two partial-pathway operons is known. Physically separated split-pathway trp operons may be of positive selective value per se for presently unknown reasons, whereby it might be of value to discoordinate the expression of some trp genes from others, or they may simply be the outcome of initially disrupted whole-pathway operons that subsequently recruited a refined control mechanism accommodating the gene separations.
As a first step toward deducing the evolutionary history of overall aromatic biosynthesis, we selected the Trp branch as a challenging but manageable metabolic segment for initial analysis. Trp pathway genes have sometimes been recruited for function in specialized biochemical pathways, and ancient paralogues or xenologues may coexist with the Trp pathway genes that are engaged in primary biosynthesis. We have shown (93) that detailed case-by-case analysis can distinguish ancient trp paralogues (or xenologues) from their homologues engaged in primary Trp biosynthesis. A comparable study in the literature produced a detailed analysis of homologues of ornithine carbamoyltransferase in which the challenges to tracking a vertical path of evolutionary descent that are caused by the complexities of xenology and ancient paralogy were sorted out (73). This study was preceded by an analysis (49) showing that ornithine carbamoyltransferases in turn belong to a larger protein family in which the ornithine and aspartate carbamoyltransferases are very ancient paralogues. The conclusions such comprehensive studies are consistent with the contentions of Glansdorff (27) and Woese (87) that complications of ancient paralogy, ancient analogy, and lateral gene transfer can be recognized sufficiently well to allow the events of vertical ancestry to be tracked.
Here we present results from an in-depth, manual analysis of Trp pathway genes in over 100 genomes. A limited amount of information is also given to illustrate the very important perspective that the evolutionary relationships of Trp biosynthesis will ultimately be best understood in its larger context as one branch of a highly divergent pathway responsible for the biosynthesis of aromatic amino acids as well as many other important metabolites.
| GENOMIC DISTRIBUTION OF THE TRYPTOPHAN PATHWAY |
|---|
|
|
|---|
|
|
The multipurpose Fig. 2 provides a summary of the presence or absence of Trp pathway genes in the larger context of the presence or absence of genes specifying the common aromatic trunk and the sister phenylalanine and tyrosine branches. The circles in Fig. 2 from left to right represent catalytic domains (specified at the bottom of each panel) corresponding to the seven common-pathway steps (aroA through aroG), chorismate mutase (aroQ or aroH) (which is common to the short Phe and Tyr branches), and the seven catalytic domains of the Trp pathway (Fig. 1 and Table 1).
The key enzyme of Phe biosynthesis is PheA, and the key enzyme of Tyr biosynthesis is TyrA. The Phe and Tyr branches each utilize an aminotransferase step, not shown as a circle because of bioinformatic difficulties associated with deducing the substrate specificity of multiple and ubiquitous broad-specificity aminotransferases (42). Most intermediary metabolites of aromatic biosynthesis are not likely to be available from the environment; only quinate, shikimate, and anthranilate, all abundant in nature (10), are feasible precursors of Trp. Although these metabolites are indeed readily utilized when available, no prokaryotes have yet been found to rely on an exogenous source of quinate, shikimate, or anthranilate as exclusive and obligatory beginning precursors. One interesting special-case exception is Chlamydophila psittaci, an obligate intracellular parasite that utilizes host-derived anthranilate as a required Trp precursor (89).
Alternative metabolic relationships.
In contrast to the apparent universality of the specific Trp branch, alternative enzyme steps appear to exist in nature for the Phe and Tyr branches as well as for the common trunk of aromatic biosynthesis. Some Archaea (Fig. 2A) and two widely spaced members of the Bacteria (Aquifex and Desulfovibrio, Fig. 2A and 2D) lack both AroA and AroB. Transketolase (Trk), required for generation of a substrate for AroA, is also shown in Fig. 2A because most (but not all) organisms that lack AroA and AroB also lack transketolase. (Desulfovibrio vulgaris [Fig. 2D] does have transketolase.) In the last six organisms, dehydroquinate, the substrate of AroC, presumably connects with carbohydrate metabolism in some unknown way that does not involve AroB or any of the known AroA homology groupings AroAI
, AroAIß, or AroAII (31, 44, 76). Some support for this putative alternative metabolic connection, based on tracer methodology, exists in the literature (79). It is also possible that quinate, either from the environment or arising endogenously in some unknown way, could be the source of dehydroquinate via the action of a quinate dehydrogenase.
Although species of Chlamydophila and Chlamydia are very close phylogenetically, the presence of Trp pathway genes varies from complete absence in C. pneumoniae to almost all present in C. psittaci. It appears that the Trp pathway in C. trachomatis and C. muridarum is in a contemporary process of reductive evolution, and the few remaining genes may be remnants (25, 89). In contrast to these species, an "incomplete" trp operon in C. psittaci appears to play a role in the capture of host kynurenine derived from tryptophan (89). Although C. psittaci does lack trpAa and trpAb, the remaining five trp genes coexist in an operon into which two novel genes have been recruited. These encode kynureninase and PRPP synthase. This creates the ability to generate PRPP (needed for the TrpB step) and to intercept host kynurenine as a source of anthranilate, cycling host-catabolized Trp back to Trp in the intracellular parasite (89). Effectively, a host-pathogen metabolic mosaic has been created, and the variant operon generates a kynurenine-to-Trp flow route instead of the usual chorismate-to-Trp flow route.
As explained above, the absence of trpAa and trpAb in C. psittaci is by design, and the remaining Trp pathway is functional. The likelihood that aroA and aroB, which are absent in some organisms, will prove to reflect either a new metabolic connection or the existence of unknown analogue genes has already been mentioned. In a few cases tyrA or pheA was the only aromatic-pathway gene not found by homology search. The endosymbiont Buchnera (Fig. 2D), which lacks tyrA, may not need to synthesize tyrosine because the host has phenylalanine hydroxylase, which can convert phenylalanine to tyrosine. Aeropyrum pernix (Fig. 2A) and Helicobacter pylori (Fig. 2D), which both lack pheA, may very well possess arogenate dehydratase, an alternative pathway step for prephenate dehydratase (reference 39 and references therein). No gene encoding an arogenate dehydratase has yet been cloned and sequenced.
Reductive evolution. Reductive evolution is descriptive of the process in which pathogens or symbionts decrease genome size by abandoning genes that are needed by their free-living relatives but dispensable because of the availability of resources from a host or symbiont partner. The genus Pyrococcus exhibits marked variation in the capability for aromatic biosynthesis. Pyrococcus horikoshii has experienced total reductive evolution. Only TrpEb remains in P. horikoshii, and the case has been made that this may have some other function, such as serine deaminase activity (92). P. abyssi possesses genes encoding common-pathway and Trp pathway steps but lacks the Phe and Tyr branches. Although chorismate mutase (aroQ) is present, it could have some other substrate specificity (13). Since P. abyssi lacks the competing Phe and Tyr branches, an unusual metabolic circumstance exists in which the representation of tryptophan biosynthesis can be collapsed to that of a linear pathway of 12 overall steps (corresponding to the seven common-pathway steps followed by the five overall steps that are specifically dedicated to Trp biosynthesis). In contrast to the foregoing two differentially auxotrophic species of Pyrococcus, P. furiosus possesses a complete assemblage of aromatic-pathway genes.
Organisms that lack the entire branched system of aromatic amino acid biosynthesis include P. horikoshii (Fig. 2A), Ureaplasma urealyticum and Mycoplasma species (Fig. 2B), Borrelia burgdorferi and Treponema pallidum (Fig. 2C), and Rickettsia prowazekii and Wolbachia spp. (Fig. 2D). These whole-pathway reductive evolutions are generally associated with intracellular parasitism or endosymbiosis, and they imply auxotrophic dependence upon the host not only for all three aromatic amino acids but also for end products of the vitamin-like branches (e.g., folate, vitamin K, and ubiquinones) that derive from chorismate. In the Bacteria, some organisms possess an otherwise intact aromatic pathway but the Trp branch is uniquely absent. Among gram-positive bacteria (Fig. 2B), this includes Enterococcus faecalis and Clostridium difficile, and this pattern is also seen in the gram-negative Haemophilus ducreyi (Fig. 2D).
Interestingly, some organisms lack all three of the terminal aromatic amino acid branches but possess an intact common pathway to chorismate: Streptococcus pyogenes (Fig. 2B), Streptococcus equi (Fig. 1B), chlamydial species (Fig. 2C), Porphyromonas gingivalis (Fig. 2C), and Treponema denticola (Fig. 2C). The implication is that the remaining common pathway still links to one or more of the vitamin-like pathways. In the chlamydiae, we could not detect (by use of homology searching) a single gene encoding any known chorismate-utilizing enzyme. However, this could easily be accounted for by the existence of analogue genes that have not yet been identified. For example, E. coli chorismate lyase, which catalyzes the initial step of ubiquinone biosynthesis, is encoded by a gene (66) that is of very limited distribution. Therefore, elucidation of presently unknown analogue genes encoding chorismate lyase surely must be forthcoming.
|
A comprehensive phylogenetic tree for trpD proteins (data not shown) reveals that all of the TrpD proteins in Fig. 3 exhibit cohesive clustering and an order of branching that is congruent with the corresponding genome positions on the16S rRNA phylogenetic tree except, of course, for the trpD domain of the trpDtrpC fusion protein in the two coryneform species. Thus, in C. diphtheriae and C. glutamicum, the free-standing trpD outside of the whole-pathway trp operon is more closely related to trpD inside the partial-pathway trp operons of all the other organisms. An inner-membrane protein of unknown function separating trpAa and trpD in all of the mycobacteria, encoded by chyp, also flanks the nonoperonic trpD of the two coryneform species. As expected for the suggested LGT scenario, trees of TrpAa, TrpEa, and TrpEb proteins that are encoded from the partial-pathway operons of mycobacterial species, Streptomyces, and Thermomonospora in Fig. 3 all cluster closely together with the exclusion of the corresponding LGT genes from the coryneform bacteria.
Post-LGT events of vertical descent can be tracked in C. diphtheriae. Since the time that an alien trpAa/trpAb/trpB/trpDtrpC/trpEb/trpEa operon displaced the trp genes present in the common ancestor of coryneform bacteria, leaving behind only chyp and trpD as remnants, subsequent vertical evolutionary events in the C. diphtheriae genome are apparent. Thus, an insertion containing panB and panC occurred recently between trpDtrpC and trpEb in the C. diphtheriae lineage after its divergence from C. glutamicum. In C. glutamicum, closely related panB and panC orthologues (encoding ketopantoate hydroxymethyltransferase and pantothenate synthetase) comprise a characterized operon of D-pantothenate biosynthesis that is located elsewhere in the genome (71). In C. diphtheriae, the translocation of panB and panC into the trp operon is associated with an inversion event between these two genes. Hence, the opposite transcriptional direction of the inserted panC has now isolated trpEb/trpEa from its former operonic transcriptional continuity, presumably forcing it to become a separate transcriptional unit. It is interesting that the otherwise alien operon of C. diphtheriae now contains the native genes panB and panC, transposed from the resident genome. C. diphtheriae has also produced a gene duplicate of the gene encoding the alien TrpEb, which has then become the proximal member of the operon. This paralogue TrpEb is probably deficient in complex formation with TrpEa, because conserved residue K-167 (Salmonella enterica serovar Typhimurium numbering), which forms a salt bridge with residue D-56 of TrpEa, has been changed to S-167 (85). Also, the highly conserved residue 162-G has been changed to a charged residue, 162-E. Thus, after the LGT event, several subsequent vertical events of evolution that occurred in C. diphtheriae but not in C. glutamicum can be tracked.
The following approaches were taken in an attempt to locate the missing trpC genes in the above-mentioned actinomycete organisms.
Pattern and profile search. TrpC is a short and relatively divergent sequence. Known TrpC homologues may have identities as low as 22%. In an initial Blast screening with E. coli TrpC as the query, for example, the Ferroplasma acidarmanus genome did not return any hits and appeared to lack TrpC by this criterion. However, the position of an unknown gene within the trp operon of F. acidarmanus strongly implicated its presence as a divergent trpC gene because it occupies the same relative position as trpC in two closely related Thermoplasma species. Indeed, identity as trpC (second iteration) was amply confirmed by use of PSI-Blast (5), as well as by the observed conservation in multiple alignments of critical residues established by structural studies of TrpC from E. coli. In addition, the use of TrpC query sequences from most of the Archaea did return positive Blast hits from the F. acidarmanus genome.
With this background in mind, the genomes of T. fusca, S. coelicolor, and the mycobacteria M. avium, M. tuberculosis, and M. bovis were subjected to a pattern and profile search that included a ProSite-like pattern based upon critical residues reported in the PDB summary, the use of TrpC domains as query sequences that were available from the closest relatives of the group missing TrpC, and the generation of a hidden Markov model based on a multiple sequence alignment of known TrpC sequences. No illuminating results were obtained with this approach.
Evaluation of an unknown gene inserted in the trp operon. M. tuberculosis has a conserved hypothetical gene (Rv1610) inserted between trpAa and trpD (denoted chyp in Fig. 3). The absence of trpC coupled with the insertion of this unexpected gene within the trp operon invited careful scrutiny. This was, in fact, reminiscent of the previously mentioned situation with the operonic trpC of F. acidarmanus, which initially eluded detection as trpC. However, critical residues expected of TrpC could not be matched to Rv1610 by manual alignment. Furthermore, Rv1610 appears to encode an inner-membrane protein with three transmembrane segments. In addition, if Rv1610 were, in fact, a divergent TrpC, we would expect to find homologues in T. fusca and S. coelicolor. We did not.
Possible catalysis of the TrpC reaction by HisA.
TrpC catalyzes an intramolecular oxidoreduction (Amadori rearrangement) that parallels the isomerase reaction catalyzed by HisA. Both reactions involve isomerization of an identical phosphoribosyl moiety. TrpC and HisA each exhibit (ß
)8 barrel structures. Jurgens et al. (46) in fact generated hisA mutants that could catalyze the TrpC reaction both in vivo and in vitro. One of these variants retained significant HisA activity. We therefore envisioned the possibility that an ancestor of the TrpC-deficient block of organisms might have duplicated hisA and recruited one copy to TrpC function. However, second copies of hisA were not found. We then further considered the possibility that HisA in these organisms might catalyze both reactions, since that potential had been established in vitro. However, the alignment of HisA sequences did not reveal any obvious variant residues common to the TrpC-deficient block of organisms that might suggest potential for TrpC activity.
Evolution of competence for TrpC catalysis by TrpD.
Altamirano et al. (3) recently reported the evolution of TrpC activity from the
ß barrel scaffold of TrpD following in vitro mutagenesis and recombination. Thus, one might envision an event of trpD gene duplication followed by divergence of one of the paralogues to TrpC function. Although a gene duplicate of trpD was found in S. coelicolor, other organisms of the trpC-"deficient" block do not have a trpD gene duplicate. In consideration of the additional possibility that a modified trpD might encode an enzyme capable of both reactions, a careful comparison of the multiple alignment for trpD sequences failed to reveal a variant subgroup that might be expected of an evolved dual-function trpC/trpD protein. This is perhaps not surprising in view of the recent retraction (4) of the results of Altamirano et al. (3).
Other possibilities.
Enzymes possessing triose phosphate isomerase (TIM) (ß
)8 barrel-like folds are widespread and accommodate a particularly wide range of functions (15). Within this large grouping, TrpC, TrpD, TrpAa, and Rpe (D-ribulose 5-phosphate 3-phosphate epimerase) belong to the ribulose phosphate binding superfamily within the SCOP (structural classification of proteins) database (15, 86). Therefore, both TrpAa and Rpe were also evaluated as possible evolutionary sources of the missing TrpC, with the approaches described for HisA and TrpD. Suggestive evidence was not found.
The isomerase step catalyzed by TrpC is clearly a facile reaction, and although none of the foregoing possibilities considered produced the answer sought, they illustrate nicely the rationale and sorts of in silico strategies for gene discovery that can be anticipated in the near future. Until the time that this article was under review, the identity of trpC in the organisms included in Fig. 3 had remained a mystery. However, convincing evidence has been obtained recently that the HisA isomerase in these organisms does in fact catalyze the isomerase reaction in both pathways (9). The gene name, priA (phosphoribosyl transferase A), has been suggested to accommodate to its functional role in two pathways. Although this possibility was anticipated as outlined earlier, the natural bifunctional proteins of actinomycete bacteria did not resemble that obtained experimentally (46) in terms of amino acid sequence matches. Barona-Gómez and Hodgson (9) suggested that the bifunctional actinomycete isomerases represent an ancient evolutionary state that is in line with the recruitment hypothesis (38). If so, specialization in the gene duplicate that became trpC must have required more divergence than the gene duplicate that became hisA because the homology of PriA proteins with HisA is evident but not with TrpC proteins.
| GENE FUSIONS |
|---|
|
|
|---|
|
|
The ultimate analysis of the total inventory of fused genes in any given genome should provide an excellent phylogenetic tool for deducing the order of branching. This approach should be greatly enhanced by the rapid increase in the number of sequenced genomes coupled with the enormous advantage of being able to identify gene fusions with bioinformatic methods. However, it was not expected at the time that fusions could occur independently at such frequencies or that LGT should be taken seriously. Therefore, application of the approach of nested gene fusions will require sufficient background work to recognize and discriminate fusion clusters that have independent origins on the vertical tree as well as ones that might have been spread in the horizontal direction by LGT.
| Trp PATHWAY GENE ORGANIZATION IN THE ARCHAEA |
|---|
|
|
|---|
trpAa
trpAb and
trpEb_1
trpEa.
|
Usually, the pair of genes encoding the two subunits of tryptophan synthase are adjacent in prokaryotes. In the case of P. aerophilum, trpEa and trpEb_2 have been separated from one another within the operon. This may reflect the inability of trpEb_2 to form a complex with trpEa. In P. aerophil