MMBR Figure table search 04
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Xie, G.
Right arrow Articles by Jensen, R. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Xie, G.
Right arrow Articles by Jensen, R. A.
Microbiology and Molecular Biology Reviews, September 2003, p. 303-342, Vol. 67, No. 3
1092-2172/03/$08.00+0     DOI: 10.1128/MMBR.67.3.303-342.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.

Ancient Origin of the Tryptophan Operon and the Dynamics of Evolutionary Change{dagger}

Gary Xie,1,2 Nemat O. Keyhani,1* Carol A. Bonner,1 and Roy A. Jensen1,2,3*

Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida 32611,1 BioScience Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87544,2 Department of Chemistry, City College of New York, New York, New York 100313

SUMMARY
INTRODUCTION
    Biochemical Pathway of Tryptophan Biosynthesis
        Nomenclature.
        Seven catalytic domains and two {alpha}/ß-subunit complexes.
        Relatives of Trp pathway catalytic domains.
        Identical Trp pathways exist within varied metabolic contexts.
    Operon Stability
    trp Operon and Its Regulation
        Known regulatory mechanisms.
        Unknown regulatory systems awaiting discovery?
    Feasibility for Deduction of Evolutionary Histories
GENOMIC DISTRIBUTION OF THE TRYPTOPHAN PATHWAY
    Mapping of trp Gene Patterns to the 16S rRNA Tree
    Trp Biosynthesis in Its Larger Context of Aromatic Biosynthesis
    Implications of Missing Genes
        Unidentified analogue genes.
        Alternative metabolic relationships.
        Reductive evolution.
    Search for an Elusive trpC Gene in Actinomycete Bacteria
        One actinomycete exception is explained by LGT.
        Post-LGT events of vertical descent can be tracked in C. diphtheriae.
        Pattern and profile search.
        Evaluation of an unknown gene inserted in the trp operon.
        Possible catalysis of the TrpC reaction by HisA.
        Evolution of competence for TrpC catalysis by TrpD.
        Other possibilities.
GENE FUSIONS
    Phylogenetic Distribution of trp Gene Fusions
    Nested Gene Fusions
Trp PATHWAY GENE ORGANIZATION IN THE ARCHAEA
Trp PATHWAY GENE ORGANIZATION IN THE BACTERIA
    Whole-Pathway trp Operons
    Dispersal of trp Operon Genes
    Gene Scrambling
RETENTION OF THE ANCESTRAL OPERON AT SPACED PHYLOGENETIC NODES IN BACTERIA
TWO MAJOR EVENTS UNDERLIE THE DYNAMICS OF trp OPERON CHANGE IN BACTERIA
    Operon Scission Yields Two Half-Pathway Operons
    Fusion of trpD with trpC Restores a Whole-Pathway Operon
LATERAL GENE TRANSFER OF trp OPERONS
    Lateral Gene Transfer of Whole-Pathway Operons
    Lateral Gene Ttransfer of Partial-Pathway trp Operons
FINE-TUNED EVOLUTIONARY DEDUCTIONS
    Single Change in a Common Ancestor versus Multiple Independent Changes in Descendants
    Distinguishing Derived States from Ancestral States
    Deducing Ancestral Character States at Phylogenetic Node Positions
    Value of Flanking-Gene Context
EXPANDED METABOLIC CONTEXT
    Pyrococcus and Its Archaeal Relatives
        Convergent trp and giant aro operons of Pyrococcus.
        Dynamics of archaeal gene shuffling.
    Bacillus/Staphylococcus Clade
        B. subtilis subgroup.
        Listeria subgroup.
        Interconnectivity of the trp, aro, pab, and his operons.
        Evolutionary information derived from flanking-gene context.
        Deducing the likely common ancestor of the clade.
OVERVIEW PERSPECTIVES
    Lineage-Specific Evolutionary Trends
    Individual Divergences Unmasked in the Larger Genomic Context
    Analysis of the Ancestral State at Phylogenetic Nodes
    Intellectual Dilemma Addressed
        Does trp gene reorganization necessarily imply functional deterioration?
        Are there any clear examples of efficient operons systems that have been disrupted?
        Elaborate regulation seems to be fairly recent.
        Regulation extending beyond the Trp pathway.
    Does Regulation Power Evolutionary Dynamics?
FUTURE PROSPECTS FOR ELEVATED KNOWLEDGE OF Trp PATHWAY EVOLUTION
APPENDIX
    Analysis of Raw DNA Sequence Data
    16S rRNA Tree Construction
    DNA Composition
    Fusion Protein and Linker Region Analyses
ACKNOWLEDGMENTS
REFERENCES

   SUMMARY
 Top
 Next
 References
 
The seven conserved enzymatic domains required for tryptophan (Trp) biosynthesis are encoded in seven genetic regions that are organized differently (whole-pathway operons, multiple partial-pathway operons, and dispersed genes) in prokaryotes. A comparative bioinformatics evaluation of the conservation and organization of the genes of Trp biosynthesis in prokaryotic operons should serve as an excellent model for assessing the feasibility of predicting the evolutionary histories of genes and operons associated with other biochemical pathways. These comparisons should provide a better understanding of possible explanations for differences in operon organization in different organisms at a genomics level. These analyses may also permit identification of some of the prevailing forces that dictated specific gene rearrangements during the course of evolution. Operons concerned with Trp biosynthesis in prokaryotes have been in a dynamic state of flux. Analysis of closely related organisms among the Bacteria at various phylogenetic nodes reveals many examples of operon scission, gene dispersal, gene fusion, gene scrambling, and gene loss from which the direction of evolutionary events can be deduced. Two milestone evolutionary events have been mapped to the 16S rRNA tree of Bacteria, one splitting the operon in two, and the other rejoining it by gene fusion. The Archaea, though less resolved due to a lesser genome representation, appear to exhibit more gene scrambling than the Bacteria. The trp operon appears to have been an ancient innovation; it was already present in the common ancestor of Bacteria and Archaea. Although the operon has been subjected, even in recent times, to dynamic changes in gene rearrangement, the ancestral gene order can be deduced with confidence. The evolutionary history of the genes of the pathway is discernible in rough outline as a vertical line of descent, with events of lateral gene transfer or paralogy enriching the analysis as interesting features that can be distinguished. As additional genomes are thoroughly analyzed, an increasingly refined resolution of the sequential evolutionary steps is clearly possible. These comparisons suggest that present-day trp operons that possess finely tuned regulatory features are under strong positive selection and are able to resist the disruptive evolutionary events that may be experienced by simpler, poorly regulated operons.


   INTRODUCTION
 Top
 Previous
 Next
 References
 
It has become quite apparent from the recent flood of genomic data that dynamic gene reorganization is an ongoing process (albeit of unknown significance) that distinguishes even closely related genomes. Genes that stay together within operons must resist the gene-scrambling process. Operons that embrace a complete complement of pathway-specific structural genes (whole-pathway operons), such as the ones encoding all the enzymes of tryptophan (Trp) biosynthesis or histidine biosynthesis, have a classical status in both biochemistry and molecular genetics that extends far beyond understanding these pathways per se. Such whole-pathway operons are broadly distributed among prokaryotes. However, the pathway genes may be completely scattered in some organisms, and in yet other organisms, the pathway genes may be organized into two or more "split-pathway" operons. This raises intriguing questions about what the evolutionary relationship is between whole-pathway operons, split-pathway operons, and those cases where all pathway genes are unlinked. Is it possible to deduce whether a given whole-pathway operon was an ancient innovation and therefore that operon splitting and/or gene dispersal followed in some lineages? Or are whole-pathway operons relatively recent innovations that are derived from split-pathway operons? Or, since these two scenarios are not mutually exclusive, is it possible that both apply?

An ideal operon system for this analysis is the trp operon. We show that the trp operon must have been present in early prokaryote ancestors. In Bacteria but not in Archaea, sufficient genome representation exists to deduce an ancestral whole-pathway trp operon. The regulation of this operon may initially have been quite minimal since the first evolutionary step(s) probably would be to collect the structural genes together. Parsimony principles support a hypothesis developed in this paper of two major evolutionary events in Bacteria, one splitting the ancestral operon in two and the other rejoining it by gene fusion. We assert that a detailed analysis can recognize occasional events of lateral gene transfer (LGT) or paralogy. Both are likely to be associated with Trp pathway genes engaged in specialized metabolic pathways other than primary amino acid biosynthesis. We show that when two sister lineages differ in particular trp operon characteristics, it is possible to deduce which is the derived change and which reflects the state of the ancestral node.

Recently, Gogarten et al. (28) endorsed a "synthesis" that will acknowledge both the traditional tree-like behavior (vertical descent of genes) and web-like, reticulate behavior (horizontal gene transfer) of the evolutionary process. They leave it open whether or not "vertical descent remains the best descriptor of the history of most genes over evolutionary time." Our overall analysis yields a very optimistic viewpoint that the evolution of the trp operon can be deduced as a vertical genealogy, with events of LGT and paralogy enriching the analysis as interesting features rather than undermining or obliterating the vertical trace of evolutionary history.

Biochemical Pathway of Tryptophan Biosynthesis

Nomenclature. The inconsistencies of nomenclature for designations of genes involved in aromatic biosynthesis have created increasingly awkward problems for comparative analyses, and in order to cope with genomic comparisons, we have implemented a logical system of naming aro genes at a level corresponding to catalytic domains (13, 31, 44, 90, 91, 88). We have extended this nomenclature to the Trp pathway (89, 92) (see Fig. 1 and Table 1). The two overall enzyme reactions that utilize a complex of nonidentical subunits have been denoted with lowercase letters (TrpAa and TrpEa are {alpha}-subunits for anthranilate synthase and tryptophan synthase, respectively; TrpAb and TrpEb are ß subunits for anthranilate synthase and tryptophan synthase, respectively). Capital letters are assigned according to the order of the enzyme reactions (or overall reactions, in the case of the two complexes). C. Yanofsky has expressed to us his preference (probably shared by most experimentalists working specifically with trp systems) for adherence to previous nomenclature schemes to minimize disruption of what is most familiar in the existing literature. Admittedly, the designations generally in use for the Trp branch do not generate as many problems of annotation errors as is the case for the rest of the aromatic pathway, but for consistency with our overall work with the aromatic pathway, we use the new naming system in this paper. Both sets of designations are shown in Table 1.



View larger version (24K):
[in this window]
[in a new window]
 
FIG. 1. Biochemical pathway of tryptophan biosynthesis. The nomenclature used in this paper for the seven catalytic domains is in boxes. See Table 1 for the alternative designations used in the literature. Anthranilate synthase catalyzes the overall reaction from chorismate to anthranilate via the half-reactions shown, whereby 2-amino-2-deoxyisochorismate (ADIC) is an enzyme-bound intermediate (62). The TrpAa/TrpAb complex functions as an amidotransferase, utilizing glutamine as the source of the o-amino group of anthranilate. TrpAa can catalyze the overall reaction alone in the presence of NH3 (thereby functioning as an aminase). TrpAb alone in some cases may be able to function as a glutaminase. As shown by McDonald et al. (59), Pseudomonas and Streptomyces species form ADIC as the product of a reaction catalyzed by PhzE. PhzE has fused domains that are homologues of TrpAa and TrpAb, which we have denoted TrpAa•TrpAb_phz (93) (see Table 1). In these organisms, ADIC can be considered a branch point that proceeds to Trp on the one hand and to phenazine pigments on the other hand. Tryptophan synthase catalyzes a second overall reaction, converting indoleglycerol phosphate to Trp in a reaction path where indole is always an intermediate. The alpha (TrpEa) and beta (TrpEb) subunits catalyze the reactions shown in which the indole intermediate is processed through a tunnel (85). PR, phosphoribosyl; IGP, indoleglycerol phosphate; G3P, glyceraldehyde 3-phosphate.

 

View this table:
[in this window]
[in a new window]
 
TABLE 1. Key to nomenclature conversions

 
Seven catalytic domains and two {alpha}/ß-subunit complexes. Trp is an essential amino acid among the assemblage of required amino acids in mammals. Trp is generally synthesized by free-living prokaryotes, lower eukaryotes, and higher plants. The Trp pathway is one of three amino acid branches diverging from a common flow route that produces chorismate. The apparent universal biosynthetic pathway for Trp biosynthesis that initiates with chorismate and L-glutamine is shown in Fig. 1. Seven catalytic domains are deployed to carry out the reactions shown. In a given organism these may be individually expressed, but a wide variety of gene fusions that encode single proteins carrying two or more catalytic domains are known.

TrpAa can function as an ammonia-utilizing aminase in the anthranilate synthase reaction. Although the aminase reaction can proceed with ammonia at unphysiologically high pH values, such reactions typically rely upon a glutamine-utilizing glutaminase subunit to deliver the ammonia at the active site (probably within a "tunnel"). Accordingly, TrpAb is a glutaminase homologue that forms a complex with TrpAa, thereby conferring an amidotransferase component to the overall anthranilate synthase reaction in the presence of glutamine. In either case, whether or not the overall anthranilate synthase reaction is carried out in the presence of TrpAb, 2-amino-2-deoxyisochorismate (ADIC) is an enzyme-bound intermediate. Interestingly, some species of Pseudomonas and Streptomyces produce an enzyme called PhzE (59), which carries out the ADIC synthase reaction but not the ADIC lyase reaction (see Fig. 1). ADIC is then converted ultimately to phenazine pigments. PhzE is a fusion of domains homologous to TrpAa and TrpAb (hence our designation TrpAa•TrpAb_phz in Table 1). TrpAa belongs to a protein superfamily that includes other chorismate-utilizing enzymes: PabAa converts chorismate to 4-amino-4-deoxychorismate (precursor of 4-aminobenzoate), and MenF and EntC are different homologue subgroups that convert chorismate to isochorismate (as precursors of ubiquinones and an iron siderophore, respectively).

Tryptophan synthase also exists as a complex of nonidentical subunits and is one of the best-understood examples of allosteric interaction exerted between subunits (97). Why indole should be sequestered to a tunnel in the {alpha}/ß complex of tryptophan synthase is not known, but indole is volatile and rather toxic. Yanofsky has speculated that recent findings of a role for indole in quorum sensing and biofilm formation might suggest that indole either produced by tryptophanase or otherwise available in the environment may serve as a metabolite cue that might otherwise be disrupted if biosynthetic indole were not enzyme-bound (see reference 96 and references therein). It has been speculated (92) that some Archaea may not form a tryptophan synthase complex.

Relatives of Trp pathway catalytic domains. The pathway of Trp biosynthesis is the first amino acid pathway for which the atomic structure of every catalytic domain has been determined (58), a circumstance of significance because evolutionary analysis can be greatly enhanced through insight gained at the structural level of protein folding. Consultation of the reference by Yanofsky et al. (97) is highly recommended for a definitive presentation of the detailed literature up to about 2000. Each catalytic domain belongs to a protein superfamily at the structural level of protein folding. Many of the catalytic domains exhibit clear homology on the criterion of amino acid identity with proteins that have different substrate specificities and which participate in different pathways. From an evolutionary perspective, this is of interest with respect to such questions as the extent to which the Trp pathway enzymes have been assembled (via gene duplication and substrate alteration) by recruitment of homologues from other pathways or the extent to which the Trp pathway has been the source of genes recruited for function in other pathways or a homologous gene with a recent history of function in another pathway has "crossed over" to replace a Trp pathway gene (or vice versa). This aspect is not addressed further in this article except indirectly (e.g., see the later section on the search for an elusive trpC gene).

Identical Trp pathways exist within varied metabolic contexts. The Trp pathway is generally defined as an unbranched pathway that begins with chorismate and produces Trp as a substrate for general protein synthesis. The Trp pathway appears to have evolved only once. These aspects of universality are favorable for the task of deducing the evolutionary history. However, many aspects of biochemical individuality are not usually considered. In some cases, Trp biosynthesis does not compete with Phe and/or Tyr biosynthesis because one or both of these are absent. In other cases, as exemplified by the use of ADIC for phenazine biosynthesis in Pseudomonas and Streptomyces species, chorismate is no longer the last branch point, and if one starts with chorismate as a reference point, then the pathway is branched. The pathway does not necessarily end exclusively with the Trp end product supplying protein synthesis, e.g., in cases where Trp may be a component of an antibiotic (as in Streptomyces), or where it is converted to indoleacetic acid in plant symbionts such as Azospirillum. Eukaryotes (but no prokaryotes so far) deploy Trp as a precursor of niacin. In such cases, the pathway can be considered divergently branched at the end, with Trp being guided to different molecular fates.

Trp is the most biochemically expensive of the amino acid pathways, requiring the input of erythrose-4-phosphate, ATP, phosphoribosyl pyrophosphate (PRPP), two phosphoenolpyruvate molecules, L-glutamine, and L-serine. Thus, efficient regulation is generally expected, but these rules no longer apply in an endosymbiont such as Buchnera, which has abandoned Trp regulation. In this case, loss of regulation can be viewed as a positive selective step in order to satisfy the needs of its aphid host. In addition, some prokaryotes sustain different physiological or developmental states where the demands impacting the Trp pathway may be more complicated than just sensing the availability of Trp for protein synthesis. These often involve specialized pathways that coexist with primary Trp biosynthesis. These specialized pathways are encoded in part or entirely by divergent trp gene duplicates whose expression is triggered by a variety of temporal and environmental cues, e.g., to make a given pigment or antibiotic derived in part from the Trp pathway.

These are all interesting but complicating elements that we have tried to keep in mind. This is relevant to the task of sorting out and recognizing paralogues (or xenologues) that may be engaged in specialist pathways other than primary Trp biosynthesis. Appreciation of such complexity may also prove relevant to understanding the nature of split-pathway trp operons in many prokaryotes.

Operon Stability

Most molecular biologists who are familiar with the elaborate control features of well-studied operon systems would feel a strong conviction that once evolved, these would resist change (at least disruptive changes). Yet a bioinformatic analysis of the then-available genomes in 1999 (37) produced the conclusion that operon structures, such as the trp operon, are unstable, as inferred from observations of extraneous insertions, gene dispersal, and scrambling of gene order within operons. Characterization of operons as unstable connotes susceptibility to deterioration. If true, this implies that the selective advantages conferred by such operons must be weak.

The Itoh et al. study (37) was a broad-scope analysis of many operons that was necessarily limited with respect to in-depth consideration of any individual operon system. It should be noted that for these kinds of studies, operons have been considered simply as a collection of structural genes that are linked. The presence or absence of linked or unlinked regulatory elements has not usually been evaluated, undoubtedly because this is not easily done. In this paper we pursue in great detail the evolution of a single well-known operon system in the large number of prokaryote genomes now available. We found strong support for the hypothesis that the trp operon, minimally defined as the linked assemblage of structural genes for tryptophan (Trp) biosynthesis, is of ancient origin and has indeed followed a dynamic time course of change that includes several identifiable milestone events in Bacteria. Our study leads to the further hypothesis that the instability of early trp operons (and perhaps some modern ones) can be attributed to weak positive selection conferred by relatively undeveloped control mechanisms.

We suggest that since the time that operons evolved a variety of control mechanisms, the characterization of operons as dynamic (rather than unstable) yields better semantics to describe a positive ongoing process of fine-tuning. In modern free-living organisms, the variety of recently evolved trp operon systems which differ from one another and are endowed with intricate control features mediated by one or more unlinked regulatory genes may in fact be highly stable in the contemporary time frame. One caveat, however, is that this frequently will not apply to pathogenic or endosymbiotic relatives, where the rules dictating selective advantage have completely changed.

trp Operon and Its Regulation

The biochemical pathway of Trp biosynthesis is a classical system of biochemical genetics (95, 97). In Escherichia coli the component genes are organized into a single transcriptional unit to form the trp operon. (This is not strictly correct to the extent that a weak internal promoter exists.) The Trp pathway has become one of the most intensely studied systems in biology, thanks largely to the truly Herculean labors of C. Yanofsky and his many students and colleagues. This system has produced knowledge that extends well beyond the details of the Trp pathway per se, e.g., proof of codon and amino acid colinearity and an early precedent for attenuation mechanisms (95, 96). The individual reactions of Trp biosynthesis are invariant, but experimental work with a variety of organisms reveals substantial diversity with respect to gene fusion, gene organization, and mechanisms of regulation.

Known regulatory mechanisms. At the bioinformatic level, the analysis of trp operons in the literature has been largely restricted to the structural genes. Consideration of regulatory features has been understandably limited, mainly because relatively little comparative information is available at the experimental level and also because analysis of alternative stem-loop structures, etc., is not a trivial task. Escherichia coli, Bacillus subtilis, Pseudomonas aeruginosa, and Lactococcus lactis represent clades for which detailed control mechanisms have been described, each of them entirely different. Importantly, each mechanism seems to be narrowly distributed, and therefore we infer that they are of recent origin. Note that in each case, unlinked genes exist that markedly decrease the probability that the total regulated operon system could be transferred by LGT in one event.

Regulation of Trp biosynthesis in E. coli, the most widely known system, is quite sophisticated (23, 94), being subject to the following multiple levels of control: (i) repression control via the Trp repressor (encoded by the unlinked trpR) which binds Trp as a corepressor moiety, (ii) an attenuation mechanism mediated by a Trp-rich leader peptide (encoded by trpL), and (iii) allosteric feedback inhibition of anthranilate synthase by Trp (95). The E. coli mechanisms of overall trp operon regulation are generally shared by the enteric lineage of Bacteria, defined by us as the clade that includes Shewanella putrefaciens as the outlying point of divergence from E. coli.

Bacillus subtilis has a different system of trp operon regulation (72, 80, 95, 96), whereby genes unlinked to the trp operon encode (i) a trp RNA-binding attenuation protein (TRAP) encoded by mtrB as well as (ii) an anti-TRAP gene product encoded by rtpA (80). Trp both feedback inhibits anthranilate synthase and activates TRAP for attenuator function, whereas uncharged tRNATrp induces synthesis of anti-TRAP. TRAP can also block translation of the trp operon through interference with the ribosome-binding site. The clade sharing the TRAP system of regulation includes Bacillus halodurans, Bacillus stearothermophilus, and Oceanobacillus iheyensis in addition to Bacillus subtilis. At this time it is not clear whether the anti-TRAP component is present throughout this clade.

A third finely tuned system of regulation has been documented in Lactococcus lactis (69). In this case uncharged tRNA can bind directly to the leader transcript, stabilizing an antiterminator configuration that promotes expression of the operonic genes. In Lactococcus lactis, unlinked, unknown genes involved in trp operon transcript processing and in transcription initiation have been suggested (69). The presence or absence of the Lactococcus lactis mode of trp operon regulation in close relatives, such as species of Streptococcus, has apparently not yet been investigated.

In Pseudomonas aeruginosa, the fourth well-documented system, the Trp pathway is represented by four operon entities: a free-standing trpAa, the trpAbBD operon, a free-standing trpC, and the trpEbTrpEa operon. The trpAa and trpAbBD operons are regulated by attenuation mechanisms employing leader peptides (67), whereas the trpEbtrpEa operon is controlled by an indoleglycerol phosphate-activated regulatory protein encoded by trpI (6). trpC is not known to be regulated in any way. The P. aeruginosa system is complicated by the presence of paralogues of trpAa and trpAb. These include genes of unknown physiological function (also known as phnA and phnB) expressed in stationary phase (57) as well as two copies of PhzE (trpAa•trpAb_phz), a gene that encodes ADIC synthase (Fig. 1), the initial reaction committed to phenazine biosynthesis. It is not entirely clear what physiological conditions exist in P. aeruginosa (and close relatives) that have resulted in its unusual use of indoleglycerol phosphate as a regulatory cue for the selective regulation of the trpEbTrpEa operon, but it is certainly evident that much has been committed to the overall regulation in this system. Close genomic neighbors of P. aeruginosa that possess identical split-pathway trp operons and trpI include Pseudomonas fluorescens, Pseudomonas syringae, and Azotobacter vinelandii.

Unknown regulatory systems awaiting discovery? We do not know the extent to which the total network of regulatory elements governing the single trp operons in the E. coli, B. subtilis, and L. lactis clades or the multiple split-pathway operons of the P. aeruginosa clade might be more elaborate than that of most other organisms. Different lifestyles undoubtedly select mechanisms accommodating varied ranges of control responsiveness. A simple mode of Trp regulation may very well be appropriate in a cyanobacterium but not E. coli. A variety of alternative regulatory systems in other modern lineages probably remain to be elucidated. Transcriptional regulation has been reported in the whole-pathway operons of Methanobacterium thermoautotrophicum (26) and Pyrococcus kodakaraensis (77), but the exact mechanisms are unknown. The split-pathway operons of the clade represented by Rhizobium meliloti (7) and Azospirillum brasilense (21) exhibit an attenuation mechanism involving a Trp-rich leader peptide, upstream of the trpAatrpAb fusion, but no regulation of the remaining two partial-pathway operons is known. Physically separated split-pathway trp operons may be of positive selective value per se for presently unknown reasons, whereby it might be of value to discoordinate the expression of some trp genes from others, or they may simply be the outcome of initially disrupted whole-pathway operons that subsequently recruited a refined control mechanism accommodating the gene separations.

Feasibility for Deduction of Evolutionary Histories

The current database for prokaryotes, at least for the Bacteria, now has sufficient genome representation to accommodate systematic attempts to deduce the evolutionary history of well-understood biochemical pathways. Such an effort requires the successful recognition and confrontation of complications such as (i) irregular genome expansions in the form of the unpredictable emergence of new paralogues or analogues, (ii) an erratic and differential phylogenetic loss of paralogue genes (often the cause of what has been termed unrecognized paralogy), and loss of analogue genes (could be termed unrecognized analogy), and (iii) lateral (horizontal) gene transfer (LGT). Errors and inconsistencies of database annotation as well as idiosyncrasies of nomenclature can create formidable hurdles for those who are not completely familiar with a given pathway and its scholarly literature. Global computational surveys to date are simply not very informative, and the algorithms employed for automated annotation have too many limitations. For example, a very recent effort at computational identification of operons in microbial genomes (98) chose to highlight the results of trp operon analysis as a prime example of the analysis. However, the results presented are not comprehensive and contain serious mistakes, most likely due to errors in annotation and confusing nomenclature issues that have been perpetuated in the databases.

As a first step toward deducing the evolutionary history of overall aromatic biosynthesis, we selected the Trp branch as a challenging but manageable metabolic segment for initial analysis. Trp pathway genes have sometimes been recruited for function in specialized biochemical pathways, and ancient paralogues or xenologues may coexist with the Trp pathway genes that are engaged in primary biosynthesis. We have shown (93) that detailed case-by-case analysis can distinguish ancient trp paralogues (or xenologues) from their homologues engaged in primary Trp biosynthesis. A comparable study in the literature produced a detailed analysis of homologues of ornithine carbamoyltransferase in which the challenges to tracking a vertical path of evolutionary descent that are caused by the complexities of xenology and ancient paralogy were sorted out (73). This study was preceded by an analysis (49) showing that ornithine carbamoyltransferases in turn belong to a larger protein family in which the ornithine and aspartate carbamoyltransferases are very ancient paralogues. The conclusions such comprehensive studies are consistent with the contentions of Glansdorff (27) and Woese (87) that complications of ancient paralogy, ancient analogy, and lateral gene transfer can be recognized sufficiently well to allow the events of vertical ancestry to be tracked.

Here we present results from an in-depth, manual analysis of Trp pathway genes in over 100 genomes. A limited amount of information is also given to illustrate the very important perspective that the evolutionary relationships of Trp biosynthesis will ultimately be best understood in its larger context as one branch of a highly divergent pathway responsible for the biosynthesis of aromatic amino acids as well as many other important metabolites.


   GENOMIC DISTRIBUTION OF THE TRYPTOPHAN PATHWAY
 Top
 Previous
 Next
 References
 

Mapping of trp Gene Patterns to the 16S rRNA Tree

In most of the figures used in this paper, patterns of operonic gene arrangement in a given organism are mapped to the placement of that organism on a 16S rRNA tree. Table 2 keys each organism examined to the figures that show trp gene patterns in that organism. To what extent do the individual Trp protein trees parallel the 16S rRNA tree? It is well known that, unlike information-rich 16S rRNA, most individual proteins cannot be expected to yield robust phylogenetic trees in which the order of branching is well supported by high bootstrap values, at least not over wide phylogenetic ranges. However, in relatively narrow phylogenetic spans, we have found congruity between Trp protein trees and 16S rRNA trees (except for TrpAb, which is too small).


View this table:
[in this window]
[in a new window]
 
TABLE 2. Cross-reference guide to organisms and figures

 
Figure 2 illustrates (see shaded and circled numbers) eight clades where reasonably good congruity is observed: a Listeria/Bacillus/Staphylococcus/Streptococcus grouping (Fig. 2B), B. subtilis, B. stearothermophilus, B. halodurans (Fig. 2B), actinomycete bacteria (Fig. 2B) cyanobacteria (Fig. 2C), Campylobacter/Helicobacter (Fig. 2D), Proteobacteria between Caulobacter crescentus and Rhodobacter sphaeroides in Fig. 2D, the clade between Thiobacillus ferrooxidans and Pseudomonas syringae (Fig. 2D), and the enteric lineage between Shewanella putrefaciens and Escherichia coli in Fig. 2D. These are all groups for which a sufficient number of closely related genomes have been sequenced. We expect that when genome sequences become available in more sparsely represented areas, e.g., around Chlorobium (Fig. 2C) or Thermotoga (Fig. 2A), additional phylogenetic spans will be congruent. Within a relatively narrow phylogenetic span, protein trees actually have the potential to discriminate branching order better than 16S rRNA trees. Our Trp protein trees (data available upon request), together with a variety of other aromatic-pathway information (see also the section on nested gene fusions), suggest that the Enterococcus/Streptococcus/Lactococcus grouping is outside the Listeria/Bacillus/Staphylococcus clade rather than within it, as shown on the 16S rRNA tree of Fig. 2B.






View larger version (290K):
[in this window]
[in a new window]
 
FIG.2. Distribution of aromatic-pathway catalytic domains among prokaryotes. In each panel, 16S rRNA trees are shown at the left, and the presence (shaded circles) or absence (open circles) of domains is shown at the right. Note that only the presence or absence of genes, not gene order, is indicated. Catalytic domains of the common trunk of aromatic biosynthesis (Aro), the phenylalanine branch (Phe), the tyrosine branch (Tyr), and the Trp branch are labeled across the top right; the specific letter designation for a given domain is shown at the bottom. In the Trp grouping, split circles are used to indicate the presence or absence of TrpAa (top half-circle) and TrpAb (bottom half-circle) or TrpEa (top half-circle) and TrpEa (bottom half-circle). In panel A, the presence or absence of transketolase (Trk) is indicated by the left column of circles. The connecting point of a tree segment in any given panel (A, B, C, and D) with a tree segment(s) in another panel is marked with a broken line. The scale bar corresponds to substitutions per site. Dotted lines in the Streptococcus region (B) and the Buchnera region (D) indicate our suggestion that the 16S rRNA tree shown may not reflect exactly the correct order of branching, and perhaps these organisms branch from a slightly deeper position. See Fig. 8 for the suggested branching order of Buchnera. Circled numbers indicate eight node positions from which Trp protein trees are congruent with the 16S rRNA tree. The common trunk of aromatic biosynthesis is encoded by seven genes whose corresponding gene products are named AroA through AroG. The common-pathway genes are named in exact order of pathway reactions according to the precedent implemented in references 12, 31, 76, 90, and 91. The chorismate mutase block is represented by homologues of either AroQ (usually) or AroH (seldom) (12). PheA refers to prephenate dehydratase, the sequence of the relatively infrequent arogenate dehydratase being currently unknown. TyrA refers to a homologue family that includes prephenate dehydrogenase, arogenate dehydrogenase, or cyclohexadienyl dehydrogenase (9, 88). See Fig. 1 for details of Trp biosynthesis. The names of organisms retaining the putative ancestral whole-pathway trp operon are shaded orange, those having the two split-pathway operons are shaded magenta, and those having operons rejoined by fusion of trpD and trpC are shaded aqua. These correspond to the major evolutionary events portrayed in Fig. 12 and indicated with the same color-coding scheme. Probable pseudogenes in chlamydiae (C) and Coxiella (21) are indicated with heavy black slash marks. Genes that function in two pathways (trpAb in Bacillus subtilis and trpC in actinomycete bacteria) are marked with magenta bull's-eyes in B. Panel A includes the Archaea and a few of the deeper-branching Bacteria at the bottom. Panel B includes the gram-positive Bacteria. Panel C includes cyanobacteria, chlamydiae, and other organisms on the 16S rRNA tree between the gram-positive organisms in panel B and the organisms in panel D, which contains the gram-negative subdivisions of the Proteobacteria. Wolbachia sp. (panel D) is an endosymbiont of Brugia malayi. A cross-index of all organisms shown in both this figure and the remaining figures is given in Table 2.

 
Assembly of these protein trees is not a trivial task because divergent paralogues or xenologues engaged in specialized metabolic activities as well some genes originating by LGT must be recognized and sorted out. Examples are given in this paper.

Trp Biosynthesis in Its Larger Context of Aromatic Biosynthesis

Trp biosynthesis is usually described as the branch of aromatic biosynthesis (reference 10 is a comprehensive biochemical review) that begins with chorismate and L-glutamine as initial substrates. In view of the fact that chorismate is not generally available from the environment as a stable nutrient, Trp biosynthesis can be considered from an in vivo perspective to initiate further upstream via the enzymatic condensation of erythrose-4-phosphate and phosphoenolpyruvate. This step is common to the biosynthesis of all three aromatic amino acids and is positioned at a point of interface with carbohydrate metabolism.

The multipurpose Fig. 2 provides a summary of the presence or absence of Trp pathway genes in the larger context of the presence or absence of genes specifying the common aromatic trunk and the sister phenylalanine and tyrosine branches. The circles in Fig. 2 from left to right represent catalytic domains (specified at the bottom of each panel) corresponding to the seven common-pathway steps (aroA through aroG), chorismate mutase (aroQ or aroH) (which is common to the short Phe and Tyr branches), and the seven catalytic domains of the Trp pathway (Fig. 1 and Table 1).

The key enzyme of Phe biosynthesis is PheA, and the key enzyme of Tyr biosynthesis is TyrA. The Phe and Tyr branches each utilize an aminotransferase step, not shown as a circle because of bioinformatic difficulties associated with deducing the substrate specificity of multiple and ubiquitous broad-specificity aminotransferases (42). Most intermediary metabolites of aromatic biosynthesis are not likely to be available from the environment; only quinate, shikimate, and anthranilate, all abundant in nature (10), are feasible precursors of Trp. Although these metabolites are indeed readily utilized when available, no prokaryotes have yet been found to rely on an exogenous source of quinate, shikimate, or anthranilate as exclusive and obligatory beginning precursors. One interesting special-case exception is Chlamydophila psittaci, an obligate intracellular parasite that utilizes host-derived anthranilate as a required Trp precursor (89).

Implications of Missing Genes

Unidentified analogue genes. The most obvious explanation for "missing" genes that leave a gap in an otherwise intact pathway is the existence of analogue genes, i.e., functionally equivalent genes that lack homology with the genes used to query the databases. The common pathway of aromatic biosynthesis is a good example of nonhomologous genes producing enzymes that catalyze the same reaction. These include the first step ([3-deoxy-D-arabino-heptulosonate-7-phosphate][DAHP] synthase) (31, 44), the third step (dehydroquinase) (13), and the fifth step (shikimate kinase) (13, 18). Chorismate mutase is represented in nature by three analogue genes (13). No analogue genes are presently known for Trp pathway genes except for trpC (9).

Alternative metabolic relationships. In contrast to the apparent universality of the specific Trp branch, alternative enzyme steps appear to exist in nature for the Phe and Tyr branches as well as for the common trunk of aromatic biosynthesis. Some Archaea (Fig. 2A) and two widely spaced members of the Bacteria (Aquifex and Desulfovibrio, Fig. 2A and 2D) lack both AroA and AroB. Transketolase (Trk), required for generation of a substrate for AroA, is also shown in Fig. 2A because most (but not all) organisms that lack AroA and AroB also lack transketolase. (Desulfovibrio vulgaris [Fig. 2D] does have transketolase.) In the last six organisms, dehydroquinate, the substrate of AroC, presumably connects with carbohydrate metabolism in some unknown way that does not involve AroB or any of the known AroA homology groupings AroAI{alpha}, AroA, or AroAII (31, 44, 76). Some support for this putative alternative metabolic connection, based on tracer methodology, exists in the literature (79). It is also possible that quinate, either from the environment or arising endogenously in some unknown way, could be the source of dehydroquinate via the action of a quinate dehydrogenase.

Although species of Chlamydophila and Chlamydia are very close phylogenetically, the presence of Trp pathway genes varies from complete absence in C. pneumoniae to almost all present in C. psittaci. It appears that the Trp pathway in C. trachomatis and C. muridarum is in a contemporary process of reductive evolution, and the few remaining genes may be remnants (25, 89). In contrast to these species, an "incomplete" trp operon in C. psittaci appears to play a role in the capture of host kynurenine derived from tryptophan (89). Although C. psittaci does lack trpAa and trpAb, the remaining five trp genes coexist in an operon into which two novel genes have been recruited. These encode kynureninase and PRPP synthase. This creates the ability to generate PRPP (needed for the TrpB step) and to intercept host kynurenine as a source of anthranilate, cycling host-catabolized Trp back to Trp in the intracellular parasite (89). Effectively, a host-pathogen metabolic mosaic has been created, and the variant operon generates a kynurenine-to-Trp flow route instead of the usual chorismate-to-Trp flow route.

As explained above, the absence of trpAa and trpAb in C. psittaci is by design, and the remaining Trp pathway is functional. The likelihood that aroA and aroB, which are absent in some organisms, will prove to reflect either a new metabolic connection or the existence of unknown analogue genes has already been mentioned. In a few cases tyrA or pheA was the only aromatic-pathway gene not found by homology search. The endosymbiont Buchnera (Fig. 2D), which lacks tyrA, may not need to synthesize tyrosine because the host has phenylalanine hydroxylase, which can convert phenylalanine to tyrosine. Aeropyrum pernix (Fig. 2A) and Helicobacter pylori (Fig. 2D), which both lack pheA, may very well possess arogenate dehydratase, an alternative pathway step for prephenate dehydratase (reference 39 and references therein). No gene encoding an arogenate dehydratase has yet been cloned and sequenced.

Reductive evolution. Reductive evolution is descriptive of the process in which pathogens or symbionts decrease genome size by abandoning genes that are needed by their free-living relatives but dispensable because of the availability of resources from a host or symbiont partner. The genus Pyrococcus exhibits marked variation in the capability for aromatic biosynthesis. Pyrococcus horikoshii has experienced total reductive evolution. Only TrpEb remains in P. horikoshii, and the case has been made that this may have some other function, such as serine deaminase activity (92). P. abyssi possesses genes encoding common-pathway and Trp pathway steps but lacks the Phe and Tyr branches. Although chorismate mutase (aroQ) is present, it could have some other substrate specificity (13). Since P. abyssi lacks the competing Phe and Tyr branches, an unusual metabolic circumstance exists in which the representation of tryptophan biosynthesis can be collapsed to that of a linear pathway of 12 overall steps (corresponding to the seven common-pathway steps followed by the five overall steps that are specifically dedicated to Trp biosynthesis). In contrast to the foregoing two differentially auxotrophic species of Pyrococcus, P. furiosus possesses a complete assemblage of aromatic-pathway genes.

Organisms that lack the entire branched system of aromatic amino acid biosynthesis include P. horikoshii (Fig. 2A), Ureaplasma urealyticum and Mycoplasma species (Fig. 2B), Borrelia burgdorferi and Treponema pallidum (Fig. 2C), and Rickettsia prowazekii and Wolbachia spp. (Fig. 2D). These whole-pathway reductive evolutions are generally associated with intracellular parasitism or endosymbiosis, and they imply auxotrophic dependence upon the host not only for all three aromatic amino acids but also for end products of the vitamin-like branches (e.g., folate, vitamin K, and ubiquinones) that derive from chorismate. In the Bacteria, some organisms possess an otherwise intact aromatic pathway but the Trp branch is uniquely absent. Among gram-positive bacteria (Fig. 2B), this includes Enterococcus faecalis and Clostridium difficile, and this pattern is also seen in the gram-negative Haemophilus ducreyi (Fig. 2D).

Interestingly, some organisms lack all three of the terminal aromatic amino acid branches but possess an intact common pathway to chorismate: Streptococcus pyogenes (Fig. 2B), Streptococcus equi (Fig. 1B), chlamydial species (Fig. 2C), Porphyromonas gingivalis (Fig. 2C), and Treponema denticola (Fig. 2C). The implication is that the remaining common pathway still links to one or more of the vitamin-like pathways. In the chlamydiae, we could not detect (by use of homology searching) a single gene encoding any known chorismate-utilizing enzyme. However, this could easily be accounted for by the existence of analogue genes that have not yet been identified. For example, E. coli chorismate lyase, which catalyzes the initial step of ubiquinone biosynthesis, is encoded by a gene (66) that is of very limited distribution. Therefore, elucidation of presently unknown analogue genes encoding chorismate lyase surely must be forthcoming.

Search for an Elusive trpC Gene in Actinomycete Bacteria

A particularly challenging observation was that, aside from the fragmented presence of the Trp pathway genes already discussed for the chlamydiae, some organisms lacked a single gene of Trp biosynthesis (trpC in all cases). These organisms are restricted to a cohesive cluster of gram-positive actinomycete bacteria (Fig. 2B and Fig. 3) that includes Thermonospora fusca, Streptomyces coelicolor, Corynebacterium diphtheriae, Corynebacterium glutamicum, and five species of Mycobacterium. Since S. coelicolor can grow on defined minimal medium in the absence of Trp, it must possess an intact Trp pathway. Likewise, Mycobacterium smegmatis is a saprophytic species that can grow in a minimal medium in the absence of Trp. This therefore also indicates the presence of a functional pathway even though the presence of trpC in the genome is not apparent by homology searching.



View larger version (27K):
[in this window]
[in a new window]
 
FIG. 3. Apparent absence of trpC and an event of LGT in a lineage of actinomycete bacteria. A broader phylogenetic context can be viewed in Fig. 2B and 6A. chyp denotes a conserved hypothetical membrane protein exhibiting about 28% identity in comparison of a given Mycobacterium species with a given Corynebacterium species. Color-coded boxes pointing in the direction of transcription represent genes of Trp biosynthesis. For clarity of presentation, trpAa is shown as Aa, etc. Open boxes with question marks denote hypothetical proteins. Intergenic spacing is shown, with negative values indicating gene overlap. trpD•trpC fusions are represented by short black linker bars. On the left are 16S rRNA-based phylogenetic trees of the genomes having the gene organizations shown on the right. Orthologues that match the mycobacterial trpAa/chyp/D/Eb/Ea operon genes are aligned vertically. Contemporary trp operons in coryneform species that originated in their common ancestor by LGT of trpAa/Ab/B/D•C/Eb/Ea from a source within the enteric lineage are shown within brackets. Except for the two coryneform species, all actinomycetes have a free-standing trpB gene. The Mycobacterium spp. and Streptomyces also have a free-standing trpAb gene. The corresponding TrpB and TrpAb proteins exhibit high identity with one another but not with TrpB and TrpAb of the coryneform species. Thermomonospora has dissociated trpAa from the typical clade operon and fused it with trpAb (as also shown in Fig. 4). The trpAa/Ab/B/D/aroAII operon of S. coelicolor is known to be specifically associated with antibiotic biosynthesis (see text).

 
One actinomycete exception is explained by LGT. Within the actinomycete clade, the two species of Corynebacterium do possess a recognizable trpC (albeit fused to the trpD domain). However, this exception is explained by LGT displacement of not only trpC but also all trp genes in the Corynebacterium genus (except for a now-redundant trpD remnant) by the whole-pathway operon originating from an enteric bacterium (Xie and Jensen, unpublished data). Figure 3 shows that this actinomycete clade characteristically possesses a partial-pathway operon, trpAa/trpD/trpEb/trpEa, with gene insertions expanding the intergenic space between trpAa and trpD. In T. fusca, trpAa has not only dissociated completely from the trpD/Eb/Ea operon but has fused with trpAb. Only trpD and an associated conserved hypothetical gene denoted chyp remain in species of Corynebacterium as remnants of the original actinomycete genes. The remnants are pleasingly fortuitous because they show the Corynebacterium ancestor to be the recipient of LGT rather than the donor.

A comprehensive phylogenetic tree for trpD proteins (data not shown) reveals that all of the TrpD proteins in Fig. 3 exhibit cohesive clustering and an order of branching that is congruent with the corresponding genome positions on the16S rRNA phylogenetic tree except, of course, for the trpD domain of the trpD•trpC fusion protein in the two coryneform species. Thus, in C. diphtheriae and C. glutamicum, the free-standing trpD outside of the whole-pathway trp operon is more closely related to trpD inside the partial-pathway trp operons of all the other organisms. An inner-membrane protein of unknown function separating trpAa and trpD in all of the mycobacteria, encoded by chyp, also flanks the nonoperonic trpD of the two coryneform species. As expected for the suggested LGT scenario, trees of TrpAa, TrpEa, and TrpEb proteins that are encoded from the partial-pathway operons of mycobacterial species, Streptomyces, and Thermomonospora in Fig. 3 all cluster closely together with the exclusion of the corresponding LGT genes from the coryneform bacteria.

Post-LGT events of vertical descent can be tracked in C. diphtheriae. Since the time that an alien trpAa/trpAb/trpB/trpD•trpC/trpEb/trpEa operon displaced the trp genes present in the common ancestor of coryneform bacteria, leaving behind only chyp and trpD as remnants, subsequent vertical evolutionary events in the C. diphtheriae genome are apparent. Thus, an insertion containing panB and panC occurred recently between trpD•trpC and trpEb in the C. diphtheriae lineage after its divergence from C. glutamicum. In C. glutamicum, closely related panB and panC orthologues (encoding ketopantoate hydroxymethyltransferase and pantothenate synthetase) comprise a characterized operon of D-pantothenate biosynthesis that is located elsewhere in the genome (71). In C. diphtheriae, the translocation of panB and panC into the trp operon is associated with an inversion event between these two genes. Hence, the opposite transcriptional direction of the inserted panC has now isolated trpEb/trpEa from its former operonic transcriptional continuity, presumably forcing it to become a separate transcriptional unit. It is interesting that the otherwise alien operon of C. diphtheriae now contains the native genes panB and panC, transposed from the resident genome. C. diphtheriae has also produced a gene duplicate of the gene encoding the alien TrpEb, which has then become the proximal member of the operon. This paralogue TrpEb is probably deficient in complex formation with TrpEa, because conserved residue K-167 (Salmonella enterica serovar Typhimurium numbering), which forms a salt bridge with residue D-56 of TrpEa, has been changed to S-167 (85). Also, the highly conserved residue 162-G has been changed to a charged residue, 162-E. Thus, after the LGT event, several subsequent vertical events of evolution that occurred in C. diphtheriae but not in C. glutamicum can be tracked.

The following approaches were taken in an attempt to locate the missing trpC genes in the above-mentioned actinomycete organisms.

Pattern and profile search. TrpC is a short and relatively divergent sequence. Known TrpC homologues may have identities as low as 22%. In an initial Blast screening with E. coli TrpC as the query, for example, the Ferroplasma acidarmanus genome did not return any hits and appeared to lack TrpC by this criterion. However, the position of an unknown gene within the trp operon of F. acidarmanus strongly implicated its presence as a divergent trpC gene because it occupies the same relative position as trpC in two closely related Thermoplasma species. Indeed, identity as trpC (second iteration) was amply confirmed by use of PSI-Blast (5), as well as by the observed conservation in multiple alignments of critical residues established by structural studies of TrpC from E. coli. In addition, the use of TrpC query sequences from most of the Archaea did return positive Blast hits from the F. acidarmanus genome.

With this background in mind, the genomes of T. fusca, S. coelicolor, and the mycobacteria M. avium, M. tuberculosis, and M. bovis were subjected to a pattern and profile search that included a ProSite-like pattern based upon critical residues reported in the PDB summary, the use of TrpC domains as query sequences that were available from the closest relatives of the group missing TrpC, and the generation of a hidden Markov model based on a multiple sequence alignment of known TrpC sequences. No illuminating results were obtained with this approach.

Evaluation of an unknown gene inserted in the trp operon. M. tuberculosis has a conserved hypothetical gene (Rv1610) inserted between trpAa and trpD (denoted chyp in Fig. 3). The absence of trpC coupled with the insertion of this unexpected gene within the trp operon invited careful scrutiny. This was, in fact, reminiscent of the previously mentioned situation with the operonic trpC of F. acidarmanus, which initially eluded detection as trpC. However, critical residues expected of TrpC could not be matched to Rv1610 by manual alignment. Furthermore, Rv1610 appears to encode an inner-membrane protein with three transmembrane segments. In addition, if Rv1610 were, in fact, a divergent TrpC, we would expect to find homologues in T. fusca and S. coelicolor. We did not.

Possible catalysis of the TrpC reaction by HisA. TrpC catalyzes an intramolecular oxidoreduction (Amadori rearrangement) that parallels the isomerase reaction catalyzed by HisA. Both reactions involve isomerization of an identical phosphoribosyl moiety. TrpC and HisA each exhibit (ß{alpha})8 barrel structures. Jurgens et al. (46) in fact generated hisA mutants that could catalyze the TrpC reaction both in vivo and in vitro. One of these variants retained significant HisA activity. We therefore envisioned the possibility that an ancestor of the TrpC-deficient block of organisms might have duplicated hisA and recruited one copy to TrpC function. However, second copies of hisA were not found. We then further considered the possibility that HisA in these organisms might catalyze both reactions, since that potential had been established in vitro. However, the alignment of HisA sequences did not reveal any obvious variant residues common to the TrpC-deficient block of organisms that might suggest potential for TrpC activity.

Evolution of competence for TrpC catalysis by TrpD. Altamirano et al. (3) recently reported the evolution of TrpC activity from the {alpha}ß barrel scaffold of TrpD following in vitro mutagenesis and recombination. Thus, one might envision an event of trpD gene duplication followed by divergence of one of the paralogues to TrpC function. Although a gene duplicate of trpD was found in S. coelicolor, other organisms of the trpC-"deficient" block do not have a trpD gene duplicate. In consideration of the additional possibility that a modified trpD might encode an enzyme capable of both reactions, a careful comparison of the multiple alignment for trpD sequences failed to reveal a variant subgroup that might be expected of an evolved dual-function trpC/trpD protein. This is perhaps not surprising in view of the recent retraction (4) of the results of Altamirano et al. (3).

Other possibilities. Enzymes possessing triose phosphate isomerase (TIM) (ß{alpha})8 barrel-like folds are widespread and accommodate a particularly wide range of functions (15). Within this large grouping, TrpC, TrpD, TrpAa, and Rpe (D-ribulose 5-phosphate 3-phosphate epimerase) belong to the ribulose phosphate binding superfamily within the SCOP (structural classification of proteins) database (15, 86). Therefore, both TrpAa and Rpe were also evaluated as possible evolutionary sources of the missing TrpC, with the approaches described for HisA and TrpD. Suggestive evidence was not found.

The isomerase step catalyzed by TrpC is clearly a facile reaction, and although none of the foregoing possibilities considered produced the answer sought, they illustrate nicely the rationale and sorts of in silico strategies for gene discovery that can be anticipated in the near future. Until the time that this article was under review, the identity of trpC in the organisms included in Fig. 3 had remained a mystery. However, convincing evidence has been obtained recently that the HisA isomerase in these organisms does in fact catalyze the isomerase reaction in both pathways (9). The gene name, priA (phosphoribosyl transferase A), has been suggested to accommodate to its functional role in two pathways. Although this possibility was anticipated as outlined earlier, the natural bifunctional proteins of actinomycete bacteria did not resemble that obtained experimentally (46) in terms of amino acid sequence matches. Barona-Gómez and Hodgson (9) suggested that the bifunctional actinomycete isomerases represent an ancient evolutionary state that is in line with the recruitment hypothesis (38). If so, specialization in the gene duplicate that became trpC must have required more divergence than the gene duplicate that became hisA because the homology of PriA proteins with HisA is evident but not with TrpC proteins.


   GENE FUSIONS
 Top
 Previous
 Next
 References
 

Phylogenetic Distribution of trp Gene Fusions

Each of the trp genes has been involved in various prokaryote fusion events except for trpEa. In some eukaryotes, however, trpEa and trpEb are fused (12, 16). Indeed, in Euglena, all of the trp genes except for trpAa and trpAb are fused together to form a pentafunctional protein (74). A trpD•trpB fusion is known in only a single instance (Archaeoglobus fulgidus), and a trpC•trpEb fusion is also thus far known in a single case (Coxiella burnetii). The remaining fusion types, all in the Bacteria, show an erratic distribution that is phylogenetically incongruous when mapped on the 16S rRNA tree (Fig. 4). Thus, the trpAbtrpB fusion is present not only in a small subcluster of the enteric bacteria, but also in the remote taxa Thermotoga maritima and Campylobacter jejuni. The trpD•trpC fusion, present throughout most of the enteric lineage (gamma proteobacteria), is also present in the widely separated Helicobacter pylori and in species of Corynebacterium. (In this case, we have already mentioned that a single origin followed by LGT events is likely.) Two distinct types of trpAatrpAb fusions have occurred, one dedicated to primary biosynthesis (denoted trpAa•trpAb) and the other to phenazine pigment synthesis (denoted trpAa•trpAb_phz). The considerable extent of amino acid changes in TrpAa•TrpAb_phz has resulted in a shortened protein which no longer allows the ADIC product to continue through the ADIC lyase reaction to yield anthranilate, as is the case with anthranilate synthase (Fig. 1).



View larger version (33K):
[in this window]
[in a new window]
 
FIG. 4. Mapping of the distribution of Trp pathway gene fusions to the 16S rRNA tree. The presence of fusion subtypes is color-coded as indicated in the legend. Although Buchnera aphidicola maps near E. coli on the 16S rRNA tree, as shown, its true point of divergence is probably prior to Yersinia, as portrayed by dotted lines in Fig. 8.

 
A priori, the scattered phylogenetic distribution of these gene fusions could be attributed to (i) LGT, (ii) an initial ancestral fusion (of rare occurrence) followed by numerous events of gene loss in different lineages, or (iii) independent gene fusions (therefore being of relatively frequent occurrence). Table 3 shows that all of the gene fusions exhibit a GC content that is similar to that of the resident genome. Thus, either these did not originate by LGT, the donor genome fortuitously had a similar GC content, or the LGT event occurred sufficiently long ago that amelioration has masked LGT. Unpublished data (Xie and Jensen, unpublished data) support the occurrence of many of the gene fusions as independent events of evolutionary innovation. Although the trpD•trpC fusions in coryneform bacteria and in Helicobacter pylori originated from the enteric lineage by LGT, the comparison of parametric data, e.g., GC content, does not reflect this, probably due to amelioration.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Comparison of GC content in gene fusions and cognate genomes

 
Nested Gene Fusions

Jensen and Ahmad (1, 41) proposed that a series of nested gene fusions could be exploited as markers of phylogenetic branch points in prokaryotes. Thus, any organism that belongs to the enteric lineage (shaded green in Fig. 2D) shown in Fig. 4 would be expected to possess the trpD•trpC fusion, provided that the pathway has not been lost. At a more narrow hierarchical level, any organism belonging to the E. coli/S. enterica serovar Typhimurium/Klebsiella pneumoniae clade would be expected to possess the trpAb•trpB fusion as well. Thus, the clade defined by the trpAb•trpB fusion is nested within the more ancient clade defined by the trpD•trpC fusion. The presence of an AroQ•AroA (chorismate mutase•DAHP synthase) fusion in Listeria, Bacillus species, and Staphylococcus but not in Enterococcus, Streptococcus, or Lactococcus is consistent with the suggestion made earlier that, contrary to the 16S rRNA tree, the order of branching is slightly different, so that these group diverged at a deeper tree position.

The ultimate analysis of the total inventory of fused genes in any given genome should provide an excellent phylogenetic tool for deducing the order of branching. This approach should be greatly enhanced by the rapid increase in the number of sequenced genomes coupled with the enormous advantage of being able to identify gene fusions with bioinformatic methods. However, it was not expected at the time that fusions could occur independently at such frequencies or that LGT should be taken seriously. Therefore, application of the approach of nested gene fusions will require sufficient background work to recognize and discriminate fusion clusters that have independent origins on the vertical tree as well as ones that might have been spread in the horizontal direction by LGT.


   Trp PATHWAY GENE ORGANIZATION IN THE ARCHAEA
 Top
 Previous
 Next
 References
 
In general the Archaea deploy the Trp pathway genes as whole-pathway operons or as partial-pathway operons (Fig. 5). A very limited amount of experimental work provides data supporting the qualitative existence of regulation at the transcriptional level (26, 77). Rearrangements of gene order following events of inversion, translocation, and gene loss have been sufficiently dynamic that it is currently not possible to deduce the gene order of the common ancestor without more closely spaced genome representation. The only certainty would appear to be the existence in the archaeal ancestor of the partial gene orders ->trpAa->trpAb and -> trpEb_1->trpEa.



View larger version (34K):
[in this window]
[in a new window]
 
FIG. 5. Organization of trp operon genes in the Archaea. Each trp gene is color coded differently, including the two subtypes of trpEb (Eb_1 and Eb_2) (92). trp genes that exist in the genome unlinked to any other trp genes are not shown. Archaeoglobus fulgidus has a trpD•trpB gene fusion (see Fig. 4). Intergenic spacing is shown, with negative values indicating gene overlap. Genes that are not specific trp pathway genes are in white boxes. F. acidarmanus possesses a gene encoding the aroAIß subclass (44) of DAHP synthase. aspC in S. solfataricus is an aromatic aminotransferase of the I{gamma} aspartate aminotransferase type (42). This gene insertion corresponds to genes that appear to have escaped from the aro operons shown in Fig. 10. The gene order shown for Methanosarcina barkeri is the same as those in Methanosarcina acetivorans and Methanosarcina mazei. The gene order shown for S. solfataricus is the same as that for Solfolobus tokodaii.

 
In the compact Pyrococcus genus, P. horikoshii has lost the entire pathway. Although the trp operons of P. abyssi and P. furiosus are virtually identical, great variation can be seen for the remainder of aromatic biosynthesis (see Fig. 2A). In the Crenarchaeota grouping (Pyrobaculum, Aeropyrum, and Sulfolobus; lowest clade of Fig. 5), dramatic scrambling of gene order is apparent. This group has replaced trpEb_1 with trpEb_2. trpEb_2 is a distinct subgroup of trpEb that is present mainly in Archaea and that may often (but not always, as indeed exemplified by the Crenarchaeota) have a separate stand-alone function (92).

Usually, the pair of genes encoding the two subunits of tryptophan synthase are adjacent in prokaryotes. In the case of P. aerophilum, trpEa and trpEb_2 have been separated from one another within the operon. This may reflect the inability of trpEb_2 to form a complex with trpEa. In P. aerophil