Previous Article | Next Article ![]()
Microbiology and Molecular Biology Reviews, March 2008, p. 13-53, Vol. 72, No. 1
1092-2172/08/$08.00+0 doi:10.1128/MMBR.00026-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
The Computation Institute, University of Chicago, Chicago, Illinois 60637,1 Mathematics and Computer Science, Argonne National Laboratory, Argonne, Illinois 60439,2 Bioscience Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545,3 Fellowship for Interpretation of Genomes, 15W155 81st Street, Burr Ridge, Illinois 60527,4 Emerson Hall, University of Florida, P.O. Box 14425, Gainesville, Florida 326045
SUMMARY INTRODUCTION TyrA AND L-TYROSINE BIOSYNTHESIS Enzyme Order Alternatives Dictate Substrate Specificity Patterns Strict specificity for prephenate. Broad specificity. Strict specificity for L-arogenate. Patterns of substrate specificity and regulatory interplay in Tyr/Phe branches. Coexisting Pathway to L-Tyrosine in Some Anaerobic Organisms How Common Is Variation of Substrate Specificity? Same-pathway ambiguity. Multipathway ambiguity. The TyrA Supradomain Cohesion Groups Rigorous unit of analysis. Expansion via concatenation: supercohesion groups. TyrA HOMOLOGY ISLANDS: AN ASSEMBLAGE OF COHESION GROUPS Multimember and Orphan Cohesion Groups Xenolog Intruders Intra-Cohesion-Group Intruders Correspondence of Cohesion Groups with Formal Taxon Ranks TWO TyrA SUBHOMOLOGY GROUPS The Master Cohesion Group Alignment Motif Variations Conserved at the Level of Cohesion Group Four Regional Sequence Sections That Differentiate TyrAalpha from TyrAβ COFACTOR DISCRIMINATOR REGION Specificity Motifs Cofactor Specificity Divergence in TyrCG-17 SNAPSHOTS OF TyrA CHARACTER STATES IN A PHYLOGENETIC CONTEXT A Tool To Track Character State Variations Phylogenetic Boundaries Xenolog Intruders Substrate Specificities Gene Fusions Gene Context of tyrA Data That Are Relevant to the Indel Hypothesis ORGANISMS THAT CARRY MULTIPLE HOMOLOGS PapC, a Functionally Specialized Paralog Intra-Cohesion-Group TyrA Paralogs Extra-Cohesion-Group TyrA Paralogs Ortholog/Xenolog Combinations SIGNIFICANCE OF THE TyrAalpha/TyrAβ SCHISM Lateral Gene Transfer between Superkingdoms? Does Membership within TyrAβ Reflect Protein-Protein Interactions? Utility of Cohesion Group Snapshots Are Essential Extradomain Contacts Needed for TyrA Members of TyrAβ? Interesting Specificity Issues Expanding the Evolutionary Context across Subsystems CANDIDATE TyrA PROTEINS FOR X-RAY CRYSTAL STUDIES Challenge of Broad-Specificity Reactions Informative Selections from TyrAalpha Subhomology Group Members Informative Selections from TyrAβ Subhomology Group Members Inhibition Properties: Insight into Binding of the 1-Carboxy Moiety? Selections Based upon Other TyrA Features The Snapshot Tool for Facilitating Selection Choices for Comparative Analysis Example 1. Example 2. Experimental Truncation of Fused Domains COMPARISON OF TYROSINE AND TRYPTOPHAN PATHWAY COHESION GROUPS Background Lower Gammaproteobacteria Upper Gammaproteobacteria and Betaproteobacteria Alphaproteobacteria Epsilonproteobacteria Deltaproteobacteria Firmicutes Cyanobacteria Actinomycetes Emerging Perspective TRACKING MILESTONE EVOLUTIONARY EVENTS ACROSS SUBSYSTEMS Gene Fusion Aromatic Biosynthesis in the Subclass Actinobacteridae Aromatic Biosynthesis in the Superphylum Bacteroidetes/Chlorobi OVERVIEW PERSPECTIVE APPENDIX Determination of Cohesion Groups Web Resources at the SEED TyrA subsystem home page. Navigating to and within the Protein Pages. Sortable character state snapshots. Semiautomation of cohesion groups. Web Resources at AroPath ACKNOWLEDGMENTS REFERENCES
|
|
|---|
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Definitions of terms used
|
The current TyrA assemblage consists of two subhomology groupings designated TyrA
(40 cohesion groups) and TyrAβ (18 cohesion groups). Evidence in support of the thesis that the TyrAβ subhomology grouping consists of TyrA enzymes that interact with either fused domains or complexed domains of other enzymes is presented. Multiple examples of the logic used to make evolutionary conclusions are given, and examples of tentative evolutionary scenarios that are experimentally testable are also given. Motif variations conserved within a cohesion group are discussed as reflections of probable mechanistic variations of an otherwise widely conserved mechanism. How a rationale can be developed to select key organisms that have ideal phylogenetic placements to advance an overall analysis by filling information gaps with experimental data is demonstrated. Systematic procedures to manage and organize otherwise overwhelming amounts of data are described. Web resources are introduced, which are interactive and freely available. A set of character state snapshots that are displayed on a sortable set of cohesion group trees using tools developed at the SEED (http://theseed.uchicago.edu/FIG/Html/tyrASubsystem.html). This includes a viewer link that displays the context of gene organization around tyrA genes within a cohesion group. The approaches herein applied should be easily applicable to other metabolic subsystems.
|
|
|---|
![]() View larger version (17K): [in a new window] |
FIG. 1. Alternative flow routes between prephenate and L-tyrosine. The L-arogenate (AGN) flow route to L-tyrosine (TYR) is initiated when prephenate (PPA) is transaminated to produce L-arogenate. A specific and irreversible arogenate dehydrogenase (TyrAa) then converts L-arogenate to L-tyrosine. The 4-hydroxyphenylpyruvate (HPP) flow route to L-tyrosine is initiated when prephenate is utilized by a specific and irreversible prephenate dehydrogenase (TyrAp). An aromatic aminotransferase then transaminates 4-hydroxyphenylpyruvate to produce L-tyrosine. Broad-specificity dehydrogenases that are capable of using both prephenate and 4-hydroxyphenylpyruvate as reaction substrates are known as cyclohexadiencyl dehydrogenases (TyrAc). AA, amino acid; KA, keto acid.
|
The TyrA family of dehydrogenases also exhibits varied specificities for the pyridine nucleotide substrate that can be accepted. Thus, some are specific for NAD+, some are specific for NADP+, and some will utilize either cofactor (again varying through a wide continuum of preference for the cofactor).
In the following assessment of substrate specificities, it should be noted that various technical pitfalls for working with crude extracts and partially purified enzyme preparations have been recognized over the years. Adequate controls are needed to ensure that prephenate is not contaminated with L-arogenate or prephenyllactate (83), that a phosphatase is not converting NADP+ to NAD+ to give a false-positive result for NADP+ reactivity, that an oxidase is not recycling a reduced cofactor product back to the oxidized form to give unduly low (or null) apparent activities, and that apparent prephenate dehydrogenase activity is not in fact due to the production of L-arogenate via prephenate aminotransferase. Functional complementation of a mutant deficient in a known prephenate-specific dehydrogenase is not proof that the heterologous donor gene specifies a prephenate-specific enzyme because prephenate, accumulated at abnormally high concentrations behind the block, can be anomalously transaminated in vivo to L-arogenate. Indeed, a tyrA mutant of Salmonella enterica serovar Typhimurium, widely used as a source of prephenate, is also the main source of L-arogenate for biochemical preparations (8). Some of these phenomena have been responsible for errors in older literature. Saccharomyces cerevisiae is an example of an organism that has sometimes been assumed to possess a prephenate-specific TyrA dehydrogenase, but we are not aware of rigorous enzymological data in support of this.
Strict specificity for prephenate. Prephenate-specific dehydrogenases (TyrAp) are thus far limited to two experimental documentations. One is within a large clade of gram-positive bacteria related to Bacillus subtilis, where the most detailed enzymological characterization remains that described previously Champney and Jensen (17). Here, the specificity for prephenate is coupled with specificity for NAD+. The other set of experimental data are from Gluconobacter oxydans, Brevundimonas vesicularis, Brevundimonas diminuta, and species of Acetobacter (13; data not shown). This group couples specificity for prephenate with specificity for NADP+. (All of the latter organisms are also distinctive in the possession of two other character states: an arogenate-specific dehydratase for phenylalanine synthesis and a single 3-deoxy-D-arabino-heptulosonate synthase of a distinctive homology type [AroAII] [38]), which is sensitive to tryptophan-mediated feedback inhibition.) Unfortunately, genomes of species of Brevundimonas (previously named Pseudomonas) have yet to be sequenced. Caulobacter crescentus is inferred to have a prephenate/NADP+-specific dehydrogenase by virtue of its close relationship with Brevundimonas species within the family Caulobacteraceae as well as the motif similarity in the G-rich cofactor discriminator region (see Fig. 4). Gluconobacter and Acetobacter belong in common to the family Acetobacteraceae. By extrapolation, it is possible that the prephenate/NADP+ specificity combination (and perhaps the other two character states) might persist throughout two orders (Caulobacterales and Rhodospirillales) of the Alphaproteobacteria. However, there is a report (51) of specificity for the arogenate/NAD+ substrate combination in Phenylobacterium immobile, which belongs to the family Caulobacteraceae. The sequence of P. immobile is not yet available, and it will be interesting to see whether this unexpected result might be explained by acquisition via LGT.
![]() View larger version (59K): [in a new window] |
FIG. 4. Selected examples of motifs in the discriminator region for cofactor binding. N-terminal TyrA sequence patterns that distinguish specificity for NAD+ (top), specificity for NADP+ (middle), and the ability to accept either cofactor [NAD(P)+] (bottom) are shown. Sequences shown begin with the last G (residue 11) of the GxGxxG motif in the Wierenga fingerprint (73). The variable gap of the Wierenga fingerprint is shown as a gray column. Examples of the smallest gap (one residue) and the largest gap (nine residues) are given. Two different patterns are shown for the NADP+ category, and two patterns are shown for the broad-specificity category. Motifs that center around the all-important residue 36 are shown for each of the five groups.
|
Broad specificity. An early wide-ranging enzymological survey revealed the ubiquity of dual-specificity cyclohexadienyl dehydrogenases (TyrAc) (13). The implication is that an uncertain mixture of both orders of reaction may be ongoing simultaneously in a single organism. Beyond the many subsequent characterizations of partially purified enzymes cited in the following references, detailed studies of purified cyclohexadienyl dehydrogenases include those cloned from Zymomonas mobilis (86), Erwinia herbicola (75), and Pseudomonas stutzeri (77).
Strict specificity for L-arogenate. L-Arogenate-specific dehydrogenases (TyrAa), also fairly widespread in nature, have been purified and characterized from a cyanobacterium (Synechocystis sp.) (10) and from a higher plant (Arabidopsis thaliana) (64). All photosynthetic bacteria and photosynthetic eukaryotes studied thus far possess L-arogenate-specific, NADP+-specific dehydrogenases. This specificity combination is present in the enzymes from red algae and green algae (9) as well as from Euglena gracilis (14). Coryneform bacteria, other actinomycetes, and Nitrosomonas europeae exemplify bacteria whose possession of L-arogenate-specific dehydrogenases are well documented (see reference 67 and references therein). Although the Nitrosomonas enzyme provides yet another example where specificity for the L-arogenate/NADP+ couple exists, the L-arogenate-specific enzymes from coryneform bacteria will utilize either cofactor, whereas L-arogenate-specific enzymes from most actinomycetes (39, 40) other than coryneform bacteria exhibit NAD+ specificity.
One plausible and interesting selective basis for the enzymatic utilization of L-arogenate and the avoidance of 4-hydroxyphenylpyruvate as an intermediate of L-tyrosine biosynthesis is to prevent cross-pathway complications in cases where 4-hydroxyphenylpyruvate has additional functional roles in metabolism that could lead to futile cycling. For example, the catabolism of L-tyrosine often deploys an initial transamination step that generates 4-hydroxyphenylpyruvate, which could wastefully enter the biosynthetic pathway. An additional example is when 4-hydroxyphenylpyruvate formed from L-tyrosine is utilized as a biosynthetic precursor of plastoquinone and vitamin E, as is uniquely typical of photosynthetic organisms. It is likely no accident that photosynthetic organisms typically utilize L-arogenate as an obligatory intermediate of L-tyrosine biosynthesis, thus avoiding the possibility that 4-hydroxyphenylpyruvate molecules that should be plastoquinone precursors would erroneously enter the L-tyrosine biosynthetic pathway (futile cycling). It is an intriguing example of metabolic plasticity that the latter coupling of biochemical pathways (L-arogenate for L-tyrosine biosynthesis and 4-hydroxyphenylpyruvate for plastoquinone/vitamin E biosynthesis) results in a novel situation where L-arogenate is a precursor of 4-hydroxyphenylpyruvate, with L-tyrosine serving as the intermediate. Thus, in this case, 4-hydroxyphenylpyruvate, rather than being an intermediate of tyrosine biosynthesis, is a following, posttyrosine intermediate of plastoquinone biosynthesis.
Patterns of substrate specificity and regulatory interplay in Tyr/Phe branches. Organisms such as Bacillus subtilis that deploy a specific prephenate dehydratase and a specific prephenate dehydrogenase at the prephenate branchpoint (the classic pathway configuration) have a regulatory domain known as the ACT domain (49) attached to each of the competitively positioned enzymes to accomplish direct feedback inhibitions that are easily visualized. However, a less straightforward (albeit rather common) pattern for the biosynthesis of L-phenylalanine and L-tyrosine in nature is the utilization of L-arogenate for L-tyrosine synthesis but not for L-phenylalanine synthesis. This occurs in cyanobacteria (69), coryneform bacteria (24-26), and other actinomycetes such as Amycolatopsis methanolica (1). In fact, in the absence of early information that L-arogenate could be a precursor of phenylalanine, L-arogenate was initially named "pretyrosine" (69). With this pathway configuration (consult the figure at http://www.aropath.lanl.gov/Visualizations/TyrPath/TyrPath.htm), the tyrosine branch is unsuited for direct allosteric control. This is because at the branchpoint in this pathway configuration, the prephenate aminotransferase reaction is catalyzed by an aromatic aminotransferase, none of which have ever been found to be subject to allosteric control. It seems likely that catalytic interference caused by the structural overlap of the L-tyrosine end product with the substrates that can be accommodated by aromatic aminotransferases would account for this. On the other hand, the phenylalanine branch is well equipped for allosteric control (since prephenate dehydratase [PheA], which competes with prephenate aminotransferase at the prephenate branchpoint, catalyzes an irreversible initial step of substrate commitment). The ACT domains of cyanobacterial and coryneform PheA proteins mediate a novel mechanism of control to balance flux to both end products. PheA is subject to opposing influences of allosteric activation by L-tyrosine and allosteric feedback inhibition by L-phenylalanine. Starvation for L-phenylalanine enhances the flow of prephenate to L-phenylalanine due to an unrestrained PheA enzyme that is not only transiently free from feedback inhibition by L-phenylalanine but also activated by endogenous L-tyrosine. On the other hand, starvation for L-tyrosine results in the potent inhibition of PheA by endogenous L-phenylalanine, which relieves prephenate aminotransferase from competition with PheA at the branchpoint, thus enhancing flux toward tyrosine. In this manner, L-tyrosine synthesis is indirectly regulated by an enzyme of L-phenylalanine synthesis. It is intriguing that Pseudomonas aeruginosa exhibits a similar pattern whereby flux to L-phenylalanine is regulated directly and flux to L-tyrosine is regulated indirectly. Here, rather than deploying an arogenate dehydrogenase, a cyclohexadienyl dehydrogenase is used. Since the sole chorismate mutase for aromatic biosynthesis is fused to prephenate dehydratase, prephenate is channeled toward L-phenylalanine preferentially. Potent feedback inhibition of prephenate dehydratase by L-phenylalanine allows the release of prephenate from the complex and its utilization for L-tyrosine biosynthesis. This has been described as a channel-shuttle mechanism of regulation (15).
With the background that TyrA proteins that are specific for prephenate are suitable for highly sensitive allosteric control and therefore likely to possess an allosteric domain such as the ACT domain, one might expect that all TyrA proteins that are fused with an ACT domain would be prephenate specific or at least exhibit an overwhelming preference for prephenate. However, TyrA from Streptomyces has an ACT domain but has been reported to be L-arogenate specific (39, 40). This is surprising because the implied inhibition of arogenate dehydrogenase by L-tyrosine could occur, albeit with less refinement, via direct product inhibition without an ACT domain. Moreover, the selective value of this inhibition, however implemented, is questionable because it would cause the accumulation of L-arogenate, which cannot enter the L-phenylalanine pathway directly, requiring back-transamination to prephenate first. One possible mechanism to explain the role of an ACT domain in keeping phenylalanine and tyrosine synthesis balanced would be for L-phenylalanine to activate arogenate dehydrogenase (via the ACT domain) in addition to inhibiting prephenate dehydratase. Another possibility is that Streptomyces might deploy an arogenate dehydratase instead of the much more ubiquitous prephenate dehydratase, thus placing L-arogenate at the metabolic branchpoint (an alternative pathway pattern). If so, backed-up L-arogenate caused by the inhibition of arogenate dehydratase and arogenate dehydrogenase by L-phenylalanine and L-tyrosine, respectively, may in turn feedback inhibit the initial common-pathway step of aromatic biosynthesis (in a pattern of sequential feedback inhibition similar to that discovered in higher plants) (21). This illustrates how an organized basis for desirable experimental inquiries can be driven by detailed analyses that are grounded in phylogenetic context, a point made recently by Osterman (58).
Same-pathway ambiguity. The TyrA family exemplifies same-pathway ambiguity. In most cases, the chemistry needed to build a given molecule dictates a particular order of steps that must be followed. In the case of L-tyrosine biosynthesis, modification of the side chain (via aromatic aminotransferase) and decarboxylation/aromatization (via dehydrogenase) are not interdependent. Thus, the overall conversion of prephenate to L-tyrosine can be accomplished with either order of steps. This is potentially true for any pathway where enzymatic chemistries performed are independent of one another. It would not be surprising if many such ambiguities exist but have not yet been recognized. For example, within the early common aromatic pathway, dehydroquinate proceeds to shikimate in two steps: dehydration (dehydroquinate dehydratase) and reduction (shikimate dehydrogenase). There is no reason a priori that these two steps could not occur in the opposite order, in which case quinate (rather than dehydroshikimate) would be the unique intermediate. Quinate dehydrogenase is widely known as a catabolic enzyme but potentially could perform as a biosynthetic enzyme in some systems.
Multipathway ambiguity. A fuller modern appreciation of the extent of substrate ambiguity has been greatly accelerated by the contemporary surge in research designed to find and exploit substrate ambiguity for biotechnological objectives. It has become increasingly apparent with modern techniques of metabolite detection that the number of metabolites present in an organism far exceeds the number of genes that would be required if the gene product/enzymes were specific (66). Macchiarulo et al. (50) applied a sophisticated docking algorithm in a computational study that revealed a very high potential for cross-reactivity of endogenous metabolites and enzymes in metabolic reactions. There are two levels of enzymatic promiscuity. In addition to substrate ambiguity (34), it has become clear that surprisingly many enzymes can catalyze seemingly disparate reactions (catalytic promiscuity) that are normally classified as different types of reactions (55). Kurakin (46) made the case that both substrate ambiguity and catalytic promiscuity are in fact expected features in a new paradigm of dynamic and adaptive protein structure. In this paradigm, major and established biochemical pathways operate against a background where many diverse "micrometabolites" are fortuitously generated, a background thought to supply latent evolutionary potential.
Even a minimal sampling of the very recent literature reveals a rapid proliferation of new examples. These include (i) a detailed assessment of the basis for the catalytic promiscuity of E. coli alkaline phosphatase, which can also act as a sulfatase (16); (ii) a new family of lactonases that hydrolyze a variety of lactones, possess low phosphotriesterase activities, and have been shown to be the source of a newly evolved and highly efficient phosphotriesterase (2); (iii) a gentisate dioxygenase that also functions with 1,4-dihydroxy-2-napthoate and salicylate (31); (iv) an ATP-dependent hexokinase from Sulfolobus tokadaii that can phosphorylate glucose, mannose, glucosamine, and N-acetylglucosamine (54); (v) a higher-plant isopropylmalate synthase that not only condenses acetyl coenzyme A (acetyl-CoA) with 2-ketoisovalerate but will also accept 2-oxo acid substrates of two-carbon to six-carbon lengths (19); (vi) a number of variations in the substrate specificities of glutathione synthesis enzymes in comparison to E. coli, Streptococcus agalactiae, and Clostridium acetobutylicum (42); (vii) an amino acid racemase from Pseudomonas putida with an unusual breadth of specificity for amino acids (43); (viii) ATP-forming acetyl-CoA synthetases that accept acetate, propionate, and some longer straight- and branched-chain acyl substrates (32); (ix) an isochorismate pyruvate lyase from Pseudomonas aeruginosa that also has weak chorismate mutase activity (45); and (x) Sulfolobus species that condense pyruvate and aldehydes with two to four carbon atoms (phosphorylated or not) (74). D-2-Hydroxyacid dehydrogenase from Haloferax mediterranei exhibits interesting parallels to the broad-specificity TyrA variants. This D-stereospecific enzyme has broad specificity for alpha-keto carboxylic acids and dual coenzyme specificity (NADH and NADPH) (20). This is striking because most members of this family are NADH dependent. A thorough and scholarly recent review on the subject of enzyme promiscuity was written by Khersonsky et al. (41).
It should be noted that the above-described consideration of same-pathway and multipathway ambiguities is not all-comprehensive with respect to the large topic area of variations that occur in reaction/substrate/cofactor specificity, e.g., phosphorylation in alternative positions of some carbohydrates by the same enzyme and alternative positions of cleavage in the same peptide by protease, etc.
A supradomain of about 180 amino acids that is central to TyrA proteins has been identified (10, 77). All TyrA sequences used in this analysis have been trimmed to the boundaries of the supradomain and are available for download (http://theseed.uchicago.edu/FIG/tyra_sequence.cgi). Well-characterized TyrA proteins from Neisseria gonorrhoeae (70), Zymomonas mobilis (86), and Synechocystis sp. (10) as well as the engineered TyrA domain from Pseudomonas stutzeri (77) represent phylogenetically well-spaced proteins (cohesion groups 2, 9, 12, and 16) that exemplify the minimal domain length. It has been suggested (77) that the foregoing four sequences, although of different specificities, define a basic catalytic domain. In this model, it was proposed that the specificity for the side chains of the substrates utilized would parallel the specificity for side chains of inhibitors that are postulated to bind directly to the active site. The only difference between the prephenate and L-arogenate substrate molecules is the side chain, which remains unaltered in the coupled overall reactions of oxidative decarboxylation and aromatization (Fig. 1). Thus, for example, N. gonorrhoeae TyrA has an overwhelming preference for prephenate (pyruvyl side chain) and exhibits classical competitive inhibition by the product 4-hydroxyphenylpyruvate (pyruvyl side chain) but is insensitive to inhibition by L-tyrosine (alanyl side chain).
Xie et al. (80) assembled trees for the seven individual tryptophan pathway enzymes from then-available prokaryotes in a comprehensive analysis in which divergent paralogs and xenologs engaged in specialized metabolic activities were sorted out from the genes dedicated to primary biosynthesis. Examination of the distribution of gene fusions and gene organization patterns in a context where these distributions were mapped to the 16S rRNA tree elucidated a variety of lineage-specific evolutionary trends. Landmark evolutionary events of operon splitting and rejoining could be reconstructed by following individual divergences in narrow phylogenetic slices and placing these together in a broader phylogenetic context. With avoidance of errors due to ancient paralogy and LGT, one can deduce the most likely character state(s) that represents a given phylogenetic node. The hierarchical placement of each node is determined by the membership of a cohesion group. The more dynamic the evolutionary pace and therefore the greater the divergence, the more narrow (albeit more informationally enriched) the phylogenetic piece captured and therefore the more shallow the position of the node will be. If nodes at the bottom of the phylogenetic tree are sufficiently well represented to deduce any given character state(s) at those nodes, one can hope to apply parsimony principles to deduce the most likely common ancestor at progressively more ancient nodes, thus moving backwards in evolutionary time. It was shown (80) how contexts of flanking genes at relatively shallow hierarchical levels can illuminate which of two evolutionary states is ancestral and which is derived.
Expansion via concatenation: supercohesion groups. The above-cited work was the basis for a follow-up effort in 2004 (78), which showed that a concatenation of the seven tryptophan pathway proteins yielded protein trees made up of individual sections that, while exhibiting an uncertain connectivity with one another, were each congruent with a portion of the 16S rRNA tree. Ten orphan concatenates were also obtained from genomes with no close relatives among the finished genomes. The seven single-protein tryptophan pathway trees were compared to the concatenate tree. They faintly resembled the concatenate tree but with much weaker support (depending upon highly individualistic degrees of conservation and protein length).
Since the cohesion group approach is fundamental to the thrust of this review, some clarification of terminology is in order. Proteins whose sequences cluster together with high bootstrap values on a phylogenetic tree comprise a cohesion group. Most or all of these proteins are from organisms that also cluster together on a 16S rRNA tree, and this fraction of the cohesion group defines an evolutionary progression of the encoding gene in a vertical genealogy. Genes encoding one or more members of a cohesion group may have been transferred to phylogenetically distant organisms via LGT, and the protein thus will not fit 16S rRNA expectations. Such cohesion group members are called intruder sequences, and the genome possessing it is mosaic with respect to the encoding gene. Cohesion groups that are assembled by the concatenation of two or more proteins of a metabolic pathway are called supercohesion groups. A protein or concatenated protein that is too divergent to share membership in cohesion groups or supercohesion groups is called an orphan sequence and is the sole occupant of an orphan cohesion group or supercohesion group.
Tryptophan pathway congruency groups within the Bacteria were so named because most or all members of a given group were congruent with 16S rRNA expectations. However, some congruency groups contain "intruder" sequences that, due to LGT, are not congruent with 16S rRNA expectations. To avoid semantic confusion, we herein rename these groups "cohesion groups," since each group is a uniformly cohesive collection of sequences that all originated from a relatively recent ancestor. A given protein member of a cohesion group either is congruent with 16S rRNA expectations and therefore embedded within a vertical genealogy or is an intruder sequence that was translocated to an alien host organism via LGT. LGT of several whole-pathway trp operons and a few partial-pathway trp operons complicated but did not obscure the vertical genealogical trace (78). Indeed, the events of paralogy and xenology could be sorted out because of their demonstrated context within a discernible genealogical trace. The cohesion group approach with the tryptophan pathway subsystem facilitated new and very detailed evolutionary inferences that could be broadly applied to the kingdoms Bacteria and Archaea. In this paper, the cohesion group approach is extended to another branch (TyrA) of aromatic amino acid biosynthesis, with an ultimate objective of extending and integrating the knowledge base to the remainder of this large, multibranched pathway (and indeed with related metabolic subsystems).
|
|
|---|
) or another (TyrAβ) of two subhomology regions. This tree is used throughout much of this paper as a visually apt way to display various character state features of the cohesion groups. Table 2 provides a list of the organisms containing the sequences chosen to represent the 58 TyrA cohesion groups that are displayed in Fig. 2. The choices were made arbitrarily because any sequence in a cohesion group (even if it is an intruder sequence) is considered to be equally representative of the cohesion group. Table 2 provides the organism abbreviations, the identification numbers in use for sequences at the SEED, and the gi numbers for sequences at the NCBI. The online version of Table 2 (extended table) is hyperlinked to the NCBI taxonomy browser, to the appropriate protein pages at the SEED, and to NCBI gene records. The rightmost column of Table 2 indicates the taxonomic grouping where the cohesion groups are distributed. For example, in TyrCG-1, the TyrA sequence of Erwinia carotovora is representative of multiple orders within the Gammaproteobacteria (but not in all orders of this class). In TyrCG-16, the TyrA sequence of Synechocystis sp. is representative of the entire phylum of Cyanobacteria. The five cohesion groups of the class Betaproteobacteria that are listed are each present at the taxon level of a different family within that class.
![]() View larger version (37K): [in a new window] |
FIG. 2. Islands of cohesion groups displayed on a phylogenetic tree. Trimmed supradomain sequences, one representing each cohesion group or orphan and aligned as shown in Fig. 3, were used as input into a tree program as described in the Appendix. The resulting radial tree, visualized using TREEVIEW software (62), displays all of the unconnected cohesion groups. Two distinct subhomology groupings are evident: TyrA (highlighted blue) and TyrAβ (highlighted yellow). See Table 2 for a succinct identification of each cohesion group. A complete, expanded version of Table 2 is available online (http://theseed.uchicago.edu/FIG/Html/TyrAExtended.html). Bootstrap values at all nodes are less than 58%, and therefore, the order of branching shown is not certain. The arrows indicate nodes that are common to TyrA sequences present in most upper Gammaproteobacteria (left arrowhead) or present in most Betaproteobacteria (right arrowhead). See the appendix for a URL for a website at which the organisms indicated by the four-letter codes are identified.
|
|
View this table: [in a new window] |
TABLE 2. Sources and properties of representative sequences of TyrA cohesion groups
|
What is the rationale for the conclusion that the Nostoc genes in the above-described example arrived as intruder sequences rather than the opposite scenario, namely, that the genes from the lower Gammaproteobacteria are LGT intruders derived from Nostoc? Nostoc species are in the same taxon family as species of Anabaena, and Anabaena lacks the intruder sequences. Hence, if Nostoc were the LGT donor, the LGT would have occurred at a relatively recent time after its divergence from the genus Anabaena. In order to account for the possession of the LGT-derived gene by all of the lower Gammaproteobacteria, this fairly recent time would have had to overlap with the more ancient time when the common ancestor of lower Gammaproteobacteria existed, i.e., before divergence to various orders and after divergence from the upper Gammaproteobacteria. These times of Nostoc/Anabaena divergence and upper Gammaproteobacteria/lower Gammaproteobacteria divergence clearly do not overlap, as can be qualitatively assessed by inspection of the appropriate nodes of a 16S rRNA tree. At a hierarchical level of superorder for lower Gammaproteobacteria compared with a level of genus for Nostoc, the lower Gammaproteobacteria lineage is qualitatively older than the Nostoc lineage (even allowing for the uneven hierarchical taxon designations that exist). A gene from a younger lineage cannot have been passed to a common ancestor of the older lineage via LGT because that ancestor would have already diverged very substantially. In short, the common ancestor of lower Gammaproteobacteria could not have been an LGT donor to a Nostoc recipient because the more recent Nostoc lineage had not yet separated at the time when the common ancestor of lower Gammaproteobacteria emerged. Accordingly, it would be feasible for Nostoc to be an LGT donor to only some restricted divergent portion of the lower Gammaproteobacteria membership but not to all of it.
TyrCG-13 is striking because it contains all of the current TyrA sequences from two taxonomic classes (Flavobacteria and Epsilonproteobacteria), each belonging to a different phylum. One set must be derived from a relatively ancient intruder sequence that was acquired from a member of the other set via LGT. The rationale for concluding that TyrA sequences in the class Flavobacteria arose as an intruder that arrived via LGT from an Epsilonproteobacteria source is explained later in this paper, where Fig. 9 is discussed.
![]() View larger version (31K): [in a new window] |
FIG. 9. Tracking milestone evolutionary events in the group Bacteroidetes/Chlorobi. The dendrogram (not drawn to scale) enumerates character states inferred to be present in the common ancestor of the superphylum at the top. Various evolutionary events affecting genes of the ancestral trp and aro operons are indicated at appropriate lineage positions. At the bottom, the gene organizations of the trp operons and the aro operons present in contemporary classes of the phylum Bacteroidetes are shown.
|
As a second example, TyrCG-25 contains TyrA sequences from two organisms in different phyla. Petrotoga miotherma is assumed to carry an intruder TyrA sequence derived from a relative of Dictyoglomus miotherma by LGT, and this is based upon the following line of logic. Petrotoga miotherma has a fairly close relative, Thermotoga maritima, whose TyrA sequence is an orphan. Their TyrA sequences would be expected to belong to the same cohesion group because the divergence of TyrA into multiple cohesion groups is usually not seen below the taxon rank of family. Thus, considering the relationship of TyrA sequences from Petrotoga, Thermotoga, and Dictyoglomus, a single LGT event of transfer of TyrA from within the Dictyoglomus lineage to Petrotoga would simultaneously explain why the TyrA sequences from Dictyoglomus and Petrotoga belong to the same cohesion group and why the TyrA sequences from Petrotoga and Thermotoga do not belong to the same cohesion group. Thus, with the information presently available, the former possibility is the most parsimonious inference. Nevertheless, a conservative approach is taken to still label TyrCG-15 as an "unresolved phylogenetic mixture" until the inference made above can be verified or denied with the help of more genome sampling.
Along these lines, it is instructive to revisit the phenomenon whereby the trp operon has been inserted into the middle of a six-member aromatic pathway (aro) operon concomitant with the gain of the regulatory gene mtrB, the loss of trpAb from the trp operon, and the subsequent conscription of pabAb to perform the amidotransferase function for both the tryptophan and p-aminobenzoic acid pathways (80). Note that this constitutes a suite of four different, but interwoven, character states. At the time of the previous study, the organisms known to have these character states were limited to Bacillus subtilis, Bacillus halodurans, and "Bacillus stearothermophilus." Taxonomic revision has resulted in the placement of "B. stearothermophilus" into a different genus, Geobacillus (53). An additional Geobacillus genome, G. kaustophilus, as well as some additional Bacillus species are now available. The trp operon insertion and the associated character states can now be updated. They are all present in both of the Geobacillus species and in the following clade of Bacillus species: B. clausii, B. subtilis, B. halodurans, and B. licheniformis. Other Bacillus species (B. cereus, B. anthracis, and B. thuringiensis) lack the trp operon insertion and the three associated character states. Thus, in light of these updates, the simplest scenario is that the trp operon insertion into the aro operon, the loss of trpAb, the broadened functional role of pabAb, and the gain of mtrB regulation occurred initially as dynamic innovations in Geobacillus. Subsequently, the supraoperon was transferred via LGT to a common ancestor of the Bacillus clade and was positioned in the aro operon region by displacement via the recombination of flanking homolog genes. The transferred fragment could have been as long as mtrA>mtrB>hepS>menH>hepT>ndk>cheR>aroG>aroB> aroF>trp operon>hisHb>tyrA>aroF>tpr (the supraoperon is shown in boldface type), with recombination perhaps occurring between the mtrA and tpr orthologs (consult Fig. 11 in reference 80 for a view of this conserved gene region). Note that this would have cotransferred the unique trp regulatory gene mtrB, which encodes TRAP (trp RNA binding attenuation protein) (28). The assertion of an intra-cohesion-group LGT that is herein made is amenable to confirmatory follow-up in that protein trees for most or all of the proteins encoded by genes that flank the trp genes should give the same result as that obtained with the TyrA protein tree, namely, that the proteins of one set of Bacillus species are more similar to their counterparts in Geobacillus than to the remaining set of Bacillus species. If so, a significant evolutionary jump (sufficient to define a new trp cohesion group) has occurred in Geobacillus, and the suite of new character states have fairly recently been passed to a common ancestor of a fraction of the Bacillus genus via LGT. Genes flanking the trp operon may not have been much different in comparison of the donor and recipient of LGT. Accordingly, TyrA proteins from all Bacillus species populate the same cohesion group regardless of LGT from Geobacillus or not. Indeed, TyrA proteins from the entire class Bacilli populate a single cohesion group, except for the Symbiobacterium thermophilum orphan. In contrast, the tryptophan subsystem has experienced such dynamic evolutionary changes within Geobacillus that a new trp supercohesion group (based upon the concatenation of Trp proteins) has emerged. This multicharacter set of genes has then exerted quite a profound effect, via LGT, upon a clade of closely related species in a nearby genus. Since Geobacillus strains are comprised of thermophilic species, the above-mentioned proteins in that fraction of Bacillus species that have a Geobacillus origin might tend to have retained the characteristics of high thermotolerance of Geobacillus. This is experimentally testable.
In the near future, when small cohesion groups expand to a better size for analysis, it should be possible to obtain fine-tuned protein trees that will allow inferences of credible LGT events within a given cohesion group. The availability of more genomes representing the genera Bacillus and Geobacillus in particular (as well as the class Bacilli in general) should allow this to be accomplished with the trp/aro multigene system.
A striking list of many divergent character state features of aromatic amino acid biosynthesis points to two distinct subdivisions of the class Gammaproteobacteria. We have termed these the lower Gammaproteobacteria and the upper Gammaproteobacteria. With respect to the multiple character states of aromatic amino acid biosynthesis and regulation, all of the formal Gammaproteobacteria taxon orders (except one) partition cleanly into either the lower Gammaproteobacteria or the upper Gammaproteobacteria. Thus, we treat the Gammaproteobacteria as being comprised of two superorders: (i) the lower Gammaproteobacteria, containing the orders Enterobacterales, Pasteurellales, and Vibrionales and most families within the Alteromonadales, and (ii) the upper Gammaproteobacteria, containing the orders Chromatiales, Oceanospirillales, Pseudomonadales, and Xanthomonadales and part of the Alteromonadales (67). The latter so far consist only of genera within the family Alteromonadaceae, e.g., Marinobacter and Microbulbifer.
The wide variation in the taxon rank delineated by the organisms whose TyrA sequences belong to a particular cohesion group can be attributed to (i) differing evolutionary dynamics in different lineages and (ii) uneven and erratic taxonomic subdivisions in formal nomenclature schemes (i.e., generously sampled and highly studied groupings become subject to more subdividing than do sparsely represented groupings). In general, it is predictable that TyrA sequences from organisms belonging to the same formal taxon up to the level of family will belong to the same cohesion group and will share similar character state properties.
|
|
|---|
and TyrAβ subhomology groups. Although this important bifurcation was not previously recognized, in retrospect, the same split was shown previously (see Fig. 3 in reference 67). Figure 3 shows the master cohesion group alignment that was used to generate the tree portrayed in Fig. 2. Based upon comparisons of TyrA sequences from members of the TyrA
subhomology grouping with the TyrA sequences of E. coli and its closest relatives (which are all TyrAβ members), it was previously concluded (prior to the recognition of a distinct TyrAβ grouping) that the TyrA sequence of E. coli and its close relatives is distinguished from the other sequences by insertion/deletion (indel) structuring (10, 71). Indel structuring refers to a general case where a protein domain makes functionally important contacts with another protein domain to which it is fused. In sequence alignments with homolog counterparts that are not fused and functionally independent of the second protein domain, there are regions of amino acid insertion or deletion that may disrupt conserved and functionally important sequence motifs of the unfused protein. It is envisioned that such important regions are compensated for by a region of the fused protein partner, which exercises an appropriate contact (indel contact). Compensatory indel contacts may operate in both directions for fused proteins, as appears to be the case for the mutually dependent activities of TyrA and chorismate mutase, which are fused in E. coli.
![]() ![]() View larger version (177K): [in a new window] |
FIG. 3. Master alignment of cohesion group representatives. The final manual alignment of 58 cohesion group representatives (see the appendix) was imported from the BioEdit alignment editor into the Word program to enhance presentation. TyrA sequences are shown in the top section bounded at the top and bottom by sequences (Synechocystis sp. and Aquifex aeolicus) for which X-ray crystal structures are available. TyrAβ sequences are shown at the bottom. Amino acid residues shown to be important for NADP+ or for NAD+ in Synechocystis sp. and Aquifex aeolicus, respectively (48, 71), are shown in red with white lettering. Residues modeled in Synechocystis sp. and Aquifex aeolicus to be important for L-arogenate or for prephenate binding, respectively (48, 71), are shown in blue with white lettering. Relative residue position numbers are shown across the top. Invariant or near-invariant anchor residues are enclosed within vertical bars and highlighted yellow. Other highly conserved residues are shown in boldface type and highlighted yellow. Near-invariant residues that differ in a cohesion group representative, but which are nevertheless uniformly different throughout the cohesion group, are shown in boldface green type. The gray vertical band encloses residues in a variable loop (one to nine residues). Divergently pointed arrows at residue positions 216 and 217 mark the boundary between the pyridine nucleotide-binding domain and the catalytic domain. Regions that distinguish TyrA and TyrAβ, as discussed in the text, are marked with numbers within triangles.
|
cohesion group representatives (top) with all 18 TyrAβ cohesion group representatives (bottom). Our collection of trimmed supradomain sequences (10) was used as input into the alignment program. These trimmed sequences (available for download from dropdown boxes activated by cohesion group mouseovers of Fig. 2 online [http://theseed.uchicago.edu/FIG/Html/tyrACGTree.html]) begin with the residues that define the Wierenga fingerprint (73) in the pyridine nucleotide discriminator region at the N terminus of TyrA proteins. Thus, each sequence has been trimmed to begin five residues upstream of the GxGxxG motif (note that three of the cohesion groups within TyrAβ appear to possess an alternative GxxGxxG motif, utilized elsewhere among some other dehydrogenases; these are TyrCG-4, TyrCG-15, and TyrCG-82). For convenience of presentation, the alignment of Fig. 3 does not show about 30 to 35 residues at the C terminus of the supradomain sequences since no patterns of conservation are evident there (however, the complete trimmed supradomain sequences can be obtained at the SEED as described in the Appendix). The vertical gray zone near the N terminus contains from one to nine residues deemed to be within the variable loop of the Wierenga fingerprint. No gaps were allowed prior to position 41 except in the variable loop.
) exhibit variations in this motif. It is striking, considering the moderate overall conservation of TyrA sequences, that these DxxSxK motif deviations alone are currently reliable signatures that distinguish the TyrCG-19, TyrCG-81, TyrCG-98, and Tyr-85 cohesion groups. Tyr-98 (containing 14 sequences from fungi) is additionally exceptional at position 155, being the only cohesion group that does possess a proline residue at this highly important position. Such conserved variations undoubtedly correspond to interesting mechanistic variations of an otherwise widely conserved mechanism. As such, these should merit the attention of protein chemists.
from TyrAβ
from members of TyrAβ are indicated by numbers enclosed within diamonds (Fig. 3) as follows. First, The G-rich region of TyrA
is quite orderly, usually being GxGLIGGS and never having adjacent or intervening charged residues. In contrast, the same region of TyrAβ typically has intervening or adjacent R or K residues (shown in red). Occasionally, this region of TyrAβ contains negatively charged D or E residues (shown in blue). Positions 17 to 19 of TyrA
are frequently occupied by the motif ALK/R, with the 19K/R being highly conserved. This motif is altogether absent in TyrAβ. Second, the motif surrounding the highly conserved 158G is 157AGxExxGxxxxxxxL171 in TyrA
, whereas in TyrAβ, the motif is 157FGP159. Note the possibility that the latter motif really corresponds to 162xGx164 of TyrA
. In other words, it may be the G residue at position 163 that is conserved throughout the entire homology family rather than the G residue at position 158. While the former region has been shown to be an important active-site region in both of the X-ray crystal studies done in TyrA
organisms (48, 71), this appears to be a region of indel disruption in TyrAβ. Third, the motif 232SHLPH236 is highly conserved in TyrA
, where it has been shown to be an important active-site region in crystallography studies (48, 71). However, only 236H is conserved in TyrAβ. Fourth, the motif 274GxR/KDxS/TR284, present in TyrA
as an important active-site region (48, 71), is disrupted in TyrAβ, where only the equivalent of the invariant 284R is matched (although here it can be R or K). This motif is discussed in later sections of this review, where, for convenience, it is referred to as the RxxxR motif. Finally, in TyrA
sequences, the motif 290PxMWxDI296 consists of putative active-site residues (48, 71), but this region is totally disrupted in TyrAβ sequences. |
|
|---|
β discriminator region at the N terminus of TyrA. A negatively charged residue (D or E) at position 36 (Fig. 3) is all-important for hydrogen binding to the diol group of the ribose near the adenine moiety in NAD+-specific enzymes. A negatively charged residue at position 36 absolutely precludes NADP+ utilization. An asparagine residue at position 36 appears to enable the binding of both NAD+ and NADP+ (67, 85). Most of our curated TyrA sequences can be assigned to one of three specificity classes: specific for NAD+, specific for NADP+, or able to utilize either cofactor. Figure 3 shows an alignment of some representative TyrA discriminator regions in order to illustrate the recognizable patterns. The alignment begins with the third G of the GxGxxG motif; this corresponds to residue 11 of the Wierenga fingerprint (73). Residues that occupy a variable loop (positions 22 to 30) are shaded in Fig. 3 and 4, and gaps are allowed only in this region. The classic Wierenga fingerprint allows for a variable loop containing two to five residues, but our alignment studies suggest that the loop can contain from one to nine residues. Thus, TyrA from Gloeobacter violaceus has only a single residue within the variable loop, whereas the TyrA proteins from Gluconobacter oxydans and from Helicobacter hepaticus contain nine residues within the variable loop. As the name would imply, these variable-loop regions are not always highly conserved within a cohesion group. Thus, Leptospira interrogans and Fibrobacter succinogenes, the two members of TyrCG-28, have variable-loop regions that show few matches (Fig. 4). Such differences in match identities and also in loop lengths can be seen in Fig. 4 for the various Betaproteobacteria and upper Gammaproteobacteria that have been selected. In other cases, though, the variable-loop regions seem to be surprisingly consistent, as exemplified by the cyanobacteria and by the Actinobacteridae.
Figure 4 illustrates selected examples of motifs associated with specificities for NAD+ (top), NADP+ (middle), and NAD(P)+ (bottom). Each specificity category is represented within both the TyrA
and TyrAβ subhomology groups, as indicated on the right. NAD+-specific enzymes possess a D (or occasionally an E) at position 36; this aspartate (or glutamate) residue acts to repel the negatively charged NADP+. The majority of cohesion groups (33 of 58) possess a D residue at position 36, while three cohesion groups (the latter all being in TyrA
) possess an E residue at position 36. TyrA from E. coli and all other members of TyrCG-1 display a DW motif corresponding to positions 36 and 37. This alone is sufficient to distinguish sequence members of TyrCG-1 from any other TyrA sequences. The placement of a D residue at position 36 in the alignment is usually unambiguous. However, Arabidopsis thaliana in the second grouping shown in Fig. 4 illustrates a case where, without the benefit of experimental data, the D residue at position 39 could easily have been aligned to position 36 without creating an abnormally short variable loop. However, rigorous experimental data allow the association of this sequence pattern with NADP+ specificity, and it can be seen that the Arabidopsis thaliana sequence in the cofactor discriminator region aligns well with other experimentally known NADP+-specific enzymes, such as those from Nitrosomonas europaea, Acinetobacter sp., and Synechocystis sp.
NADP+-specific enzymes typically deploy one G/S/T/A residue at position 36, and this is followed most commonly by RS (but sometimes by RR or RK). A second pattern of NADP+ specificity (36G/A/S/TxxxRxR42) was recognized from the sequence from Gluconobacter oxydans, which is known experimentally to be NADP+ specific (and prephenate specific). The pattern from Fibrobacter succinogenes and Caulobacter crescentus matches this quite well. Here, the positively charged R residue, normally located at position 37, is shifted three positions downstream, and the R residue at position 42 may be significant as well.
A broad capability to utilize either of the two cofactors is achieved by one of two variations: a 36GxxR39 motif and a 36N motif. The 36GxxR39 motif, as seen in some of the TyrA sequences from the Betaproteobacteria (70), resembles the 36G/S/T/AR37 motif of NADP+-specific enzymes. From an inspection of Fig. 4, one could envision an evolutionary transition from 36G/S/T/AR37 to 36GxxR39 to have occurred by the insertion of VG (or similar residues) after the G at position 36, displacing the important basic R residue to position 39. Of course, the opposite scenario, whereby two residues are deleted from 36GxxR39, is equally plausible. The presence of asparagine at position 36 correlates with the ability to use both cofactors, as established by experimental data from Corynebacterium glutamicus (24, 25). Interestingly, the 36NRS38 variation correlates with an order-of-magnitude preference for NADP+ in coryneform bacteria, whereas the 36GxxRS40 variation correlates with equal preference for NAD+ or NADP+ in Betaproteobacteria such as Ralstonia. Thus, in cases where 36N is not followed by RS (all in organisms not yet examined experimentally), it would be interesting to know if the preference for NADP+ is lessened, perhaps markedly.
Some cohesion groups have a split membership with respect to cofactor specificity (see Fig. 6, panel 10). One of these, TyrCG-17, is discussed in detail below.
![]() ![]() View larger version (121K): [in a new window] |
FIG. 6. Snapshots of character state features. Eighteen panels are shown as mini-semblances of the bifurcated tree of cohesion groups portrayed in Fig. 2. Various character states of interest are displayed on these trees to facilitate comparisons. The organisms in all three domains of life that host the various TyrA cohesion groups are profiled in panels 1 to 8. The numbers at the branch ends in panels 2 to 8 indicate the total number of sequences within the cohesion group. An appropriate fraction of a given branch is color coded if the cohesion group has a "mixed" membership. Thus, in panel 3, the proximal half of the TyrCG-13 branch is color coded for the nine sequences of the Epsilonproteobacteria. In panel 6, the other (distal) half of the branch is color coded to indicate the nine TyrA sequences from the class Flavobacteria (Bacteroidetes). The locations of cohesion groups containing intruder sequences are identified in panel 9, e.g., the Flavobacteria mentioned above. TyrA character states associated with cofactor and cyclohexadienyl substrate specificities are displayed in accord with the color-coded legends (panels 10 and 11). In panel 10, "?NADP or NAD(P)?" means that whether the enzyme is NADP+ specific or whether it can use either cofactor is unknown, but we know that it cannot be NAD+ specific. The amino acid lengths of trimmed core supradomain TyrA sequences are given at the branch ends of panel 12. TyrA enzymes encoded by tyrA genes fused to other genes are depicted in panel 13. TyrA enzymes encoded by tyrA genes which are isolated from other aromatic pathway genes are shown in panel 15. The color-coded legends for panels 17 and 18 show conserved motifs (Fig. 3), which are disrupted or absent in the indicated cohesion groups (or a fraction thereof). These panels can be accessed at http://theseed.uchicago.edu/FIG/Html/TyrAPanels.html, where they can be expanded and sorted in order to facilitate comparisons. The interactive panels are linked to the extended table in order to quickly view the membership of any cohesion group of interest.
|
![]() ![]() View larger version (180K): [in a new window] |
FIG. 5. Divergence of cofactor specificity within cohesion group TyrCG-17. TyrA sequences from members of nine families of the order Actinomycetales and one (Bifidobacterium longum) from the family Bifidobacteriaceae within the order Bifidobacterales were aligned by entering the appropriate trimmed sequences into ClustalX, carrying out manual adjustments with the aid of the BioEdit alignment editor, and entering the final alignment into the Phylip program. The alignment (A) and the tree visualized with TREEVIEW (B) were imported into Word to enhance presentation. The Bifidobacterium longum sequence is shown in the middle of A for comparison with TyrA sequences from the single family (Corynebacterinceae) members in the bottom block and with members of the remaining families of the Actinomycetales (top block).
|
Members of TyrCG-17 are thought to all be L-arogenate specific, and it is perhaps surprising that the narrowing of cofactor specificity for the upper block of sequences in Fig. 5A is associated with changes throughout the entire TyrA sequence rather than just the N-terminal domain. However, existing X-ray crystal studies have pointed to a substantial functional intercalation of the two domains comprising the TyrA supradomain. In another cohesion group, TyrCG-3, the three member sequences share the character state of being broad-specificity cyclohexadienyl dehydrogenases but differ in cofactor specificities, with two members being NAD+ specific and one being NADP+ specific. Here, where the overall amino acid identity is high (56%), the divergence of cofactor specificity has triggered more sequence divergence in the N-terminal domain (46% identity) than in the C-terminal domain (74% identity) (the join point of the N-terminal and C-terminal domains is marked by divergent arrows in Fig. 3).
|
|
|---|
and TyrAβ subhomology groups differ from one another. Online at the SEED (http://theseed.uchicago.edu/FIG/Html/TyrAPanels.html), clicking the "compare TyrA panels" option allows a choice of up to three panels for side-by-side comparison. The individual panels are expandable with a built-in magnifier, and links are provided at the top for navigation to the extended table.
subhomology grouping. The allocation of cohesion groups among the major taxa within the superkingdoms of Archaea and Eukaryota are shown in panels 7 and 8, respectively, of Fig. 6. Numbers at the ends of branches (panels 2 to 8) indicate the numbers of sequences contained within a cohesion group. Hence, branches labeled with a "1" indicate an orphan sequence. In panels 5 and 6, the presence of cohesion groups is displayed at the level of phylum, except for the Actinobacteridae (subclass) in panel 6. Panels 2 to 4 show the distribution of cohesion groups present in the various class divisions of the phylum Proteobacteria. The Gammaproteobacteria have been separated into the lower Gammaproteobacteria "superorder" and the upper Gammaproteobacteria "superorder" because extraordinarily dynamic evolutionary jumps in the lower Gammaproteobacteria have created qualitatively significant distinctions. Indeed, features of aromatic biosynthesis in the upper Gammaproteobacteria and the Betaproteobacteria are much more similar to one another than is the case when upper Gammaproteobacteria and lower Gammaproteobacteria are compared (67, 80). It is interesting that a significant change of state of the histidine operon, whereby a gene fusion is embedded in a compact operon, occurs uniquely in exactly the same organisms that we refer to as the lower Gammaproteobacteria (23). The gene organization of the histidine pathway for the upper Gammaproteobacteria differs sharply from that of the lower Gammaproteobacteria. The phylum Proteobacteria exhibits relatively great overall divergences with respect to TyrA sequences such that cohesion groups usually parallel a formal order or a collection of orders. Only TyrA sequences from the Epsilonproteobacteria are represented at the class taxon level as members of a single cohesion group (Fig. 6, panel 3). The lower Gammaproteobacteria (consisting of the orders Enterobacteriales, Pasteurellales, Alteromonadales, and Vibrionales) possess TyrA sequences that populate TyrCG-1 in the TyrAβ subhomology group. Among the upper Gammaproteobacteria, members of the order Pseudomonadales possess TyrA sequences that fall into TyrCG-2 and TyrCG-3. Members of the order Xanthomonadales possess TyrA sequences that belong to TyrCG-4 in the TyrAβ subhomology grouping. Members of the order Chromatiales possess TyrA sequences that belong to TyrCG-5, except for Nitrosococcus oceani, whose TyrA sequence is an orphan. Members of the order Oceanospirillales possess TyrA sequences that belong to TyrCG-6.
One member of TyrCG-6 (Marinobacter aquaeolei) as well as one orphan of the upper Gammaproteobacteria (Microbulbifer degradans, recently reclassified as Saccharophagus degradans) are classified at the NCBI as belonging to the Alteromonadales. However, TyrA members present in the Alteromonadales are otherwise housed by lower Gammaproteobacteria. M. aquaeolei and M. degradans clearly seem to have multiple properties characteristic of upper Gammaproteobacteria. They lack many evolved characteristics of lower Gammaproteobacteria. For example, a member of the latter superorder (exemplified by species of Shewanella within the Alteromonadales) possesses TyrAβ, an aroHI-tyrA fusion, a tyr operon containing a newly emerged paralog encoding a third regulatory isoenzyme of 3-deoxy-D-arabino-heptulosonate 7-phosphate (DAHP) synthase, a tyrR regulatory gene, and a complete trp operon including a trpD-trpC fusion. These are all newly evolved character states that typify lower Gammaproteobacteria (more detail can be found in Fig. 7 and Table 6 in reference 67). In striking contrast, all upper Gammaproteobacteria (including the above-mentioned Marinobacter aquaeolei and Microbulbifer degradans) possess the TyrA
subhomology type of TyrA, they lack the aroHI-tyrA fusion, they lack a tyr operon containing a gene encoding a regulatory isoenzyme of DAHP synthase, they lack a tyrR repressor gene, tyrA is in fact within a supraoperon containing other aromatic pathway genes, and a "split" trp operon (80) is present. In short, organisms currently contained within the Alteromonadales are a mixture of lower Gammaproteobacteria and upper Gammaproteobacteria that are sharply distinguished by a suite of differing character states.
![]() View larger version (28K): [in a new window] |
FIG. 7. Independent tyrA-aroF fusions in proteobacterial amino acid sequences of TyrA-AroF fusions from the upper Gammaproteobacteria and the Betaproteobacteria were aligned with TyrA and AroF concatenates from other members of these proteobacterial divisions where these genes are unfused. The alignment was used to obtain the Phylip tree shown. Values of bootstrap support are indicated at nodes. Proteins encoded by tyrA-aroF fusions are enclosed within the orange patterning.
|
The distributions of the three specificity patterns for the cyclohexadienyl substrate are shown in panel 11 of Fig. 6. At least within the TyrA
subhomology grouping, the broad-specificity cyclohexadienyl dehydrogenases appear to be most common. Relatively few specificities within the TyrAβ grouping are known. Both the cofactor substrate and cyclohexadienyl substrate specificities are listed in the right column of the extended table, to which panels 10 and 11 are linked online after choosing "compare TyrA panels." Specificities that are considered to be certain are displayed in boldface type in the table; specificities thought to be probable but not certain are shown in lightface type.
and TyrAβ subhomology groupings, as displayed in Fig. 6, panel 13. The aroHI homolog of chorismate mutase has fused with tyrA in two cohesion groups within TyrAβ, and it is evident that these must have been independent gene fusions. In another case, the same two genes are fused, but here, aroHI is fused to the C terminus of tyrA (tyrA-aroHI). (Note that aroHI is well known in the literature as aroQ; consult a study by Okvist et al. [56] for an alternative classification of chorismate mutase subtypes). tyrA is fused to aroF in some of the upper Gammaproteobacteria and Betaproteobacteria (see below). The closest relatives of the latter that lack the fusion typically possess adjacent tyrA-aroF genes. To compare the gene organization at the level of cohesion group, one can use the "gene neighborhood" button in the extended table as described below. tyrA is fused to several known regulatory genes called ACT and REG, sometimes in combination with other structural genes such as aroHI and pheA. Other N-terminal or C-terminal extensions of tyrA exist (Fig. 6, panel 14), which could be regulatory domains. Some fusions are present in only a fraction of the cohesion group membership, indicating that the origin of these fusions is recent, i.e., a new fusion event or a recent LGT. Browsing the individual membership of a given cohesion group in the extended table allows one to view the existing fusion identities (in one of the right columns). Gene organization is not highly conserved and can be quite erratic, even within short phylogenetic distances (33). Even operons are surprisingly vulnerable to disruption, as documented in detail with the trp operon (80). However, functionally related genes frequently retain linkage relationships over at least short phylogenetic distances, sometimes with distinct shuffling patterns. The comparative analysis of gene clusters can be extremely informative, yielding valuable functional and evolutionary clues. Examples of how this approach can elucidate functional roles for "missing genes" have been reported (30, 59, 61).
Each cohesion group section of the extended table has an arrowhead button after the cohesion group number, which allows navigation to a direct single-view comparison of the gene organizations surrounding tyrA within that cohesion group. These are extracted from all of the individual graphics that appear on the Protein Pages of each sequence at the SEED for which there is a current identification number. This accommodates a very convenient way to view the extent to which the gene organization is consistent within a cohesion group. Phylogenetic groupings at about the level of class often exhibit sufficient conservation of gene synteny that an ancestral gene organization can be deduced. Nevertheless, extensive gene shuffling occurs such that individual lineages will have highly scrambled (or even unrecognizable) versions of the consensus gene organization. The admixture in a given phylogeny of gene organizations conserved over relatively great phylogenetic distances (stability) in combination with dramatic gene shuffling over short phylogenetic distances (instability) is one of the intriguing mysteries of genomics. A detailed example of this was analyzed (67) in the upper Gammaproteobacteria and Betaproteobacteria, where a proposed ancestral supraoperon is gyrA>serC>aroQ-pheA>hisHb>tyrA>aroF>cmk>rpsA>himD. Only Ralstonia metallidurans in the Betaproteobacteria has a "perfect" ancestral supraoperon. Most of the other Betaproteobacteria exhibit very minor supraoperon alterations such as open reading frame insertions and single-gene deletions. Occasionally, more drastic gene shuffling (Chromobacterium violaceum) or partial supraoperon translocation (Nitrosomonas europaea) has occurred. At one extreme (species of Neisseria), the genes of the supraoperon have been completely dispersed. An entirely parallel situation is found in the upper Gammaproteobacteria, where most organisms house near-perfect ancestral supraoperons that differ only slightly in having gene insertions, gene deletions, or gene fusions. Pseudomonas aeruginosa, for example, possesses gyrA>serC>aroHI-pheA>hisHb>tyrA-aroF>cmk>rpsA>himD. Multiple fragmentation of the supraoperon has occurred elsewhere, e.g., in species of Xanthomonas and Xylella. It is quite striking that the supraoperon gene arrangement of R. metallidurans (Betaproteobacteria) is more similar to that of P. aeruginosa (upper Gammaproteobacteria) than to the supraoperon compositions of many other Betaproteobacteria. In reciprocal fashion, the P. aeruginosa supraoperon gene arrangement is more similar to that of R. metallidurans than to those of many other upper Gammaproteobacteria.
The data described above illustrate that within a manageable phylogeny (cohesion group), a particular order of dynamic events of gene ordering can be deduced, yielding a likely ancestral gene order. Parallel analyses at nearby phylogenetic nodes with a roughly equivalent hierarchical level can then lead to a systematic deduction of the ancestral synteny that predated those deduced for the sister nodes.
members is distinctly greater than those of TyrAβ members. The amino acid lengths shown at the ends of branches in panel 12 specify the length of the representative sequence for the cohesion group. These are quite consistent within the cohesion group. The individual cohesion group supradomain sequences can be downloaded for comparison from the pop-up menu provided with the online version of Fig. 2 (http://theseed.uchicago.edu/FIG/Html/tyrACGTree.html). Gene fusions shown in panel 13 are discussed above. In the context here, fusion events tend to have occurred most frequently within the TyrAβ subhomology group. Panels 16 to 18 are designed to examine motifs that are generally conserved within TyrA
but not within TyrAβ. The RxxxR motif (occupies positions 276 to 284 in Fig. 3, a numbering that takes into account an inserted three-residue gap for alignment purposes) is present in nearly all cohesion groups of TyrA
but is absent throughout most of TyrAβ. Panels 17 and 18 identify a number of motifs (having amino acid numbers given in Fig. 3) which again are generally conserved within TyrA
but not within TyrAβ. The color coding shows the cohesion groups that lack a given motif. Examples of the application of the snapshot tool are pursued in some detail in later sections of this review.
|
|
|---|
and TyrAβ subhomology groupings. Streptomyces coelicolor possesses two paralogs (TyrAa and PapC) that occupy the same cohesion group (TyrCG-17). The PapC paralog is encoded by a gene within the calcium-dependent antibiotic cluster (65) and possesses an alanine residue at position 154. The TyrA and PapC homologs in S. coelicolor are closely related intra-cohesion-group paralogs, of which PapC presumably arose recently by gene duplication, followed by a novel specialization of substrate specificity. It is interesting that calcium-dependent antibiotic contains a variety of nonprotein amino acids. (Note that this PapC paralog is not shown in the various figures and tables of this review in order to maintain focus upon the functional role of TyrA.) Surprisingly, all remaining PapC paralogs (which have an asparagine residue at position 154) reside in a single cohesion group located in the TyrAβ assemblage (not shown in Fig. 2). These are present in Photorhabdus luminescens, Photorhabdus asymbiotica, Pseudomonas fluorescens, Streptomyces venezuela, Streptomyces pristinaespiralis, and Rhodococcus sp. The two Photorhabdus species also possess TyrA homologs within TyrCG-2, whereas Rhodococcus sp. appears to lack TyrA. TyrA homologs have not been identified in the remaining organisms, but these are unfinished genomes. The latter PapC proteins occupy a single cohesion group and thus probably share a fairly recent common ancestor. However, they are hosted by diverse taxa, so most or all of them might be xenologs. Another possible explanation for the unexpectedly close sequence similarity in diverse taxa is selective pressure for evolutionary convergence. If PapC proteins form a complex with one or more other proteins of the antibiotic synthesis pathway, similar but independently evolved constraints dictating crucial protein-protein interactions may have forced evolutionary convergence. This is similar to the convergence proposed to explain the TrpAa-TrpAb_PhzE clustering for proteins engaged in a step of phenazine pigment biosynthesis in species of Pseudomonas and Streptomyces (78). It is also similar to the indel hypothesis invoked in the following section of this review to explain the convergence of cohesion groups in the TyrAβ subhomology grouping.
subhomology group. TyrCG-30, on the other hand, belongs to the TyrAβ subhomology grouping and contains two TyrA sequences in addition to the paralogs from R. rubrum and S. pomeroyi. Maricaulis maris, a finished genome that also belongs to the Alphaproteobacteria, lacks a paralog member in TyrCG-12. Thus, M. maris is so far alone among the Alphaproteobacteria in its complete reliance upon a TyrAβ-specified dehydrogenase for tyrosine biosynthesis. The fourth member of TyrCG-30 is from Myxococcus xanthus (Deltaproteobacteria and an unfinished genome). The latter is provisionally labeled as an intruder sequence, although the alternative scenario, that the M. xanthus sequence is a native sequence from which the TyrAβ intruder sequences present in a few genera of Alphaproteobacteria originated, certainly cannot be ruled out. It is interesting that TyrA from M. xanthus is the only member of TyrCG-30 to have a fused chorismate mutase domain (tyrA-aroHI), distinctive from other chorismate mutase fusions because it is a C-terminal fusion. Regardless of whether M. xanthus was an LGT donor or recipient, the fusion must have occurred after the LGT event.
/TyrAβ SCHISM
|
|
|---|
, many cohesion groups are also represented within TyrAβ. How is this to be explained? Firstly, the possibility must be considered that some or all bacterial sequences that belong to the TyrAβ subhomology grouping originated from an archaeal or eukaryal source by LGT. Secondly, the possibility is considered that members of TyrA
act as independent catalysts, whereas members of TyrAβ exhibit constraints that have driven convergence. These constraints reflect dependence upon contacts with a fused or complexed protein. These possibilities are discussed in turn.
subhomology group. However, a scattered number of bacterial sequences also belong to the TyrAβ grouping. Among the Proteobacteria, the latter include all of the lower Gammaproteobacteria (TyrCG-1), TyrCG-4 from the upper Gammaproteobacteria, a small group of TyrA
sequences from the Alphaproteobacteria (TyrCG-30) (also containing one intruder sequence carried by a deltaproteobacterium), and TyrCG-15, which is populated by two sequences from the Deltaproteobacteria. No Betaproteobacteria or Epsilonproteobacteria that host proteins belonging to the TyrAβ subhomology grouping are currently known. The phylum Bacteroidetes is represented by TyrCG-24 and TyrCG-23 in the TyrA
and TyrAβ subhomology groups, respectively.
The Alphaproteobacteria exhibit some novel variations. Most of them contribute to a 38-member cohesion group (TyrCG-12), which, along with an orphan sequence (Pelagibacter ubique), belong to the TyrA
subhomology group. Three Alphaproteobacteria have members that occupy the TyrAβ subhomology group (TyrCG-30). Two of the latter (Rhodospirillum rubrum and Silicibacter pomeroyi) also host paralogs among the above-mentioned group of 38, thus being the only organisms so far known to possess a TyrA member of each subhomology group. The third member of TyrCG-30 (Maricaulis maris) is the only alphaproteobacterium whose sole TyrA sequence belongs to TyrAβ.
Could all of the bacterial sequences that fall into the TyrAβ subhomology group be explained as acquisitions from archaeal or eukaryotic donors via LGT? If so, multiple LGT events would have had to occur independently in different bacterial lineages since those Bacteria whose sequences belong to the TyrAβ subhomology grouping do not cluster together in a common lineage. None of the seven cohesion groups within the TyrAβ subhomology grouping that have bacterial membership contain a sequence of the Archaea or Eukaryota that would implicate an LGT donor. This, of course, is also true for the two bacterial orphan sequences present in the TyrAβ subhomology grouping. Since genomic sampling is still quite minimal in the Archaea, it is possible that the LGT donors are simply unknown. However, the probability of this is lessened considering that a donor has not materialized on nine different occasions.
are postulated to function independently of any protein partners. In a previous paper (67), it was noted that some TyrA sequences, such as that from E. coli, possessed distinctive indel structuring (insertions and deletions) in alignments with what are here called TyrA
subhomology group members. The above-described types of sequences (herein named TyrAβ) were originally named TyrAc_
(cyclohexadienyl dehydrogenases that have indel structuring). The previous TyrAc_
designation is herein abandoned in favor of the current TyrAβ designation (one which does not imply any substrate specificity). This indel hypothesis is stimulated largely by experimental work with E. coli and some close relatives. Thus, TyrA from E. coli (and all other lower Gammaproteobacteria) is fused at the N terminus with chorismate mutase (AroHI). Chen et al. (18) demonstrated that neither chorismate mutase nor cyclohexadienyl dehydrogenase reactions of E. coli are fully competent when isolated from one another. Sun et al. (71) cited a variety of other documentation to suggest that the two fused domains are functionally dependent. There is the suggestive correlation that lower Gammaproteobacteria have the fusion and belong to TyrAβ, whereas the closely related upper Gammaproteobacteria lack the fusion and belong to TyrA
. Xanthomonas and Xylella species (TyrCG-4) are exceptions among the upper Gammaproteobacteria in that they belong to the TyrAβ subhomology grouping. However, these TyrA species exhibit another fusion pattern: a C-terminal fusion with ACT, a broadly distributed regulatory domain. The intruder TyrA sequences present in species of Nostoc which are derived from the lower Gammaproteobacteria lineage possess an N-terminal extension that appears to be a remnant of the fused chorismate mutase, otherwise found in TyrCG-1. Key catalytic residues needed for chorismate mutase activity have not been conserved. It is interesting to consider that the extension nevertheless persists in order to maintain the domain-domain interactions proposed for TyrAβ enzyme species. This would be worthwhile to test experimentally since one can potentially evaluate what regions are needed to support TyrA activity without complications related to chorismate mutase activity. In addition to fusions with AroHI and the ACT domain, other members of TyrAβ exhibit fusions with a domain called REG (67) or have sequence extensions that may be unknown regulatory domains. Thus, cohesion groups that fall within the TyrAβ subhomology grouping consist of sequences that have experienced a wide variety of different and independent indel events postulated to be associated with functional domain-domain interactions. This variety plus normal phylogenetic divergence explain the separation of cohesion groups within the TyrAβ subhomology grouping. However, at the broadest level, the cohesion group members of TyrAβ have converged because they have the indel disruption of highly conserved motifs that are shared by members of TyrA
in common.
The indel hypothesis does not require that members of the TyrA
subhomology group lack TyrA fusions and that members of the TyrAβ subhomology group possess TyrA fusions, although this is certainly the trend (Fig. 6, panels 13 and 14). In some cases, TyrA
members do carry a fusion. Here, TyrA is presumably not dependent upon the fused domain for function. In support of this, Xie et al. (77) showed that TyrA from Pseudomonas stutzeri was not affected when separated from its AroF fusion partner. It is also suggestive in this context that closely related species of Burkholderia share membership in TyrCG-7 (TyrA
subhomology grouping), even though some of them possess a fusion of tyrA with aroF and some do not. In those cases where TyrAβ members have no fusion or sequence extensions, we suggest that these associate with another protein to form a complex and that such protein-protein contacts are functionally important. Panel 12 of Fig. 6 shows that the length of the core supradomain is typically shorter in members of TyrAβ than in members of TyrA
, an observation that is consistent with indel deletions that might be compensated for by an extradomain protein partner region.
Sequence convergence following the independent fusion of interacting domains in widely separated organisms was demonstrated (78) in a simpler case where only two interacting domains were involved. Xie et al. (78) showed that four different and large TrpAa (anthranilate synthase aminase) cohesion groups were populated by sequences from the Actinobacteridae, Cyanobacteria, upper Gammaproteobacteria/Betaproteobacteria, and Alphaproteobacteria, respectively. Four TrpAb (anthranilate synthase amidotransferase) cohesion groups were populated by sequences from exactly the same organisms. However, several organisms in each of the former taxa possessed TrpAa and TrpAb domains, which were fused to one another and which did not belong to the expected cohesion groups made up of free-standing TrpAa or TrpAb domains. In comparison with the four separated positions of free-standing TrpAa domains on a phylogenetic tree, the fused TrpAa- domains were all clustered together on a divergent branch of the tree. (The hyphen and its placement signify a fusion at the C terminus.) Similarly, in comparison with the positions of free-standing TrpAb domains on a phylogenetic tree, all of the fused -TrpAb domains were clustered together on one divergent branch of the tree. Evidence that TrpAa-TrpAb fusions have occurred independently as many as seven times and that the convergence observed for sequences from diverse taxa is the consequence of rigid constraints imposed for proper protein-protein interactions of these subunits was presented (76).
members are catalytically competent, in contrast to isolated supradomain regions of TyrAβ members, which are predicted to require contacts with extra-TyrA protein domains. In the case of E. coli, the fused chorismate mutase (AroHI) has a reciprocal dependence upon the fused TyrA for normal function. This raises the question of whether fused chorismate mutases and free-standing chorismate mutases of the AroHI homology class would also exhibit a bifurcated divergence similar to the TyrA
/TyrAβ dichotomy. This is certainly worthy of further examination. |
|
|---|
Subhomology Group MembersKey variables of interest are TyrA crystals bound with any substrate for which it has catalytic competence. Given that enzymes specific for cyclohexadienyl substrate and pyridine nucleotide cofactor are known to occur in all combinations, this alone generates a qualitative total of nine comparative possibilities. An enzyme such as that from Ralstonia solanacearum (TyrCG-7) has roughly equal capabilities with NAD+ and NADP+ as well as roughly equal capabilities with L-arogenate and prephenate. Hence, there are four protein-substrate combinations that can be analyzed from this single TyrA species, each of which should be informative in comparison with TyrA proteins that can be selected for the various appropriate narrow specificities. Another dimension of complexity is that many broad-specificity TyrA species have order-of-magnitude preferences for one substrate or for one cofactor. These quantitative differences must have discernible parallels at the molecular level that distinguish them from the absolutely specific TyrA proteins or from broad-specificity TyrA proteins that accept alternative substrates about equally well.
Ideal TyrA candidates for initial crystal studies are those that have been well characterized, are produced from organisms with complete genomes, and have core supradomains that are uncomplicated by fused catalytic or regulatory domains. Examples of such organisms selected from the TyrA
subhomology grouping are Zymomonas mobilis (broad-specificity cyclohexadienyl dehydrogenase with a preference for L-arogenate) (NAD+ specific), Aquifex aeolicus (cyclohexadienyl dehydrogenase markedly favoring prephenate) (NAD+ specific), Rhodopseudomonas palustris (cyclohexadienyl dehydrogenase with a marked preference for prephenate) (NADP+ specific), Ralstonia eutropha (cyclohexadienyl dehydrogenase) {broad cofactor specificity [NAD(P)+]}, Neisseria gonorrhoeae (cyclohexadienyl dehydrogenase with marked preference for prephenate) (NAD+ specific), Nitrosomonas europaea (L-arogenate specific and NADP+ specific), Corynebacterium glutamicum (L-arogenate specific, with a marked preference for NADP+ over NAD+), Synechocystis sp. (L-arogenate-specific and NADP+ specific), Gluconobacter oxydans (prephenate specific and NADP+ specific), and Clostridium difficile (prephenate specific and NAD+ specific). Although many additional TyrA proteins from organisms whose genomes unfortunately are not yet sequenced have been well characterized, it seems likely that this will be largely ameliorated in the near future, considering the high and increasing rate of genome sequencing.
Although a well-spaced phylogenetic selection of TyrA proteins is generally desirable, in some cases, it might also be worthwhile to select TyrA proteins from a single cohesion group that have variant properties of substrate selectivity. This can be comparable to the approach of selecting specificity mutants for comparison with the wild-type parent in order to carry out structural analysis. For example, the entire cyanobacterial phylum possesses a TyrA member belonging to a single cohesion group (TyrCG-16). An extensive enzymological comparison indicated that most, if not all, cyanobacterial TyrA enzymes can utilize L-arogenate and NADP+ as substrates (29). Although some are absolutely specific for these two substrates, cyanobacteria frequently express broad-specificity enzymes that are capable of utilizing NAD+ (albeit always less well than NADP+). Less commonly, broad specificity for the cyclohexadienyl substrate exists, although L-arogenate is always utilized better than prephenate. (At one extreme, Synechocystis sp. strain PCC7509 uses prephenate 48% as well as L-arogenate at substrate saturation.)
A second example that offers interesting comparative possibilities is the collection of TyrA proteins from the Betaproteobacteria. All members of TyrCG-7, TyrCG-8, and TyrCG-10 and four orphans (Table 2) are broad-specificity cyclohexadienyl dehydrogenases that have the broad cofactor specificity motif pattern 36GxxRS40 (Fig. 4). Members of TyrCG-11 possess narrowed specificity for both substrates, being L-arogenate specific and NADP+ specific. On the other hand, members of TyrCG-9 possess the opposite pattern of narrowed specificities, being NAD+ specific and exhibiting a very marked preference for prephenate as the cyclohexadienyl substrate.
subhomology group, relatively few TyrA enzymes from the TyrAβ subhomology group have been characterized. Of course, TyrAc from E. coli is an obvious selection choice because of the abundance of experimental work with it, including evidence upon which the indel hypothesis is based (see references 11 and 71 and references therein). TyrAc from E. coli and TyrAc from Aquifex aeolicus should be a good comparative match as selections taken from the TyrAβ and the TyrA
subhomology groups, respectively. Each of these is NAD+ specific, and each is a cyclohexadienyl dehydrogenase that has a marked preference for prephenate as a substrate. Each is sensitive to L-tyrosine inhibition.
Xanthomonas campestris and other members of TyrCG-4 are upper Gammaproteobacteria that possess a TyrA enzyme with a C-terminal ACT domain, with the latter perhaps being responsible for placement in the TyrAβ subhomology grouping. (Note that the presence of an attached ACT domain does not necessarily mean that a so-endowed TyrA species will be in the TyrAβ subhomology grouping since many gram-positive bacteria in the TyrA
subhomology grouping, e.g., all members of TyrCG-18, have an ACT domain.) In contrast to the members of TyrCG-4, all upper Gammaproteobacteria (TyrCG-2, TyrCG-3, TyrCG-5, TyrCG-6, and five orphans) lack an attached ACT domain and belong to the TyrA
subhomology grouping. X. campestris TyrA has been characterized as being NAD+ specific and broadly specific for cyclohexadienyl substrate. The best match for this substrate profile among the upper Gammaproteobacteria in the TyrA
subhomology grouping would be TyrAc produced by any of three orphans: Acidithiobacillus ferrooxidans, Methylococcus capsulatus, or Nitrosococcus oceani. The TyrA protein from Coxiella burnetii might also be worth considering for comparison. Like the X. campestris protein, it belongs to the TyrAβ subhomology grouping, but it lacks an ACT domain. This TyrA species is NAD specific, but its cyclohexadienyl specificity is uncertain. Also, we cannot be sure that this TyrA enzyme is a native upper Gammaproteobacteria protein since it resides in TyrCG-26, which is an unresolved phylogenetic mixture.
Finally, TyrA proteins from higher plants (TyrCG-95) are well characterized as being L-arogenate-specific and NADP+-specific enzymes. Since the Synechocystis sp. strain PCC6803 enzyme (TyrA
subhomology group) has the same specificity profile as TyrA from organisms such as Arabidopsis thaliana (TyrAβ subhomology group), X-ray crystal comparative studies should be illuminating.
Comparison of reasonably close sets of TyrA proteins that differ in being resistant or sensitive to product inhibition could give insight into residue contacts that are important for binding of the ring carboxylate. For example, a reasonable choice for comparison might be two TyrA members of the Betaproteobacteria. TyrAc enzymes from Acidovorax facilis (TyrCG-10) and Burkholderia cepacia (TyrCG-7) are very similar in having broad specificities for the two cyclohexadienyl substrates and broad specificities for cofactor. The alternative substrates and alternative cofactors are accepted about equally well. However, the A. facilis enzyme is totally refractive to product inhibition, whereas the B. cepacia enzyme is sensitive to product inhibition. Sun et al. (71) pointed out that a glycine-rich region, 273-GGG-275, immediately preceding the 277-RxxxR-284 motif of Aquifex TyrA, seems to play a critical role in positioning 278-D' into the active site within interacting distance of the ring carboxylate of prephenate (numbering as given in Fig. 3). 274-GG-275 of the glycine-rich region appears to be particularly conserved. It may be significant that TyrA enzymes from organisms (Pseudomonas aeruginosa, Burkholderia cepacia, Ralstonia solanacearum, and Ralstonia eutropha) that have been shown to be sensitive to product inhibition all possess 274-GG-275. In contrast, TyrA enzymes from Zymomonas mobilis (GS), Acidovorax sp. (PG), Nitrosomonas europeae (SS), and Rubrivivax gelatinosus (PG) are not inhibited by the reaction product and lack the GG signature.
TyrAc from Aquifex aeolicus, one of the two TyrA proteins for which X-ray crystal studies exist (71), has a marked preference for prephenate and is NAD+ specific. Since it is quite sensitive to tyrosine inhibition (11), one would expect even greater sensitivity to inhibition by 4-hydroxyphenylpyruvate, but this was not tested. This TyrA sequence is currently an orphan sequence, so comparisons with relatively close orthologs are not yet possible. The second subject of an X-ray crystal study is Synechocystis sp. (48). This L-arogenate-specific, NADP+-specific enzyme was reported to be insensitive to inhibition by L-tyrosine. Unfortunately, this is at odds with a report by Bonner et al. (10), who detailed good sensitivity of TyrAa from the same strain to competitive inhibition by L-tyrosine. Enzymes that become selectively desensitized to inhibition while maintaining catalytic competence are known, but these usually are enzymes that have a distinct allosteric domain (or subunit). Legrand et al. (48) suggested that the difference might be due to "mutations" in four amino acids very near the C terminus. However, this apparent difference in sequence was due to an inadvertent transposition of a glutamine residue in the preparation of Fig. 7 in the paper by Bonner et al. (10). A substantial amount of comparative enzymology (including determinations of sensitivity to inhibition by L-tyrosine) was done with TyrA species of various named Synechocystis species (29). Organismal differences in substrate specificity and sensitivity to inhibition by L-tyrosine that would fit the results of either research group were observed. Unfortunately, Synechocystis sp. strain PCC6803 was not included in the latter study. Hence, whether the TyrAa enzyme from Synechocystis sp. strain PCC6803 is sensitive or refractive to product inhibition by L-tyrosine must await further experimental clarification.
Still other features deemed to have significance could be used as criteria of significance with respect to organisms selected as a source of TyrA protein. These features would not necessarily be independent of some of the above-described considerations. For example, the motif RxxxR has been discussed above as a character state that has been suggested in the X-ray crystal study described Sun et al. (71) to be important in the mechanism employed by the TyrA protein of Aquifex aeolicus. The idea has been presented that in proteins belonging to the TyrAβ subhomology family (10, 71), this motif has been disrupted by extra-TyrA contacts extended from an attached or complexed domain. This is consistent with the near-total conservation of this motif throughout the TyrA
subhomology grouping and with its near-total absence in proteins belonging to the TyrAβ subhomology grouping. Thus, this motif seems intimately relevant to the second perspective described above. Scrolling through the extended table online shows that exceptions in the TyrA
subhomology grouping whereby the motif is disrupted include one member of TyrCG-5, some members of TyrCG-16, two members of TyrCG-11, the Flavobacteria component of TyrCG-13, half the members of TyrCG-16, most members of TyrCG-24, and one of the two members of TyrCG-31. Comparison of a motif-present member with a motif-absent member in the latter cohesion groups might be of particular value because the motif difference seen in each pair exists in a background of close phylogeny.
The X-ray crystal study of TyrA from Aquifex aeolicus (71) indicated that the RxxxR motif comprises part of an ionic network, which was proposed to support a gated mechanism for the access of substrate to the active site. However, the X-ray crystal study of TyrA from Synechocystis sp. (48) asserted that this patch of basic residues does not seem to play a critical role in the binding of substrate. Synechocystis sp. belongs to TyrCG-16, a cohesion group that contains a total of 16 cyanobacteria. Although the subject of the X-ray crystal study has the motif, it is absent in 10 members of TyrCG-16. This suggests the possibility that the presence of the motif in some cyanobacteria may be only coincidental, and it may not have the functional significance that generally applies in the TyrA
subhomology grouping. It was also noted (71) that the rightward R residue of the motif (R284 in Fig. 3) forms an ion pair with E160. In this context, it may be significant that the latter residue is completely conserved (sometimes substituted with a D) in the TyrA
subhomology grouping with only two exceptions, one of them being TyrA from Synechocystis sp.
Example 1.
Suppose that one chooses to think about the Gammaproteobacteria (a taxon at the level of class) in terms of how it has diverged into cohesion groups, where these cohesion groups belong in terms of the two primary subhomology groupings, and what the distribution pattern is for the RxxxR motif. If panels 2 and 16 of Fig. 6 are displayed side by side using the tool at the SEED platform (http://theseed.uchicago.edu/FIG/Html/TyrAPanels.html), 10 cohesion groups of Gammaproteobacteria are visualized in panel 2, and these can be compared to the presence or absence of the RxxxR motif in panel 16. The lower Gammaproteobacteria populate a single cohesion group (TyrCG-1) within the TyrAβ subhomology grouping. TyrA sequences from the upper Gammaproteobacteria separate into 10 cohesion groups, four of which are orphans. TyrA from Coxiella burnetii is a member of an unresolved phylogenetic mixture (TyrCG-26) and, together with members of TyrCG-4, is the only sequence from the upper Gammaproteobacteria that belongs to the TyrAβ subhomology grouping. The C. burnetii sequence as well as all members of TyrCG-1 and TyrCG-4 lack the RxxxR motif as is typical of the TyrAβ subhomology grouping. As expected of the TyrA
subhomology grouping, all of the remaining members of the upper Gammaproteobacteria possess the RxxxR motif, except for one member of TyrCG-5 (produced by Thermochromatium tepidum). Hence, it would be attractive to have definitive X-ray crystal results with the TyrA enzymes from (i) T. tepidum (exceptional in lacking the RxxxR motif), (ii) a member of the same cohesion group (where the motif is present), (iii) one of the other upper Gammaproteobacteria cohesion groups of the TyrA
subhomology grouping (motif is present), (iv) Xanthomonas campestris (a member of the upper Gammaproteobacteria belonging to the TyrAβ subhomology grouping and lacking the motif), and (v) E. coli (lower Gammaproteobacteria, TyrAβ subhomology grouping, and the motif is absent). These five choices offer potential for a wealth of comparative information that will reveal structural ties to functional properties. Evidence supporting refined evolutionary conclusions can also be anticipated. For example, one can speculate that the separation of the lower Gammaproteobacteria from the upper Gammaproteobacteria correlated with the attachment of a chorismate mutase domain to TyrA in the former group. Also, more recently, in the Xanthomonas/Xylella lineage, the attachment of an ACT domain to TyrA occurred after its diversion from the other upper Gammaproteobacteria. These two independent fusion events presumably account for membership of the latter TyrAs in the TyrAβ subhomology grouping. The uncoupling of motif presence with membership within the TyrA
subhomology group in a single organism (T. tepidum) whose close phylogenetic relatives in TyrCG-5 have maintained the normal coupling should be instructive.
Example 2. Suppose panel 3 of Fig. 6 is viewed in parallel with panel 16. Panel 3 highlights cohesion groups and orphans that are represented by TyrA proteins from the Alphaproteobacteria, the Deltaproteobacteria, and the Epsilonproteobacteria. By using the snapshot tool and using the links to navigate to the extended table where necessary, one can develop a rationale for TyrA selections from these three classes of Proteobacteria that might be the most informative with respect to the significance of the RxxxR motif.
The Alphaproteobacteria mostly populate TyrCG-12 in the TyrA
subhomology grouping, where they consistently possess the RxxxR motif. One can see just by considering TyrCG-12 alone that the significance of this motif has some broader meaning than a relationship to substrate/cofactor specificity in view of the widely different specificities previously described for organisms such as Zymomonas mobilis, Rhodopseudomonas palustris, and Gluconobacter oxydans, all members of TyrCG-12. In spite of its overall sequence divergence from most other Alphaproteobacteria, the TyrA
orphan Pelagibacter ubique also possesses the RxxxR motif. The three members of TyrCG-30 are the only Alphaproteobacteria present in the TyrAβ subhomology group, and all of them lack the motif. Thus, at the taxon level of class, TyrA proteins from the Alphaproteobacteria have diverged to form one orphan, one small cohesion group, and one large cohesion group. Only the small cohesion group belongs to the TyrAβ subhomology grouping, and this correlates perfectly with the lack of the RxxxR motif. Both Rhodospirillum rubrum and Silicibacter pomeroyi possess paralog members of TyrCG-12 and TyrCG-30, so a comparison of one of these two paralog pairs should also be rewarding.
TyrA proteins from the Deltaproteobacteria populate three cohesion groups. Most of them are in TyrCG-14, which occupies the TyrA
subhomology grouping, and these have the RxxxR motif and are the only Deltaproteobacteria that are NAD+ specific. Members of TyrCG-15 and an orphan from Syntrophobacter fumaroxidans belong to the TyrAβ subhomology grouping and lack the motif, as expected. Selection of one TyrA from each of the two subhomology groupings yields a pair where the TyrAβ subhomology grouping member possesses a core supradomain length that is shortened (Fig. 6, panel 12) and yet where there are no extra core extensions (panel 14) or domain fusions (panel 13). Biochemical work may show that this TyrA protein partners with another protein that makes contacts in the complex that is needed for maximal activity. The TyrA enzymes of Epsilonproteobacteria all belong to a single cohesion group, TyrCG-13. This cohesion group also contains all members of the class Flavobacteria. It was concluded above that genes encoding the TyrA enzymes from Flavobacteria arrived from a donor in the Epsilonproteobacteria lineage via LGT. TyrCG-13 is in the TyrA
subhomology grouping. It is interesting that all TyrA members from the Epsilonproteobacteria possess the RxxxR motif, whereas all those from the Flavobacteria lack the motif. The latter members do not seem to have a shortened supracore domain, so it may be that in this case, the absence of the motif does not indicate a domain-domain interaction. If so, alternative amino acid contacts may substitute here for the otherwise highly conserved RxxxR motif.
grouping is not dependent upon an attached domain even if such fusions are present. It is predicted that the removal of such attached domains will not directly affect the catalytic reaction. This has in fact been shown for the tyrAc-aroF fusion of Pseudomonas stutzeri, where removal of the C-terminal AroF catalytic domain had no effect upon the remaining TyrA domain (77). In addition, TyrCG-7 contains 11 members, only three of which possess a tyrA-aroF fusion. This recent fusion has not distanced the TyrA domain of the small clade of Burkholderia species that contain it from the unfused TyrA domains of the sister Burkholderia species and species of Ralstonia that occupy the cohesion group. If new indel contacts had developed in the newly evolved TyrA-AroF protein to create interdependent domains, one would expect these TyrA domains to have diverged away from the unfused TyrA domains in TyrCG-7.
TyrA proteins frequently possess a C-terminal ACT domain, as exemplified by the well-studied Bacillus subtilis enzyme (17), which belongs to the TyrA
subhomology grouping. It would be quite interesting to examine this enzyme following the removal of the ACT domain, which is an allosteric domain. This amino acid binding domain presumably accounts for the sensitivity of B. subtilis TyrAp to inhibition by L-tyrosine, L-phenylalanine, L-tryptophan, and D-tyrosine. Removal of the ACT domain should abolish these amino acid sensitivities, leaving only the sensitivity to inhibition by 4-hydroxyphenylpyruvate intact. This expectation is enhanced by the fact that exactly these properties were obtained with the selection of a D-tyrosine-resistant, tyrosine-excreting mutant in 1970 (17). Similar opportunities for examining the effects of removing a C-terminal ACT domain exist in other cohesion groups belonging to the TyrA
subhomology grouping, e.g., TyrCG-20, TyrCG-21, and TyrCG-22.
In contrast with the above-described expectations for TyrA proteins belonging to the TyrA
subhomology grouping, experimental truncations that remove attached catalytic or regulatory domains of TyrA proteins belonging to the TyrAβ subhomology grouping are expected to impact TyrA catalysis directly. This has already been demonstrated following removal of the N-terminal chorismate mutase domain from E. coli aroHI-tyrAc (18), and X-ray crystal results that demonstrate the projected domain-domain contacts projected by Bonvin et al. (11) would be most welcome. Xanthomonas campestris and other members of TyrCG-4 possess a C-terminal ACT domain, just like B. subtilis and other members of TyrCG-18. Since the former and latter represent the TyrAβ and TyrA
subhomology groupings, respectively, the differences in how this allosteric domain interacts should be fascinating. Another attached regulatory domain of potential interest is the C-terminal REG domain present in members of TyrCG-80 (Euryarchaea_1).
|
|
|---|
Tyrosine pathway cohesion groups and tryptophan pathway supercohesion groups cannot be expected to correspond with one another perfectly for the following reasons. First, intruder sequences that become established in a given organism for one pathway will not generally be present for another pathway. Second, the sequence length and degree of conservation of the protein(s) upon which cohesion groups are based will dictate different relative resolving powers. Because the Trp enzyme concatenate trees are more robust than the single-enzyme TyrA trees, it is expected that some Trp supercohesion groups would correspond to multiple TyrA cohesion groups. Finally, aside from the differential resolving powers of the particular proteins used to make trees, dynamic evolutionary changes that sometimes occur in a short time frame (evolutionary jumps) drive accelerated divergence that leads to separated cohesion groups or supercohesion groups. Thus, for example, TrpSCG-6 contains concatenates from Bacillus subtilis, B. stearothermophilus, and B. halodurans that are clearly separated from concatenates from other Bacillus species and from certain sister firmicute species (Lactococcus/Listeria/Staphylococcus/Streptococcus) that populate TrpSCG-7 (80). Dynamic and recent evolutionary events in the smaller clade that have driven rapid divergence are the insertion of the trp operon into a six-gene aro operon; the loss of a gene encoding a histidine pathway aminotransferase from the histidine operon, forcing an aromatic aminotransferase in the aro operon to take on a dual function; and the loss of trpAb from the trp operon, forcing pabAb to assume a dual function. In contrast, TyrCG-18 is a large cohesion group that contains TyrA members from all of the organisms corresponding to TrpSCG-6 and TrpSCG-7. Thus, on the one hand, the B. subtilis/B. halodurans/B. stearothermophilus trio has experienced an evolutionary jump that led to a dramatic divergence with respect to the tryptophan pathway (see "Intra-Cohesion-Group Intruders" above for a proposed scenario for this evolutionary jump). On the other hand, only a shallow, graded divergence occurred for TyrA throughout this large clade of firmicutes, with the result that TyrA from the B. subtilis/B. halodurans/B. stearothermophilus trio occupies a common cohesion group with TyrA proteins from Bacillus, Listeria, Staphylococcus, Streptococcus, and Lactococcus.
In previous studies of the tryptophan pathway (78, 80), a substantial fraction of the genomes and the corresponding taxonomic representation were absent compared to the much greater abundance of genomes available for the TyrA cohesion group study. Thus, in the following sections, discussion is limited to those TyrA cohesion groups existing in organisms where Trp supercohesion groups were also studied.
TrpSCG-1 contains whole-operon intruders that reside in contemporary Helicobacter pylori and in coryneform bacteria. TyrCG-1 contains intruder sequences that reside in species of Nostoc (a lineage within cyanobacteria). The trp operon LGT events resulted in a total displacement of the native trp genes, but the functional role of performing L-tryptophan biosynthesis remained exactly the same. In contrast, the tyrA intruders in Nostoc did not displace the native orthologs and are thought to exercise another functional role in secondary metabolism (68). Each of the three LGT events was relatively recent, since the intruder sequences in H. pylori are absent from other Epsilonproteobacteria, those present in coryneform bacteria are absent from other actinomycete bacteria, and those present in Nostoc are absent from other cyanobacteria.
Whereas TrpSCG-2 contains two cases of partial-pathway operon LGT, no intruders have so far been found to be present in any of the 20 TyrA cohesion groups that populate the upper Gammaproteobacteria and Betaproteobacteria (although, as mentioned above, the TyrA protein from C. burnetii in TyrCG-26 could possibly be a xenolog intruder).
subhomology grouping). The other, from Desulfovibrio desulfuricans, occupies TyrCG-15 along with one other sequence (TyrAβ subhomology grouping). Three additional Deltaproteobacteria contain TyrA sequences that do not belong to the former two cohesion groups. The TyrA sequence from Synthrophobacter fumaroxidans is an orphan (TyrAβ subhomology grouping); TyrA from Anaeromyxobacter dehalogenans belongs to TyrCG-27 (TyrA
subhomology grouping), which is an unresolved phylogenetic mixture; and TyrA from Myxococcus xanthus is a xenolog intruder of TyrCG-30 (TyrAβ subhomology grouping). Identification in painstaking detail of qualitatively different character states of genes and their encoded products, their evolutionary progression in the vertical genealogy, and evolutionary acquisitions made via LGT can feasibly be accomplished for relatively small metabolic segments, such as the individual terminal branches of aromatic biosynthesis. Once coverage is completed for the entire pathway, including the minor vitamin-like branches, it should be apparent that evolutionary conclusions arrived at separately via steps that are essentially atomistic can be combined to describe evolutionary progressions at the whole-pathway level that reveal a larger gestalt of interlocking relationships. The next section illustrates examples of this approach.
|
|
|---|
Did the tyrA-aroF fusion occur on a single occasion in the upper Gammaproteobacteria? An inspection of Fig. 2 indicates that the position of TyrCG-5 is inconsistent with a single common ancestor that acquired the tyrA-aroF fusion. If the order of branching shown was correct, the fusion either occurred twice (once in the common ancestor of Microbulbifer degradans, TyrCG-6, and TyrCG-3 and once in the common ancestor of the members of TyrCG-2) or occurred once in the common ancestor of all the organisms hosting the fusion but was subsequently lost in the common ancestor of TyrCG-5.
Since the cohesion groups are defined such that there is little confidence in the order of branching, a tree that was based upon an alignment of all the TyrA-AroF fusion sequences with concatenated TyrA and AroF sequences from upper Gammaproteobacteria and Betaproteobacteria that lack the fusion was assembled (Fig. 7). This creates a much more reliable protein tree since AroF is a much longer and much more conserved protein than TyrA. The results shown in Fig. 7 are indeed consistent with an order of branching in upper Gammaproteobacteria such that a single tyrA-aroF fusion occurred in the common ancestor of Microbulbifer degradans and in the organisms hosting the members of TyrCG-2, TyrCG-3, and TyrCG-6. Hence, it seems clear that a recent tyrA-aroF fusion occurred in the ancestor of a clade of the upper Gammaproteobacteria and that an even more recent, second fusion occurred in the ancestor of a very small clade of the Betaproteobacteria. As suggested previously (35), this use of gene fusions has great potential for ordering phylogenetic progressions of related organisms. This section illustrates how analysis of the relationships between AroF (the sixth enzyme of the erythrose-4-phosphate to chorismate portion of aromatic biosynthesis) and TyrA provides a small glimpse of the potential of different subsystems to merge, thereby expanding to an ever-wider and more insightful view.
![]() View larger version (18K): [in a new window] |
FIG. 8. Tracking milestone evolutionary events in the Actinobacteridae. The dendrogram for the subclass Actinobacteridae of the Bacteria (not drawn to scale) includes the family Bifidobacteriaceae of the order Bifidobacteriales (top) and the various families belonging to the order Actinomycetales. Character states asserted to exist in the common ancestor are indicated by orange encircled letters. More recent evolutionary events are shown as yellow encircled letters.
|
In this superphylum, all TyrA proteins are NAD+ specific and of unknown specificity for the cyclohexadienyl substrate (NADTyrAx). Both the TyrA
and TyrAβ subhomology groups are represented in this superphylum, and it is suggested (Fig. 9) that a single transition from TyrA
to TyrAβ occurred in a common ancestor of the classes Bacteroidetes and Sphingobacteria. After divergence of the two phyla, aroAIβ (encodes DAHP synthase) became fused with aroHI (encodes chorismate mutase) in the phylum Bacteroidetes (aroAIβ-aroHI). Members of the class Bacilli also have this fusion, albeit in the opposite orientation (aroHI-aroAIβ). The latter fusion has long been known to be the basis for a pattern of allosteric regulation (sequential feedback inhibition), whereby the substrate (chorismate) and product (prephenate) of chorismate mutase double up as feedback inhibitors of DAHP synthase (37). The putative ancestral operon (pheA>aspC>tyrA>aroAIβ>aroHI) joins all of the enzymes that divert chorismate to L-phenylalanine and L-tyrosine. AroHI can supply prephenate to both amino acid branches, and AspC undoubtedly is an aromatic aminotransferase that is capable of catalyzing very similar transaminase reactions in both amino acid branches. The fusion shown in Fig. 9 (aroAIβ-aroHI) can reasonably be considered to be a regulatory innovation. Based upon the rationale asserted by Xie et al. (80), an ancestral classical trp operon similar to that which still persists in the class Sphingobacteria is presumed to exist. Members of the class Flavobacteria have retained the original aro operon as well as the original trp operon (albeit with an open reading frame insertion between trpAa and trpAb). In the class Bacteroidetes, both the trp and aro operons have been slightly scrambled, with trpEb being translocated to the beginning of the operon and with an exchanged positioning of tyrA and aroAIβ-aroHI. The trp operon remains intact in the Sphingobacteria, but the aro operon has been disrupted. In the phylum Chlorobi, both the trp and aro operons have been dispersed.
The TyrCG-13 cohesion group is populated by TyrA sequences from not only the class Flavobacteria but also the class Epsilonproteobacteria. Hence, a fairly ancient event of LGT is implicated. As an isolated observation, it is difficult to know which of these two classes is likely to be the host of the intruder sequences and which is likely to be the donor. If these classes diverged within their phyla at different times, the most recently emerged class could not have been the LGT donor to a common ancestor of the more ancient class. The phylogenetic tree of organisms reported by Olsen et al. (57) shows the class Epsilonproteobacteria diverging from its sister classes of Proteobacteria at an earlier time than the class Flavobacteria diverged from sister classes of the phylum Bacteroidetes. It thus appears that the class Flavobacteria did not yet exist at the time of the common ancestor of Epsilonproteobacteria, and hence, no Flavobacteria could have been the donor. On the other hand, a member of Epsilonproteobacteria could have been an LGT donor of tyrA to a common ancestor of the Flavobacteria. If so, it appears that the resident tyrA gene was replaced by homologous recombination without disrupting the aro operon, of which tyrA is a member; i.e., the context of gene organization surrounding tyrA in the Flavobacteria fits into the larger context of the superphylum (Fig. 9). Thus, the aro operons of the class Flavobacteria (TyrCG-13) and the class Bacteroidetes (TyrCG-23) share the distinctive fusion gene and a nearly identical gene order.
|
|
|---|
The extent to which current annotations are correct depends upon generations of previous experimental work and is hugely assisted by a fraction of genes that are highly conserved and evolve in the face of many limitations and constraints due to their elegant and complex mechanisms. Within the aromatic pathway, an example would be 5-enolpyruvylshikimate-3-phosphate synthase, a highly specific enzyme that utilizes a complex catalytic mechanism. Such complexity facilitates reliable annotations. On the other hand, enzymes having the plasticity to catalyze broad-specificity reactions can be represented by entirely different homology groups or by distinctly different subhomology groups that can make functional predictions elusive. A multitude of proteins (exemplified by such enzymes as kinases, phosphatases, and dehydrogenases) that illustrate the many and varied challenges for correct calls of functional role exist. The TyrA protein family of dehydrogenases benefits from a treasure trove background of wide-ranging comparative enzymology. The current analysis, together with previous work, has been a labor-intensive effort. Comparable efforts are not easily fitted to goals of high-throughput annotations for thousands of sequences in many hundreds of organisms, hence the dilemma of rapid results achieved with a lesser quality of annotation accuracy than one would like. "Difficult" gene products require a labor-intensive effort as a useful step in order to generate and preserve the information needed to allow the rich array of bioinformatic tools available to succeed in increasing the quality of high-throughput annotation efforts.
|
|
|---|
In the initial tree of 347 trimmed sequences, nodes were collapsed at bootstrap values of 68%. An arbitrarily chosen member of the collapsed groups was selected as a representative sequence of that node position. The resulting 64 sequences were used to obtain a second Phylip tree, which yielded 60 sequences with the collapse of a few more nodes when a bootstrap value of 68% was applied as a cutoff. An additional repetition of this process resulted in a final tally of 58 cohesion groups. The ultimate collapsed tree (Fig. 2) exhibited nodes with bootstrap values below 58%.
Navigating to and within the Protein Pages. The version of Fig. 2 installed at http://theseed.uchicago.edu/FIG/Html/tyrACGTree.html is the portal to hyperlinked cohesion group tables (the short Table 2 version or a comprehensive "extended table") that in turn are linked to the Protein Pages at the SEED. Each of the latter prominently display a clickable graphic showing the location of a given tyrA gene within an array of flanking genes, and many links are provided to allow navigation to a variety of detailed bioinformatic information. Tools are available. For example, one can ask for a comparison of the displayed gene organization with similar gene organizations present in other organisms. Mousing over any given cohesion group of Fig. 2 also delivers a drop-down menu that gives access to the relevant group of trimmed TyrA sequences.
One innovation in the extended table is a "gene neighborhood" button within each cohesion group section, which delivers a comparison of gene organization flanking tyrA within the cohesion group.
Sortable character state snapshots. The individual panels of Fig. 6 can be viewed at http://theseed.uchicago.edu/FIG/Html/TyrAPanels.html. Choosing "compare TyrA panels" activates an option to compare up to three side-by-side panels. For example, one might want to choose and display the Proteobacteria (Fig. 6, panel 2) side by side with the view of cofactor specificities (panel 7) or with instances of gene fusion (panel 9). These individual, sortable panels identify the cohesion group numbers for all of the cohesion groups that are color coded. One can then view the complete membership of any cohesion groups of interest by linking to the extended table via links provided at the top of the screen. A JavaScript magnifying tool is provided when mousing over a given panel with the cursor. Depression of the up or down key on the keyboard increases or decreases the zoom ratio, respectively. Depression of the right or left key increases or decreases the window size, respectively.
Semiautomation of cohesion groups. An important accomplishment would be to lock in and build upon the manual effort represented by this project with continuing semiautomatic follow-up. The technology to support the creation, curation, and advanced development of subsystems at the SEED was described previously (60). Tools to preserve the trimmed sequence alignment, accurately add newly available sequences, and update the tree and cohesion group assemblages are being implemented.
A universal four-letter system for coding organisms to the species level with unambiguous acronyms has been developed (the first letter of the genus in capital letters followed by the first three letters of the species in lowercase type). When necessary to disambiguate a four-letter acronym, a number is attached. For example, Escherichia coli is designated Ecol, whereas Enterococcus columbae is designated Ecol-1. If the species has not been determined, the first four letters of the genus are used (all in caps). To find a given four-letter acronym associated with an organism, a list of organisms currently in the system can be browsed by clicking the link under "organisms" entitled "browse organism acronyms" at the AroPath home page. Each organism is hyperlinked to the NCBI taxonomy browser. Each species entry can be expanded to show all of the component strains and their corresponding absolute acronyms (see below).
In addition, a tool to generate an acronym that is unique at the level of a specific strain, designated an absolute acronym, is provided. A given strain or list of strains can be uploaded to AroPath by clicking the link under "organisms" entitled "get absolute acronym." This will enable the return of an absolute acronym that is a unique identifier at the strain level. Any strain for which an absolute acronym has not been previously requested will automatically be assigned a unique designation, which will be held permanently in the database.
Finally, a useful tool is provided to amend personal sequence files to be used for obtaining multiple sequence alignments and phylogenetic trees such that key acronym information for both organism and protein are displayed in the sequence names. FASTA sequence files can be uploaded to AroPath by clicking the link under "organisms" entitled "convert sequence files," and a converted output will be returned. For example, a sequence name returned that begins ">Ecol_J_F_AroA_b," when used as input in a tree-building program, will appear in that form as an informative label. It will indicate that the sequence is from Escherichia coli (Ecol) strain CFT073 (_J), that the sequence is from a finished genome (_F) rather than an unfinished genome (_U), and that the sequence is one of multiple AroA paralogs (AroA_b). If a hypothetical organism possessed a single gene product, two paralogs, or three paralogs, the corresponding designations would be AroA; AroA_a and AroA_b; and AroA_a, AroA_b, and AroA_c.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2010 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»