MMBR Figure table search 04
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Woese, C. R.
Right arrow Articles by Söll, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Woese, C. R.
Right arrow Articles by Söll, D.

 Previous Article  |  Next Article 

Microbiology and Molecular Biology Reviews, March 2000, p. 202-236, Vol. 64, No. 1
1092-2172/00/$04.00+0
Copyright © 2000, American Society for Microbiology. All rights reserved.

Aminoacyl-tRNA Synthetases, the Genetic Code, and the Evolutionary Process

Carl R. Woese,1 Gary J. Olsen,1 Michael Ibba,2 and Dieter Söll3,*

Department of Microbiology, University of Illinois, Urbana, Illinois 618011; Center for Biomolecular Recognition, Department of Medical Biochemistry and Genetics, Laboratory B, The Panum Institute, DK-2200 Copenhagen N, Denmark2; and Department of Molecular Biophysics and Biochemistry and Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-81143

SUMMARY
INTRODUCTION
BIOCHEMICAL SKETCH OF AMINOACYL-TRNA SYNTHESIS
EVOLUTIONARY OVERVIEW OF AMINOACYL-TRNA SYNTHETASES
ORDER IN THE GENETIC CODE
AMINOACYL-TRNA SYNTHETASE EVOLUTION: RECURRING GENE TRANSFER
EVOLUTIONARY PROFILES OF THE INDIVIDUAL AMINOACYL-TRNA SYNTHETASES
    Synthetases for the NUN-Encoded Amino Acids
        Phe; UUY; class II; tetramer of alpha - and beta -subunits.
        Leu; UUR and CUN; class I; monomer.
        Ile; AUH; class I; monomer.
        Met; AUG; class I; homodimer.
        Val; GUN; class I; monomer.
    Synthetases for the NCN-Encoded Amino Acids
        Ser; UCN and AGY; class II; homodimer.
        Pro; CCN; class II; homodimer.
        Thr; ACN; class II; homodimer.
        Ala; GCN; class II; homotetramer.
    Synthetases for the NAN-Encoded Amino Acids
        Tyr; UAY; class I; homodimer.
        His; CAY; class II; homodimer.
        Gln; CAR; class I; monomer.
        Asn; AAY; class II; homodimer.
        Lys; AAR; class I (monomer) and class II (homodimer).
        Asp; GAY; class II; homodimer.
        Glu; GAR; class I; monomer.
    Synthetases for the NGN-Encoded Amino Acids
        Cys; UGY; class I; monomer.
        Trp; UGG; class I; homodimer.
        Arg; CGN and AGR; class I; homodimer.
        Gly; GGN; class II; heterotetramer or homodimer.
    Recruitment of Aminoacyl-tRNA Synthetases into Other Roles
    Summary of Aminoacyl-tRNA Synthetase Evolutionary Profiles
PHYLOGENETIC PICTURE EMERGING FROM THE AMINOACYL-TRNA SYNTHETASES
    Spirochetes
    Cytophaga-Chlorobium Kingdom
    Deinococcus-Thermus Division
    Proteobacteria (Purple Bacteria)
    Gram-Positive Bacteria
    Rickettsias, Mycoplasmas, and Certain Other Bacteria
    Archaea
EVOLUTIONARY SYNTHESIS
    Evolutionary Significance of the Canonical Pattern
    Some General Evolutionary Matters
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES


SUMMARY
Top
Next
References

The aminoacyl-tRNA synthetases (AARSs) and their relationship to the genetic code are examined from the evolutionary perspective. Despite a loose correlation between codon assignments and AARS evolutionary relationships, the code is far too highly structured to have been ordered merely through the evolutionary wanderings of these enzymes. Nevertheless, the AARSs are very informative about the evolutionary process. Examination of the phylogenetic trees for each of the AARSs reveals the following. (i) Their evolutionary relationships mostly conform to established organismal phylogeny: a strong distinction exists between bacterial- and archaeal-type AARSs. (ii) Although the evolutionary profiles of the individual AARSs might be expected to be similar in general respects, they are not. It is argued that these differences in profiles reflect the stages in the evolutionary process when the taxonomic distributions of the individual AARSs became fixed, not the nature of the individual enzymes. (iii) Horizontal transfer of AARS genes between Bacteria and Archaea is asymmetric: transfer of archaeal AARSs to the Bacteria is more prevalent than the reverse, which is seen only for the "gemini group." (iv) The most far-ranging transfers of AARS genes have tended to occur in the distant evolutionary past, before or during formation of the primary organismal domains. These findings are also used to refine the theory that at the evolutionary stage represented by the root of the universal phylogenetic tree, cells were far more primitive than their modern counterparts and thus exchanged genetic material in far less restricted ways, in effect evolving in a communal sense.


INTRODUCTION
Top
Previous
Next
References

The aminoacyl-tRNA synthetases (AARSs) have long fascinated biologists. They are the linchpin of translation, the link between the worlds of protein and nucleic acid. Their structures and functions, which have both practical and basic significance, are deserving of and have received much attention. However, it is not only the structure-function aspect of these enzymes that has captured the biologist's imagination; it is also the possibility that they could tell us the secrets of the genetic code. To understand these enzymes in standard molecular terms is to add one more piece, a most important one, to the puzzle of what the cell is, how it works. But to understand them in evolutionary terms is to ask what the cell is in a deeper sense, how it evolved, how life came to be---the biologist's ultimate question. Reading the history written into the AARSs was not possible previously for the simple reason that doing so requires molecular sequences from a large number of these molecules, and the necessary body of data was lacking. The progress of genomics in the late 1990s is now providing the needed data, and a picture of AARS evolution is beginning to emerge (5, 7, 15-18, 45, 75). In the present review we examine the still murky image of synthetase evolution from a slightly different perspective and bring forth more of its rich detail and evolutionary depth.


BIOCHEMICAL SKETCH OF AMINOACYL-TRNA SYNTHESIS
Top
Previous
Next
References

In a departure from the long-accepted view (48) that every cell harbors 20 aminoacyl-tRNA synthetases responsible for the synthesis of the set of 20 canonical aminoacyl-tRNA families, it is now clearly established that there are at least two ways of forming aminoacyl-tRNA (12, 33). The direct acylation of tRNA by aminoacyl-tRNA synthetases is well understood; the ATP-dependent reaction (Fig. 1) is carried out by enzymes which, in general, are exceedingly specific in selecting their substrates, i.e., amino acid and tRNA. They fall into two classes of 10 based on the topology of their ATP binding domain; class I proteins contain a Rossmann fold (characterized by the HIGH and KMSK motifs), while class II enzymes possess an unrelated beta -sheet arrangement and are characterized by three degenerate sequence motifs (3, 10, 14, 20). Examples of most of the aminoacyl-tRNA synthetases have been structurally characterized, and it is expected that in the near future the crystal structure of at least one enzyme from all these families will be known (42, 43). There is also an indirect pathway of aminoacyl-tRNA synthesis, tRNA-dependent amino acid modification (Fig. 1). This pathway relies on the acylation of tRNA with a "precursor" amino acid by a nondiscriminating AARS (33). Currently our knowledge of the discriminating versus nondiscriminating AARS is not advanced enough to deduce this property from their amino acid sequence alone. This "precursor" amino acid is then converted, while bound to tRNA, to the correct amino acid (matching the tRNA specificity) by a second, nonsynthetase enzyme, which recognizes only such a mischarged aminoacyl-tRNA species. Our current knowledge about the number and nature of these enzymes is still far from complete, but it is clear that in many organisms this is the essential and only way to form Asn-tRNA and Gln-tRNA (12, 13, 63).


View larger version (18K):
[in this window]
[in a new window]
 
FIG. 1.   Mechanisms of aminoacyl-tRNA formation. Both pathways, direct acylation and tRNA-dependent amino acid modification, are depicted for glutaminyl-tRNA formation. For example, E. coli uses glutaminyl-tRNA synthetase while B. subtilis employs Glu-tRNAGln amidotransferase for this purpose.


EVOLUTIONARY OVERVIEW OF AMINOACYL-TRNA SYNTHETASES
Top
Previous
Next
References

The assumption that translationally produced protein was a part of the very first translation mechanism raises the chicken-and-egg paradox. However, there can be little doubt that once translation did exist, proteins that facilitated tRNA charging would be among the first proteins to evolve, the selective advantage of their specificity being great. Thus, the evolutionary history of the current aminoacyl-tRNA synthetases must go deep into translation's past, to the emergence of the modern genetic code. The central role played by the AARSs in translation would suggest that their histories and that of the genetic code are somehow intertwined. This then raises the question of whether the AARSs in their evolution have contributed to the code's present structure; put another way, are the codon assignments simply reflections of AARS evolutionary wanderings? It is important that conjectures of this sort be examined in detail---and in a genomic era this can be done.

In an evolutionary sense, the most striking thing about the synthetases is the existence of the two distinct classes (3, 10, 14, 20). Common characteristic domain structures and sequence homologies define each class, but the two have nothing in common except the biochemistry of the reactions they catalyze (22): between the two classes, proteins show no structural resemblance, have almost no common motifs (see reference 49 for a possible exception), encounter the tRNA from different angles, and acylate the amino acid to different hydroxyl groups of the terminal ribose of the tRNA (57). This has been widely assumed to suggest that the tRNA-charging function evolved at least twice. In the origin of these two classes of tRNA-charging enzymes lies a clue to one of biology's deepest mysteries (45). Perhaps the two reflect a dichotomous origin of translation itself, in some sort of fusion between two different primitive processes, each associated with its own set of amino acids. Perhaps the two classes are the surviving traces of an ancient evolutionary battle between emerging tRNA-charging mechanisms as biology evolved beyond the RNA world. In any case, the existence of unrelated tRNA-charging systems must be considered a most telling evolutionary relic (45).

The aminoacyl-tRNA synthetases are distributed between the classes according to specific rules. Each class encompasses 10 of the amino acids, and all examples of a given amino acid's synthetase are of the same class, the so-called "class rule." Within a class, all synthetases associated with a given amino acid are specifically related to one another to the exclusion of the AARSs associated with any other amino acids, the "monophyly rule." A third, class-independent generalization is that for each organism, all tRNAs assigned to a given amino acid (so-called isoacceptors [50]) can be charged by a single synthetase, a rule that holds even for amino acids such as serine with two distinct sets of codons, UCN and AGY (reviewed in reference 39). Except possibly for the last, these rules have exceptions. The class rule and hence the monophyly rule are violated by lysine; in some organisms its synthetase is class I, while in others it is class II (32, 34, 35). Four more exceptions to the monophyly rule but not the class rule exist, involving glycine, serine, glutamic acid, and aspartic acid. For glycine and serine, each amino acid is associated with two synthetases (for both amino acids they are class II [37, 44]). However, the two enzymes in each case are not specifically related to one another, as the monophyly rule demands. This is most obvious for glycine, where the overall structures of the two enzymes are completely dissimilar, with one being a homodimer and the other being a heterotetramer. For the glutamyl- and aspartyl-tRNA synthetases, the violation of the monophyly rule is of a different nature. In each of these cases, all the synthetases associated with a particular amino acid constitute a related group. However, the synthetase for the amidated form of the amino acid (i.e., glutamine or asparagine) arises from within the same group, which then renders the parent grouping paraphyletic (4, 38, 46, 52, 53). Mention should also be made here of the charging system for cysteine, which breaks the class and monophyly rules in another way. In at least two organisms, the methanogens Methanococcus jannaschii and Methanobacterium thermoautotrophicum, neither a class I nor a class II cysteinyl-tRNA synthetase can be found in the genome, and the exact mechanism of Cys-tRNA formation (direct or indirect) has long remained a mystery. These exceptions to the class and monophyly rules do not rob the rules of their potential evolutionary significance. Erosion of the historical trace is the hallmark of evolution. The exceptions merely restrict what kinds of explanations can be given the rules.

The common general structure and sequence motifs shared by all members of a given synthetase class demand common ancestry and Darwinian descent. The later stages of this descent are captured in the sequence similarities among existing synthetases; importantly, their branching patterns recall structural similarities among the amino acids and patterns in the genetic code. To give examples: the ValRs and IleRS (class I) are impressively similar in sequence; this is not simply a matter of a sequence motif here and there (7). These sequences in turn are somewhat less similar to those of the LeuRS; the MetRSs then join the group at a still lower level of similarity (45). All four corresponding amino acids are nonpolar and aliphatic, and their codons all conform to the general composition NUN. Similarly, the class II enzymes for serine, threonine, proline, histidine, and glycine group phylogenetically and structurally (43). (Only one of the two unrelated forms of the GlyRS shows this specific relatedness, however [17].) The amino acids serine and threonine are obviously related structurally, and both are capable of forming an internal hydrogen-bonded five-membered ring structure that mimics the ring structure of the imino acid proline. However, histidine and glycine appear structurally unrelated to these three (and to one another). In their codons, the first three amino acids are also related; all conform to the general composition NCN, but the codons for histidine (CAY) and glycine (GGN) are not related to the others except, of course, to the CCY codons of proline in the former case.

The third major AARS grouping involves the class II synthetases for lysine, aspartic acid, and asparagine (14, 17, 52), all of which are closely related in structure (4). The amino acids aspartic acid and asparagine are obviously related, but lysine stands apart. In their codons, the three exhibit an overlapping kind of relatedness, with the Asn codons (AAY) being close to both their Asp (GAY) and Lys (AAR) counterparts whereas the last two sets are not closely related.

The close evolutionary relationship between the (class I) synthetases for glutamic acid and glutamine (mentioned above; see also reference 46) is mirrored in the obvious structural relationship between the corresponding amino acids and the relationship between their codons (GAR-Glu versus CAR-Gln). Finally, a pronounced similarity exists between the synthetases for the two aromatic amino acids tyrosine and tryptophan (6, 19), but their codons (UAY and UGG, respectively) are not closely related.

Although the existence of correlations between the genetic code and the evolutionary patterns of the AARSs is clear, their significance is not. Does the fact that the valine, isoleucine, leucine, and methionine enzymes came from a common ancestor mean that this ancestor itself could not distinguish among these amino acids or that the ancestor was able to specifically charge four separate aminoacyl-tRNAs? That seems absurd in the context of modern translation (45). A more acceptable explanation would seem to be that the AARS relationships reflect evolutionary replacement of one tRNA-charging enzyme or acylation system by another. Indeed, what we see here may be only the latest in a series of such evolutionary replacements, a series that traces far back into the code's past and an evolutionary process that still goes on today in a less radical form, involving replacements within the confines of a given amino acid type (see below).


ORDER IN THE GENETIC CODE
Top
Previous
Next
References

The significance of AARS evolution vis-à-vis that of the genetic code cannot be properly assessed without some appreciation of the nature and extent of the code's order. Within the last decade, significant strides have been made in this area. The so-called synonym order in the code, i.e., the degeneracy in codon assignment, which manifests itself almost exclusively in the third codon position, has never been in doubt, except as regards what caused it in the first place. However, such is not the case for the ordering that pertains to related amino acids. Although most biologists accept the existence of such an order, they have disagreed about its exact form, its extent, and its cause. Some have argued that the related amino acid order evolved to ameliorate the phenotypic consequences of mutations, an evolutionary scenario that would produce both synonym and related amino acid orderings (59). An alternative but conceptually related explanation is that the assignments have somehow been adjusted to minimize the consequences of errors in a primitive translation mechanism that was highly inaccurate (66). Seemingly, both error minimization models would lead to a very similar type of order in the code. However, a computer simulation study (26) showed that the assumptions of the first model are unlikely to lead to a synonym order in the code that is almost entirely confined to a single codon position---a type of order that is consistent with, if not predicted by, the second model (which also suggests the third codon position should be the degenerate one [66]). It has also been proposed that the form of the code was predetermined, at least in part, by specific interactions between amino acids and nucleic acids (reference 76 and references therein).

Perhaps the main difficulty in comprehending the code's related amino acid ordering is that amino acid relatedness is context dependent; amino acids that appear similar in one context can be unrelated in another. The amino acid replacement spectra of proteins prove the point: the replacement pattern can differ from position to position in a protein sequence for any amino acid. However, nobody knows what property or properties of the amino acids the code actually reflects.

One important advance in this area was the definition of an amino acid property called the polar requirement, which is a number derived from the paper chromatographic mobility of an amino acid in pyridine-water mixtures of various ratios (71). Simply plotting these numbers on a codon table (Table 1) reveals the existence of a remarkable degree of order, much of which would be unexpected on the basis of amino acid properties as normally understood. For example, codons of the form NUN define a set of five amino acids, all of which have very similar polar requirements. Likewise, the set of amino acids defined by the NCN codons all have nearly the same unique polar requirement. The codon couplets CAY-CAR, AAY-AAR, and GAY-GAR each define a pair of amino acids (histidine-glutamine, asparagine-lysine, and aspartic acid-glutamic acid, respectively) that has a unique polar requirement. Only for the last of these (aspartic and glutamic acids), however, would the two amino acids be judged highly similar by more conventional criteria. Perhaps the most remarkable thing about polar requirement is that although it is only a unidimensional characterization of the amino acids, it still seems to capture the essence of the way in which amino acids, all of which are capable of reacting in varied ways with their surroundings, are related in the context of the genetic code. Also of note is the fact that the context in which polar requirement is defined, i.e., the interaction of amino acids with heterocyclic aromatic compounds in an aqueous environment, is more suggestive of a similarity in the way amino acids might interact with nucleic acids than of any similarity in the way they would behave in a proteinaceous environment (70).

                              
View this table:
[in this window]
[in a new window]
 
TABLE 1.   Conventional table of codons showing the polar requirement for each amino acida,b

More recently, computer simulation studies have been used to try to assess the merit of polar requirement as an indicator of the code's related amino acid order is compared to other amino acid properties, how well ordered the code actually is, and the nature of the code's order. An appealingly straightforward approach to the problem was explored by Hurst and his colleagues (23, 28). In summary, they compared the natural code to a series of synthetic codes generated by randomly reassigning the 20 amino acids to the set of synonym codon categories that are defined by the natural code. Each code is then measured for how conservative it is with regard to a given amino acid property under "mutation"; i.e., each codon in a given code is compared to all other codons that are 1 base change removed from it, the numerical difference in that property between the amino acids corresponding to the original and the "mutated" codon is measured, and the squared differences are summed over the code as a whole or over each of the three codon positions individually. For all amino acid properties tested except one, the natural code was not notably superior to the random codes. That exception, polar requirement, revealed a natural code superior to all but 0.01% of the random codes (28). A subsequent, more refined simulation of this sort, which took transition-transversion ratios into account, showed the natural code was "one in a million" (23). There can be no doubt that when viewed in terms of amino acid polar requirements, the genetic code is a highly structured array. It would also seem that it has somehow been optimized to reduce the consequences of translational errors. However, the evolutionary dynamic that shaped the code remains a mystery.

While it must be admitted that the evolutionary relationships among the AARSs bear some resemblance to the related amino acid order of the code, it seems unlikely that they are responsible for that order (45): the evolutionary wanderings of these enzymes alone simply could not produce a code so highly ordered, in both degree and kind, as we now know the genetic code to be. These enzymes could at best be the agents through which other constraints acted to shape the code. However, even in such a capacity they would not be alone: the tRNAs offer a simple and facile alternative mechanism for changing codon assignments (65). It would seem, therefore, that the evolutionary patterns among the aminoacyl-tRNA synthetases do not imply a role for these enzymes in structuring the genetic code (45). The resemblance between their evolutionary patterns and the patterns seen in the code are a loose convergence, forced by the fact that both evolutions independently reflect somewhat similar properties of the amino acids. The evolutionary patterns in the AARSs do seem to represent evolutionary replacements that occurred against the background of an already established, or otherwise fashioned, code (45).


AMINOACYL-TRNA SYNTHETASE EVOLUTION: RECURRING GENE TRANSFER
Top
Previous
Next
References

If the AARSs do not reveal the code's evolution, what do their evolutionary relationships tell us? The answer is clear. Aminoacyl-tRNA synthetase evolution is a superb indicator of the evolutionary dynamic in general.

It should be noted that the AARSs are unique among components of the translation system in their evolutionary behavior. Starting with the rRNAs and continuing through the ribosomal proteins and the translation initiation and elongation factors runs one dominant evolutionary theme---molecules tend to show the same evolutionary history; i.e., their molecular phylogenies are consistent with the accepted overall organismal phylogeny. At the highest level, they tend to yield what we herein call the canonical phylogenetic pattern, which is basically a division of all life into the three primary groupings Bacteria, Archaea, and eukaryotes, with the closest relationship being between the Archaea and eukaryotes (73) (see below). The evolutionary picture painted by the synthetases, however, is a world apart from this canonical pattern. Not only do the phylogenies fail to yield the canonical pattern in a number of cases, but also they typically violate the accepted taxonomic structure within the organismal domains. Furthermore, the molecular phylogenies inferred from the synthetases of different amino acid types tend not to agree with one another---but this is the telling point.

Why should the synthetases show such atypical and disparate evolutionary pictures? The answer again is clear. The AARSs are in essence modular components of the cell; they function in isolation from the rest of the translation apparatus and from the rest of the cell, except for their individual contacts in each case with a small subset of the tRNAs (58). Because of this and because of their universality, the AARSs can function in a wide spectrum of cellular environments, often without disadvantage to the host. In other words, the AARSs are ideal candidates for widespread horizontal gene transfer, and the evidence certainly indicates this, since quite a few examples are known in which two different AARSs for the same amino acid coexist in the same organism. Versions of a given enzyme characteristic of the Archaea can be seen scattered among the bacterial taxa (see below). Versions characteristic of the eukaryotes have been seen in the Bacteria or in the Archaea. Within the Bacteria alone, the different bacterial subtypes of a given enzyme intermix among and within the taxa. There is no set pattern to all this; there is merely evidence consistent with frequent, widespread, indiscriminate horizontal gene transfer.

As suggested above, it is tempting to view the evolution of aminoacyl-tRNA synthesis as a study in horizontal gene transfer from top to bottom: at the deepest level, horizontal replacements involving the ancestors of the two synthetase classes, then replacements that gave rise to the phylogenetic structure within each class, and, finally, the replacements involving the different (modern) synthetases that use the same amino acid.


EVOLUTIONARY PROFILES OF THE INDIVIDUAL AMINOACYL-TRNA SYNTHETASES
Top
Previous
Next
References

We now examine in some detail the evolutionary profiles for each of the 20 aminoacyl-tRNA synthetases, with the principal objective of determining the extent to which each conforms to the canonical phylogenetic pattern (defined below) and asking what, if anything, the exceptions to canonical pattern tell us about these enzymes and about stages in the evolution of the cell.

The organisms mentioned in the figures and tables are listed in Table 2.

                              
View this table:
[in this window]
[in a new window]
 
TABLE 2.   Organisms listed in figures and tables

The analysis presented is a synthesis of four approaches: (i) conventional phylogenetic trees (see Fig. 2 caption); (ii) visual inspection of alignments to reveal qualitative differences not apparent from the other analyses; (iii) dipeptide similarity matrices (see Table 3 footnotes); and (iv) signature analysis. Signatures are defined in terms of positions in the alignment wherein at least 80% of the members of a given group show a constant composition but one that is found elsewhere in the alignment no more that once within some larger phylogenetic taxonomic context. For example, spirochete signatures would usually be relative to all other bacterial groups but not relative to the more distantly related archaeal and eukaryotic versions of the enzyme.

Because the canonical evolutionary pattern is central to our thesis and because the tRNA-charging enzymes exhibit different partial forms of that pattern, it is necessary to begin by explaining clearly what we mean by the phrase. In its essence (which we will call the basal canonical pattern), the canonical pattern is defined by the relationship between the bacterial and archaeal versions of a given molecule. For the basal canonical pattern to hold, regardless of how many subtypes of a given protein exist, it must be possible to distinguish strongly between characteristic bacterial and archaeal versions of the molecule. This distinction should be a pronounced quantitative one (on the level of sequence similarities) and/or a qualitative one (evident in terms of gross areas in a sequence alignment wherein homology between the two is only weakly evident or nonexistent). In other words, for these two organismal domains, the interdomain differences between the characteristic archaeal and bacterial proteins must far outweigh any intradomain differences: the two must appear to differ in genre. For the full canonical pattern to hold, there must also then exist a characteristic eukaryotic version(s) of the molecule that is distinguishable from both the archaeal and the bacterial versions but which is clearly of the archaeal genre. Tables 3 and 4 are representative dipeptide similarity matrices for two aminoacyl-tRNA synthetases typical of those showing canonical pattern (PheRS and TyrRS), while Tables 5 and 6 are matrices for enzymes (SerRS and CysRS) that do not show canonical pattern.

                              
View this table:
[in this window]
[in a new window]
 
TABLE 3.   Dipeptide similarity matrix for a representative sampling of taxa from the PheRS sequence alignmenta


                              
View this table:
[in this window]
[in a new window]
 
TABLE 4.   Dipeptide similarity for a representative sampling of taxa from the TyrRS sequence alignmenta


                              
View this table:
[in this window]
[in a new window]
 
TABLE 5.   Dipeptide similarity for a representative sampling of taxa from the SerRS sequence alignmenta


                              
View this table:
[in this window]
[in a new window]
 
TABLE 6.   Dipeptide similarity for a representative sampling of taxa from the CysRS sequence alignmenta

The aminoacyl-tRNA synthetases are considered individually below in an order defined by their corresponding codons. We have not included most of the mitochondrial data in the analysis, because doing so would add nothing to our conclusions and would needlessly complicate an already complex picture (27, 29).

Synthetases for the NUN-Encoded Amino Acids

Phe; UUY; class II; tetramer of alpha - and beta -subunits. PheRS is the only class II synthetase in the NUN codon group, and it has no close relatives within that class. Not surprisingly, both the alpha - and beta -subunits present the same evolutionary picture; their sequences are combined to produce Fig. 2. PheRS shows the classical full canonical pattern, the only exception being the spirochete PheRSs, which are of the archaeal, not the bacterial genre, and which seem to be specifically related to the Pyrococcus PheRS within that grouping, as sequence signature analysis suggests and Fig. 2 confirms.


View larger version (28K):
[in this window]
[in a new window]
 
FIG. 2.   Phylogenetic tree of PheRS sequences. Aligned protein sequences were evaluated for the 1,000 most parsimonious trees (61), using amino acid replacement costs based on the BLOSUM 45 matrix (31, 41). Of these trees, we retained the topology with the maximum likelihood of giving rise to the data under the JTT model in protml version 2.2 of the MOLPHY package (1). Additional optimization was performed by removing sequences, one or more at a time, from the tree and using maximum likelihood to select the best from among the 100 to 500 most parsimonious alternative placements and to assign lengths to the branches. The tree has been rooted between the Bacteria and the Archaea plus Eukarya. The sequence identifiers correspond to an organism defined in Table 2, followed by the one-letter amino acid code (in this case F). Sometimes suffixes 1 and 2 are used; they refer to the fact that the given organism contains more than one specific AARS. Bacteria are shown in red, Archaea in blue, and eukaryotes in yellow.

For both the alpha - and beta -subunits of PheRS, significant length differences distinguish the bacterial subunits from their archaeal counterpart. The bacterial alpha -subunit is about 120 amino acids shorter than the archaeal/eukaryotic alpha -subunit at its N terminus, and the first 90 amino acids of the bacterial sequence show little or no similarity to the archaeal/eukaryotic counterpart. However, for the beta -subunit, the bacterial version is the longer, by approximately 250 amino acids. At both termini the bacterial version of the beta -subunit extends beyond the archaeal/eukaryotic version by about 100 amino acids; in the N-terminal ~50 amino acids, the archaeal version of the beta -subunit shows no recognizable similarity to its bacterial counterpart. In addition, large sequence gaps distinguish the two genres in the interior of the beta -subunit.

Leu; UUR and CUN; class I; monomer. LeuRS conforms to the full canonical pattern as well, in this case without exception. A striking lack of similarity in various regions of the molecule distinguishes the bacterial and archaeal genres of LeuRS, and a number of sizable insertion and deletion differences distinguish the two genres throughout the alignment. A nearly total lack of sequence similarity between the two is seen in the C-terminal (KMSK) section of the molecule.

Within the Bacteria, however, the accepted phylogenetic relationships are not all preserved---at least two distinct bacterial subtypes of the molecule exist and have obviously migrated horizontally. The best-defined bacterial subtype (by all methods of analysis) is that common to the majority of gram-positive bacteria (and relatives), the spirochetes, chlamydias, and the Cytophaga-Chlorobium grouping (represented by Chlorobium tepidum and Porphyromonas gingivalis). However, this grouping fails to include Clostridium acetobutylicum, a gram-positive species whose LeuRS groups with that of Deinococcus in Fig.
3 (a relationship supported by sequence signature). On the other hand, the proteobacteria (Escherichia coli and relatives) do form a grouping quite consistent with their established phylogeny (Fig. 3).


View larger version (28K):
[in this window]
[in a new window]
 
FIG. 3.   Phylogenetic tree of LeuRS sequences. The tree was rooted using ValRS, IleRS, and MetRS sequences. Other details are as in Fig. 2.

Ile; AUH; class I; monomer. IleRS also shows the full canonical pattern. As with LeuRS, this fact is obvious upon visual inspection of the alignment, especially its C-terminal section, wherein the bacterial and archaeal genres exhibit very little sequence similarity and show major alignment gaps relative to one another. However, as all methods of analysis clearly show, a sizable minority of bacterial taxa possess an IleRS of the archaeal rather than the bacterial genre (Fig. 4). All of these bacterial examples are specifically related to their eukaryotic counterparts, with the closest relationship being between the eukaryotes and a bacterial subgroup comprising the spirochetes, chlamydias, Mycobacterium, and Rickettsia. Note in Fig. 4 the specific relationship between the IleRS of Mycobacterium and that of Rickettsia, which is strongly suggested by sequence signature as well. Also note the relationship between the C. acetobutylicum IleRS and the plasmid-borne IleRS found in mupirocin-resistant strains of Staphylococcus aureus; this relationship is also supported by sequence signature.


View larger version (25K):
[in this window]
[in a new window]
 
FIG. 4.   Phylogenetic tree of IleRS sequences. The tree has been rooted using ValRS, LeuRS, and MetRS sequences. Other details are as in Fig. 2.

Met; AUG; class I; homodimer. Methionine presents one of the more complex evolutionary profiles among the aminoacyl-tRNA synthetases. The enzyme marginally shows the canonical picture: the majority of bacterial examples---the group represented by Helicobacter in Fig. 5---define a bacterial genre, while the archaea, eukaryotes and a number of bacterial MetRSs constitute the archaeal genre (Fig. 5). However, there is another bacterial grouping, confined to the beta  and gamma  proteobacteria, which is of the archaeal genre (Fig. 5). The difference between the bacterial and archaeal genres of MetRS is not as extreme as that seen for the other members of the NUN codon group. However, one large alignment gap (~25 amino acids) separates the bacterial genre from all others (the latter appear to contain a metal binding region at this point, the consensus sequence of which is CP . C . . . . . a . gD . C . . C . . . . . . . . . . L (where lowercase signifies its presence in only four of the five groupings involved). A strong signature distinguishes the bacterial genre from the others, and its distinctiveness is also evident in a dipeptide similarity matrix.


View larger version (28K):
[in this window]
[in a new window]
 
FIG. 5.   Phylogenetic tree of MetRS sequences. The tree has been rooted using LeuRS, IleRS, and ValRS sequences. Other details are as in Fig. 2.

Relationships within the archaeal genre are themselves complex. The closest relatives of the eukaryotic MetRSs are those of the spirochetes (convincingly demonstrated by sequence signature). The archaea appear paraphyletic; the crenarchaeal examples do not group with their euryarchaeal counterparts to the exclusion of all the bacterial examples (Fig. 5). MetRSs of the bacterial genre (the group represented by Helicobacter) present a mixed phylogenetic picture. The low-G+C gram-positive Bacteria (Bacillus and relatives) cluster well. (Although Fig. 5 does not indicate it, by signature analysis the mycoplasmas do seem to be a part of this grouping.) However, the high-G+C gram-positive representative, Mycobacterium MetRS, falls elsewhere within the tree, and again shows a clear specific relationship to the MetRS of Rickettsia; this is also supported by sequence signature. (Note that the rickettsial MetRS is of a different genre from the MetRSs of the other alpha  proteobacterial representatives.)

The C-terminal domain of MetRS, about 150 amino acids in length, can take one of three forms: (i) it can be covalently linked to the rest of the molecule, as in most bacteria and most archaea; (ii) it can be completely missing, as in a number of bacteria, e.g., cyanobacteria and mycoplasmas; or (iii) it can be present but not covalently linked to the rest of the molecule, as in all eukaryotes (except Caenorhabditis elegans, where it is covalently linked), in Aquifex, and in the Crenarchaeota. In eukaryotes, this separate protein, known as Arc1p, occurs as a part of some higher-order complexes involving eukaryotic synthetases, wherein it is involved in amino acid recognition (54). The C termini of these proteins extend about 60 amino acids beyond the normal C-terminus of MetRS. It is interesting that the spirochete MetRSs (in which the C terminus is a covalently linked part of the molecule) also extend beyond the normal C terminus of MetRS, and in this extension, they show homology to sequences in the Arc1p family, providing further support for a specific relationship between the spirochete and eukaryote enzymes (Fig. 5).

Arc1p-like domains can be seen in a few other aminoacyl-tRNA synthetases as well. Approximately 100 residues of the N terminus of the beta -subunit of bacterial PheRS is homologous to a portion of Arc1p. Mammalian TyrRS (only) has appended to its C terminus a more extensive homolog, which is impressively similar to the MetRS extensions just discussed. Also, it has been demonstrated in the mammalian TyrRS case that the extension functions not as Arc1p (i.e., in amino acid recognition) but as a cytokine (64).

Val; GUN; class I; monomer. The valine-charging enzyme conforms only to the basal canonical pattern; the eukaryotic ValRSs are not archaeal in nature but obviously bacterial (Fig. 6). Also, within the bacterial group a 37-amino-acid insertion in the alignment found only in the eukaryotes and alpha , beta , and gamma  proteobacteria suggests a specific relationship among them. The distinction between the archaeal and bacterial genres of ValRS is again a strong one and is manifested most strongly in the C-terminal (KMSK) portion of the molecule. The rickettsial ValRS, alone among the bacterial examples, is of the archaeal genre, seemingly specifically related therein to the ValRS of the crenarchaeon Pyrobaculum aerophilum; this relationship is supported by sequence signature.


View larger version (26K):
[in this window]
[in a new window]
 
FIG. 6.   Phylogenetic tree of ValRS sequences. The tree was rooted using IleRS, MetRS, and LeuRS sequences. Other details are as in Fig. 2.

Synthetases for the NCN-Encoded Amino Acids

Serine, threonine, and proline have related structures, codons, and aminoacyl-tRNA synthetases; in this last respect, the group also encompasses histidine and glycine. (However, as mentioned above, only one of the two unrelated GlyRS forms shows the relationship.)

Ser; UCN and AGY; class II; homodimer. The seryl-tRNA synthetase is of particular interest for two reasons: (i) it clearly fails to conform to the canonical pattern, and (ii) there are two distinct serine-charging enzymes, a very rare form that has been found so far only in M. thermoautotrophicum and the two Methanococcus species examined and a major form that has been found in all other organisms. Although both the major and minor forms of SerRS belong to the above-mentioned Ser-Thr-Pro supercluster, it is unclear whether the two are specifically related to one another therein. (The minor form is not included in Fig. 7.)


View larger version (27K):
[in this window]
[in a new window]
 
FIG. 7.   Phylogenetic tree of SerRS sequences. The tree has been rooted using ThrRS sequences. The unusual SerRSs of the methanogens, which may not be related to other SerRSs, have not been used in constructing the figure. Other details are as in Fig. 2.

Although one can see an archaeal and an eukaryotic grouping in Fig. 7 and the two are specifically related, the true canonical pattern is not exhibited. For example, no alignment gaps separate the archaeal from the bacterial type, and intergroup dipeptide similarities are not strikingly lower than intragroup similarities in general (Table 5). There is considerable evidence suggestive of SerRS horizontal gene transfers. The halobacterial SerRS, for example (62), is not related to other archaeal examples but almost certainly is bacterial in origin, apparently stemming from the group that comprises Porphyromonas and Chlorobium. Two unrelated eukaryotic SerRS groups exist, one of them seemingly related to the main group of archaeal SerRSs and the other (which comprises the Drosophila, plant, and second yeast SerRSs) specifically related to the spirochete SerRSs. Since these latter eukaryotic SerRSs---all clearly related by signature sequence---seem to be mitochondrial, their relationship to spirochetes rather than proteobacteria becomes of interest.

Pro; CCN; class II; homodimer. ProRS exhibits the full canonical pattern but again with exceptions. The bacterial genre is distinguished from the archaeal by having an insertion of about 180 amino acid residues not seen in the latter at approximate (E. coli) position 190, while the latter extends at the C terminus of the molecule for about 70 residues beyond the former. Dipeptide similarities between the two genre are remarkably low.

The ProRSs of a few bacterial taxa, i.e., the mycoplasmas, Deinococcus, Chlorobium, Porphyromonas, and Borrelia (but not Treponema), are of the archaeal genre, and the eukaryotic enzymes (with the exception of that from Giardia) are included in this phylogenetic grouping; sequence signature analysis shows a sister relationship therein to the genera Borrelia, Chlorobium, and Porphyromonas, which the Fig.
8 tree confirms.


View larger version (26K):
[in this window]
[in a new window]
 
FIG. 8.   Phylogenetic tree of ProRS sequences. The tree was rooted using ThrRS, HisRS, and GlyRS sequences. Other details are as in Fig. 2.

Thr; ACN; class II; homodimer. Like its valyl counterpart, ThrRS exhibits only the basal canonical pattern, with the eukaryotic versions of the enzyme being bacterial rather than archaeal in nature. The bacterial and archaeal genres are readily distinguished by sizable additions/deletions in the N-terminal ~250 amino acids or so of the alignment, and evidence of similarity between the two in this portion of the molecule is minimal (Fig. 9). Two of the three available crenarchaeal ThrRSs add a further complication to the picture (see below).


View larger version (27K):
[in this window]
[in a new window]
 
FIG. 9.   Phylogenetic tree of ThrRS sequences. The tree has been rooted using ProRS, GlyRS, and HisRS sequences. Other details are as in Fig. 2.

The bacterial ThrRSs break into subtypes, subtly distinguished from one another yet very evident from the fact that they violate established organismal phylogenies and the fact that B. subtilis possesses two ThrRSs, one of each subtype. Among these taxonomic violations are (i) a grouping of Thermotoga with C. acetobutylicum and one of the two B. subtilis ThrRSs; (ii) a cluster comprising Borrelia, Aquifex, Mycobacterium, and (probably) Helicobacter; and (iii) the clustering of Treponema pallidum with the proteobacteria.

Among the archaea, a close specific relationship is seen between the Pyrococcus and Archaeoglobus ThrRSs, as well as between those of the two methanogens. However, the most striking feature of the archaeal enzymes is that two crenarchaeal examples, Sulfolobus and Aeropyrum, are highly atypical in that for a stretch of about 330 amino acid residues beginning at approximate position 150 (M. jannaschii numbering) these two contain no more than 150 residues in this region, which exhibit no detectable homology to any other sequences in the ThrRS alignment. However, in both cases, a second, unlinked ThrRS-related gene exists that basically covers the region in question (plus a bit more), shows homology therein to other sequences in the alignment, and has by far the highest similarity to the bacterial, not the archaeal, versions. Note, however, that this strange chimeric type of ThrRS is not found in a third crenarchaeon, Pyrobaculum. (The second peptide of the Sulfolobus and Aeropyrum ThrRSs has not been used in the calculations upon which Fig. 9 is based.)

Ala; GCN; class II; homotetramer. Although a class II enzyme, AlaRS is not a member of the supercluster that contains the other NCN-associated synthetases. The archaeal and bacterial forms of the enzyme are clearly distinguished by dipeptide similarities, sequence signature, and a few small but significant insertions and deletions in the alignment; the N terminus of the archaeal form also begins some 50 amino acids before the bacterial one does (Fig. 10).

Although the canonical pattern holds for the AlaRS, it is only the basal canonical pattern, since the eukaryotic AlaRSs (except for that of Giardia) cluster with the bacterial AlaRSs; and within that grouping they appear to be specifically related to the Chlorobium-Porphyromonas cluster (Fig. 10), a relationship that is supported by sequence signature. The Giardia AlaRS, however, is of the archaeal genre. This is confirmed by a strong sequence signature, which is also consistent with Giardia's position in Fig. 10 as an outgroup to the archaeal clade. The spirochete AlaRSs, although clearly of the bacterial genre, are highly derived. They both show two characteristic large deletions, one interior and the other C-terminal.


View larger version (27K):
[in this window]
[in a new window]
 
FIG. 10.   Phylogenetic tree of AlaRS sequences. The tree has been rooted between the bacterial sequences and the archaeal (plus Giardia) sequences. Other details are as in Fig. 2.

Synthetases for the NAN-Encoded Amino Acids

Tyr; UAY; class I; homodimer. The TyrRS makes a strong canonical distinction (Fig. 11 and Table 4). In the C-terminal (KMSK) section of the molecule there is very little similarity between the TyrRSs of the bacterial and archaeal genres, and a number of insertion-deletion differences distinguish the two throughout the molecule as well.


View larger version (28K):
[in this window]
[in a new window]
 
FIG. 11.   Phylogenetic tree of TyrRS sequences. The tree has been rooted using TrpRS sequences. Other details are as in Fig. 2.

Two distinct subtypes of bacterial TyrRS can be seen, and these distinguish members of various taxa from one another. Among the enteric-vibrio subgroup of the gamma  proteobacteria, E. coli, Salmonella, and Yersinia exhibit the first type while Haemophilus, Actinobacillus, and Vibrio exhibit the second. Among the beta  proteobacteria, Neisseria exhibits the first type while Bordetella and Thiobacillus exhibit the second. Porphyromonas and its relative Chlorobium are phylogenetically split in this way too. B. subtilis and C. acetobutylicum each contain TyrRSs of both subtypes.

Within the archaeal genre, the eukaryotic and archaeal TyrRSs are intermixed. The euryarchaeal enzymes (except for those of the pyrococci) cluster specifically with the animal and fungal TyrRSs, while the three crenarchaeal TyrRSs (and those of the pyrococci) group with the two plant examples (Arabidopsis and tobacco). Sequence signatures strongly support this entire phylogenetic arrangement.

His; CAY; class II; homodimer. HisRS also shows the full canonical pattern. However, as signature analysis indicates and Fig. 12 confirms, a small group of bacterial taxa---spirochetes, Helicobacter, C. acetobutylicum, Caulobacter, and Porphyromonas---have HisRSs of the archaeal genre. This bacterial grouping in turn encompasses the eukaryotic HisRSs, which shows a specific relationship to Porphyromonas HisRS therein, a relationship supported by sequence signature.


View larger version (26K):
[in this window]
[in a new window]
 
FIG. 12.   Phylogenetic tree of HisRS sequences. The tree has been rooted using ThrRS, ProRS, and GlyRS sequences. Other details are as in Fig. 2.

Gln; CAR; class I; monomer. It has been convincingly demonstrated that GlnRS stems specifically from the eukaryotic lineage of GluRSs (53). Not only is this evident at the sequence level, but also it has been demonstrated in terms of the overall structure of the molecule (46). In its N-terminal (HIGH) region, the GlnRS sequence is decidedly more similar to eukaryotic than to archaeal GluRSs (and least similar of all to bacterial GluRS). In the C-terminal (KMSK) region, the similarities of GlnRS to the eukaryotic and archaeal versions of GluRS are roughly comparable but sequence similarity to the bacterial GluRS is effectively nonexistent (Fig. 13).


View larger version (15K):
[in this window]
[in a new window]
 
FIG. 13.   Phylogenetic tree of GlnRS sequences. The tree has been rooted using GluRS sequences. Other details are as in Fig. 2.

A GlnRS of a single type seems to occur in all eukaryotes; this generalization is based not only on animals, plants, and fungi but also upon the slime mold Dictyostelium, Trichomonas, and Nosema, with the last two representing deeply branching eukaryotic lineages (8). However, GlnRS is absent from the Archaea, and among bacteria its distribution is very sparse; it is found only in the beta  and gamma  subdivisions of the Proteobacteria, the Deinococcus-Thermus division, and Porphyromonas. In other words, among bacteria known not to contain GlnRS are representatives of the alpha  and varepsilon  subdivisions of the Proteobacteria, the gram-positive bacteria, the cyanobacteria, the spirochetes, the chlamydias, and the genera Aquifex and Thermotoga. The only specific phylogenetic relationship apparent among the bacterial versions of the GlnRS is the proteobacterial grouping, but the proteobacterial representatives are not strongly distinguished from the other bacterial GlnRSs. Indeed, one of the beta  proteobacteria, Bordetella pertussis, has a GlnRS that appears specifically related to that found in Porphyromonas, a relationship reinforced by a sequence signature.

Asn; AAY; class II; homodimer. Although they represent different synthetase classes (II and I respectively), in their evolutions the AsnRS and GlnRS families have much in common (Fig. 14). Both arose from within the cluster of the synthetases for their corresponding diacid. For glutamine, it was from the eukaryotic lineage per se that the enzyme arose, while for asparagine, the origin is localized only to the archaeal genre of AspRS in general. In both instances it is in the C-terminal portion of the molecule that the origin of the synthetase for the amidated amino acid is most strikingly seen. As is the case for GluRSs (see below), BLAST sequence similarity searches based upon the C-terminal 40% of the archaeal and eukaryotic AspRS have much higher scores with one another and with the AsnRSs. The root of the AsnRS tree itself separates the eukaryotic AsnRSs from their counterparts (see Fig. 16). The root of a combined phylogenetic tree for the AsnRS and AspRS enzymes (rooted by LysRS) occurs between the bacterial AspRSs and the grouping of archaeal and eukaryotic AspRSs with the AsnRSs.


View larger version (21K):
[in this window]
[in a new window]
 
FIG. 14.   Phylogenetic tree of AsnRS sequences. The tree has been rooted using AspRS sequences. Saccharomyces cerevisiae N1 is the cytoplasmic enzyme, while S. cerevisiae N2 is the mitochondrial enzyme. Other details are as in Fig. 2.

The spotty distribution among the organismal taxa characteristic of GlnRS is seen for AsnRS as well, but to a lesser degree. AsnRS appears to be present in all eukaryotes but occurs in only two archaea, Pyrococcus and Pyrobaculum. AsnRS is more widely distributed among the bacteria but still is definitely absent in a number of taxa, i.e., Aquifex, Thermotoga, some proteobacteria (Neisseria, Pseudomonas, and Helicobacter), Mycobacterium, and Chlamydia.

Given their relatively late origins (see above), it is not surprising that the asparagine- and glutamine-charging enzymes show no real evidence of the canonical pattern. There are, for example, no significant areas of deletion-insertion in the alignment that would distinguish an archaeal from a bacterial genre. Dipeptide analysis does not show the pronounced differences between inter- and intra-group similarities, as the canonical pattern requires. Furthermore, no strong signature distinguishes an archaeal from a bacterial version of the enzyme.

The bacterial AsnRSs show two very distinctive subtypes, which are marginally specifically related at best. The first subtype is phylogenetically the more widespread, covering all characterized proteobacterial examples, Porphyromonas, the spirochetes, and the mycoplasmas. The second covers the Bacillus-Lactobacillus area of the gram-positive tree (although not the mycoplasmas) plus the Deinococcus-Thermus division. This second subtype, however, shows more similarity to the AspRSs than does the first subtype, suggesting that the second subtype has retained more ancestral character than have other AsnRSs.

The two known archaeal AsnRSs are specifically related to one another. The eukaryotic AsnRSs, however, fall into two unrelated groupings: the animal and fungal AsnRSs constitute one (distinct from all other AsnRS groups), while the yeast mitochondrial, plant, and Plasmodium enzymes distribute within the first bacterial subtype (and may all be mitochondrial) (Fig. 14).

Lys; AAR; class I (monomer) and class II (homodimer). LysRS represents the only known violation of the class rule: a class II LysRS is found in eukaryotes, most bacteria, and a few archaea (i.e., Sulfolobus and Pyrobaculum) (Fig. 15). However, a class I LysRS is found in the euryarchaeotes, two other members of the Crenarchaeota (Cenarchaeum and Aeropyrum), and a scattering of bacteria (34). The class II LysRSs clearly had a common ancestor with the AspRSs and AsnRSs in the deep past, but the class I enzyme stands essentially alone phylogenetically within its class.


View larger version (23K):
[in this window]
[in a new window]
 
FIG. 15.   Phylogenetic trees of LysRS sequences. (A) The class II synthetases were rooted using AspRS sequences. (B) The class I (archaeal type) sequences were rooted between the crenarchaeal (plus Pyrococcus) and other euryarchaeal sequences. Other details are as in Fig. 2.

The bacterial class II LysRSs are all of a kind and contain the grouping of the two above-mentioned crenarchaeal examples. No specific relationship exists between these archaeal examples and their eukaryotic counterparts, and the canonical pattern is not apparent. The phylogenetic grouping of the class I LysRSs shows the bacterial and archaeal examples to be intermixed. The crenarchaeon Cenarchaeum groups specifically with the known examples of the alpha  proteobacteria, except for Rhizobium meliloti, whose LysRS is class II; however, Aeropyrum, also a crenarchaeon, is not a member of this group. The other known bacterial examples, the spirochete and Streptomyces LysRSs, as a group show a specific relationship to the Pyrococcus enzyme, while the remaining (eury)archaeal examples appear in an outgroup relationship to all those just discussed (Fig. 15).

Asp; GAY; class II; homodimer. The AspRSs strongly exhibit the full canonical pattern: a single bacterial type exists, which differs dramatically from the AspRSs of the archaeal genre. In the interior of the AspRS sequence alignment, a stretch of about 220 amino acids in the bacterial genre (starting at ca. position 250 in the E. coli sequence) shows almost no similarity to the corresponding (~100-amino-acid) section in the archaeal genre. Sequence similarity resumes thereafter at ca. bacterial position 470 and continues to the C terminus of the molecule, slightly more than 100 amino acids distant (Fig. 16). Because the AsnRS has arisen from within the grouping of the AspRSs (see above), the latter must be considered paraphyletic, which breaks the monophyly rule.


View larger version (26K):
[in this window]
[in a new window]
 
FIG. 16.   Phylogenetic tree of AspRS sequences. The origin of the AsnRS sequences is also shown (designated by the N rather than the D suffixes). The tree was rooted using LysRS sequences. Other details are as in Fig. 2.

The arrangement of the AspRSs in all three major groups (bacterial, archaeal, and eukaryotic) does not violate established taxonomy except in minor ways in the bacteria. For example, the Cytophaga-Chlorobium clade (represented by Porphyromonas and Chlorobium) is split by the AspRSs: the Chlorobium version shows remarkably close relationship to the three examples (Rhodobacter, Caulobacter, and Rickettsia) of the alpha  proteobacteria, while the Porphyromonas enzyme shows no clear specific relationships to any other bacterial AspRS.

Glu; GAR; class I; monomer. Again, the full canonical pattern is strongly evident; the difference between the bacterial genre and its archaeal counterpart is striking. Not only are the bacterial examples about 100 amino acids shorter than the archaeal sequences at the N terminus, but also in the C-terminal (KMSK) section of the molecule the difference between them is extreme: the two show no resemblance, in either sequence or overall structure (46). (BLAST searches based on the archaeal and eukaryotic examples of this region readily detect one another and also the comparable region of all GlnRSs but never detect their bacterial counterparts.) Because the GlnRS has arisen from within the GluRS cluster, the latter breaks the monophyly rule. The bacteria show at least two subtypes of GluRS, which are specifically related to one another (to the exclusion of the GluRSs of the archaeal genre), and a number of bacterial species contain two GluRSs as well (24), all of which makes for a somewhat confusing phylogenetic picture (Fig. 17). It is worth noting that a rather clear grouping emerges that includes the spirochetes, the Cytophaga-Chlorobium group, the Deinococcus-Thermus division, the chlamydias, and two proteobacteria, i.e., Pseudomonas (gamma  division) and Rhizobium (alpha  division).


View larger version (24K):
[in this window]
[in a new window]
 
FIG. 17.   Phylogenetic tree of GluRS sequences. The tree was rooted between the Bacteria and the Archaea plus Eukarya. The M. thermoautotrophicum Delta H sequence was used. Other details are as in Fig. 2.

Synthetases for the NGN-Encoded Amino Acids

Cys; UGY; class I; monomer. The mechanism of Cys-tRNA formation in M. jannaschii and M. thermoautotrophicum has until now been a mystery. Nothing identifiable as a CysRS was seen in their (complete) genomes. However, a normal functioning CysRS has been identified in Methanococcus maripaludis, a close relative of M. jannaschii (30, 40). Did a third, unrecognized synthetase class exist in these cases, or could the cysteine tRNA be charged indirectly, as in the case of selenocysteinyl-tRNA (11, 33)? The possibility that the highly aberrant SerRS found in M. jannaschii and M. thermoautotrophicum is somehow related to the lack of recognizable CysRS in these organisms was considered (37), the rationale being that such a SerRS might form Ser-tRNACys, which would be a key intermediate in Cys-tRNA formation by a tRNA-mediated amino acid transformation pathway (33). However, in vitro data did not support this view (37). Instead, biochemical and genetic approaches have now revealed that in M. jannaschii and M. thermoautotrophicum, ProRS is able to specifically synthesize both Cys-tRNACys and Pro-tRNAPro (60). This unprecedented dual functionality in an AARS is not reflected in any distinguishing features of these ProRSs at the sequence level. Interpretation of the evolutionary significance of this unexpected versatility among AARSs must now await more detailed biochemical description of its phylogenetic distribution.

As can be inferred from Table 6, the CysRSs do not exhibit the canonical pattern. There is also considerable evidence of interdomain horizontal gene transfer, particularly involving the archaeal CysRSs: In Fig. 18, four of the archaeal CysRSs do cluster. However, the M. maripaludis enzyme (see above) is disturbingly similar to that from Pyrococcus, with pair showing 65% sequence identity (40). Three other archaeal CysRSs, from Methanosarcina, Archaeoglobus, and Cenarchaeum, group among the bacterial examples of the enzyme but show no phylogenetic relationship to one another therein. By contrast, the relationships among the bacterial CysRSs in Fig. 18 are not particularly out of kilter with established bacterial taxonomy, which might suggest that the horizontal gene transfers have been mainly from the Bacteria to the Archaea.


View larger version (26K):
[in this window]
[in a new window]
 
FIG. 18.   Phylogenetic tree of CysRS sequences. The tree was rooted between the largest group of archaeal sequences and all others. Other details are as in Fig. 2.

Trp; UGG; class I; homodimer. TrpRS is an obvious relative of TyrRS (19), although, as mentioned above, their corresponding codons are not related. The tryptophan enzyme conforms to the full canonical pattern, which can be inferred from Fig. 19, dipeptide similarity matrices, and striking sequence signatures. The TrpRSs of the archaeal genre show a substantial N-terminal extension relative to those of the bacterial genre. Within the bacterial genre, a number of subtypes can be recognized, and two organisms possess two TrpRSs, each of a different bacterial subtype (Fig. 19). By signature analysis, five bacterial subtypes can be ident