Previous Article | Next Article ![]()
Microbiology and Molecular Biology Reviews, March 2000, p. 202-236, Vol. 64, No. 1
Department of Microbiology, University of
Illinois, Urbana, Illinois 618011;
Center for Biomolecular Recognition, Department of Medical
Biochemistry and Genetics, Laboratory B, The Panum Institute, DK-2200
Copenhagen N, Denmark2; and Department
of Molecular Biophysics and Biochemistry and Department of
Molecular, Cellular and Developmental Biology, Yale University, New
Haven, Connecticut 06520-81143
1092-2172/00/$04.00+0
Copyright © 2000, American Society for Microbiology. All rights reserved.
Aminoacyl-tRNA Synthetases, the Genetic Code, and
the Evolutionary Process
SUMMARY
INTRODUCTION
BIOCHEMICAL SKETCH OF AMINOACYL-TRNA
SYNTHESIS
EVOLUTIONARY OVERVIEW OF AMINOACYL-TRNA
SYNTHETASES
ORDER IN THE GENETIC CODE
AMINOACYL-TRNA SYNTHETASE EVOLUTION: RECURRING
GENE TRANSFER
EVOLUTIONARY PROFILES OF THE INDIVIDUAL
AMINOACYL-TRNA SYNTHETASES
Synthetases for the NUN-Encoded Amino Acids
Phe; UUY; class II; tetramer of
- and
-subunits.
Leu; UUR and CUN; class I; monomer.
Ile; AUH; class I; monomer.
Met; AUG; class I; homodimer.
Val; GUN; class I; monomer.
Synthetases for the NCN-Encoded Amino Acids
Ser; UCN and AGY; class II; homodimer.
Pro; CCN; class II; homodimer.
Thr; ACN; class II; homodimer.
Ala; GCN; class II; homotetramer.
Synthetases for the NAN-Encoded Amino Acids
Tyr; UAY; class I; homodimer.
His; CAY; class II; homodimer.
Gln; CAR; class I; monomer.
Asn; AAY; class II; homodimer.
Lys; AAR; class I (monomer) and class II (homodimer).
Asp; GAY; class II; homodimer.
Glu; GAR; class I; monomer.
Synthetases for the NGN-Encoded Amino Acids
Cys; UGY; class I; monomer.
Trp; UGG; class I; homodimer.
Arg; CGN and AGR; class I; homodimer.
Gly; GGN; class II; heterotetramer or homodimer.
Recruitment of Aminoacyl-tRNA Synthetases into Other
Roles
Summary of Aminoacyl-tRNA Synthetase Evolutionary
Profiles
PHYLOGENETIC PICTURE EMERGING FROM THE
AMINOACYL-TRNA SYNTHETASES
Spirochetes
Cytophaga-Chlorobium Kingdom
Deinococcus-Thermus Division
Proteobacteria (Purple Bacteria)
Gram-Positive Bacteria
Rickettsias, Mycoplasmas, and Certain Other Bacteria
Archaea
EVOLUTIONARY SYNTHESIS
Evolutionary Significance of the Canonical Pattern
Some General Evolutionary Matters
CONCLUSION
ACKNOWLEDGMENTS
REFERENCES
SUMMARY
|
|
|---|
The aminoacyl-tRNA synthetases (AARSs) and their relationship to the genetic code are examined from the evolutionary perspective. Despite a loose correlation between codon assignments and AARS evolutionary relationships, the code is far too highly structured to have been ordered merely through the evolutionary wanderings of these enzymes. Nevertheless, the AARSs are very informative about the evolutionary process. Examination of the phylogenetic trees for each of the AARSs reveals the following. (i) Their evolutionary relationships mostly conform to established organismal phylogeny: a strong distinction exists between bacterial- and archaeal-type AARSs. (ii) Although the evolutionary profiles of the individual AARSs might be expected to be similar in general respects, they are not. It is argued that these differences in profiles reflect the stages in the evolutionary process when the taxonomic distributions of the individual AARSs became fixed, not the nature of the individual enzymes. (iii) Horizontal transfer of AARS genes between Bacteria and Archaea is asymmetric: transfer of archaeal AARSs to the Bacteria is more prevalent than the reverse, which is seen only for the "gemini group." (iv) The most far-ranging transfers of AARS genes have tended to occur in the distant evolutionary past, before or during formation of the primary organismal domains. These findings are also used to refine the theory that at the evolutionary stage represented by the root of the universal phylogenetic tree, cells were far more primitive than their modern counterparts and thus exchanged genetic material in far less restricted ways, in effect evolving in a communal sense.
INTRODUCTION
|
|
|---|
The aminoacyl-tRNA synthetases (AARSs) have long fascinated
biologists. They are the linchpin of translation, the link between the
worlds of protein and nucleic acid. Their structures and functions, which have both practical and basic significance, are deserving of and
have received much attention. However, it is not only the structure-function aspect of these enzymes that has captured the biologist's imagination; it is also the possibility that they could
tell us the secrets of the genetic code. To understand these enzymes in
standard molecular terms is to add one more piece, a most important
one, to the puzzle of what the cell is, how it works. But to understand
them in evolutionary terms is to ask what the cell is in a deeper
sense, how it evolved, how life came to be
the biologist's ultimate
question. Reading the history written into the AARSs was not possible
previously for the simple reason that doing so requires molecular
sequences from a large number of these molecules, and the necessary
body of data was lacking. The progress of genomics in the late 1990s is
now providing the needed data, and a picture of AARS evolution is
beginning to emerge (5, 7, 15-18, 45, 75). In the present
review we examine the still murky image of synthetase evolution from a
slightly different perspective and bring forth more of its rich detail and evolutionary depth.
BIOCHEMICAL SKETCH OF AMINOACYL-TRNA
SYNTHESIS
|
|
|---|
In a departure from the long-accepted view (48) that
every cell harbors 20 aminoacyl-tRNA synthetases responsible for the synthesis of the set of 20 canonical aminoacyl-tRNA families, it is now
clearly established that there are at least two ways of forming
aminoacyl-tRNA (12, 33). The direct acylation of tRNA by
aminoacyl-tRNA synthetases is well understood; the ATP-dependent reaction (Fig. 1) is carried out by
enzymes which, in general, are exceedingly specific in selecting their
substrates, i.e., amino acid and tRNA. They fall into two classes of 10 based on the topology of their ATP binding domain; class I proteins
contain a Rossmann fold (characterized by the HIGH and KMSK motifs),
while class II enzymes possess an unrelated
-sheet arrangement and are characterized by three degenerate sequence motifs (3, 10, 14,
20). Examples of most of the aminoacyl-tRNA synthetases have been
structurally characterized, and it is expected that in the near future
the crystal structure of at least one enzyme from all these families
will be known (42, 43). There is also an indirect pathway of
aminoacyl-tRNA synthesis, tRNA-dependent amino acid modification (Fig.
1). This pathway relies on the acylation of tRNA with a "precursor"
amino acid by a nondiscriminating AARS (33). Currently our
knowledge of the discriminating versus nondiscriminating AARS is not
advanced enough to deduce this property from their amino acid sequence
alone. This "precursor" amino acid is then converted, while bound
to tRNA, to the correct amino acid (matching the tRNA specificity) by a
second, nonsynthetase enzyme, which recognizes only such a mischarged
aminoacyl-tRNA species. Our current knowledge about the number and
nature of these enzymes is still far from complete, but it is clear
that in many organisms this is the essential and only way to form
Asn-tRNA and Gln-tRNA (12, 13, 63).
|
EVOLUTIONARY OVERVIEW OF AMINOACYL-TRNA
SYNTHETASES
|
|
|---|
The assumption that translationally produced protein was a part of
the very first translation mechanism raises the chicken-and-egg paradox. However, there can be little doubt that once translation did
exist, proteins that facilitated tRNA charging would be among the first
proteins to evolve, the selective advantage of their specificity being
great. Thus, the evolutionary history of the current aminoacyl-tRNA
synthetases must go deep into translation's past, to the emergence of
the modern genetic code. The central role played by the AARSs in
translation would suggest that their histories and that of the genetic
code are somehow intertwined. This then raises the question of whether
the AARSs in their evolution have contributed to the code's present
structure; put another way, are the codon assignments simply
reflections of AARS evolutionary wanderings? It is important that
conjectures of this sort be examined in detail
and in a genomic era
this can be done.
In an evolutionary sense, the most striking thing about the synthetases is the existence of the two distinct classes (3, 10, 14, 20). Common characteristic domain structures and sequence homologies define each class, but the two have nothing in common except the biochemistry of the reactions they catalyze (22): between the two classes, proteins show no structural resemblance, have almost no common motifs (see reference 49 for a possible exception), encounter the tRNA from different angles, and acylate the amino acid to different hydroxyl groups of the terminal ribose of the tRNA (57). This has been widely assumed to suggest that the tRNA-charging function evolved at least twice. In the origin of these two classes of tRNA-charging enzymes lies a clue to one of biology's deepest mysteries (45). Perhaps the two reflect a dichotomous origin of translation itself, in some sort of fusion between two different primitive processes, each associated with its own set of amino acids. Perhaps the two classes are the surviving traces of an ancient evolutionary battle between emerging tRNA-charging mechanisms as biology evolved beyond the RNA world. In any case, the existence of unrelated tRNA-charging systems must be considered a most telling evolutionary relic (45).
The aminoacyl-tRNA synthetases are distributed between the classes according to specific rules. Each class encompasses 10 of the amino acids, and all examples of a given amino acid's synthetase are of the same class, the so-called "class rule." Within a class, all synthetases associated with a given amino acid are specifically related to one another to the exclusion of the AARSs associated with any other amino acids, the "monophyly rule." A third, class-independent generalization is that for each organism, all tRNAs assigned to a given amino acid (so-called isoacceptors [50]) can be charged by a single synthetase, a rule that holds even for amino acids such as serine with two distinct sets of codons, UCN and AGY (reviewed in reference 39). Except possibly for the last, these rules have exceptions. The class rule and hence the monophyly rule are violated by lysine; in some organisms its synthetase is class I, while in others it is class II (32, 34, 35). Four more exceptions to the monophyly rule but not the class rule exist, involving glycine, serine, glutamic acid, and aspartic acid. For glycine and serine, each amino acid is associated with two synthetases (for both amino acids they are class II [37, 44]). However, the two enzymes in each case are not specifically related to one another, as the monophyly rule demands. This is most obvious for glycine, where the overall structures of the two enzymes are completely dissimilar, with one being a homodimer and the other being a heterotetramer. For the glutamyl- and aspartyl-tRNA synthetases, the violation of the monophyly rule is of a different nature. In each of these cases, all the synthetases associated with a particular amino acid constitute a related group. However, the synthetase for the amidated form of the amino acid (i.e., glutamine or asparagine) arises from within the same group, which then renders the parent grouping paraphyletic (4, 38, 46, 52, 53). Mention should also be made here of the charging system for cysteine, which breaks the class and monophyly rules in another way. In at least two organisms, the methanogens Methanococcus jannaschii and Methanobacterium thermoautotrophicum, neither a class I nor a class II cysteinyl-tRNA synthetase can be found in the genome, and the exact mechanism of Cys-tRNA formation (direct or indirect) has long remained a mystery. These exceptions to the class and monophyly rules do not rob the rules of their potential evolutionary significance. Erosion of the historical trace is the hallmark of evolution. The exceptions merely restrict what kinds of explanations can be given the rules.
The common general structure and sequence motifs shared by all members of a given synthetase class demand common ancestry and Darwinian descent. The later stages of this descent are captured in the sequence similarities among existing synthetases; importantly, their branching patterns recall structural similarities among the amino acids and patterns in the genetic code. To give examples: the ValRs and IleRS (class I) are impressively similar in sequence; this is not simply a matter of a sequence motif here and there (7). These sequences in turn are somewhat less similar to those of the LeuRS; the MetRSs then join the group at a still lower level of similarity (45). All four corresponding amino acids are nonpolar and aliphatic, and their codons all conform to the general composition NUN. Similarly, the class II enzymes for serine, threonine, proline, histidine, and glycine group phylogenetically and structurally (43). (Only one of the two unrelated forms of the GlyRS shows this specific relatedness, however [17].) The amino acids serine and threonine are obviously related structurally, and both are capable of forming an internal hydrogen-bonded five-membered ring structure that mimics the ring structure of the imino acid proline. However, histidine and glycine appear structurally unrelated to these three (and to one another). In their codons, the first three amino acids are also related; all conform to the general composition NCN, but the codons for histidine (CAY) and glycine (GGN) are not related to the others except, of course, to the CCY codons of proline in the former case.
The third major AARS grouping involves the class II synthetases for lysine, aspartic acid, and asparagine (14, 17, 52), all of which are closely related in structure (4). The amino acids aspartic acid and asparagine are obviously related, but lysine stands apart. In their codons, the three exhibit an overlapping kind of relatedness, with the Asn codons (AAY) being close to both their Asp (GAY) and Lys (AAR) counterparts whereas the last two sets are not closely related.
The close evolutionary relationship between the (class I) synthetases for glutamic acid and glutamine (mentioned above; see also reference 46) is mirrored in the obvious structural relationship between the corresponding amino acids and the relationship between their codons (GAR-Glu versus CAR-Gln). Finally, a pronounced similarity exists between the synthetases for the two aromatic amino acids tyrosine and tryptophan (6, 19), but their codons (UAY and UGG, respectively) are not closely related.
Although the existence of correlations between the genetic code and the evolutionary patterns of the AARSs is clear, their significance is not. Does the fact that the valine, isoleucine, leucine, and methionine enzymes came from a common ancestor mean that this ancestor itself could not distinguish among these amino acids or that the ancestor was able to specifically charge four separate aminoacyl-tRNAs? That seems absurd in the context of modern translation (45). A more acceptable explanation would seem to be that the AARS relationships reflect evolutionary replacement of one tRNA-charging enzyme or acylation system by another. Indeed, what we see here may be only the latest in a series of such evolutionary replacements, a series that traces far back into the code's past and an evolutionary process that still goes on today in a less radical form, involving replacements within the confines of a given amino acid type (see below).
ORDER IN THE GENETIC CODE
|
|
|---|
The significance of AARS evolution vis-à-vis that of the
genetic code cannot be properly assessed without some appreciation of
the nature and extent of the code's order. Within the last decade,
significant strides have been made in this area. The so-called synonym
order in the code, i.e., the degeneracy in codon assignment, which
manifests itself almost exclusively in the third codon position, has
never been in doubt, except as regards what caused it in the first
place. However, such is not the case for the ordering that pertains to
related amino acids. Although most biologists accept the existence of
such an order, they have disagreed about its exact form, its extent,
and its cause. Some have argued that the related amino acid order
evolved to ameliorate the phenotypic consequences of mutations, an
evolutionary scenario that would produce both synonym and related amino
acid orderings (59). An alternative but conceptually related
explanation is that the assignments have somehow been adjusted to
minimize the consequences of errors in a primitive translation
mechanism that was highly inaccurate (66). Seemingly, both
error minimization models would lead to a very similar type of order in
the code. However, a computer simulation study (26) showed
that the assumptions of the first model are unlikely to lead to a
synonym order in the code that is almost entirely confined to a single
codon position
a type of order that is consistent with, if not
predicted by, the second model (which also suggests the third codon
position should be the degenerate one [66]). It has
also been proposed that the form of the code was predetermined, at
least in part, by specific interactions between amino acids and nucleic
acids (reference 76 and references therein).
Perhaps the main difficulty in comprehending the code's related amino acid ordering is that amino acid relatedness is context dependent; amino acids that appear similar in one context can be unrelated in another. The amino acid replacement spectra of proteins prove the point: the replacement pattern can differ from position to position in a protein sequence for any amino acid. However, nobody knows what property or properties of the amino acids the code actually reflects.
One important advance in this area was the definition of an amino acid
property called the polar requirement, which is a number derived from
the paper chromatographic mobility of an amino acid in pyridine-water
mixtures of various ratios (71). Simply plotting these
numbers on a codon table (Table 1)
reveals the existence of a remarkable degree of order, much of which
would be unexpected on the basis of amino acid properties as normally
understood. For example, codons of the form NUN define a set of five
amino acids, all of which have very similar polar requirements.
Likewise, the set of amino acids defined by the NCN codons all have
nearly the same unique polar requirement. The codon couplets CAY-CAR, AAY-AAR, and GAY-GAR each define a pair of amino acids
(histidine-glutamine, asparagine-lysine, and aspartic acid-glutamic
acid, respectively) that has a unique polar requirement. Only for the
last of these (aspartic and glutamic acids), however, would the two
amino acids be judged highly similar by more conventional criteria.
Perhaps the most remarkable thing about polar requirement is that
although it is only a unidimensional characterization of the amino
acids, it still seems to capture the essence of the way in which amino acids, all of which are capable of reacting in varied ways with their
surroundings, are related in the context of the genetic code. Also of
note is the fact that the context in which polar requirement is
defined, i.e., the interaction of amino acids with heterocyclic
aromatic compounds in an aqueous environment, is more suggestive of a
similarity in the way amino acids might interact with nucleic acids
than of any similarity in the way they would behave in a proteinaceous
environment (70).
|
More recently, computer simulation studies have been used to try to assess the merit of polar requirement as an indicator of the code's related amino acid order is compared to other amino acid properties, how well ordered the code actually is, and the nature of the code's order. An appealingly straightforward approach to the problem was explored by Hurst and his colleagues (23, 28). In summary, they compared the natural code to a series of synthetic codes generated by randomly reassigning the 20 amino acids to the set of synonym codon categories that are defined by the natural code. Each code is then measured for how conservative it is with regard to a given amino acid property under "mutation"; i.e., each codon in a given code is compared to all other codons that are 1 base change removed from it, the numerical difference in that property between the amino acids corresponding to the original and the "mutated" codon is measured, and the squared differences are summed over the code as a whole or over each of the three codon positions individually. For all amino acid properties tested except one, the natural code was not notably superior to the random codes. That exception, polar requirement, revealed a natural code superior to all but 0.01% of the random codes (28). A subsequent, more refined simulation of this sort, which took transition-transversion ratios into account, showed the natural code was "one in a million" (23). There can be no doubt that when viewed in terms of amino acid polar requirements, the genetic code is a highly structured array. It would also seem that it has somehow been optimized to reduce the consequences of translational errors. However, the evolutionary dynamic that shaped the code remains a mystery.
While it must be admitted that the evolutionary relationships among the AARSs bear some resemblance to the related amino acid order of the code, it seems unlikely that they are responsible for that order (45): the evolutionary wanderings of these enzymes alone simply could not produce a code so highly ordered, in both degree and kind, as we now know the genetic code to be. These enzymes could at best be the agents through which other constraints acted to shape the code. However, even in such a capacity they would not be alone: the tRNAs offer a simple and facile alternative mechanism for changing codon assignments (65). It would seem, therefore, that the evolutionary patterns among the aminoacyl-tRNA synthetases do not imply a role for these enzymes in structuring the genetic code (45). The resemblance between their evolutionary patterns and the patterns seen in the code are a loose convergence, forced by the fact that both evolutions independently reflect somewhat similar properties of the amino acids. The evolutionary patterns in the AARSs do seem to represent evolutionary replacements that occurred against the background of an already established, or otherwise fashioned, code (45).
AMINOACYL-TRNA SYNTHETASE EVOLUTION: RECURRING
GENE TRANSFER
|
|
|---|
If the AARSs do not reveal the code's evolution, what do their evolutionary relationships tell us? The answer is clear. Aminoacyl-tRNA synthetase evolution is a superb indicator of the evolutionary dynamic in general.
It should be noted that the AARSs are unique among components of the
translation system in their evolutionary behavior. Starting with the
rRNAs and continuing through the ribosomal proteins and the translation
initiation and elongation factors runs one dominant evolutionary
theme
molecules tend to show the same evolutionary history; i.e.,
their molecular phylogenies are consistent with the accepted overall
organismal phylogeny. At the highest level, they tend to yield what we
herein call the canonical phylogenetic pattern, which is basically a
division of all life into the three primary groupings
Bacteria, Archaea, and eukaryotes, with the closest relationship being between the Archaea and
eukaryotes (73) (see below). The evolutionary picture
painted by the synthetases, however, is a world apart from this
canonical pattern. Not only do the phylogenies fail to yield the
canonical pattern in a number of cases, but also they typically violate
the accepted taxonomic structure within the organismal domains.
Furthermore, the molecular phylogenies inferred from the synthetases of
different amino acid types tend not to agree with one another
but this
is the telling point.
Why should the synthetases show such atypical and disparate evolutionary pictures? The answer again is clear. The AARSs are in essence modular components of the cell; they function in isolation from the rest of the translation apparatus and from the rest of the cell, except for their individual contacts in each case with a small subset of the tRNAs (58). Because of this and because of their universality, the AARSs can function in a wide spectrum of cellular environments, often without disadvantage to the host. In other words, the AARSs are ideal candidates for widespread horizontal gene transfer, and the evidence certainly indicates this, since quite a few examples are known in which two different AARSs for the same amino acid coexist in the same organism. Versions of a given enzyme characteristic of the Archaea can be seen scattered among the bacterial taxa (see below). Versions characteristic of the eukaryotes have been seen in the Bacteria or in the Archaea. Within the Bacteria alone, the different bacterial subtypes of a given enzyme intermix among and within the taxa. There is no set pattern to all this; there is merely evidence consistent with frequent, widespread, indiscriminate horizontal gene transfer.
As suggested above, it is tempting to view the evolution of aminoacyl-tRNA synthesis as a study in horizontal gene transfer from top to bottom: at the deepest level, horizontal replacements involving the ancestors of the two synthetase classes, then replacements that gave rise to the phylogenetic structure within each class, and, finally, the replacements involving the different (modern) synthetases that use the same amino acid.
EVOLUTIONARY PROFILES OF THE INDIVIDUAL
AMINOACYL-TRNA SYNTHETASES
|
|
|---|
We now examine in some detail the evolutionary profiles for each of the 20 aminoacyl-tRNA synthetases, with the principal objective of determining the extent to which each conforms to the canonical phylogenetic pattern (defined below) and asking what, if anything, the exceptions to canonical pattern tell us about these enzymes and about stages in the evolution of the cell.
The organisms mentioned in the figures and tables are listed in Table
2.
|
The analysis presented is a synthesis of four approaches: (i) conventional phylogenetic trees (see Fig. 2 caption); (ii) visual inspection of alignments to reveal qualitative differences not apparent from the other analyses; (iii) dipeptide similarity matrices (see Table 3 footnotes); and (iv) signature analysis. Signatures are defined in terms of positions in the alignment wherein at least 80% of the members of a given group show a constant composition but one that is found elsewhere in the alignment no more that once within some larger phylogenetic taxonomic context. For example, spirochete signatures would usually be relative to all other bacterial groups but not relative to the more distantly related archaeal and eukaryotic versions of the enzyme.
Because the canonical evolutionary pattern is central to our thesis and
because the tRNA-charging enzymes exhibit different partial forms of
that pattern, it is necessary to begin by explaining clearly what we
mean by the phrase. In its essence (which we will call the basal
canonical pattern), the canonical pattern is defined by the
relationship between the bacterial and archaeal versions of a given
molecule. For the basal canonical pattern to hold, regardless of how
many subtypes of a given protein exist, it must be possible to
distinguish strongly between characteristic bacterial and archaeal
versions of the molecule. This distinction should be a pronounced
quantitative one (on the level of sequence similarities) and/or a
qualitative one (evident in terms of gross areas in a sequence
alignment wherein homology between the two is only weakly evident or
nonexistent). In other words, for these two organismal domains, the
interdomain differences between the characteristic archaeal and
bacterial proteins must far outweigh any intradomain differences: the
two must appear to differ in genre. For the full canonical pattern to
hold, there must also then exist a characteristic eukaryotic version(s)
of the molecule that is distinguishable from both the archaeal and the
bacterial versions but which is clearly of the archaeal genre. Tables
3 and 4 are
representative dipeptide similarity matrices for two aminoacyl-tRNA
synthetases typical of those showing canonical pattern (PheRS and
TyrRS), while Tables 5 and
6 are matrices for enzymes (SerRS and
CysRS) that do not show canonical pattern.
|
|
|
|
The aminoacyl-tRNA synthetases are considered individually below in an order defined by their corresponding codons. We have not included most of the mitochondrial data in the analysis, because doing so would add nothing to our conclusions and would needlessly complicate an already complex picture (27, 29).
Synthetases for the NUN-Encoded Amino Acids
Phe; UUY; class II; tetramer of
- and
-subunits.
PheRS is the only class II synthetase in the NUN codon group, and
it has no close relatives within that class. Not surprisingly, both the
- and
-subunits present the same evolutionary picture; their
sequences are combined to produce Fig. 2.
PheRS shows the classical full canonical pattern, the only exception
being the spirochete PheRSs, which are of the archaeal, not the
bacterial genre, and which seem to be specifically related to the
Pyrococcus PheRS within that grouping, as sequence signature
analysis suggests and Fig. 2 confirms.
|
- and
-subunits of PheRS, significant length
differences distinguish the bacterial subunits from their archaeal counterpart. The bacterial
-subunit is about 120 amino acids shorter
than the archaeal/eukaryotic
-subunit at its N terminus, and the
first 90 amino acids of the bacterial sequence show little or no
similarity to the archaeal/eukaryotic counterpart. However, for the
-subunit, the bacterial version is the longer, by approximately 250 amino acids. At both termini the bacterial version of the
-subunit
extends beyond the archaeal/eukaryotic version by about 100 amino
acids; in the N-terminal ~50 amino acids, the archaeal version of the
-subunit shows no recognizable similarity to its bacterial
counterpart. In addition, large sequence gaps distinguish the two
genres in the interior of the
-subunit.
Leu; UUR and CUN; class I; monomer. LeuRS conforms to the full canonical pattern as well, in this case without exception. A striking lack of similarity in various regions of the molecule distinguishes the bacterial and archaeal genres of LeuRS, and a number of sizable insertion and deletion differences distinguish the two genres throughout the alignment. A nearly total lack of sequence similarity between the two is seen in the C-terminal (KMSK) section of the molecule.
Within the Bacteria, however, the accepted phylogenetic relationships are not all preserved
at least two distinct bacterial subtypes of the molecule exist and have obviously migrated
horizontally. The best-defined bacterial subtype (by all methods of
analysis) is that common to the majority of gram-positive bacteria (and relatives), the spirochetes, chlamydias, and the
Cytophaga-Chlorobium grouping (represented by
Chlorobium tepidum and Porphyromonas gingivalis).
However, this grouping fails to include Clostridium acetobutylicum, a gram-positive species whose LeuRS groups with that of Deinococcus in Fig. 3
(a relationship supported by sequence signature). On the other hand,
the proteobacteria (Escherichia coli and relatives) do form
a grouping quite consistent with their established phylogeny (Fig. 3).
|
Ile; AUH; class I; monomer.
IleRS also shows the full
canonical pattern. As with LeuRS, this fact is obvious upon visual
inspection of the alignment, especially its C-terminal section, wherein
the bacterial and archaeal genres exhibit very little sequence
similarity and show major alignment gaps relative to one another.
However, as all methods of analysis clearly show, a sizable minority of
bacterial taxa possess an IleRS of the archaeal rather than the
bacterial genre (Fig. 4). All of these
bacterial examples are specifically related to their eukaryotic
counterparts, with the closest relationship being between the
eukaryotes and a bacterial subgroup comprising the spirochetes,
chlamydias, Mycobacterium, and Rickettsia. Note in Fig. 4 the specific relationship between the IleRS of
Mycobacterium and that of Rickettsia, which is
strongly suggested by sequence signature as well. Also note the
relationship between the C. acetobutylicum IleRS and the
plasmid-borne IleRS found in mupirocin-resistant strains of
Staphylococcus aureus; this relationship is also supported by sequence signature.
|
Met; AUG; class I; homodimer.
Methionine presents one
of the more complex evolutionary profiles among the aminoacyl-tRNA
synthetases. The enzyme marginally shows the canonical picture: the
majority of bacterial examples
the group represented by
Helicobacter in Fig. 5
define
a bacterial genre, while the archaea, eukaryotes and a number of
bacterial MetRSs constitute the archaeal genre (Fig. 5). However, there is another bacterial grouping, confined to the
and
proteobacteria, which is of the archaeal genre (Fig. 5). The difference
between the bacterial and archaeal genres of MetRS is not as extreme as that seen for the other members of the NUN codon group. However, one
large alignment gap (~25 amino acids) separates the bacterial genre
from all others (the latter appear to contain a metal binding region at
this point, the consensus sequence of which is
CP . C . . . . . a . gD . C . . C . . . . . . . . . . L
(where lowercase signifies its presence in only four of the five
groupings involved). A strong signature distinguishes the bacterial
genre from the others, and its distinctiveness is also evident in a
dipeptide similarity matrix.
|
proteobacterial representatives.)
The C-terminal domain of MetRS, about 150 amino acids in length, can
take one of three forms: (i) it can be covalently linked to the rest of
the molecule, as in most bacteria and most archaea; (ii) it can be
completely missing, as in a number of bacteria, e.g., cyanobacteria and
mycoplasmas; or (iii) it can be present but not covalently linked to
the rest of the molecule, as in all eukaryotes (except
Caenorhabditis elegans, where it is covalently linked), in
Aquifex, and in the Crenarchaeota. In eukaryotes, this
separate protein, known as Arc1p, occurs as a part of some higher-order
complexes involving eukaryotic synthetases, wherein it is involved in
amino acid recognition (54). The C termini of these proteins
extend about 60 amino acids beyond the normal C-terminus of MetRS. It
is interesting that the spirochete MetRSs (in which the C terminus is a
covalently linked part of the molecule) also extend beyond the normal C
terminus of MetRS, and in this extension, they show homology to
sequences in the Arc1p family, providing further support for a specific
relationship between the spirochete and eukaryote enzymes (Fig. 5).
Arc1p-like domains can be seen in a few other aminoacyl-tRNA
synthetases as well. Approximately 100 residues of the N terminus of
the
-subunit of bacterial PheRS is homologous to a portion of Arc1p.
Mammalian TyrRS (only) has appended to its C terminus a more extensive
homolog, which is impressively similar to the MetRS extensions just
discussed. Also, it has been demonstrated in the mammalian TyrRS case
that the extension functions not as Arc1p (i.e., in amino acid
recognition) but as a cytokine (64).
Val; GUN; class I; monomer.
The valine-charging enzyme
conforms only to the basal canonical pattern; the eukaryotic ValRSs are
not archaeal in nature but obviously bacterial (Fig.
6). Also, within the bacterial group a
37-amino-acid insertion in the alignment found only in the eukaryotes and
,
, and
proteobacteria suggests a specific relationship among them. The distinction between the archaeal and bacterial genres
of ValRS is again a strong one and is manifested most strongly in the
C-terminal (KMSK) portion of the molecule. The rickettsial ValRS, alone
among the bacterial examples, is of the archaeal genre, seemingly
specifically related therein to the ValRS of the crenarchaeon
Pyrobaculum aerophilum; this relationship is supported by
sequence signature.
|
Synthetases for the NCN-Encoded Amino Acids
Serine, threonine, and proline have related structures, codons, and aminoacyl-tRNA synthetases; in this last respect, the group also encompasses histidine and glycine. (However, as mentioned above, only one of the two unrelated GlyRS forms shows the relationship.)
Ser; UCN and AGY; class II; homodimer.
The seryl-tRNA
synthetase is of particular interest for two reasons: (i) it
clearly fails to conform to the canonical pattern, and (ii) there are
two distinct serine-charging enzymes, a very rare form that has been
found so far only in M. thermoautotrophicum and the two
Methanococcus species examined and a major form that has
been found in all other organisms. Although both the major and minor
forms of SerRS belong to the above-mentioned Ser-Thr-Pro supercluster,
it is unclear whether the two are specifically related to one another
therein. (The minor form is not included in Fig. 7.)
|
all clearly related by signature sequence
seem to be
mitochondrial, their relationship to spirochetes rather than
proteobacteria becomes of interest.
Pro; CCN; class II; homodimer. ProRS exhibits the full canonical pattern but again with exceptions. The bacterial genre is distinguished from the archaeal by having an insertion of about 180 amino acid residues not seen in the latter at approximate (E. coli) position 190, while the latter extends at the C terminus of the molecule for about 70 residues beyond the former. Dipeptide similarities between the two genre are remarkably low.
The ProRSs of a few bacterial taxa, i.e., the mycoplasmas, Deinococcus, Chlorobium, Porphyromonas, and Borrelia (but not Treponema), are of the archaeal genre, and the eukaryotic enzymes (with the exception of that from Giardia) are included in this phylogenetic grouping; sequence signature analysis shows a sister relationship therein to the genera Borrelia, Chlorobium, and Porphyromonas, which the Fig. 8 tree confirms.
|
Thr; ACN; class II; homodimer.
Like its valyl
counterpart, ThrRS exhibits only the basal canonical pattern, with the
eukaryotic versions of the enzyme being bacterial rather than archaeal
in nature. The bacterial and archaeal genres are readily distinguished
by sizable additions/deletions in the N-terminal ~250 amino acids or
so of the alignment, and evidence of similarity between the two in this
portion of the molecule is minimal (Fig.
9). Two of the three available
crenarchaeal ThrRSs add a further complication to the picture (see
below).
|
Ala; GCN; class II; homotetramer. Although a class II enzyme, AlaRS is not a member of the supercluster that contains the other NCN-associated synthetases. The archaeal and bacterial forms of the enzyme are clearly distinguished by dipeptide similarities, sequence signature, and a few small but significant insertions and deletions in the alignment; the N terminus of the archaeal form also begins some 50 amino acids before the bacterial one does (Fig. 10).
Although the canonical pattern holds for the AlaRS, it is only the basal canonical pattern, since the eukaryotic AlaRSs (except for that of Giardia) cluster with the bacterial AlaRSs; and within that grouping they appear to be specifically related to the Chlorobium-Porphyromonas cluster (Fig. 10), a relationship that is supported by sequence signature. The Giardia AlaRS, however, is of the archaeal genre. This is confirmed by a strong sequence signature, which is also consistent with Giardia's position in Fig. 10 as an outgroup to the archaeal clade. The spirochete AlaRSs, although clearly of the bacterial genre, are highly derived. They both show two characteristic large deletions, one interior and the other C-terminal.
|
Synthetases for the NAN-Encoded Amino Acids
Tyr; UAY; class I; homodimer.
The TyrRS makes a strong
canonical distinction (Fig. 11 and
Table 4). In the C-terminal (KMSK) section of the molecule there is
very little similarity between the TyrRSs of the bacterial and archaeal
genres, and a number of insertion-deletion differences distinguish the
two throughout the molecule as well.
|
proteobacteria, E. coli, Salmonella, and Yersinia exhibit the first type
while Haemophilus, Actinobacillus, and
Vibrio exhibit the second. Among the
proteobacteria, Neisseria exhibits the first type while
Bordetella and Thiobacillus exhibit the second.
Porphyromonas and its relative Chlorobium are
phylogenetically split in this way too. B. subtilis and
C. acetobutylicum each contain TyrRSs of both subtypes.
Within the archaeal genre, the eukaryotic and archaeal TyrRSs are
intermixed. The euryarchaeal enzymes (except for those of the
pyrococci) cluster specifically with the animal and fungal TyrRSs,
while the three crenarchaeal TyrRSs (and those of the pyrococci) group
with the two plant examples (Arabidopsis and tobacco).
Sequence signatures strongly support this entire phylogenetic arrangement.
His; CAY; class II; homodimer.
HisRS also shows the
full canonical pattern. However, as signature analysis indicates and
Fig. 12 confirms, a small group of bacterial taxa
spirochetes, Helicobacter, C. acetobutylicum, Caulobacter, and
Porphyromonas
have HisRSs of the archaeal genre. This
bacterial grouping in turn encompasses the eukaryotic HisRSs, which
shows a specific relationship to Porphyromonas HisRS
therein, a relationship supported by sequence signature.
|
Gln; CAR; class I; monomer.
It has been convincingly
demonstrated that GlnRS stems specifically from the eukaryotic lineage
of GluRSs (53). Not only is this evident at the sequence
level, but also it has been demonstrated in terms of the overall
structure of the molecule (46). In its N-terminal (HIGH)
region, the GlnRS sequence is decidedly more similar to eukaryotic than
to archaeal GluRSs (and least similar of all to bacterial GluRS). In
the C-terminal (KMSK) region, the similarities of GlnRS to the
eukaryotic and archaeal versions of GluRS are roughly comparable but
sequence similarity to the bacterial GluRS is effectively nonexistent
(Fig. 13).
|
and
subdivisions of the
Proteobacteria, the Deinococcus-Thermus division,
and Porphyromonas. In other words, among bacteria known not
to contain GlnRS are representatives of the
and
subdivisions of
the Proteobacteria, the gram-positive bacteria, the
cyanobacteria, the spirochetes, the chlamydias, and the genera
Aquifex and Thermotoga. The only specific
phylogenetic relationship apparent among the bacterial versions of the
GlnRS is the proteobacterial grouping, but the proteobacterial
representatives are not strongly distinguished from the other bacterial
GlnRSs. Indeed, one of the
proteobacteria, Bordetella
pertussis, has a GlnRS that appears specifically related to that
found in Porphyromonas, a relationship reinforced by a sequence signature.
Asn; AAY; class II; homodimer.
Although they represent
different synthetase classes (II and I respectively), in their
evolutions the AsnRS and GlnRS families have much in common (Fig.
14). Both arose from within the cluster of the synthetases for their corresponding diacid. For glutamine, it
was from the eukaryotic lineage per se that the enzyme arose, while for
asparagine, the origin is localized only to the archaeal genre of AspRS
in general. In both instances it is in the C-terminal portion of the
molecule that the origin of the synthetase for the amidated amino acid
is most strikingly seen. As is the case for GluRSs (see below), BLAST
sequence similarity searches based upon the C-terminal 40% of the
archaeal and eukaryotic AspRS have much higher scores with one another
and with the AsnRSs. The root of the AsnRS tree itself separates the
eukaryotic AsnRSs from their counterparts (see Fig. 16). The root of a
combined phylogenetic tree for the AsnRS and AspRS enzymes (rooted
by LysRS) occurs between the bacterial AspRSs and the grouping of
archaeal and eukaryotic AspRSs with the AsnRSs.
|
Lys; AAR; class I (monomer) and class II (homodimer).
LysRS represents the only known violation of the class rule: a
class II LysRS is found in eukaryotes, most bacteria, and a few archaea
(i.e., Sulfolobus and Pyrobaculum) (Fig. 15).
However, a class I LysRS is found in the euryarchaeotes, two other
members of the Crenarchaeota (Cenarchaeum and
Aeropyrum), and a scattering of bacteria (34).
The class II LysRSs clearly had a common ancestor with the AspRSs and
AsnRSs in the deep past, but the class I enzyme stands essentially
alone phylogenetically within its
class.
|
proteobacteria, except for Rhizobium meliloti, whose LysRS is class II; however, Aeropyrum, also a crenarchaeon,
is not a member of this group. The other known bacterial examples, the
spirochete and Streptomyces LysRSs, as a group show a
specific relationship to the Pyrococcus enzyme, while the
remaining (eury)archaeal examples appear in an outgroup relationship to
all those just discussed (Fig. 15).
Asp; GAY; class II; homodimer.
The AspRSs strongly
exhibit the full canonical pattern: a single bacterial type exists,
which differs dramatically from the AspRSs of the archaeal genre. In
the interior of the AspRS sequence alignment, a stretch of about 220 amino acids in the bacterial genre (starting at ca. position 250 in the
E. coli sequence) shows almost no similarity to the
corresponding (~100-amino-acid) section in the archaeal genre.
Sequence similarity resumes thereafter at ca. bacterial position 470 and continues to the C terminus of the molecule, slightly more than 100 amino acids distant (Fig. 16). Because
the AsnRS has arisen from within the grouping of the AspRSs (see
above), the latter must be considered paraphyletic, which breaks the
monophyly rule.
|
proteobacteria, while the Porphyromonas enzyme shows no
clear specific relationships to any other bacterial AspRS.
Glu; GAR; class I; monomer.
Again, the full canonical
pattern is strongly evident; the difference between the bacterial genre
and its archaeal counterpart is striking. Not only are the bacterial
examples about 100 amino acids shorter than the archaeal sequences at
the N terminus, but also in the C-terminal (KMSK) section of the
molecule the difference between them is extreme: the two show no
resemblance, in either sequence or overall structure (46).
(BLAST searches based on the archaeal and eukaryotic examples of this
region readily detect one another and also the comparable region of all
GlnRSs but never detect their bacterial counterparts.) Because the
GlnRS has arisen from within the GluRS cluster, the latter breaks the
monophyly rule. The bacteria show at least two subtypes of GluRS, which are specifically related to one another (to the exclusion of the GluRSs
of the archaeal genre), and a number of bacterial species contain two
GluRSs as well (24), all of which makes for a somewhat confusing phylogenetic picture (Fig. 17). It is worth noting that a
rather clear grouping emerges that includes the spirochetes, the
Cytophaga-Chlorobium group, the
Deinococcus-Thermus division, the chlamydias, and two
proteobacteria, i.e., Pseudomonas (
division) and
Rhizobium (
division).
|
Synthetases for the NGN-Encoded Amino Acids
Cys; UGY; class I; monomer. The mechanism of Cys-tRNA formation in M. jannaschii and M. thermoautotrophicum has until now been a mystery. Nothing identifiable as a CysRS was seen in their (complete) genomes. However, a normal functioning CysRS has been identified in Methanococcus maripaludis, a close relative of M. jannaschii (30, 40). Did a third, unrecognized synthetase class exist in these cases, or could the cysteine tRNA be charged indirectly, as in the case of selenocysteinyl-tRNA (11, 33)? The possibility that the highly aberrant SerRS found in M. jannaschii and M. thermoautotrophicum is somehow related to the lack of recognizable CysRS in these organisms was considered (37), the rationale being that such a SerRS might form Ser-tRNACys, which would be a key intermediate in Cys-tRNA formation by a tRNA-mediated amino acid transformation pathway (33). However, in vitro data did not support this view (37). Instead, biochemical and genetic approaches have now revealed that in M. jannaschii and M. thermoautotrophicum, ProRS is able to specifically synthesize both Cys-tRNACys and Pro-tRNAPro (60). This unprecedented dual functionality in an AARS is not reflected in any distinguishing features of these ProRSs at the sequence level. Interpretation of the evolutionary significance of this unexpected versatility among AARSs must now await more detailed biochemical description of its phylogenetic distribution.
As can be inferred from Table 6, the CysRSs do not exhibit the canonical pattern. There is also considerable evidence of interdomain horizontal gene transfer, particularly involving the archaeal CysRSs: In Fig. 18, four of the archaeal CysRSs do cluster. However, the M. maripaludis enzyme (see above) is disturbingly similar to that from Pyrococcus, with pair showing 65% sequence identity (40). Three other archaeal CysRSs, from Methanosarcina, Archaeoglobus, and Cenarchaeum, group among the bacterial examples of the enzyme but show no phylogenetic relationship to one another therein. By contrast, the relationships among the bacterial CysRSs in Fig. 18 are not particularly out of kilter with established bacterial taxonomy, which might suggest that the horizontal gene transfers have been mainly from the Bacteria to the Archaea.
|
Trp; UGG; class I; homodimer. TrpRS is an obvious relative of TyrRS (19), although, as mentioned above, their corresponding codons are not related. The tryptophan enzyme conforms to the full canonical pattern, which can be inferred from Fig. 19, dipeptide similarity matrices, and striking sequence signatures. The TrpRSs of the archaeal genre show a substantial N-terminal extension relative to those of the bacterial genre. Within the bacterial genre, a number of subtypes can be recognized, and two organisms possess two TrpRSs, each of a different bacterial subtype (Fig. 19). By signature analysis, five bacterial subtypes can be ident