P. Siguier,
and
M. Chandler**
Laboratoire de Microbiologie et Génétique Moléculaires (UMR5100 CNRS), Campus Université Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse Cedex, France
SUMMARY INTRODUCTION NOMENCLATURE IS DISTRIBUTION IN ARCHAEA COMPARED TO BACTERIA AND EUKARYA TRANSPOSITION IN THE ARCHAEA: HISTORICAL PERSPECTIVE Spontaneous Mutation in the Extreme Halophiles ISH1. ISH2. ISH3/ISH27/ISH51. ISH8/ISH26. ISH11. ISH23/ISH50. ISH24. ISH25. ISH28. Transposition in Sulfolobus Transposition in Other Archaea REGULATION OF TRANSPOSITION Lost in Transcription: ncRNAs in S. solfataricus Lost in Translation: Translational Readthrough in Methanosarcina? IS FAMILIES AND THE NATURE OF THE CATALYTIC SITE The DDE Enzymes The Serine Enzymes The Relaxase Enzymes IS FAMILIES IN THE ARCHAEAL GENOMES IS1 IS3 IS4 ISH8 subgroup. IS1634 subgroup. ISH3 subgroup. IS701 subgroup. IS5 IS903 subgroup. IS5 subgroup. IS1031 subgroup. IS427 subgroup. The halophilic subgroup ISH1. The Sulfolobus subgroup. IS5 orphans. IS6 IS21 IS30 IS110 IS110 subgroup. IS1111 subgroup. IS256 IS481 IS630 IS982 ISL3 Non-DDE Transposons: the IS91 Group Non-DDE Transposons: the IS200/IS605/IS607 Group IS200 subgroup. IS605-related elements. IS607-related elements. Single orfB elements. Phylogenetic distribution. EMERGING GROUPS, ORPHANS, WAIFS, AND STRAYS ISA1214-Related Elements ISL3-Related Elements ISM1 group. IS1595 group. IS66-Related Elements: the New Subgroup ISBst12 IS1182 ISH6 ISC1217 MITEs, MICs, AND SOLO IRs MITEs IS1. IS4. IS5. IS6. IS200/IS605. IS630. ISM1. ISC1217. Nonclassified MITEs. Solo IRs COMPOUND TRANSPOSONS, BITS, AND PIECES Compound Transposons Uncharacterizable IS-Like Sequences Concatenated ISs GENOME COMPARISONS: IS DISTRIBUTION, ABUNDANCE, AND GEOGRAPHICAL VARIATIONS Intergenome Distribution and Abundance Intragenome Distribution Large Genomic Rearrangements Geographical Variations EVOLUTIONARY HISTORY OF ISs IN ARCHAEA: A POSSIBLE SCENARIO CONCLUSIONS ADDENDUM IN PROOF ACKNOWLEDGMENTS REFERENCES
| SUMMARY |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Like those of the other two domains of life, the Bacteria and Eukarya, members of the prokaryotic Archaea can carry a large number and variety of transposable elements within their genomes. These are principally insertion sequences (ISs) and miniature inverted-repeat transposable elements (MITEs) (8), although at least one active composite transposon has been documented (92) and other similar structures have been identified (see "Compound transposons, bits, and pieces," below). ISs are short specific segments of DNA up to 2 kbp long. They carry one or two open reading frames (ORFs) encoding the enzyme that catalyzes their movement, the transposase (Tpase), generally (but not always) flanked by short terminal inverted repeats (IRs). IS insertion often results in the duplication of a short target sequence that flanks the insertion (direct repeat [DR]) (12). MITEs are nonautonomous ISs deleted for part or all of the Tpase ORF but retaining both ends, while composite transposons are structures in which a DNA segment is flanked by two copies of a given IS.
Little is known about the transposition behavior of the majority of these mobile genetic elements in archaea. This is certainly due to the limitation of genetic systems available for their analysis and to the extreme conditions (temperature, pressure, pH, and salinity) required for the growth of those archaea so far analyzed. Data from the available sequenced genomes suggests that, as among bacteria, the distribution of ISs is somewhat "haphazard," with certain species exhibiting very few or no IS copies while others carry many (see "Genome comparisons: IS distribution, abundance, and geographical variations," below). It is clear that the variety of archaeal ISs approximates that of bacteria rather than the limited types recognized at present in eukaryotes (8). However, apart from a survey compiled several years ago (8) before the availability of a significant number of archaeal genome sequences, no systematic and coherent comparison of archaeal and bacterial ISs is available. Since the transposition characteristics of a variety of bacterial ISs are known (14), such a comparison would provide a useful starting point for exploring transposition activity in archaea and the impact of mobile genetic elements on archaeal genome structure.
| NOMENCLATURE |
|---|
|
|
|---|
In the present review, we provide an updated survey of archaeal IS elements and include an analysis of their distribution and of their relationship to bacterial and eukaryotic ISs. Except for certain IS names already published (principally those of the halophiles and Sulfolobales), we adhere to the system of nomenclature used at present for ISs of Bacteria, namely, the first letter of the genus, in uppercase, and the first two letters of the species name, in lowercase (12; also see www-is.biotoul.fr). This is similar to the nomenclature system used for restriction enzymes. It renders more transparent the phylogenetic relationships between highly related ISs that differ simply in overall length. These designations have been included as the principal name in the ISfinder database (www-is.biotoul.fr). Any names previously used are also included in the database as synonyms to facilitate retrieval. We assign IS names only for those where we can identify the IS ends. In all other cases, we assume that the copies are only partial, and only the identification number of the corresponding transposase ORF is given.
At the time of writing, the public databases included the entire sequences of 28 archaeal genomes (23 euryarchaeotes and 5 crenarchaeotes). For operational simplicity, to avoid inundating the ISfinder database with specific names, we have adopted the use of "isoforms," as first suggested by Ohtsubo et al. (57). We (arbitrarily) define isoforms as being sequences that are 98% similar at the protein level and/or more than 95% similar at the DNA level. Moreover, we also point out those previously published ISs that were given different names according to length but that are effectively identical to, or are isoforms of, other ISs. We have not yet systematically addressed the extensive accumulating data from environmental sequencing projects, although certain ISs have been identified and included in ISfinder.
IS elements were identified by manual reiterative BLAST analysis using an E value cutoff of 103. Tpase alignments were performed with CLUSTALX and refined by eye. To infer phylogenetic relationships, we performed preliminary analyses to assess the different subgroups of large families by neighbor joining using MUST.3.0 (68). TribeMCL (23) was also applied to confirm the clustering of all ISs into the various families and subgroups. Sequences belonging to different subgroups of a single family were then treated separately by maximum likelihood, using PROML (Phylip, version 3.6 [26]) with the Jones-Taylor-Thornton amino acid substitution matrix.
| IS DISTRIBUTION IN ARCHAEA COMPARED TO BACTERIA AND EUKARYA |
|---|
|
|
|---|
|
|
The distribution of ISs in archaeal genomes is very "patchy" (Fig. 1). Four phyla, comprising the Halobacteriales, Sulfolobales, Methanosarcinales, and Thermoplasmatales, monopolize more than 90% of archaeal ISs (Table 1). No ISs were identified in the Nanoarchaeota, the Desulfurococcales, the Methanomicrobiales, the Thermoproteales, or the Methanobacteriales, and only one or two families in the Methanococcales or the Methanopyrales. However, these lineages are represented by only one or two completely sequenced genomes, and this limited information may introduce some bias, as was initially the case for bacterial Mycoplasma species (www-is.biotoul.fr).
It is worth noting that archaeal ISs resemble bacterial ISs rather than those identified in eukaryotes. No elements with significant similarity to the nine currently recognized eukaryote DNA transposon superfamilies could be identified. These include notably the mariner/Tc (distantly related to the IS630 family) and the P (from Drosophila) families, which are structurally close to bacterial ISs; elements such as the CACTA or the hAT (e.g., hobo, Ac, and Tam) families (mainly recovered in plants and insects), Merlin (related to IS1016), Mutator (distantly related to IS256 family members), PIF/Harbinger (distant relatives of some IS5 family members), piggyback, and Transbib (12, 70); or to the helitrons (40), a family related to bacterial IS91 and identified in plants, fungi, and diverse animals (14). Extensive BLAST searches seeded with such sequences revealed no detectable homologies in the archaeal genomes. This is perhaps surprising in view of the fact that Archaea have important similarities to Eukarya, notably enzymes involved in DNA replication (47). Since it seems unlikely that eukaryal "ISs" were originally present in these genomes and were subsequently specifically deleted, this implies that any lateral transfer of transposable elements occurred between Bacteria and Archaea but not between Archaea and Eukarya.
In the light of the important differences between bacterial and archaeal replication systems, it is interesting to note the presence of members of the IS1, IS3, and IS256 families within archaeal genomes. Bacterial members of these families are thought to transpose by a mechanism involving a replication step to eject a circular IS transposition copy from the donor site, which then serves as a transposition intermediate (78). In the case of the IS3 family member IS911, this process has been shown to depend on the DnaG primase (22). Interestingly, each archaeal genome usually contains two types of primase: a dimeric eukaryotic-like primase (44) and a DnaG-like enzyme that shares the Toprim domain with bacterial DnaG (2).
However, recent biochemical analyses have demonstrated that the DnaG-like primase in Archaea may be involved in RNA processing and degradation rather than in DNA metabolism (25). The presence of these ISs in Archaea therefore implies that the replication step may be taken in charge by the host (Eukarya-like) replication system.
| TRANSPOSITION IN THE ARCHAEA: HISTORICAL PERSPECTIVE |
|---|
|
|
|---|
The exceptional genome plasticity revealed by these studies was further reinforced by experiments establishing that strains of both H. salinarium and the related Halobacterium volcanii generally carry a large number of repeated elements. These were divided into several families by Southern hybridization. The elements appeared to be highly mobile, were associated with chromosome rearrangements, and were found both clustered and dispersed over the genome (79).
A collection of repeated sequences resembling bacterial ISs was subsequently assembled in H. salinarium with either gas vacuole or plasmid-carried purple membrane genes used as targets. Several of these have been isolated more than once and have received different names. Importantly, since the majority of these ISs were isolated as novel insertions, they therefore represent active copies.
ISH1. The 1,118-bp ISH1 was isolated as an insertion into the bacteriorhodopsin (bop) gene. Its sequence revealed imperfect terminal inverted repeats of 9 bp and flanking 8-bp direct target repeats. These features are characteristic signatures of IS elements in Bacteria. The element was named ISH1 (84). The single ORF predicts a protein of 270 amino acids (aa) with a clear DDE catalytic motif (see "IS families and the nature of the catalytic site," below), relating the Tpase to those of the majority of transposable elements presently identified. Further examination (12) placed ISH1 in the rather disperse IS5 family (see "IS families in the archaeal genomes," below). Many isolates of ISH1 appeared to have inserted into the same site (5'-AGTTATTG-3') of the bop gene but could do so in both orientations. This indicates relatively high target site specificity. Southern blot analysis revealed multiple ISH1 copies, ranging from one to more than five, in different halobacterial strains (84).
Moreover, analysis of one insertion mutant revealed a single additional ISH1-specific restriction fragment compared to its wild-type parent. This increase in copy number led to the supposition that ISH1 transposes by a replicative mechanism (84).
Evidence from Northern blots also showed that ISH1 was actively transcribed in these strains with a rough correlation between RNA band intensity and IS copy number. However, in view of the numerous regulatory mechanisms adopted by ISs to limit their activity (53), this does not necessarily mean that the Tpase is produced at comparative relative levels.
ISH2. Examination of additional bop mutants revealed several other repeated sequences distinguishable by size. The most frequently observed was ISH2, only 521 bp long and carrying 19-bp terminal inverted repeats flanked by target duplications of 10 or 20 bp (17) and occasionally 11 bp (64). Although three potential ORFs were detected (ORF I, 80 codons; ORF II, 64; ORF III, 59), we have been unable to identify a typical Tpase catalytic motif (see "IS families and the nature of the catalytic site," below). The majority of insertion mutations in the bop gene were caused by the elements ISH1 and ISH2. Unlike ISH1, ISH2 showed multiple insertion sites in the gene (17).
ISH2 was present in multiple copies in various H. salinarium strains, and, more recently, four additional copies were identified in the Halobacterium plasmid pNRC100 (54). The IS is clearly capable of transposition but is probably not an autonomous transposon. However, ISH2 shares nearly perfect terminal homology (but no internal homology) with an apparently complete IS, ISH26 (ISH8; see below). ISH2 transposition may therefore be driven in trans by the ISH26 Tpase.
ISH3/ISH27/ISH51.
Remarkably, 20% of H. salinarium PHH4 colonies were found to carry IS insertions into a resident pHH4 plasmid (16, 63). Among these, ISH27 was isolated as a major source of mutation. This group of ISs belongs to the IS4 family. They are 1,398 bp (ISH27-1) or 1,389 bp (ISH27-2 and ISH27-3) long and generate 5-bp target repeats (63) rather than the 3-bp repeats proposed for the identical ISH3 (16). They also include terminal IRs of 16 bp. Two ISH27-1-specific transcripts were observed in the pHH4 plasmid-carrying strain. One of these exhibited a size expected for a full ISH27 transcript (
1,200 nucleotides [nt]), while the other was significantly shorter (
650 nt). This could reflect regulation at the transcriptional or posttranscriptional level.
ISH27 is the generic name for three related ISs. Although closely related, these are not isoforms by our definition. At the nucleotide level, ISH27-1 is more similar to H. volcanii ISH51-1, ISH51-2, and ISH51-3 (88% DNA identity) than to ISH27-2 and ISH27-3 (80% identity). There are more than 20 copies of ISH51 in the H. volcanii genome (36). ISH27 was also observed to have undergone an amplification following storage of the host strain over a period of several years at 4°C (63). Further studies to determine the factors involved in this process would be interesting.
ISH8/ISH26. ISH8/ISH26 was isolated as an insertion mutation of the gvp operon (gas vesicle proteins, Vac) (31). ISH8, also a member of the IS4 family, is 1,402 bp long, carries 18-bp IRs, and generates 10-bp DRs. Its DNA sequence is 94% identical to that of ISH26. Copies of ISH8 were also found in the H. salinarium plasmid pNRC100.
A 70-kbp AT-rich island of H. salinarium was identified and proven to carry copies of ISH1, ISH2, and an IS-like sequence, ISH26, together with copies of an additional 10 repeated sequences, most of which were not characterized (62).
ISH26 was also isolated as an insertional inactivation of the bop gene. There are four ISH26 copies on pHH1 and four copies on the chromosome of H. salinarium PHH1 (65). ISH26 was described as harboring two overlapping ORFs. Although the first ORF has significant similarity with the putative Tpases of other IS4 family members (for example, 26% identity to IS231W over a 143-aa overlap), the second ORF has only very limited similarity, in the region of the conserved E residue (see "IS families and the nature of the catalytic site," below). Detailed analyses suggest, however, that the introduction of several frameshifts would significantly increase this similarity. The first ORF is very closely related to the N-terminal end of the Tpase of ISH8. Like ISH27, ISH26 copies constitute a group of related, but not identical, elements (63).
ISH11. ISH11, from H. salinarium, was observed as an insertion into plasmid pGRB1. It is 1,068 bp long, with 15-bp terminal IRs, and was flanked by 7-bp direct target repeats (43). It exhibits a single long ORF of 334 aa. ISH11 has been tentatively grouped within the IS427 cluster of the IS5 family. Two copies are present in pNRC100 of Halobacterium sp. strain NRC-1.
ISH23/ISH50. ISH23/ISH50 is one of the least-frequent causes of insertion mutations in the bop gene (64). There are two ISH23 copies in H. salinarium NRC817.
ISH23 is flanked by 29-bp imperfect IRs and by a 9-bp direct target repeat. It is very similar (but not identical) to ISH50, an IS isolated as an insertion into the Halobacterium plasmid pNRC (93). ISH50 is 996 bp long, with terminal IRs of 23/29 bp and 8-bp flanking direct target repeats. It encodes a potential 273-aa Tpase and belongs to a newly defined family containing both archaeal and bacterial members (L. Gagnevin and P. Siguier, unpublished data) (see "Emerging groups, orphans, waifs, and strays," below). The first and last 200 bp of ISH23 were found to be identical to those of ISH50 and, although ISH23 and ISH50 differ by at least two restriction sites and appear to generate either 9- or 8-bp target duplications, they are assumed be isoforms of the same IS (65).
ISH24. Another infrequent insertion into the bop gene, ISH24, is 3,000 bp long, including two terminal IRs of 14 bp, and is flanked by 7-bp direct target repeats. The sequence of this element became available subsequent to the sequencing of the megaplasmid pNRC100 of H. salinarum. It was renamed ISH7 (54). ISH7 encodes two large ORFs. The second displays some weak and local similarities with the C-terminal parts of IS4 element Tpases. No clear DDE motif in ISH24 could be detected from this partial alignment.
ISH25. The short 588-nt sequence of ISH25 is sometimes associated with ISH27 insertion, but it appears unlikely to be a simple IS, as no putative ORF can be found.
ISH28. ISH28 was also isolated from a bop mutant (62). Its nucleotide sequence was revised (91). It is 938 bp long, with 16-bp terminal IRs, and carries an ORF of 828 bp. It is flanked by 8-bp direct target repeats. The putative Tpase protein is 49% similar to that of ISH1, a member of the IS5 family.
ISH28 has also been engineered to generate composite transposons, which are efficient tools for mutagenesis of Haloarcula hispanica and other halophilic organisms (92). This element showed little target sequence specificity but was biased toward target regions with a lower G+C content. Of 20 insertions characterized, 18 generated DRs of 8 bp, while the remaining 2 had DRs of 9 bp.
Collectively, these results clearly demonstrate the major role played by transposable elements in shaping the halophilic genome.
Like halobacterial species, S. solfataricus also exhibits a relatively high spontaneous mutation rate (52). These studies used 5-fluoro-orate resistance as a screen for uracil auxotrophs (pyrE and pyrF). Mutations were obtained at frequencies of between 104 and 105, significantly lower than in the halobacteria but at least 10-fold higher than for other members of the Sulfolobus genus. PCR analysis of several auxotrophic mutants revealed that all carried insertions ranging from 1 to 1.4 kbp. Similar auxotrophs of the related Sulfolobus acidocaldarius failed to show such insertions. Seven S. solfataricus mutants were analyzed in more detail and proved to carry insertions. These were named according to their individual lengths, in base pairs: ISC1058 (three examples), ISC1359 (two examples), and ISC1439 (one example). One example, of 1,147 bp, was closely related to, and presumably a deletion derivative of, a 1217-bp element previously isolated as an insertion of ISC1217 (13-bp IRs, 6-bp DRs) into a ß-galactosidase gene (80). All four ISs show similarities to members of the IS4 or IS5 family: their putative Tpases include both the D · N · G/A-Y/F and Y · R · E · K motifs characteristic of these DDE families (see "IS families and the nature of the catalytic site," below).
Additional active ISs have since been isolated (6), also with 5-fluoroorate resistance used as a screen. Several different, newly isolated, Sulfolobus strains from Siberia and the western United States were analyzed. As judged by the 99% nucleotide identities in the pyrB, pyrF, or pyrE gene, these appeared to be conspecific strains. Seven distinct ISs were isolated following PCR amplification across the mutated gene. Again, these were named for their lengths, in nucleotides.
In order of size they include ISC735, a member of the IS6 family with a single ORF, 18-bp IRs, and 8-bp DRs; ISC796, a member of the IS1 family with only a single reading frame, 21-bp IRs, and 8-bp DRs; ISC1057 and ISC1058b, related to ISC1058 and members of the IS5 family, with 88 to 93% shared nucleic acid identities, 20-bp IRs interrupted ("hyphenated") by a hexanucleotide, and 8-bp DRs; ISC1205, related to ISC1217, with 17- to 20-bp IRs and 4- to 7-bp DRs; ISC1290, a member of the IS5 family, with 34-bp IRs and 5-bp DRs; and ISC1926, a member of the IS200/IS605 group, with the corresponding two characteristic ORFs. ISC1926 is an isoform of ISC1913 in the sequenced genome of S. solfataricus. In addition to these entire ISs, the authors also detected an insertion of a short 128-bp fragment with terminal inverted repeats similar to those of ISC1058. This sequence corresponds to a typical MITE (see "MITEs, MICs, and solo IRs," below).
ISM1 was identified in a cloning study of the Methanobrevibacter smithii purE and proC genes (32). This has a typical IS structure, is distantly related to the ISL3 family, and is present in about 10 copies in M. smithii.
No data concerning transposition or the effects of transposable elements are available for other archaeal phyla, including important groups carrying numerous ISs such as the Methanosarcinales and Thermoplasmatales.
| REGULATION OF TRANSPOSITION |
|---|
|
|
|---|
|
Further studies are essential to determine the exact role of these ncRNAs in regulation of Tpase expression. As pointed out by Tang et al. (87), regulation at the posttranscriptional level would be an efficient strategy for S. solfataricus, since mRNAs in this organism have unusually long half-lives (4).
| IS FAMILIES AND THE NATURE OF THE CATALYTIC SITE |
|---|
|
|
|---|
helix as the conserved E but two turns farther toward the C terminus. Several groups of additional conserved amino acids, designated N1, N2, N3, and C1 encompass the D (N2), D (N3), and E (C1) regions in the IS4 family (74). These have been expanded to the motifs DDT, DREAD, and YREK respectively (73).
DDE enzymes ensure cleavage of the terminal phosphodiester bonds at the 3' end of the transposon strand, which will be finally transferred into the target DNA site (transferred strand). Transposons and ISs using such enzymes generally carry imperfect IRs at their ends, including one or several Tpase binding sites. The ends of ISs (terminal IRs) that have adopted this transposition chemistry are generally the simplest. They can often be divided into two domains: a Tpase binding domain, an internal sequence of 10 to 15 bp, and a catalytic domain composed of the terminal 2 to 4 bp required for cleavage and strand transfer. DDE enzymes generally generate a characteristic direct duplication of target DNA flanking the insertion. This type of IR structure is conserved in the archaeal ISs but is generally more complicated in the Eukarya.
150-aa) Tpases. They use a single tyrosine (Y) residue as a nucleophile in DNA cleavage and generate covalent Y-DNA substrate intermediates. The structures of two enzymes, the bacterial IS608 and an isoform of ISSto1 (from S. tokodaii [ISfinder]), have been solved (46, 77). They exhibit a structural topology close to that of the Rep and Relaxase proteins. Transposons using this type of Tpase do not carry terminal IRs and do not generate the small flanking direct target repeats generally produced by transposons with DDE Tpases. Instead, these Tpases bind to extensive subterminal secondary structural motifs and cleave at a fixed but distant position (88). They also use a defined tetra- or pentanucleotide as a target sequence and require this sequence for further transposition. | IS FAMILIES IN THE ARCHAEAL GENOMES |
|---|
|
|
|---|
|
|
|
All four Sulfolobus elements carry only a single long reading frame (although one ISSto9 copy appears to be degenerate, with an 8-bp deletion generating two ORFs). Although there is no ORF equivalent to insA, an upstream equivalent to InsA may be produced in these single ORF elements. This could occur, for example, by proteolysis of the larger Tpase or by frameshifting to create the smaller protein, as in Escherichia coli for dnaX (5).
ISC1173a and ISSto7 are significantly longer (1,173 and 1,174 bp) than other family members, with IRs of approximately 50 bp, over twice the length of other members of the family. Moreover, the Tpase is larger than that of ISC796, ISSto9, and other members of the family (
340 aa compared to
240 aa) due to an 80-aa N-terminal extension and a 40-aa C-terminal extension (Fig. 3B, top). Both ISC796 and ISSto9 are 796 bp long, with IRs of 21 bp (Fig. 3C, top). DNA alignments show that the long and short ISs and the MITEs are clearly derived from a common ancestor, but their exact relationship is at present unclear.
Four additional IS1 family members, organized as a canonical eubacterial IS1 (Fig. 3A, top), are present in the Methanosarcinales: ISMac16 (Methanosarcina acetivorans); ISMma7 (M. mazei, M. barkeri, and Methanococcoides burtonii), ISMba2 (M. barkeri), and ISMbu3 (Methanococcoides burtonii). ISMac16, ISMma7, and ISMba2 are 740 bp long, with 24-bp IRs and 8- or 9-bp DRs. ISMbu3 (741 bp; 8-bp DRs) has IRs of only 15 bp. In contrast to the Sulfolobus IS1 members, these all carry the expected two ORFs. They are closely related elements, with 84 to 89% identity with respect to ISMac16. Inspection of their nucleic acid sequence reveals an appropriately placed stretch of eight A residues and raises the possibility that the Tpase is produced by transcriptional rather than translational frameshifting (3; O. Fayet, personal communication).
The Tpases of these elements are related to that of ISMae3 of the cyanobacterium Microcystis aeruginosa (Fig. 3; 89) and less closely to diverse IS1 elements of the
-Proteobacteria, including IS1X and IS1S from E. coli and ISVvu1 from Vibrio vulnificus. The DDE catalytic motif and surrounding amino acid residues are also typical of this family. Finally, the terminal 23 to 30 bp are very similar to the IRs of the
-proteobacterial and cyanobacterial IS1 elements and terminate with a highly conserved 5'-GGNNNTG (CANNNCC-3'). Where identified, the site of insertion is A+T rich.
A single, distantly related degenerate element has been identified in Thermoplasma volcanium (TVN0865/67 and TVN0691/92). Blast searches revealed a relationship with diverse bacterial IS3 elements such as ISAca1 of Acinetobacter calcoaceticus, ISSod2 of Shewanella oneidensis, and ISPg5 of Porphyromonas gingivalis. Multiple alignments of these reading frames suggested that TVN0865 and TVN0691 are truncated copies of the OrfA frame and that TVN0867 and TVG0898533 represent truncated versions of the OrfB frame lacking the first catalytic aspartic acid (D). The spacing between the second catalytic aspartic acid (D) and glutamic acid (E) is conserved (35 aa), and an arginine (R) is present 7 aa after the glutamic acid (E). No IRs or DRs could be found for these two archaeal elements. T. volcanium therefore apparently carries only partial copies of IS3 elements.
|
ISH26 was described as harboring two overlapping ORFs. Although the first has significant similarity to the putative Tpases of other IS4 family members (26% identity with IS231W over a 143-aa overlap), the second has only very limited similarity (in the region of the conserved E residue). Detailed analyses indicate, however, that several frameshifts could significantly increase this similarity. The first ORF is very closely related to the N-terminal end of the Tpase of ISH8. A reevaluation of the ISH26 DNA sequence is needed to clarify this issue.
It is interesting to note that all five copies of ISH5 are interrupted by ISH11 at an identical position. This suggests that the entire interrupted IS is capable of autonomous transposition.
IS1634 subgroup. The IS1634 subgroup includes both bacterial and archaeal members. All archaeal members except ISFac6, from the incompletely sequenced F. acidarmanus, and ISTvo4, from T. volcanium, are restricted to methanogens. These include ISMac5, ISMac6, ISMac10, ISMac12, and ISMac23 from M. acetivorans; ISMba11, ISMba12, and ISMba13 from M. barkeri; ISMma3, ISMma4, and ISMma20 from M. mazei; ISMma18 from M. mazei, M. acetivorans, and M. barkeri; ISMhu4, ISMhu5, ISMhu7, and ISMhu8 from M. hungatei; and ISMth2 from M. thermophila. ISMba11 and ISMba12 also give rise to MITE derivatives (Table 3). An additional IS, ISArch8, has been identified in an uncultured environmental archaeon.
The IRs appear to be similar and begin with 5'CA or 5'CC. Short DRs generally of 5 or 6 bp are also present, but no similarities can be distinguished. Their presence, largely restricted to Methanosarcinales, could indicate horizontal acquisition of these elements from bacterial species by a common Methanosarcinales ancestor.
ISH3 subgroup. The Archaea-specific subgroup ISH3 forms a separate cluster in Tribe analysis and can be further subdivided into two phylogenetic subgroups with BLAST. It includes ISH27 (an isoform of ISH40) from H. salinarium; ISH51 from Haloferax volcanii; ISH20 from Haloarcula marismortui; ISH3 from the Halobacterium sp. chromosome, pNRC100, and pNRC200; ISFac1 in the unfinished genome of Ferroplasma acidarmanus; ISC1200, ISC1225, ISC1359, and ISC1439A and ISC1439B (76% identity with ISC1439A) from S. solfataricus; ISSto8 and ISSto14 from S. tokodaii; ISMma1 from M. mazei; ISMba14 from M. barkeri and M. burtonii; and ISMbu7 and ISMbu8 from M. burtonii. ISMba14 was reconstructed in silico because it is interrupted by ISMba11. The ISH3 subgroup shares a conserved terminal 5'-CAG-3' trinucleotide.
IS701 subgroup. At present the IS701 cluster, which has emerged as a group separate from the IS4 family, contains a single example from the Archaea, ISMba8 (M. barkeri).
|
-Proteobacteria (IS903D and IS102 of E. coli, ISAs4 from Aeromonas sp., and ISVa1 from Vibrio species). The IRs of this subgroup are very homogeneous despite the fact that the very terminal "catalytic" base pairs are different from the 5'-GGC-3' consensus of the bacterial elements. They all carry a motif, TGTTG, common to the bacterial ISs between nt 6 and 10. All exhibit DRs with a length of 9 bp, as expected for this group, but no similarities between them are evident. Related partial copies are present in H. marismortui chromosome II (rrnB0094), M. mazei (MM1429), and M. barkeri (Mbar_A1398/99, Mbar_A2202). IS5 subgroup. The IS5 subgroup (Fig. 5) includes ISMbu1 (M. burtonii), ISMac22 (M. acetivorans), and ISArch6 (from an uncultured archaeon). Three complete copies of ISMbu1 carry an in-phase insertion of 52 bp, which introduces a termination codon. Four complete copies also carry an additional tandem left end of 97 bp. A possible MITE derivative of ISMac22 was also identified. A fragment of an IS related to IS1194 can also be found in T. volcanium (TVN1409, TVN1410) and another in T. acidophilum (ID: Ta0379). ISMbu1 is related to IS1246 (Pseudomonas species) and ISSsp126 (Sphingomonas sp.). The IRs of this subgroup are heterogeneous. ISMbu1 have long DRs (14 bp), with no similarities to bacterial DRs.
IS1031 subgroup. Only a single example of this group, ISMac15 (M. acetivorans), has been identified.
IS427 subgroup. Four archaeal ISs have been identified in this subgroup: ISMac11, ISMma12 (M. mazei), ISMba5, and ISMba19 (M. barkeri). ISMac11- and ISMba5-related MITEs have also been identified.
The halophilic subgroup ISH1. The halophilic subgroup ISH1 includes ISH1 and two isoelements, ISH9 and ISH28, together with ISH19, ISHma8, ISHma9, ISHma10, ISHma11, and ISNph4. Where present, DRs are between 7 and 10 bp. A single ISH9 MITE derivative was also identified.
The Sulfolobus subgroup. Several elements in the genome of S. solfataricus (ISC1212, ISC1234, and ISC1290) are annotated as IS5 family members (8). These, together with ISSto3 from S. tokodaii, show only very weak similarities to other IS5 elements and also vary significantly among themselves. Moreover, the spacing of the DDE catalytic motifs does not align with that of other IS5 family members. MITE derivatives of ISSto3 have been identified.
IS5 orphans. Several elements that display only weak similarities with the other IS5 elements are also present in both archaeal methanogens and halophiles. We have identified ISMba15 (M. barkeri), ISMhu10 (M. hungatei), and ISMbu10 (M. burtonii). ISMbu10-related MITEs and numerous solo IRs were also identified. Solo IRs are also found in M. acetivorans, M. mazei, and M. barkeri. Two related ISs are also present in the halophiles: ISH11 (Halobacterium sp. plasmids pNRC100 and pNRC200) and ISHma6 (H. marismortui pNG500 and N. pharaonis chromosome II and pL131).
|
Five different members were identified in the Sulfolobales: ISC735, ISC774, ISSto2, ISSte1, and ISSis1. ISC735 is indicated as a single copy in Sulfolobus sp. (AY671942). There are also three degenerate copies (with rearrangements and deletions within the IS) in S. solfataricus. S. solfataricus also carries full and partial (mostly solo IRs) copies of ISC774, while S. acidocaldarius carries only two IRs. ISSto2 is present in four complete copies, three of which carry different mutations in one IR and at least 13 partial copies. ISSte1 is present in a single copy in Sulfolobus tengchongensis plasmid pTC. Finally, ISSis1 is present in a single copy in Sulfolobus islandicus plasmid pARN4.
Methanocaldococcus jannaschii carries ISMja1 (ISE703) in two complete and one partial copy in the genome and three partial copies in the large extrachromosomal element. In addition, eight small elements of 358 to 360 bp resembling MITEs were identified (see "MITES, MICs, and solo IRs," below).
Only a single partial copy of an IS6 family member could be identified in the Methanosarcina genus (M. barkeri Mbar_A0568).
The hyperthermophilic P. furiosus carries another three closely related elements, ISPfu1, ISPfu2, and ISPfu5, while P. abyssi carries a partial iso-ISPfu1 copy. Isoforms of these ISs are present in P. woesei and in a wide range of Pyrococcus strains.
Finally, two partial copies of an IS6-like element are present in the genome of Archaeoglobus fulgidus (AF0138, AF0895).
These archaeal elements form a monophyletic group related to bacterial ISs from Firmicutes: IS240 (Bacillus sp.), IS431 (Staphylococcus aureus), IS1297 (Leuconostoc mesenteroides), ISS1W (Lactococcus lactis), and ISEnfa1 (Enterococcus faecalis). Most carry DRs of