Previous Article | Next Article ![]()
Microbiology and Molecular Biology Reviews, March 2007, p. 121-157, Vol. 71, No. 1
1092-2172/07/$08.00+0 doi:10.1128/MMBR.00031-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
P. Siguier,
and
M. Chandler**
Laboratoire de Microbiologie et Génétique Moléculaires (UMR5100 CNRS), Campus Université Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse Cedex, France
SUMMARY INTRODUCTION NOMENCLATURE IS DISTRIBUTION IN ARCHAEA COMPARED TO BACTERIA AND EUKARYA TRANSPOSITION IN THE ARCHAEA: HISTORICAL PERSPECTIVE Spontaneous Mutation in the Extreme Halophiles ISH1. ISH2. ISH3/ISH27/ISH51. ISH8/ISH26. ISH11. ISH23/ISH50. ISH24. ISH25. ISH28. Transposition in Sulfolobus Transposition in Other Archaea REGULATION OF TRANSPOSITION Lost in Transcription: ncRNAs in S. solfataricus Lost in Translation: Translational Readthrough in Methanosarcina? IS FAMILIES AND THE NATURE OF THE CATALYTIC SITE The DDE Enzymes The Serine Enzymes The Relaxase Enzymes IS FAMILIES IN THE ARCHAEAL GENOMES IS1 IS3 IS4 ISH8 subgroup. IS1634 subgroup. ISH3 subgroup. IS701 subgroup. IS5 IS903 subgroup. IS5 subgroup. IS1031 subgroup. IS427 subgroup. The halophilic subgroup ISH1. The Sulfolobus subgroup. IS5 orphans. IS6 IS21 IS30 IS110 IS110 subgroup. IS1111 subgroup. IS256 IS481 IS630 IS982 ISL3 Non-DDE Transposons: the IS91 Group Non-DDE Transposons: the IS200/IS605/IS607 Group IS200 subgroup. IS605-related elements. IS607-related elements. Single orfB elements. Phylogenetic distribution. EMERGING GROUPS, ORPHANS, WAIFS, AND STRAYS ISA1214-Related Elements ISL3-Related Elements ISM1 group. IS1595 group. IS66-Related Elements: the New Subgroup ISBst12 IS1182 ISH6 ISC1217 MITEs, MICs, AND SOLO IRs MITEs IS1. IS4. IS5. IS6. IS200/IS605. IS630. ISM1. ISC1217. Nonclassified MITEs. Solo IRs COMPOUND TRANSPOSONS, BITS, AND PIECES Compound Transposons Uncharacterizable IS-Like Sequences Concatenated ISs GENOME COMPARISONS: IS DISTRIBUTION, ABUNDANCE, AND GEOGRAPHICAL VARIATIONS Intergenome Distribution and Abundance Intragenome Distribution Large Genomic Rearrangements Geographical Variations EVOLUTIONARY HISTORY OF ISs IN ARCHAEA: A POSSIBLE SCENARIO CONCLUSIONS ADDENDUM IN PROOF ACKNOWLEDGMENTS REFERENCES
|
|
|---|
|
|
|---|
Like those of the other two domains of life, the Bacteria and Eukarya, members of the prokaryotic Archaea can carry a large number and variety of transposable elements within their genomes. These are principally insertion sequences (ISs) and miniature inverted-repeat transposable elements (MITEs) (8), although at least one active composite transposon has been documented (92) and other similar structures have been identified (see "Compound transposons, bits, and pieces," below). ISs are short specific segments of DNA up to 2 kbp long. They carry one or two open reading frames (ORFs) encoding the enzyme that catalyzes their movement, the transposase (Tpase), generally (but not always) flanked by short terminal inverted repeats (IRs). IS insertion often results in the duplication of a short target sequence that flanks the insertion (direct repeat [DR]) (12). MITEs are nonautonomous ISs deleted for part or all of the Tpase ORF but retaining both ends, while composite transposons are structures in which a DNA segment is flanked by two copies of a given IS.
Little is known about the transposition behavior of the majority of these mobile genetic elements in archaea. This is certainly due to the limitation of genetic systems available for their analysis and to the extreme conditions (temperature, pressure, pH, and salinity) required for the growth of those archaea so far analyzed. Data from the available sequenced genomes suggests that, as among bacteria, the distribution of ISs is somewhat "haphazard," with certain species exhibiting very few or no IS copies while others carry many (see "Genome comparisons: IS distribution, abundance, and geographical variations," below). It is clear that the variety of archaeal ISs approximates that of bacteria rather than the limited types recognized at present in eukaryotes (8). However, apart from a survey compiled several years ago (8) before the availability of a significant number of archaeal genome sequences, no systematic and coherent comparison of archaeal and bacterial ISs is available. Since the transposition characteristics of a variety of bacterial ISs are known (14), such a comparison would provide a useful starting point for exploring transposition activity in archaea and the impact of mobile genetic elements on archaeal genome structure.
|
|
|---|
In the present review, we provide an updated survey of archaeal IS elements and include an analysis of their distribution and of their relationship to bacterial and eukaryotic ISs. Except for certain IS names already published (principally those of the halophiles and Sulfolobales), we adhere to the system of nomenclature used at present for ISs of Bacteria, namely, the first letter of the genus, in uppercase, and the first two letters of the species name, in lowercase (12; also see www-is.biotoul.fr). This is similar to the nomenclature system used for restriction enzymes. It renders more transparent the phylogenetic relationships between highly related ISs that differ simply in overall length. These designations have been included as the principal name in the ISfinder database (www-is.biotoul.fr). Any names previously used are also included in the database as synonyms to facilitate retrieval. We assign IS names only for those where we can identify the IS ends. In all other cases, we assume that the copies are only partial, and only the identification number of the corresponding transposase ORF is given.
At the time of writing, the public databases included the entire sequences of 28 archaeal genomes (23 euryarchaeotes and 5 crenarchaeotes). For operational simplicity, to avoid inundating the ISfinder database with specific names, we have adopted the use of "isoforms," as first suggested by Ohtsubo et al. (57). We (arbitrarily) define isoforms as being sequences that are 98% similar at the protein level and/or more than 95% similar at the DNA level. Moreover, we also point out those previously published ISs that were given different names according to length but that are effectively identical to, or are isoforms of, other ISs. We have not yet systematically addressed the extensive accumulating data from environmental sequencing projects, although certain ISs have been identified and included in ISfinder.
IS elements were identified by manual reiterative BLAST analysis using an E value cutoff of 103. Tpase alignments were performed with CLUSTALX and refined by eye. To infer phylogenetic relationships, we performed preliminary analyses to assess the different subgroups of large families by neighbor joining using MUST.3.0 (68). TribeMCL (23) was also applied to confirm the clustering of all ISs into the various families and subgroups. Sequences belonging to different subgroups of a single family were then treated separately by maximum likelihood, using PROML (Phylip, version 3.6 [26]) with the Jones-Taylor-Thornton amino acid substitution matrix.
|
|
|---|
![]() View larger version (49K): [in a new window] |
FIG. 1. Comparison of IS families in archaea. The figure shows the distribution of IS families among the different archaeal phyla. The tree is from NCBI (http://www.ncbi.nlm.nih.gov/sutils/genom_tree.cgi). The color code for IS families is included within the figure beneath the phylogenetic tree. Stars represent emerging groups or families.
|
|
View this table: [in a new window] |
TABLE 1. IS content of archaeal genomesa
|
The distribution of ISs in archaeal genomes is very "patchy" (Fig. 1). Four phyla, comprising the Halobacteriales, Sulfolobales, Methanosarcinales, and Thermoplasmatales, monopolize more than 90% of archaeal ISs (Table 1). No ISs were identified in the Nanoarchaeota, the Desulfurococcales, the Methanomicrobiales, the Thermoproteales, or the Methanobacteriales, and only one or two families in the Methanococcales or the Methanopyrales. However, these lineages are represented by only one or two completely sequenced genomes, and this limited information may introduce some bias, as was initially the case for bacterial Mycoplasma species (www-is.biotoul.fr).
It is worth noting that archaeal ISs resemble bacterial ISs rather than those identified in eukaryotes. No elements with significant similarity to the nine currently recognized eukaryote DNA transposon superfamilies could be identified. These include notably the mariner/Tc (distantly related to the IS630 family) and the P (from Drosophila) families, which are structurally close to bacterial ISs; elements such as the CACTA or the hAT (e.g., hobo, Ac, and Tam) families (mainly recovered in plants and insects), Merlin (related to IS1016), Mutator (distantly related to IS256 family members), PIF/Harbinger (distant relatives of some IS5 family members), piggyback, and Transbib (12, 70); or to the helitrons (40), a family related to bacterial IS91 and identified in plants, fungi, and diverse animals (14). Extensive BLAST searches seeded with such sequences revealed no detectable homologies in the archaeal genomes. This is perhaps surprising in view of the fact that Archaea have important similarities to Eukarya, notably enzymes involved in DNA replication (47). Since it seems unlikely that eukaryal "ISs" were originally present in these genomes and were subsequently specifically deleted, this implies that any lateral transfer of transposable elements occurred between Bacteria and Archaea but not between Archaea and Eukarya.
In the light of the important differences between bacterial and archaeal replication systems, it is interesting to note the presence of members of the IS1, IS3, and IS256 families within archaeal genomes. Bacterial members of these families are thought to transpose by a mechanism involving a replication step to eject a circular IS transposition copy from the donor site, which then serves as a transposition intermediate (78). In the case of the IS3 family member IS911, this process has been shown to depend on the DnaG primase (22). Interestingly, each archaeal genome usually contains two types of primase: a dimeric eukaryotic-like primase (44) and a DnaG-like enzyme that shares the Toprim domain with bacterial DnaG (2).
However, recent biochemical analyses have demonstrated that the DnaG-like primase in Archaea may be involved in RNA processing and degradation rather than in DNA metabolism (25). The presence of these ISs in Archaea therefore implies that the replication step may be taken in charge by the host (Eukarya-like) replication system.
|
|
|---|
The exceptional genome plasticity revealed by these studies was further reinforced by experiments establishing that strains of both H. salinarium and the related Halobacterium volcanii generally carry a large number of repeated elements. These were divided into several families by Southern hybridization. The elements appeared to be highly mobile, were associated with chromosome rearrangements, and were found both clustered and dispersed over the genome (79).
A collection of repeated sequences resembling bacterial ISs was subsequently assembled in H. salinarium with either gas vacuole or plasmid-carried purple membrane genes used as targets. Several of these have been isolated more than once and have received different names. Importantly, since the majority of these ISs were isolated as novel insertions, they therefore represent active copies.
ISH1. The 1,118-bp ISH1 was isolated as an insertion into the bacteriorhodopsin (bop) gene. Its sequence revealed imperfect terminal inverted repeats of 9 bp and flanking 8-bp direct target repeats. These features are characteristic signatures of IS elements in Bacteria. The element was named ISH1 (84). The single ORF predicts a protein of 270 amino acids (aa) with a clear DDE catalytic motif (see "IS families and the nature of the catalytic site," below), relating the Tpase to those of the majority of transposable elements presently identified. Further examination (12) placed ISH1 in the rather disperse IS5 family (see "IS families in the archaeal genomes," below). Many isolates of ISH1 appeared to have inserted into the same site (5'-AGTTATTG-3') of the bop gene but could do so in both orientations. This indicates relatively high target site specificity. Southern blot analysis revealed multiple ISH1 copies, ranging from one to more than five, in different halobacterial strains (84).
Moreover, analysis of one insertion mutant revealed a single additional ISH1-specific restriction fragment compared to its wild-type parent. This increase in copy number led to the supposition that ISH1 transposes by a replicative mechanism (84).
Evidence from Northern blots also showed that ISH1 was actively transcribed in these strains with a rough correlation between RNA band intensity and IS copy number. However, in view of the numerous regulatory mechanisms adopted by ISs to limit their activity (53), this does not necessarily mean that the Tpase is produced at comparative relative levels.
ISH2. Examination of additional bop mutants revealed several other repeated sequences distinguishable by size. The most frequently observed was ISH2, only 521 bp long and carrying 19-bp terminal inverted repeats flanked by target duplications of 10 or 20 bp (17) and occasionally 11 bp (64). Although three potential ORFs were detected (ORF I, 80 codons; ORF II, 64; ORF III, 59), we have been unable to identify a typical Tpase catalytic motif (see "IS families and the nature of the catalytic site," below). The majority of insertion mutations in the bop gene were caused by the elements ISH1 and ISH2. Unlike ISH1, ISH2 showed multiple insertion sites in the gene (17).
ISH2 was present in multiple copies in various H. salinarium strains, and, more recently, four additional copies were identified in the Halobacterium plasmid pNRC100 (54). The IS is clearly capable of transposition but is probably not an autonomous transposon. However, ISH2 shares nearly perfect terminal homology (but no internal homology) with an apparently complete IS, ISH26 (ISH8; see below). ISH2 transposition may therefore be driven in trans by the ISH26 Tpase.
ISH3/ISH27/ISH51.
Remarkably, 20% of H. salinarium PHH4 colonies were found to carry IS insertions into a resident pHH4 plasmid (16, 63). Among these, ISH27 was isolated as a major source of mutation. This group of ISs belongs to the IS4 family. They are 1,398 bp (ISH27-1) or 1,389 bp (ISH27-2 and ISH27-3) long and generate 5-bp target repeats (63) rather than the 3-bp repeats proposed for the identical ISH3 (16). They also include terminal IRs of 16 bp. Two ISH27-1-specific transcripts were observed in the pHH4 plasmid-carrying strain. One of these exhibited a size expected for a full ISH27 transcript (
1,200 nucleotides [nt]), while the other was significantly shorter (
650 nt). This could reflect regulation at the transcriptional or posttranscriptional level.
ISH27 is the generic name for three related ISs. Although closely related, these are not isoforms by our definition. At the nucleotide level, ISH27-1 is more similar to H. volcanii ISH51-1, ISH51-2, and ISH51-3 (88% DNA identity) than to ISH27-2 and ISH27-3 (80% identity). There are more than 20 copies of ISH51 in the H. volcanii genome (36). ISH27 was also observed to have undergone an amplification following storage of the host strain over a period of several years at 4°C (63). Further studies to determine the factors involved in this process would be interesting.
ISH8/ISH26. ISH8/ISH26 was isolated as an insertion mutation of the gvp operon (gas vesicle proteins, Vac) (31). ISH8, also a member of the IS4 family, is 1,402 bp long, carries 18-bp IRs, and generates 10-bp DRs. Its DNA sequence is 94% identical to that of ISH26. Copies of ISH8 were also found in the H. salinarium plasmid pNRC100.
A 70-kbp AT-rich island of H. salinarium was identified and proven to carry copies of ISH1, ISH2, and an IS-like sequence, ISH26, together with copies of an additional 10 repeated sequences, most of which were not characterized (62).
ISH26 was also isolated as an insertional inactivation of the bop gene. There are four ISH26 copies on pHH1 and four copies on the chromosome of H. salinarium PHH1 (65). ISH26 was described as harboring two overlapping ORFs. Although the first ORF has significant similarity with the putative Tpases of other IS4 family members (for example, 26% identity to IS231W over a 143-aa overlap), the second ORF has only very limited similarity, in the region of the conserved E residue (see "IS families and the nature of the catalytic site," below). Detailed analyses suggest, however, that the introduction of several frameshifts would significantly increase this similarity. The first ORF is very closely related to the N-terminal end of the Tpase of ISH8. Like ISH27, ISH26 copies constitute a group of related, but not identical, elements (63).
ISH11. ISH11, from H. salinarium, was observed as an insertion into plasmid pGRB1. It is 1,068 bp long, with 15-bp terminal IRs, and was flanked by 7-bp direct target repeats (43). It exhibits a single long ORF of 334 aa. ISH11 has been tentatively grouped within the IS427 cluster of the IS5 family. Two copies are present in pNRC100 of Halobacterium sp. strain NRC-1.
ISH23/ISH50. ISH23/ISH50 is one of the least-frequent causes of insertion mutations in the bop gene (64). There are two ISH23 copies in H. salinarium NRC817.
ISH23 is flanked by 29-bp imperfect IRs and by a 9-bp direct target repeat. It is very similar (but not identical) to ISH50, an IS isolated as an insertion into the Halobacterium plasmid pNRC (93). ISH50 is 996 bp long, with terminal IRs of 23/29 bp and 8-bp flanking direct target repeats. It encodes a potential 273-aa Tpase and belongs to a newly defined family containing both archaeal and bacterial members (L. Gagnevin and P. Siguier, unpublished data) (see "Emerging groups, orphans, waifs, and strays," below). The first and last 200 bp of ISH23 were found to be identical to those of ISH50 and, although ISH23 and ISH50 differ by at least two restriction sites and appear to generate either 9- or 8-bp target duplications, they are assumed be isoforms of the same IS (65).
ISH24. Another infrequent insertion into the bop gene, ISH24, is 3,000 bp long, including two terminal IRs of 14 bp, and is flanked by 7-bp direct target repeats. The sequence of this element became available subsequent to the sequencing of the megaplasmid pNRC100 of H. salinarum. It was renamed ISH7 (54). ISH7 encodes two large ORFs. The second displays some weak and local similarities with the C-terminal parts of IS4 element Tpases. No clear DDE motif in ISH24 could be detected from this partial alignment.
ISH25. The short 588-nt sequence of ISH25 is sometimes associated with ISH27 insertion, but it appears unlikely to be a simple IS, as no putative ORF can be found.
ISH28. ISH28 was also isolated from a bop mutant (62). Its nucleotide sequence was revised (91). It is 938 bp long, with 16-bp terminal IRs, and carries an ORF of 828 bp. It is flanked by 8-bp direct target repeats. The putative Tpase protein is 49% similar to that of ISH1, a member of the IS5 family.
ISH28 has also been engineered to generate composite transposons, which are efficient tools for mutagenesis of Haloarcula hispanica and other halophilic organisms (92). This element showed little target sequence specificity but was biased toward target regions with a lower G+C content. Of 20 insertions characterized, 18 generated DRs of 8 bp, while the remaining 2 had DRs of 9 bp.
Collectively, these results clearly demonstrate the major role played by transposable elements in shaping the halophilic genome.
Like halobacterial species, S. solfataricus also exhibits a relatively high spontaneous mutation rate (52). These studies used 5-fluoro-orate resistance as a screen for uracil auxotrophs (pyrE and pyrF). Mutations were obtained at frequencies of between 104 and 105, significantly lower than in the halobacteria but at least 10-fold higher than for other members of the Sulfolobus genus. PCR analysis of several auxotrophic mutants revealed that all carried insertions ranging from 1 to 1.4 kbp. Similar auxotrophs of the related Sulfolobus acidocaldarius failed to show such insertions. Seven S. solfataricus mutants were analyzed in more detail and proved to carry insertions. These were named according to their individual lengths, in base pairs: ISC1058 (three examples), ISC1359 (two examples), and ISC1439 (one example). One example, of 1,147 bp, was closely related to, and presumably a deletion derivative of, a 1217-bp element previously isolated as an insertion of ISC1217 (13-bp IRs, 6-bp DRs) into a ß-galactosidase gene (80). All four ISs show similarities to members of the IS4 or IS5 family: their putative Tpases include both the D · N · G/A-Y/F and Y · R · E · K motifs characteristic of these DDE families (see "IS families and the nature of the catalytic site," below).
Additional active ISs have since been isolated (6), also with 5-fluoroorate resistance used as a screen. Several different, newly isolated, Sulfolobus strains from Siberia and the western United States were analyzed. As judged by the 99% nucleotide identities in the pyrB, pyrF, or pyrE gene, these appeared to be conspecific strains. Seven distinct ISs were isolated following PCR amplification across the mutated gene. Again, these were named for their lengths, in nucleotides.
In order of size they include ISC735, a member of the IS6 family with a single ORF, 18-bp IRs, and 8-bp DRs; ISC796, a member of the IS1 family with only a single reading frame, 21-bp IRs, and 8-bp DRs; ISC1057 and ISC1058b, related to ISC1058 and members of the IS5 family, with 88 to 93% shared nucleic acid identities, 20-bp IRs interrupted ("hyphenated") by a hexanucleotide, and 8-bp DRs; ISC1205, related to ISC1217, with 17- to 20-bp IRs and 4- to 7-bp DRs; ISC1290, a member of the IS5 family, with 34-bp IRs and 5-bp DRs; and ISC1926, a member of the IS200/IS605 group, with the corresponding two characteristic ORFs. ISC1926 is an isoform of ISC1913 in the sequenced genome of S. solfataricus. In addition to these entire ISs, the authors also detected an insertion of a short 128-bp fragment with terminal inverted repeats similar to those of ISC1058. This sequence corresponds to a typical MITE (see "MITEs, MICs, and solo IRs," below).
ISM1 was identified in a cloning study of the Methanobrevibacter smithii purE and proC genes (32). This has a typical IS structure, is distantly related to the ISL3 family, and is present in about 10 copies in M. smithii.
No data concerning transposition or the effects of transposable elements are available for other archaeal phyla, including important groups carrying numerous ISs such as the Methanosarcinales and Thermoplasmatales.
|
|
|---|
![]() View larger version (14K): [in a new window] |
FIG. 2. Noncoding RNA. The ISs are drawn to scale. The black arrows represent the length of the Tpase ORF. The open boxes represent the noncoding regions of the ISs. The noncoding RNA names from reference 72 are shown, together with their beginnings and ends (in bases from the first base of the Tpase coding sequence). The directions of transcription are shown.
|
Further studies are essential to determine the exact role of these ncRNAs in regulation of Tpase expression. As pointed out by Tang et al. (87), regulation at the posttranscriptional level would be an efficient strategy for S. solfataricus, since mRNAs in this organism have unusually long half-lives (4).
|
|
|---|
helix as the conserved E but two turns farther toward the C terminus. Several groups of additional conserved amino acids, designated N1, N2, N3, and C1 encompass the D (N2), D (N3), and E (C1) regions in the IS4 family (74). These have been expanded to the motifs DDT, DREAD, and YREK respectively (73).
DDE enzymes ensure cleavage of the terminal phosphodiester bonds at the 3' end of the transposon strand, which will be finally transferred into the target DNA site (transferred strand). Transposons and ISs using such enzymes generally carry imperfect IRs at their ends, including one or several Tpase binding sites. The ends of ISs (terminal IRs) that have adopted this transposition chemistry are generally the simplest. They can often be divided into two domains: a Tpase binding domain, an internal sequence of 10 to 15 bp, and a catalytic domain composed of the terminal 2 to 4 bp required for cleavage and strand transfer. DDE enzymes generally generate a characteristic direct duplication of target DNA flanking the insertion. This type of IR structure is conserved in the archaeal ISs but is generally more complicated in the Eukarya.
150-aa) Tpases. They use a single tyrosine (Y) residue as a nucleophile in DNA cleavage and generate covalent Y-DNA substrate intermediates. The structures of two enzymes, the bacterial IS608 and an isoform of ISSto1 (from S. tokodaii [ISfinder]), have been solved (46, 77). They exhibit a structural topology close to that of the Rep and Relaxase proteins. Transposons using this type of Tpase do not carry terminal IRs and do not generate the small flanking direct target repeats generally produced by transposons with DDE Tpases. Instead, these Tpases bind to extensive subterminal secondary structural motifs and cleave at a fixed but distant position (88). They also use a defined tetra- or pentanucleotide as a target sequence and require this sequence for further transposition. |
|
|---|
|
View this table: [in a new window] |
TABLE 2. ISs identified in archaeal genomesb
|
|
View this table: [in a new window] |
TABLE 3. Archaeal MITEs
|
![]() View larger version (33K): [in a new window] |
FIG. 3. IS1 members. Shown is the phylogeny of the IS1 family and comparison of a representative set of terminal IRs. The top panel shows the general organization of members of this family. Red boxes indicate the terminal IRs. Yellow (or white) boxes within the larger IS box indicate ORFs (see the text). (A) Organization of the "classical" bacterial IS1. pIRL indicates the promoter, which drives Tpase synthesis. This class includes those from the archaeal methanogens. (B) The longer of the two Sulfolobus groups carries more-extensive IRs and N- and C-terminal extensions (white boxes) to the Tpase compared to the classical IS1 and the shorter Sulfolobus class. (C) Shorter Sulfolobus class. IRs are approximately the length of those found in the classical IS1 organization. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; methanogens, blue. Bacteria are indicated in black.
|
All four Sulfolobus elements carry only a single long reading frame (although one ISSto9 copy appears to be degenerate, with an 8-bp deletion generating two ORFs). Although there is no ORF equivalent to insA, an upstream equivalent to InsA may be produced in these single ORF elements. This could occur, for example, by proteolysis of the larger Tpase or by frameshifting to create the smaller protein, as in Escherichia coli for dnaX (5).
ISC1173a and ISSto7 are significantly longer (1,173 and 1,174 bp) than other family members, with IRs of approximately 50 bp, over twice the length of other members of the family. Moreover, the Tpase is larger than that of ISC796, ISSto9, and other members of the family (
340 aa compared to
240 aa) due to an 80-aa N-terminal extension and a 40-aa C-terminal extension (Fig. 3B, top). Both ISC796 and ISSto9 are 796 bp long, with IRs of 21 bp (Fig. 3C, top). DNA alignments show that the long and short ISs and the MITEs are clearly derived from a common ancestor, but their exact relationship is at present unclear.
Four additional IS1 family members, organized as a canonical eubacterial IS1 (Fig. 3A, top), are present in the Methanosarcinales: ISMac16 (Methanosarcina acetivorans); ISMma7 (M. mazei, M. barkeri, and Methanococcoides burtonii), ISMba2 (M. barkeri), and ISMbu3 (Methanococcoides burtonii). ISMac16, ISMma7, and ISMba2 are 740 bp long, with 24-bp IRs and 8- or 9-bp DRs. ISMbu3 (741 bp; 8-bp DRs) has IRs of only 15 bp. In contrast to the Sulfolobus IS1 members, these all carry the expected two ORFs. They are closely related elements, with 84 to 89% identity with respect to ISMac16. Inspection of their nucleic acid sequence reveals an appropriately placed stretch of eight A residues and raises the possibility that the Tpase is produced by transcriptional rather than translational frameshifting (3; O. Fayet, personal communication).
The Tpases of these elements are related to that of ISMae3 of the cyanobacterium Microcystis aeruginosa (Fig. 3; 89) and less closely to diverse IS1 elements of the
-Proteobacteria, including IS1X and IS1S from E. coli and ISVvu1 from Vibrio vulnificus. The DDE catalytic motif and surrounding amino acid residues are also typical of this family. Finally, the terminal 23 to 30 bp are very similar to the IRs of the
-proteobacterial and cyanobacterial IS1 elements and terminate with a highly conserved 5'-GGNNNTG (CANNNCC-3'). Where identified, the site of insertion is A+T rich.
A single, distantly related degenerate element has been identified in Thermoplasma volcanium (TVN0865/67 and TVN0691/92). Blast searches revealed a relationship with diverse bacterial IS3 elements such as ISAca1 of Acinetobacter calcoaceticus, ISSod2 of Shewanella oneidensis, and ISPg5 of Porphyromonas gingivalis. Multiple alignments of these reading frames suggested that TVN0865 and TVN0691 are truncated copies of the OrfA frame and that TVN0867 and TVG0898533 represent truncated versions of the OrfB frame lacking the first catalytic aspartic acid (D). The spacing between the second catalytic aspartic acid (D) and glutamic acid (E) is conserved (35 aa), and an arginine (R) is present 7 aa after the glutamic acid (E). No IRs or DRs could be found for these two archaeal elements. T. volcanium therefore apparently carries only partial copies of IS3 elements.
![]() View larger version (41K): [in a new window] |
FIG. 4. IS4 members. Shown is the phylogeny of the different subgroups of the IS4 family and comparison of a representative set of terminal IRs. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; Thermoplasmatales, magenta; halophiles, green; methanogens, blue. Bacteria are indicated in black.
|
ISH26 was described as harboring two overlapping ORFs. Although the first has significant similarity to the putative Tpases of other IS4 family members (26% identity with IS231W over a 143-aa overlap), the second has only very limited similarity (in the region of the conserved E residue). Detailed analyses indicate, however, that several frameshifts could significantly increase this similarity. The first ORF is very closely related to the N-terminal end of the Tpase of ISH8. A reevaluation of the ISH26 DNA sequence is needed to clarify this issue.
It is interesting to note that all five copies of ISH5 are interrupted by ISH11 at an identical position. This suggests that the entire interrupted IS is capable of autonomous transposition.
IS1634 subgroup. The IS1634 subgroup includes both bacterial and archaeal members. All archaeal members except ISFac6, from the incompletely sequenced F. acidarmanus, and ISTvo4, from T. volcanium, are restricted to methanogens. These include ISMac5, ISMac6, ISMac10, ISMac12, and ISMac23 from M. acetivorans; ISMba11, ISMba12, and ISMba13 from M. barkeri; ISMma3, ISMma4, and ISMma20 from M. mazei; ISMma18 from M. mazei, M. acetivorans, and M. barkeri; ISMhu4, ISMhu5, ISMhu7, and ISMhu8 from M. hungatei; and ISMth2 from M. thermophila. ISMba11 and ISMba12 also give rise to MITE derivatives (Table 3). An additional IS, ISArch8, has been identified in an uncultured environmental archaeon.
The IRs appear to be similar and begin with 5'CA or 5'CC. Short DRs generally of 5 or 6 bp are also present, but no similarities can be distinguished. Their presence, largely restricted to Methanosarcinales, could indicate horizontal acquisition of these elements from bacterial species by a common Methanosarcinales ancestor.
ISH3 subgroup. The Archaea-specific subgroup ISH3 forms a separate cluster in Tribe analysis and can be further subdivided into two phylogenetic subgroups with BLAST. It includes ISH27 (an isoform of ISH40) from H. salinarium; ISH51 from Haloferax volcanii; ISH20 from Haloarcula marismortui; ISH3 from the Halobacterium sp. chromosome, pNRC100, and pNRC200; ISFac1 in the unfinished genome of Ferroplasma acidarmanus; ISC1200, ISC1225, ISC1359, and ISC1439A and ISC1439B (76% identity with ISC1439A) from S. solfataricus; ISSto8 and ISSto14 from S. tokodaii; ISMma1 from M. mazei; ISMba14 from M. barkeri and M. burtonii; and ISMbu7 and ISMbu8 from M. burtonii. ISMba14 was reconstructed in silico because it is interrupted by ISMba11. The ISH3 subgroup shares a conserved terminal 5'-CAG-3' trinucleotide.
IS701 subgroup. At present the IS701 cluster, which has emerged as a group separate from the IS4 family, contains a single example from the Archaea, ISMba8 (M. barkeri).
![]() View larger version (38K): [in a new window] |
FIG. 5. IS5 members. Shown is the phylogeny of the different subgroups of the IS5 family and comparison of a representative set of terminal IRs. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; Thermoplasmatales, magenta; halophiles, green; methanogens, blue. Bacteria are indicated in black.
|
-Proteobacteria (IS903D and IS102 of E. coli, ISAs4 from Aeromonas sp., and ISVa1 from Vibrio species). The IRs of this subgroup are very homogeneous despite the fact that the very terminal "catalytic" base pairs are different from the 5'-GGC-3' consensus of the bacterial elements. They all carry a motif, TGTTG, common to the bacterial ISs between nt 6 and 10. All exhibit DRs with a length of 9 bp, as expected for this group, but no similarities between them are evident. Related partial copies are present in H. marismortui chromosome II (rrnB0094), M. mazei (MM1429), and M. barkeri (Mbar_A1398/99, Mbar_A2202). IS5 subgroup. The IS5 subgroup (Fig. 5) includes ISMbu1 (M. burtonii), ISMac22 (M. acetivorans), and ISArch6 (from an uncultured archaeon). Three complete copies of ISMbu1 carry an in-phase insertion of 52 bp, which introduces a termination codon. Four complete copies also carry an additional tandem left end of 97 bp. A possible MITE derivative of ISMac22 was also identified. A fragment of an IS related to IS1194 can also be found in T. volcanium (TVN1409, TVN1410) and another in T. acidophilum (ID: Ta0379). ISMbu1 is related to IS1246 (Pseudomonas species) and ISSsp126 (Sphingomonas sp.). The IRs of this subgroup are heterogeneous. ISMbu1 have long DRs (14 bp), with no similarities to bacterial DRs.
IS1031 subgroup. Only a single example of this group, ISMac15 (M. acetivorans), has been identified.
IS427 subgroup. Four archaeal ISs have been identified in this subgroup: ISMac11, ISMma12 (M. mazei), ISMba5, and ISMba19 (M. barkeri). ISMac11- and ISMba5-related MITEs have also been identified.
The halophilic subgroup ISH1. The halophilic subgroup ISH1 includes ISH1 and two isoelements, ISH9 and ISH28, together with ISH19, ISHma8, ISHma9, ISHma10, ISHma11, and ISNph4. Where present, DRs are between 7 and 10 bp. A single ISH9 MITE derivative was also identified.
The Sulfolobus subgroup. Several elements in the genome of S. solfataricus (ISC1212, ISC1234, and ISC1290) are annotated as IS5 family members (8). These, together with ISSto3 from S. tokodaii, show only very weak similarities to other IS5 elements and also vary significantly among themselves. Moreover, the spacing of the DDE catalytic motifs does not align with that of other IS5 family members. MITE derivatives of ISSto3 have been identified.
IS5 orphans. Several elements that display only weak similarities with the other IS5 elements are also present in both archaeal methanogens and halophiles. We have identified ISMba15 (M. barkeri), ISMhu10 (M. hungatei), and ISMbu10 (M. burtonii). ISMbu10-related MITEs and numerous solo IRs were also identified. Solo IRs are also found in M. acetivorans, M. mazei, and M. barkeri. Two related ISs are also present in the halophiles: ISH11 (Halobacterium sp. plasmids pNRC100 and pNRC200) and ISHma6 (H. marismortui pNG500 and N. pharaonis chromosome II and pL131).
![]() View larger version (37K): [in a new window] |
FIG. 6. IS6 members. Shown is the phylogeny of the IS6 family and comparison of a representative set of terminal IRs. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; halophiles, green; "other," orange. Bacteria are indicated in black.
|
Five different members were identified in the Sulfolobales: ISC735, ISC774, ISSto2, ISSte1, and ISSis1. ISC735 is indicated as a single copy in Sulfolobus sp. (AY671942). There are also three degenerate copies (with rearrangements and deletions within the IS) in S. solfataricus. S. solfataricus also carries full and partial (mostly solo IRs) copies of ISC774, while S. acidocaldarius carries only two IRs. ISSto2 is present in four complete copies, three of which carry different mutations in one IR and at least 13 partial copies. ISSte1 is present in a single copy in Sulfolobus tengchongensis plasmid pTC. Finally, ISSis1 is present in a single copy in Sulfolobus islandicus plasmid pARN4.
Methanocaldococcus jannaschii carries ISMja1 (ISE703) in two complete and one partial copy in the genome and three partial copies in the large extrachromosomal element. In addition, eight small elements of 358 to 360 bp resembling MITEs were identified (see "MITES, MICs, and solo IRs," below).
Only a single partial copy of an IS6 family member could be identified in the Methanosarcina genus (M. barkeri Mbar_A0568).
The hyperthermophilic P. furiosus carries another three closely related elements, ISPfu1, ISPfu2, and ISPfu5, while P. abyssi carries a partial iso-ISPfu1 copy. Isoforms of these ISs are present in P. woesei and in a wide range of Pyrococcus strains.
Finally, two partial copies of an IS6-like element are present in the genome of Archaeoglobus fulgidus (AF0138, AF0895).
These archaeal elements form a monophyletic group related to bacterial ISs from Firmicutes: IS240 (Bacillus sp.), IS431 (Staphylococcus aureus), IS1297 (Leuconostoc mesenteroides), ISS1W (Lactococcus lactis), and ISEnfa1 (Enterococcus faecalis). Most carry DRs of 8 bp, but no clear sequence similarities can be observed in the DRs or surrounding sequences either between different ISs or copies of the same IS. The IRs of the archaeal IS6 members are quite variable compared to those of the bacterial members and might be divided into two subgroups. They generally terminate with 5'-GT or 5'-GA, as opposed to the 5'-GG found in Bacteria (Fig. 6). The bacterial and archaeal IRs clearly fall into different groups. The large phylogenetic distribution of IS6 family members in the Archaea and the monophyly of the IS6 archaeal group (in agreement with the IR resemblances) suggest that these elements were ancestrally present in archaea rather than being recently acquired by lateral gene transfer from bacteria.
![]() View larger version (34K): [in a new window] |
FIG. 7. IS21 members. Shown is the phylogeny of the IS21 family and comparison of a representative set of terminal IRs. The top panel shows the organization of a typical IS21 family member. Red boxes, terminal IRs; yellow boxes, ORFs (IstA is the Tpase, and IstB is a regulatory protein). The overlap between these ORFs is indicated as a possible control region where translational coupling might be required for expression of istB. The terminal repeated sequences L1, L2, R1, and R2 are indicated by small arrows. The Archaea have been color coded as follows for clarity: methanogens, blue. Bacteria are indicated in black.
|
ISC1041 (S. solfataricus MT-4) is the unique archaeal member of this family. It encodes a single ORF with 93% identity in DNA sequence with ISAba125 from Acinetobacter baumannii and is also closely related to ISPst1 and IS1394 of Pseudomonas species. Unlike other IS30 elements, ISC1041 carries only short and very imperfect IRs (18 bp.) As the genome sequence of S. solfataricus MT-4 has not been determined, very little genomic information is available for this element. However, the absence of IS30 elements in the sequenced species of Sulfolobus is compatible with the idea that ISC1041 had been laterally and recently acquired in strain MT-4.
![]() View larger version (21K): [in a new window] |
FIG. 8. IS110/IS1111 members. Shown is the phylogeny of the IS110/IS1111 family. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; Thermoplasmatales, magenta; halophiles, green; methanogens, blue. Bacteria are indicated in black.
|
One example, ISFac9, was also identified in Ferroplasma acidarmanus.
These elements form a monophyletic group that is distantly related to diverse bacterial IS110 elements from proteobacteria such as ISPpu10 (Pseudomonas putida), ISNgo3 (Neisseria gonorrhoeae), and ISIMb1 (Moraxella sp.) but also from IS elements of Thermus thermophilus such as IS1000A and IS1000B.
IS1111 subgroup. Several members of the IS1111 subgroup are also found in the methanogens: ISMma5 (M. mazei), ISMac14 (M. acetivorans), and ISMba7 and ISMba20 (both from M. barkeri). These closely related elements display some significant similarities with other bacterial IS1111 elements, mainly in the C-terminal region of the Tpase. On the other hand, ISMma5 and ISMac14 display very short and typical DRs of 3 bp, together with typical IS1111 internal IRs. Another element, ISH18, was identified in pNG500 of H. marismortui and in partial copy in M. hungatei (Mhun_0755).
![]() View larger version (41K): [in a new window] |
FIG. 9. IS256 members. Shown is the phylogeny of the IS256 family and comparison of a representative set of terminal IRs. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; Thermoplasmatales, magenta; methanogens, blue. Bacteria are indicated in black.
|
Three different family members have also been identified in the Methanosarcinales. These are ISMma16 (M. mazei and partial isocopies in M. acetivorans), ISMba9 (one copy with several termination codons and phase changes and partial copies in M. barkeri), and ISMbu6 (M. burtonii).
The Thermoplasmatales also carry three distinct members: ISFac7 and ISFac8, from Ferroplasma acidarmanus, and ISTac2, a single copy carrying several termination codons in Thermoplasma acidophilum. Short and closely related partial copies can also be found in the genome of T. volcanium (TVN0870, TVN1468).
These are all phylogenetically linked to IS256 elements of the Firmicutes (Fig. 9): IS905A (Lactococcus lactis), IS1310 (Enterococcus sp.), IS1191 (Streptococcus thermophilus), ISLdl12 (Lactococcus delbrueckii), and IS1201 (Lactobacillus helveticus). With the exception of ISFac8, the archaeal IS256 elements form a monophyletic group in the phylogeny, suggesting little or no transfer of these between archaea and bacteria.
All are characterized by relatively highly conserved IRs beginning with 5'-GG or 5'-GA and carrying blocks of conserved sequence throughout. Most archaeal members appear to generate an 8- or 9-bp DR, although no sequence similarities between these DRs are apparent.
![]() View larger version (28K): [in a new window] |
FIG. 10. IS481 members. Shown is the phylogeny of the IS481 family and comparison of a representative set of terminal IRs. The various Archaea have been color coded as follows for clarity: Thermoplasmatales, magenta; methanogens, blue; "other," orange. Bacteria are indicated in black.
|
![]() View larger version (38K): [in a new window] |
FIG. 11. IS630 members. Shown is the phylogeny of the IS6 family and comparison of a representative set of terminal IRs. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; halophiles, green; methanogens, blue; "other," orange. Bacteria are indicated in black.
|
In the case of the Euryarchaeota, ISHma2 is present in one complete copy on chromosome I of H. marismortui. A closely related element, ISH16, is found as two complete copies and two partial copies in pNG500 of H. marismortui and one partial copy in Halobacterium sp. It encodes a single polypeptide, has IRs of 25 bp, and generates a typical TA dinucleotide DR. A more distant but partial element (rrnAC1575) is also present on chromosome I of H. marismortui.
We have identified 11 distinguishable IS630 family members in the Methanosarcina group: ISMac13, ISMac17, ISMac18, ISMma6 ISMma8, ISMma9, ISMma10, ISMma17, ISMba3, ISMba10, and ISMth1. All except ISMac13 and ISMac17 carry two ORFs. ISMma6 (M. mazei) is related to the Sulfolobus and halophilic elements. Several of these elements are mutated and presumably nonfunctional. The single complete ISMac13 (M. acetivorans) copy is interrupted by insertion of a 1,695-bp sequence present six times elsewhere in the genome (see "Compound transposons, bits, and pieces," below). ISMac13-related MITEs can also be identified (see "MITES, MICs, and solo IRs," below). In addition to a single complete copy in M. mazei, six solo 22-bp IRs of ISMma8 are present in M. barkeri. One of the two ISMma17 copies is interrupted by insertion of ISMma11. All three copies of ISMac17 include a stop codon (TAG) at the same location in the Tpase ORF, while that of the single ISMba3 copy is degenerate and distributed over three ORFs (but see "Lost in translation," above).
The presence of two ORFs in ISMac18, ISMma6, ISMma8, ISMma9, ISMma10, ISMma17, ISMba10, and ISMth1 raises the possibility that expression involves translational or transcriptional frameshifting. Indeed, ISMth1 (Methanosaeta thermophila) carries an extended stretch of 21 A's, representing a potential frameshifting site.
Finally, Archaeoglobus fulgidus carries two IS630 family members, ISA1083-1 ISA1083-2. We were unable to identify terminal IRs for these elements.
We also note that TVN1411 of T. volcanium and PTO1017, PTO0855, and PTO1049 of Picrophilus torridus encode partial Tpases with some similarities to that of IS630. Several examples from uncultured archaea have also been identified (Table 2 and www-is.biotoul.fr).
Phylogenetic analyses (Fig. 11) show that there are two subgroups and that the archaeal and bacterial elements are interspersed in both, suggesting multiple transfer events between these two domains.
-Proteobacteria and Bacteroidetes species. The two from Bacteroidetes form a distinct subgroup. Members are characterized by lengths of between 950 and 1,200 bp, with similar terminal IRs of between 18 and 35 bp, generally beginning with 5'-ACCC; DRs of 7 bp or 9 bp; and a single ORF capable of specifying a protein of between 271 and 313 aa, with a possible DDE motif lacking the conserved downstream K/R residue.
![]() View larger version (43K): [in a new window] |
FIG. 12. IS982 members. Shown is a comparison of a representative set of terminal IRs of the IS982 family. The archaeal IS is indicated in orange. Bacterial ISs are shown in black.
|
X174 gpA protein than to the plasmid Rep protein family. Similar types of transposon have been identified in eukaryotic genomes (40). A single IS91 family element can be found in an archaeal genome, ISMbu9 in M. burtonii.
![]() View larger version (31K): [in a new window] |
FIG. 13. IS200/IS605/IS607 members. The top panel shows the organization of different members of this group. The direction of gene expression is indicated by the arrows. Yellow, tyrosine Tpase TnpA1; blue arrowed boxes, serine Tpase TnpA2; orange arrowed boxes, TnpB frame of unknown function. The left and right ends of the transposons containing TnpA1 are shown in magenta and blue, respectively. These are not IRs but include potential secondary structures. (A) Phylogeny based on orfB of the IS200/IS605/IS607 family. (B) Phylogeny based on orfA of the IS607 family (orfA1). (C) Phylogeny based on orfA of the IS605 family (orfA2). IS608 elements are underlined, single orfB elements are indicated between brackets, and the asterisk indicates the mosaic construction of the elements of this family (see the text). The various Archaea have been color coded as follows for clarity: Sulfolobales, red; Thermoplasmatales, magenta; halophiles, green; methanogens, blue; "other," orange. Bacteria are indicated in black.
|
IS200 subgroup. As in the Bacteria, isolated copies of tnpA1 could be observed but, except for a partial copy in N. pharaonis (NP4630A), are limited to the methanogens (ISMma21, ISMba16, and ISMba18), and all have corresponding MITE derivatives in their host genomes. We have yet to search for and identify equivalent isolated copies of tnpA2.
IS605-related elements. Members of the IS605 subgroup carry a copy of tnpA1 together with tnpB. Three different members were identified in the Sulfolobales, ISC1476 (S. solfataricus), ISSto1 (present in S. tokodaii and as partial copies in S. acidocaldarius), and ISSis2 (S. islandicus plasmid pHVE14). A recent publication reporting the structure of a TnpA1 protein from S. solfataricus implied that the corresponding gene was an isolated copy (46), but this element is associated with an overlapping tnpB annotated as ISC1476 (8). MITE derivatives of ISSto1, including both left and right ends with respective potential secondary structures, are also present in S. tokodaii.
The halobacteria carry four distinct ISs: ISH12 and ISH1-8, "iso" copies of the same elements (Halobacterium sp. chromosome and plasmids pNRC100 and pNRC200 and H. marismortui plasmid pNG300); ISH22 (Halobacterium sp. and N. pharaonis plasmid pL131); ISHma7 (chromosome I of H. marismortui); and ISHma12 (H. marismortui plasmid pNG500).
Four different members were also identified in the methanogens: ISMac7 (M. acetivorans), ISMma19 and ISMma22 (M. mazei), and ISMba17 (M. barkeri chromosome and plasmid 1). MITE derivatives of ISMma19 are also present in the M. mazei genome.
Two ISs are present in the Thermoplasmatales, ISTac1 (T. acidophilum) and ISTvo5 (T. volcanium), and additional partial copies are found in T. volcanium (TVN0750), T. kodakarensis (TK0931/32), M. barkeri (Mbar_A2836), and N. pharaonis (NP3908A/10A and NP4810A/12A).
IS607-related elements. Several archaeal elements encoding two overlapping ORFs can be assigned to the IS607 subgroup on the basis of the nature of orfA. The Sulfolobus genomes carry six such elements: ISC1904 and ISC1913, from S. solfataricus; ISC1926 from Sulfolobus sp. strain L00 11; and ISSto11, ISSto12, and ISSto13, from S. tokodaii. ISC1926 is closely related to ISC1913. An additional IS, IS1921, was also identified some time ago in the sulfolobale Acidianus ambivalens (A. ambivalens = Desulfurolobus ambivalens). MITE derivatives of ISSto12 and ISSto13 are also present in the S. tokodaii genome.
The pyrococcal group contains a single complete element present in the genomes of T. kodakarensis (ISTko1) and P. furiosus (ISPfu4). Partial ISs are also present in the T. kodakarensis genome (TK1841/1842), P. furiosus (PF1985/86), and P. abyssi (PAB2076/2077).
T. volcanium carries ISTvo1, whose TnpA2 shows only weak similarities with bacterial IS607 TnpA2, but TnpB shows extensive similarities with IS605 sequences.
Finally, a partial IS is found in M. jannaschii (MG0012m/14) and in S. acidocaldarius (Saci_2022/23).
Single orfB elements. It is not yet clear even in the Bacteria that isolated copies of orfB, lacking the Tpase encoded by orfA1 or orfA2, are active in transposition or whether their transposition can be activated in trans by the Tpase of related IS200/IS605 elements in the same genome.
Most archaeal genomes carry isolated copies of orfB. These include S. solfataricus (ISC1316); S. tokodaii, S. acidocaldarius, and S. islandicus plasmids pEF9 and pHVE14; H. marismortui chromosome I and plasmids pNG300 and pNG500; Halobacterium sp. chromosome and pNRC200; N. pharaonis chromosome and pL131; M. kandleri; M. jannaschii; M. mazei; M. burtonii; M. barkeri chromosome and plasmid 1; M. stadtmanae; T. acidophilum; T. volcanium; P. furiosus; P. horikoshii; P. abyssi; and T. kodakarensis. At least some of these isolated orfB copies are flanked by short DNA regions including the expected secondary structural features necessary for mobilization. In addition, several copies of individual ends, not associated with either orfA or orfB, can be detected.
In Sulfolobus, the 12 complete but isolated TnpB genes are flanked by regions exhibiting potential secondary structures similar to those observed for other members of the family carrying both TnpA and TnpB. These have been called ISC1316. The left end, defined as the end upstream from the TnpB translation initiation signals, exhibits a short AT-rich sequence consistent with target sequences identified for other members of the family. The right end terminates with TCAC (compared to TCAA found with IS608). These extremities differ in sequence from the complete ISs (with both TnpA and TnpB) in the same genome, suggesting that they are probably not mobilizable by the TnpA genes encoded by these complete IS copies. A single copy of a left end is also present in this genome.
Phylogenetic distribution. To determine the relationship within this IS group, we have analyzed the phylogeny of orfB, which is universally present in archaeal IS605/IS200 elements (Fig. 13A), and two distinct phylogenies for each orfA (Fig. 13B and C).
While orfB is universally present in archaeal IS605 elements, the distribution of the two nonhomologous orfA frames appears complex. Phylogenetic analysis of orfA1 (IS605 [Fig. 13B]) and orfA2 (IS607 [Fig. 13C]) and of the combined orfB frames from both groups was used to assess the relationship between different members of this complex group. In the orfB phylogeny (Fig. 13A), single orf elements of both IS605 and IS607 subgroups are interspersed and many bacterial sequences are intermingled with the archaeal sequences, suggesting a complex evolutionary history for orfB. The phylogeny of orfA1 shows that archaeal and bacterial IS605 elements are intermixed (Fig. 13B), suggesting several events of gene transfer between archaea and bacteria. On the other hand, orfA2 of archaeal IS607 is monophyletic (Fig. 13C), favoring a hypothesis of vertical transmission of these genes. Taken together, these observations could indicate the existence of diverse recombination events between divergent IS copies, leading to a mosaic construction of such elements. For example, replacement of an orfA of IS607 by an orfA of IS605, and vice versa, by nonhomologous recombination or with microhomologies would appear to have occurred frequently. As several highly similar orfB genes can be found alone or with orfA, it is likely that loss and gain of an orfA gene has occurred.
|
|
|---|
320 aa) and a short ORF (47 to 74 aa) of unknown function. These are arranged in a nonoverlapping divergent configuration, with the small ORF located upstream of the Tpase. In addition, the small ORF can be found in at least four isolated copies: one in T. volcanium (TVN1041) and M. jannaschii (MJ0362) and two in Methanococcus maripaludis (MMP0468 and MMP0751). The first element discovered in this family was ISA1214, from A. fulgidus (six complete copies: five are identical and the other is more distantly related [see reference 8 and www-is.biotoul.fr). It is closely related to ISFac3 of F. acidarmanus and to an element in T. volcanium, ISTvo2. An element was also identified in the genome of S. solfataricus and named ISC1043 (12). Reannotation has shown that the Tpase of the single complete copy of this element carries many termination codons. Seven partial copies are also found in this genome. Alignment of the Tpase of these five elements did not clearly identify a catalytic DDE motif, and it is possible that these Tpases use a different chemistry for transposition. ISM1 group. Members of the ISM1 group are slightly longer (by 100 to 200 bp) than bacterial ISL3 family members. This group includes ISM1 itself, from Methanobrevibacter smithii; ISMst1 (Methanosphaera stadtmanae); ISMbu2 (M. burtonii); ISMma11 (M. mazei and related MITEs); ISMac19 (M. acetivorans; full and partial copies and one copy split into three segments by independent insertions of ISMac5 and ISMac9); and ISMba4 (M. barkeri; one complete copy containing many in-phase stop codons [but see "Lost in translation," above], partial copies, and related MITEs). ISM1 has long IRs of 34 bp beginning with 5'-G and also generates DRs of 8 or 9 bp, as do the canonical ISL3 elements.
IS1595 group. The second, and more distant, group (IS1595) is represented by six archaeal ISs. Four are complete while two are present as partial copies. Two closely related elements are present in Halobacterium: ISH4 (one complete chromosomal copy and one in pNRC100, 1,004 bp, IR of 23/29 bp) and ISH50 (one complete copy in H. salinarium). ISHma4 is present in one copy in H. marismortui chromosome I, one complete copy in pNG400, and a partial copy in chromosome II. ISNph2 is present as one copy in the N. pharaonis chromosome and one in pL131.
These elements are approximately 1,000 bp long, encode a unique polypeptide bordered by IRs of 25 to 29 bp, and are flanked by DRs of 8 bp with no clear similarities. The terminal IRs do not appear to be related to those of the bacterial ISL3 family. These two elements have some similarities with a very short ORF present in a single copy in the genomes of P. horikoshii (PH1854) and P. abyssi (PAB2064). No IRs and DRs could be found associated with these two Pyrococcus elements, and they could therefore represent degenerate truncated copies of ISs belonging to this family. This family contains many bacterial relatives, notably IS1595 from Xanthomonas oryzae (L. Gagnevin and P. Siguier, unpublished data).
These elements include ISH10 and ISH10B in Halobacterium sp.; ISMac8 (M. acetivorans); ISMbu5 (Methanococcoides burtonii); ISMhu3 (M. hungatei); a second but partial copy (Mhun1220) in M. hungatei; ISMma13, ISMma14, and ISMma15 (M. mazei); ISArch7 (from an uncultured archaeon); and a partial copy from T. volcanium (TVN0684). All ISMma13, ISMma14, and ISMma15 copies include a TAG termination codon within the Tpase gene (20) (see "Lost in translation," above). ISMhu3 and ISArch7 also carry the small ORF found in the ISA1214 family (see "ISA1214-related elements," above). In these cases it is in the same orientation upstream of, and partially overlapping, the Tpase gene. These constitute a new IS66 group whose members are significantly shorter and which carry a single Tpase ORF.
We have identified eight distinct archaeal members, at present restricted to the methanogens: ISMac1, ISMac2, and ISMac20 (M. acetivorans); ISMma2 (M. mazei); ISMhu1 and ISMhu2 (M. hungatei); and ISArch1 and ISArch2 (uncultured archaeon). All except ISMac20 generate a DR of 4 to 5 bp, while ISMac20 inserts into a palindromic target sequence.
|
|
|---|
MITEs are generally thought to derive from ISs that specify DDE Tpases and are composed of flanking terminal IRs but no interstitial Tpase gene. They range in size from less than 100 bp to more than 300 bp and do not carry other ORFs. MICs (mobile insertion cassettes) are similar to MITEs but carry passenger genes unrelated to transposition (19). MITEs and MICs are considered to be nonautonomous transposable elements, mobilizable in trans by Tpases of full-length genomic copies of the parent transposon.
In eukaryotes, many MITEs are related to the Tc/mariner family elements, as judged both by the similarity between their IRs and by their target site duplication. The Tc/mariner family is distantly related to the bacterial IS630 family, and MITEs derived from IS630 were the first bacterial examples to be described (10, 55). Other eukaryotic MITEs are related to other DNA transposons, such as PIF/Harbinger (itself related to bacterial IS5 elements), and probably to members of the hAT, CACTA, and Mutator elements (27). MITEs showing similarities to one of several of the IS families have now been identified in bacteria and in several archaeal genomes, including A. pernix, S. solfataricus and S. tokodaii, M. jannaschii, M. mazei, M. acetivorans, and M. barkeri (8, 9, 71; also this work). However, their transposition activities have yet to be analyzed in any detail. Transposition activity is implied from observations showing that they may be present in some but not all copies of a given multicopy IS, and sometimes a given element has been identified in several different ISs. Moreover, the insertions within an IS have permitted determination of the length of target repeat generated by comparing empty and occupied insertion sites (71). A single transposition event involving an S. solfataricus MITE has recently been observed (6).
The most numerous archaeal MITEs (Table 3) are observed in the Sulfolobales and Methanosarcinales. A systematic study of the Sulfolobales revealed relatively high numbers in both S. solfataricus and S. tokodaii but not in S. acidocaldarius (8, 9, 71). They were divided into several classes (called SM1 to SM6). SM5 and SM6 appeared to be limited to S. tokodaii, and SM4 appeared to be limited to S. solfataricus. For the Methanosarcinales, significant numbers are found in M. acetivorans and M. barkeri but M. mazei contains many fewer full copies.
The MITE families are described below according to their probable "parental" IS family. Where not stated, they were identified by simple BLAST analysis using the supposed parental IS as a seed sequence. Other groups were not identifiable in this way either because no entire parental IS copy was present in the database or because the IRs are degenerate. In these cases the sequences were initially identified as repeated sequences within the genome (8). The relationship of these with the parental IS is therefore more tenuous. We have not attempted to identify others of this type.
IS1. Two larger elements (315 and 317 bp) with IR ends identical to those of ISSto7 were identified in S. tokodaii. In addition, two groups of MITEs, SM4 and SM7 in S. solfataricus, are related to IS1. SM4 (147 to 168 bp) has 27-bp IRs, similar in length to IS1 IRs, which are 49% identical to those of ISC1173 and are flanked by 8-bp target repeats (8). The second group, SM7 (330 bp) has only 13-bp IRs, significantly shorter than those of ISC1173, with >90% identity but with no obvious DRs (M. F. White, P. Redder, and R. A. Garrett, personal communication). A third group, SM6 (127 bp), related to ISSto9 (ISC794) is present in S. tokodaii with a high degree of identity (>95%).
IS4. The methanogens carry MITEs related to two ISs of the IS4 family: ISMba11 and ISMba12. Those related to ISMba11 are 241 bp in length and are found as complete and partial copies (with the left or right IR deleted) in M. barkeri, M. mazei, and M. acetivorans. Those related to ISMba12, found in M. barkeri, are 278 bp in length.
Among the Halobiales, we have identified IS4-related MITEs only in H. marismortui. These are related to ISH20 and are present in single copies of 277 and 287 bp.
IS5. Five groups of MITEs related to ISMac11 are found in M. acetivorans. These all have similar lengths (130 to 131 bp), but their sequences are clearly different and form five distinguishable clusters. A limited number of MITEs related to ISMba5 in M. barkeri (two copies of 131 bp and one of 152 bp) and to ISMbu10 in M. burtonii (185 bp) have also been found.
The S. solfataricus genome (71) carries 40 copies of SM3. These are between 127 and 139 bp long and are divided into two subgroups (SM3A and SM3B) based on sequence identities (75 to 97% within each group and only 60% between the groups). These appear to be distantly related to ISC1058 and include a DR of 9 bp, although they are not revealed by a simple BLAST analysis. The S. tokodaii genome also carries four copies of a MITE of 329 to 332 bp related to ISSto3.
Halobacterium sp. also carries MITE derivatives of 180 bp related to ISH9.
IS6. Eight MITE-like sequences with lengths between 358 and 360 bp were identified in M. jannaschii. These represent internal deletions of ISMja1 (703 bp) between positions 263 and 604. IS6-related MITEs, SM5, have also been described in the S. tokodaii genome with a high level of conservation (92 to 100%). They show limited identity to ISC774, which is too low to be detected by BLAST analysis (8).
IS200/IS605. Note that IS200/IS605 are the only MITEs yet identified from non-DDE-type ISs. All archaeal ISs carrying only orfA1 have MITE derivatives. These are ISMma21 (248 bp; M. mazei) ISMba16 (three types: 272 bp, 248 bp with 87% identity to ISMma21 and 195 bp in M. barkeri), ISMba18 (four types: 209 bp in M. barkeri, M. acetivorans, and M. mazei; 55 bp in M. barkeri and M. acetivorans; 174 bp in M. barkeri; and 178 bp in M. barkeri).
Some MITEs derived from IS605-related ISs are also observed in S. tokodaii, related to ISSto1 (356 to 357 bp, equivalent to SMA [8]), and in M. mazei, related to ISMma19 (206 bp).
IS607-related elements also give rise to MITEs. These are derived from ISSto12 (274 bp) and ISSto13 (266 to 271 bp), both in S. tokodaii.
IS630. Fifteen complete copies of a 172-bp MITE related to ISMac13 were identified in the M. acetivorans genome. In addition, a group of MITEs in S. solfataricus and S. tokodaii, SM1, whose members are >95% conserved and between 79 and 80 bp long, are distantly related to ISC1048.
ISM1. Several MITEs belonging to the ISM1 subgroup and related to ISMma11 (199 bp) and ISMba4 (193 to 302 bp) are at present limited to the methanogens.
ISC1217. An additional class of MITEs found in both S. solfataricus and S. tokodaii is related to ISC1217 and ISSto10. Members of this group, SM2, are between 183 and 186 bp and present in 25 and 36 copies, respectively, with >95% conservation. A single longer MITE of 295 bp, related to ISSto10, is also present in S. tokodaii.
Nonclassified MITEs. Multiple short sequences resembling MITEs (called MJREs) have been found in M. jannaschii (85). These were divided into three groups based on sequence similarities. There are a total 141 copies: MJRE1 has 59 copies and is 108 ± 7 bp long with IRs of about 17 bp, MJRE2 has 69 copies and is 96 ± 3 bp long with IRs of about 16 bp, and MJRE3 has 13 copies and is 101 ± 4 bp long with IRs of 15 bp. No particular regional bias was observed in the M. jannaschii genome. These sequences were not observed in other Archaea, and no corresponding complete IS elements with similar ends were found.
|
|
|---|
Another potential compound transposon was identified in P. furiosus. This is composed of a 16-kb region carrying an actively transcribed maltose and trehalose ABC transport system flanked by two insertion sequences from the IS6 family (now called ISPfu1) (21). It is absent from both P. abyssi and P. horikoshii. Interestingly, this region was also identified in Thermococcus litoralis, where it was flanked by very short (12- and 18-bp) sequences that might be IS remnants.
|
|
|---|
Again, as in the case of Bacteria, the number of IS copies can vary greatly between chromosomes of different species. They are particularly numerous in Methanosarcinales (2.6 to 4.4%), some halophiles (3.5% for chromosome II of H. marismortui), and some Sulfolobus sp. (S. solfataricus). It is striking to observe the large variation in IS numbers in closely related genomes. This has been previously underlined for Sulfolobus (many ISs are present in S. solfataricus but no entire copies at all in S. acidocaldarius [8]). A survey of the ISs in S. solfataricus also gives the overall impression that they have undergone a high rate of mutation by nested insertions and deletions. Large differences in IS load are also seen in Pyrococcus (from 0.16% to 1.65%) and in Thermoplasma (from 0.48% to 1.72%).
IS-rich regions may result from horizontal acquisition of blocks of DNA or reflect IS exclusion from other regions, differential IS extinction (90), or IS insertion specificity. For example, In Bacteria, Tn7 shows regional specificity in both of its parallel transposition pathways (61), and IS903 shows distinct regional preferences that are dependent on the nucleoid protein H-NS (86). Thus, the interpretation of regional specificity is likely to reflect a complex interaction of different influences.
Genome analysis of P. furiosus, P. abyssi, and P. horikoshii showed that P. furiosus carries a set of 29 complete ISs and 9 truncated copies. It has been reported that P. abyssi and P. horikoshii lack complete IS elements (21); however, we have now been able to identify at least three different elements belonging to the relatively newly defined IS607 and IS1595 families. Additional studies with Pyrococcus (33) have shown that the major differences between the P. furiosus and P. woesei genomes appear to be due to gene clusters present in P. furiosus and lacking in P. woesei. These clusters include one (PF1737 to PF1751) involved in maltose/trehalose metabolism and flanked by ISPfu1. The authors suggest that the MalI cluster in P. furiosus is a composite transposon that undergoes replicative transposition.
Analysis of diverse Sulfolobus species from Siberia and the western United States, with 5-fluoroorate resistance used as a screen, led to the identification of seven additional ISs distinct from those identified in the sequenced Sulfolobus genomes (6). This indicates the existence of large regional disparity in IS content and suggests, moreover, that a large fraction of the true diversity of ISs in Archaea remains to be discovered.
Another study analyzed IS6 elements in a collection of 36 isolates of Pyrococcus from the Pacific Ocean and the Mediterranean Sea. It revealed that IS6 derivatives are present in almost all of the analyzed strains. These either are isoforms or are very closely related to IS-pfu1 (= ISPfu2 in ISfinder standard nomenclature), IS-pfu2 (= ISPfu1), and IS-pfu3 (= ISPfu5) (33). The authors suggest that such ISs could play an important role in genetic drift, leading to geographic diversification of hyperthermophilic archaea (24).
|
|
|---|
![]() View larger version (35K): [in a new window] |
FIG. 14. Possible evolutionary scenario of ISs in archaea. A phylogeny of the Archaea is represented, and for each IS family we indicate the most parsimonious scenario of IS gain by mapping acquisition of elements at each node. The distribution by IS families is also indicated for each taxon; complete and partial elements are indicated in black and in gray, respectively. The various Archaea have been color-coded as follows for clarity: Sulfolobales, red; Thermoplasmatales, magenta; halophiles, green; methanogens, blue; "other," orange.
|
Horizontal transfer between Bacteria and Archaea has almost certainly occurred. For example, the incongruencies in the phylogenies of orfA and orfB of the IS605 family (Fig. 13) suggest a mosaic evolution of these elements (multiple events of replacement of one gene by another). In addition, several bacterial ISs belonging to the IS6 and IS605 families are more similar to archaeal ISs than to other bacterial ISs. For example, in the IS605 family, the TnpB proteins of ISBce3 (Bacillus cereus) and ISTma1 (Thermotoga maritima) are related to archaeal elements. This observation suggests that archaeal genomes may occasionally be a source of new ISs in bacterial genomes (and vice versa).
Due to the "erratic" phylogenetic distribution of the remaining families, it is likely that the large majority of IS families arose after the last common ancestor of the Archaea. Many ISs have a sporadic distribution in archaea, with only one or few members. These elements are more likely to have been recently acquired by archaea and include IS3 and ISL3 in Thermoplasma; IS256 in Thermoplasmatales; IS91, IS1, ISL3, IS256, and IS21 in Methanosarcinales; and IS982 in Pyrococcus. For the mesophilic archaea, such as the Methanosarcinales, this is consistent with the presence of numerous Bacteria-related genes in their genomes (20). At present, there are too few sequenced genomes of hyperthermophilic bacteria available to draw such inferences for the hyperthermophilic archaea.
An example of a possible massive invasion of ISs in the Archaea comes from the genome of the Sulfolobus species. The ancestors of Sulfolobus could have been invaded by ISs from the IS1, IS4, IS5, IS30, IS110, IS256, and IS630 families (Fig. 14). S. solfataricus has remained largely colonized (82), whereas S. tokodaii may have lost several of these elements (IS5, IS256, and IS630), as it retains a small IS subset, and S. acidocaldarius has lost all complete elements and retains only short and inactive copies. This is reminiscent of the IS distribution in bacteria, where IS expansion is often observed and for which it has been suggested that periodic extinction of transposable elements occurs (90).
Taken together, these results suggest that a small set of ISs existed in the last common ancestor of the Archaea and that subsequent lateral acquisition of new bacterial IS families has enriched archaeal phyla. It is interesting to note that Eukarya-type transposons were not detected either in Archaea or in Bacteria. This may indicate that lateral gene transfers of ISs between prokaryotes and eukaryotes did not occur before the emergence of the three kingdoms. However, Eukarya contains numerous mobile genetic elements that transpose by using DDE enzymes or tyrosine recombinases. It is tempting to speculate that these archaeal, bacterial, and eukaryal enzymes shared common ancestors but that eukaryotic enzymes have diverged to such an extent that homology is now undetectable from the sole primary sequences. This result suggests that the last universal common ancestor most likely had DNA transposons that transposed by using DDE or tyrosine enzymes. Alternatively, eukaryotic DNA transposons may have originated from mitochondria or chloroplasts and subsequently invaded the eukaryotic nucleus. Such a scenario has also been proposed for the origin of spliceosomal introns (35) in Eukarya by capture of mitochondrial group II introns (11, 51). If Eukarya are the result of the fusion of an archaeon and a bacterium (49, 51, 75), eukaryotic transposons could derive from either bacterial or archaeal ISs initially present in the two partners. An alternative hypothesis is that chemically there are a limited number of ways in which DNA can undergo the cleavages and strand transfers required for transposition and that at least some of the similarities observed result from convergent evolution. We note, however, that eukaryotic viruses infecting algae and amoebae surprisingly carry Bacteria- and Archaea-type ISs belonging to the IS4 and IS605/IS607/IS608 families (27a). These ISs have most likely been acquired by these viruses along with other prokaryotic genes. Similar types of virus, providing a genetic interface between prokaryotes and eukaryotes, may have been responsible for introducing prokaryote-type transposons during the early steps of eukaryotic evolution. Such transposons may then have diverged so extensively during the evolutionary course of the Eukarya that only the catalytic sites remain conserved. Finally, we should also mention that the archaeal Acidianus two-tailed virus carries four ISs belonging to the IS605/IS607/IS608 family (69). This observation further suggests that viruses would be an efficient system for laterally transmitting ISs between species.
|
|
|---|
|
|
|---|
Intramural funding was provided by the Centre National de la Recherche Scientifique (CNRS) (France), and extramural funding was provided by European contract LSHM-CT-2005-019023. J.F. was supported by the CNRS and by the European contract.
F. Filée and P. Siguier contributed equally to this work. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»