MMBR Free Medline Searching
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Filée, J.
Right arrow Articles by Chandler, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Filée, J.
Right arrow Articles by Chandler, M.
Microbiology and Molecular Biology Reviews, March 2007, p. 121-157, Vol. 71, No. 1
1092-2172/07/$08.00+0     doi:10.1128/MMBR.00031-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Insertion Sequence Diversity in Archaea

J. Filée,{dagger} P. Siguier,{dagger} and M. Chandler**

Laboratoire de Microbiologie et Génétique Moléculaires (UMR5100 CNRS), Campus Université Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse Cedex, France

SUMMARY
INTRODUCTION
NOMENCLATURE
IS DISTRIBUTION IN ARCHAEA COMPARED TO BACTERIA AND EUKARYA
TRANSPOSITION IN THE ARCHAEA: HISTORICAL PERSPECTIVE
    Spontaneous Mutation in the Extreme Halophiles
        ISH1.
        ISH2.
        ISH3/ISH27/ISH51.
        ISH8/ISH26.
        ISH11.
        ISH23/ISH50.
        ISH24.
        ISH25.
        ISH28.
    Transposition in Sulfolobus
    Transposition in Other Archaea
REGULATION OF TRANSPOSITION
    Lost in Transcription: ncRNAs in S. solfataricus
    Lost in Translation: Translational Readthrough in Methanosarcina?
IS FAMILIES AND THE NATURE OF THE CATALYTIC SITE
    The DDE Enzymes
    The Serine Enzymes
    The Relaxase Enzymes
IS FAMILIES IN THE ARCHAEAL GENOMES
    IS1
    IS3
    IS4
        ISH8 subgroup.
        IS1634 subgroup.
        ISH3 subgroup.
        IS701 subgroup.
    IS5
        IS903 subgroup.
        IS5 subgroup.
        IS1031 subgroup.
        IS427 subgroup.
        The halophilic subgroup ISH1.
        The Sulfolobus subgroup.
        IS5 orphans.
    IS6
    IS21
    IS30
    IS110
        IS110 subgroup.
        IS1111 subgroup.
    IS256
    IS481
    IS630
    IS982
    ISL3
    Non-DDE Transposons: the IS91 Group
    Non-DDE Transposons: the IS200/IS605/IS607 Group
        IS200 subgroup.
        IS605-related elements.
        IS607-related elements.
        Single orfB elements.
        Phylogenetic distribution.
EMERGING GROUPS, ORPHANS, WAIFS, AND STRAYS
    ISA1214-Related Elements
    ISL3-Related Elements
        ISM1 group.
        IS1595 group.
    IS66-Related Elements: the New Subgroup ISBst12
    IS1182
    ISH6
    ISC1217
MITEs, MICs, AND SOLO IRs
    MITEs
        IS1.
        IS4.
        IS5.
        IS6.
        IS200/IS605.
        IS630.
        ISM1.
        ISC1217.
        Nonclassified MITEs.
    Solo IRs
COMPOUND TRANSPOSONS, BITS, AND PIECES
    Compound Transposons
    Uncharacterizable IS-Like Sequences
    Concatenated ISs
GENOME COMPARISONS: IS DISTRIBUTION, ABUNDANCE, AND GEOGRAPHICAL VARIATIONS
    Intergenome Distribution and Abundance
    Intragenome Distribution
    Large Genomic Rearrangements
    Geographical Variations
EVOLUTIONARY HISTORY OF ISs IN ARCHAEA: A POSSIBLE SCENARIO
CONCLUSIONS
ADDENDUM IN PROOF
ACKNOWLEDGMENTS
REFERENCES

   SUMMARY
 Top
 Next
 References
 
Insertion sequences (ISs) can constitute an important component of prokaryotic (bacterial and archaeal) genomes. Over 1,500 individual ISs are included at present in the ISfinder database (www-is.biotoul.fr), and these represent only a small portion of those in the available prokaryotic genome sequences and those that are being discovered in ongoing sequencing projects. In spite of this diversity, the transposition mechanisms of only a few of these ubiquitous mobile genetic elements are known, and these are all restricted to those present in bacteria. This review presents an overview of ISs within the archaeal kingdom. We first provide a general historical summary of the known properties and behaviors of archaeal ISs. We then consider how transposition might be regulated in some cases by small antisense RNAs and by termination codon readthrough. This is followed by an extensive analysis of the IS content in the sequenced archaeal genomes present in the public databases as of June 2006, which provides an overview of their distribution among the major archaeal classes and species. We show that the diversity of archaeal ISs is very great and comparable to that of bacteria. We compare archaeal ISs to known bacterial ISs and find that most are clearly members of families first described for bacteria. Several cases of lateral gene transfer between bacteria and archaea are clearly documented, notably for methanogenic archaea. However, several archaeal ISs do not have bacterial equivalents but can be grouped into Archaea-specific groups or families. In addition to ISs, we identify and list nonautonomous IS-derived elements, such as miniature inverted-repeat transposable elements. Finally, we present a possible scenario for the evolutionary history of ISs in the Archaea.


   INTRODUCTION
 Top
 Previous
 Next
 References
 
Archaea, members of the third domain of life, are prokaryotic organisms that can be divided into two major groups: the Crenarchaeota and the Euryarchaeota. This division, based on small-subunit rRNA phylogeny, is also strongly supported by comparative genomics. A number of genes present in euryarchaeal genomes are missing altogether in crenarchaeota and vice versa (28). Recent studies have suggested the existence of a third phylum: the Nanoarchaeota (37). However it has been suggested that Nanoarchaeota may be representatives of a quickly evolving euryarchaeal lineage (7). Many new groups of as-yet-uncultured archaea have been detected by PCR amplification of 16S rRNA from environmental samples. These include seawater, sediments, tidal flats and lakes, and the human gut and buccal cavity (76). Archaea should therefore no longer be considered simply as extremophiles (18).

Like those of the other two domains of life, the Bacteria and Eukarya, members of the prokaryotic Archaea can carry a large number and variety of transposable elements within their genomes. These are principally insertion sequences (ISs) and miniature inverted-repeat transposable elements (MITEs) (8), although at least one active composite transposon has been documented (92) and other similar structures have been identified (see "Compound transposons, bits, and pieces," below). ISs are short specific segments of DNA up to 2 kbp long. They carry one or two open reading frames (ORFs) encoding the enzyme that catalyzes their movement, the transposase (Tpase), generally (but not always) flanked by short terminal inverted repeats (IRs). IS insertion often results in the duplication of a short target sequence that flanks the insertion (direct repeat [DR]) (12). MITEs are nonautonomous ISs deleted for part or all of the Tpase ORF but retaining both ends, while composite transposons are structures in which a DNA segment is flanked by two copies of a given IS.

Little is known about the transposition behavior of the majority of these mobile genetic elements in archaea. This is certainly due to the limitation of genetic systems available for their analysis and to the extreme conditions (temperature, pressure, pH, and salinity) required for the growth of those archaea so far analyzed. Data from the available sequenced genomes suggests that, as among bacteria, the distribution of ISs is somewhat "haphazard," with certain species exhibiting very few or no IS copies while others carry many (see "Genome comparisons: IS distribution, abundance, and geographical variations," below). It is clear that the variety of archaeal ISs approximates that of bacteria rather than the limited types recognized at present in eukaryotes (8). However, apart from a survey compiled several years ago (8) before the availability of a significant number of archaeal genome sequences, no systematic and coherent comparison of archaeal and bacterial ISs is available. Since the transposition characteristics of a variety of bacterial ISs are known (14), such a comparison would provide a useful starting point for exploring transposition activity in archaea and the impact of mobile genetic elements on archaeal genome structure.


   NOMENCLATURE
 Top
 Previous
 Next
 References
 
One major task that must be confronted initially is that of nomenclature. Apart from ISs originally identified in the extreme halophiles, named ISH, the more recently identified archaeal ISs have been distinguished by their appearance in the major archaeal divisions, the Crenarchaeota (ISC) and Euryarchaeota (ISE). For these individual ISs, the distinction ISC or ISE is followed by a number corresponding to the length, in base pairs (8). This, of course, obscures the relationship between IS derivatives that differ in length by deletion or insertion of one or a few base pairs and also inflates the number of apparently different ISs.

In the present review, we provide an updated survey of archaeal IS elements and include an analysis of their distribution and of their relationship to bacterial and eukaryotic ISs. Except for certain IS names already published (principally those of the halophiles and Sulfolobales), we adhere to the system of nomenclature used at present for ISs of Bacteria, namely, the first letter of the genus, in uppercase, and the first two letters of the species name, in lowercase (12; also see www-is.biotoul.fr). This is similar to the nomenclature system used for restriction enzymes. It renders more transparent the phylogenetic relationships between highly related ISs that differ simply in overall length. These designations have been included as the principal name in the ISfinder database (www-is.biotoul.fr). Any names previously used are also included in the database as synonyms to facilitate retrieval. We assign IS names only for those where we can identify the IS ends. In all other cases, we assume that the copies are only partial, and only the identification number of the corresponding transposase ORF is given.

At the time of writing, the public databases included the entire sequences of 28 archaeal genomes (23 euryarchaeotes and 5 crenarchaeotes). For operational simplicity, to avoid inundating the ISfinder database with specific names, we have adopted the use of "isoforms," as first suggested by Ohtsubo et al. (57). We (arbitrarily) define isoforms as being sequences that are 98% similar at the protein level and/or more than 95% similar at the DNA level. Moreover, we also point out those previously published ISs that were given different names according to length but that are effectively identical to, or are isoforms of, other ISs. We have not yet systematically addressed the extensive accumulating data from environmental sequencing projects, although certain ISs have been identified and included in ISfinder.

IS elements were identified by manual reiterative BLAST analysis using an E value cutoff of 10–3. Tpase alignments were performed with CLUSTALX and refined by eye. To infer phylogenetic relationships, we performed preliminary analyses to assess the different subgroups of large families by neighbor joining using MUST.3.0 (68). TribeMCL (23) was also applied to confirm the clustering of all ISs into the various families and subgroups. Sequences belonging to different subgroups of a single family were then treated separately by maximum likelihood, using PROML (Phylip, version 3.6 [26]) with the Jones-Taylor-Thornton amino acid substitution matrix.


   IS DISTRIBUTION IN ARCHAEA COMPARED TO BACTERIA AND EUKARYA
 Top
 Previous
 Next
 References
 
An overview of the results of database searches is presented in Fig. 1 and Table 1. IS elements are classified into families according to genetic organization, the relationship between their Tpases, and the sequences of their ends (12). The division into superfamilies, families, groups, and subgroups is relatively subjective and will change with time. A family can be defined as a closely related group with strong conservation of the catalytic site (identical spacing between the key residues and the presence of additional conserved residues within the catalytic domain; see "IS families and the nature of the catalytic site," below), conservation of organization and expression signals (e.g., frameshifting), and a clear relationship between the IRs over their entire length. Examples of such large and closely knit families include IS3, IS21, IS30, IS481, and IS630. Not all IS groups are so coherent. Two such diverse groupings have been identified in prokaryotes: the IS4 and IS5 superfamilies (12). These are growing considerably, and the relationships within these superfamilies continue to evolve as additional members are identified. IS630 has also been included in a less well defined grouping with eukaryotic elements such as mariner and Tc. This has been referred to as a superfamily.


Figure 1
View larger version (49K):
[in this window]
[in a new window]

 
FIG. 1. Comparison of IS families in archaea. The figure shows the distribution of IS families among the different archaeal phyla. The tree is from NCBI (http://www.ncbi.nlm.nih.gov/sutils/genom_tree.cgi). The color code for IS families is included within the figure beneath the phylogenetic tree. Stars represent emerging groups or families.

 

View this table:
[in this window]
[in a new window]

 
TABLE 1. IS content of archaeal genomesa

 
Figure 1 shows the distribution of different IS families within the Archaea. The most striking feature here is that most of the archaeal ISs fall into families found in the Bacteria (present in the ISfinder database). Three Archaea-specific groups, ISA1214, ISC1217, and ISH6, have emerged in these studies. On the other hand, archaeal genomes lack elements from the IS1380 family and, moreover, several widespread bacterial IS families such as IS3, IS1182, IS21, IS91, IS30, and IS982 have few archaeal members. However, since the sequences of only 28 archaeal genomes were available, compared to more than 325 bacterial genomes, it is possible that the numbers of archaeal ISs from known families are underestimated. Conversely, we cannot rule out the existence of additional Archaea-specific ISs presenting limited or no obvious similarities with those from Bacteria (see "Emerging groups, orphans, waifs, and strays," below).

The distribution of ISs in archaeal genomes is very "patchy" (Fig. 1). Four phyla, comprising the Halobacteriales, Sulfolobales, Methanosarcinales, and Thermoplasmatales, monopolize more than 90% of archaeal ISs (Table 1). No ISs were identified in the Nanoarchaeota, the Desulfurococcales, the Methanomicrobiales, the Thermoproteales, or the Methanobacteriales, and only one or two families in the Methanococcales or the Methanopyrales. However, these lineages are represented by only one or two completely sequenced genomes, and this limited information may introduce some bias, as was initially the case for bacterial Mycoplasma species (www-is.biotoul.fr).

It is worth noting that archaeal ISs resemble bacterial ISs rather than those identified in eukaryotes. No elements with significant similarity to the nine currently recognized eukaryote DNA transposon superfamilies could be identified. These include notably the mariner/Tc (distantly related to the IS630 family) and the P (from Drosophila) families, which are structurally close to bacterial ISs; elements such as the CACTA or the hAT (e.g., hobo, Ac, and Tam) families (mainly recovered in plants and insects), Merlin (related to IS1016), Mutator (distantly related to IS256 family members), PIF/Harbinger (distant relatives of some IS5 family members), piggyback, and Transbib (12, 70); or to the helitrons (40), a family related to bacterial IS91 and identified in plants, fungi, and diverse animals (14). Extensive BLAST searches seeded with such sequences revealed no detectable homologies in the archaeal genomes. This is perhaps surprising in view of the fact that Archaea have important similarities to Eukarya, notably enzymes involved in DNA replication (47). Since it seems unlikely that eukaryal "ISs" were originally present in these genomes and were subsequently specifically deleted, this implies that any lateral transfer of transposable elements occurred between Bacteria and Archaea but not between Archaea and Eukarya.

In the light of the important differences between bacterial and archaeal replication systems, it is interesting to note the presence of members of the IS1, IS3, and IS256 families within archaeal genomes. Bacterial members of these families are thought to transpose by a mechanism involving a replication step to eject a circular IS transposition copy from the donor site, which then serves as a transposition intermediate (78). In the case of the IS3 family member IS911, this process has been shown to depend on the DnaG primase (22). Interestingly, each archaeal genome usually contains two types of primase: a dimeric eukaryotic-like primase (44) and a DnaG-like enzyme that shares the Toprim domain with bacterial DnaG (2).

However, recent biochemical analyses have demonstrated that the DnaG-like primase in Archaea may be involved in RNA processing and degradation rather than in DNA metabolism (25). The presence of these ISs in Archaea therefore implies that the replication step may be taken in charge by the host (Eukarya-like) replication system.


   TRANSPOSITION IN THE ARCHAEA: HISTORICAL PERSPECTIVE
 Top
 Previous
 Next
 References
 

Spontaneous Mutation in the Extreme Halophiles

One of the earliest descriptions of IS element activity in archaea stemmed from the observation of an unusually high spontaneous mutation rate in Halobacterium salinarium (previously called H. halobium). Depending on the phenotypic marker observed (gas vacuole or bacterio-opsin genes), this was found to range between 10–2 and 10–4 in an aerobically grown culture which had undergone approximately 20 generations of growth (67). In the case of the gas vacuole genes, mutation was generally associated with the insertion of additional DNA at one of two specific places. Reversion of the mutation was often accompanied by loss of the inserted DNA, a characteristic of IS mutagenesis in the Bacteria. These "pregenomic" studies were facilitated by the fact that the H. salinarium genome could be physically separated into two fractions according to AT/GC content, and that the relatively AT rich fraction carried the genes of interest often as part of plasmids (66). Much of this and further work was done with wild strains of H. salinarium carrying various plasmids or megaplasmids such as pHH1, pHH2, pGRB1, or pNCR100 (54, 66).

The exceptional genome plasticity revealed by these studies was further reinforced by experiments establishing that strains of both H. salinarium and the related Halobacterium volcanii generally carry a large number of repeated elements. These were divided into several families by Southern hybridization. The elements appeared to be highly mobile, were associated with chromosome rearrangements, and were found both clustered and dispersed over the genome (79).

A collection of repeated sequences resembling bacterial ISs was subsequently assembled in H. salinarium with either gas vacuole or plasmid-carried purple membrane genes used as targets. Several of these have been isolated more than once and have received different names. Importantly, since the majority of these ISs were isolated as novel insertions, they therefore represent active copies.

ISH1. The 1,118-bp ISH1 was isolated as an insertion into the bacteriorhodopsin (bop) gene. Its sequence revealed imperfect terminal inverted repeats of 9 bp and flanking 8-bp direct target repeats. These features are characteristic signatures of IS elements in Bacteria. The element was named ISH1 (84). The single ORF predicts a protein of 270 amino acids (aa) with a clear DDE catalytic motif (see "IS families and the nature of the catalytic site," below), relating the Tpase to those of the majority of transposable elements presently identified. Further examination (12) placed ISH1 in the rather disperse IS5 family (see "IS families in the archaeal genomes," below). Many isolates of ISH1 appeared to have inserted into the same site (5'-AGTTATTG-3') of the bop gene but could do so in both orientations. This indicates relatively high target site specificity. Southern blot analysis revealed multiple ISH1 copies, ranging from one to more than five, in different halobacterial strains (84).

Moreover, analysis of one insertion mutant revealed a single additional ISH1-specific restriction fragment compared to its wild-type parent. This increase in copy number led to the supposition that ISH1 transposes by a replicative mechanism (84).

Evidence from Northern blots also showed that ISH1 was actively transcribed in these strains with a rough correlation between RNA band intensity and IS copy number. However, in view of the numerous regulatory mechanisms adopted by ISs to limit their activity (53), this does not necessarily mean that the Tpase is produced at comparative relative levels.

ISH2. Examination of additional bop mutants revealed several other repeated sequences distinguishable by size. The most frequently observed was ISH2, only 521 bp long and carrying 19-bp terminal inverted repeats flanked by target duplications of 10 or 20 bp (17) and occasionally 11 bp (64). Although three potential ORFs were detected (ORF I, 80 codons; ORF II, 64; ORF III, 59), we have been unable to identify a typical Tpase catalytic motif (see "IS families and the nature of the catalytic site," below). The majority of insertion mutations in the bop gene were caused by the elements ISH1 and ISH2. Unlike ISH1, ISH2 showed multiple insertion sites in the gene (17).

ISH2 was present in multiple copies in various H. salinarium strains, and, more recently, four additional copies were identified in the Halobacterium plasmid pNRC100 (54). The IS is clearly capable of transposition but is probably not an autonomous transposon. However, ISH2 shares nearly perfect terminal homology (but no internal homology) with an apparently complete IS, ISH26 (ISH8; see below). ISH2 transposition may therefore be driven in trans by the ISH26 Tpase.

ISH3/ISH27/ISH51. Remarkably, 20% of H. salinarium PHH4 colonies were found to carry IS insertions into a resident pHH4 plasmid (16, 63). Among these, ISH27 was isolated as a major source of mutation. This group of ISs belongs to the IS4 family. They are 1,398 bp (ISH27-1) or 1,389 bp (ISH27-2 and ISH27-3) long and generate 5-bp target repeats (63) rather than the 3-bp repeats proposed for the identical ISH3 (16). They also include terminal IRs of 16 bp. Two ISH27-1-specific transcripts were observed in the pHH4 plasmid-carrying strain. One of these exhibited a size expected for a full ISH27 transcript (~1,200 nucleotides [nt]), while the other was significantly shorter (~650 nt). This could reflect regulation at the transcriptional or posttranscriptional level.

ISH27 is the generic name for three related ISs. Although closely related, these are not isoforms by our definition. At the nucleotide level, ISH27-1 is more similar to H. volcanii ISH51-1, ISH51-2, and ISH51-3 (88% DNA identity) than to ISH27-2 and ISH27-3 (80% identity). There are more than 20 copies of ISH51 in the H. volcanii genome (36). ISH27 was also observed to have undergone an amplification following storage of the host strain over a period of several years at 4°C (63). Further studies to determine the factors involved in this process would be interesting.

ISH8/ISH26. ISH8/ISH26 was isolated as an insertion mutation of the gvp operon (gas vesicle proteins, Vac) (31). ISH8, also a member of the IS4 family, is 1,402 bp long, carries 18-bp IRs, and generates 10-bp DRs. Its DNA sequence is 94% identical to that of ISH26. Copies of ISH8 were also found in the H. salinarium plasmid pNRC100.

A 70-kbp AT-rich island of H. salinarium was identified and proven to carry copies of ISH1, ISH2, and an IS-like sequence, ISH26, together with copies of an additional 10 repeated sequences, most of which were not characterized (62).

ISH26 was also isolated as an insertional inactivation of the bop gene. There are four ISH26 copies on pHH1 and four copies on the chromosome of H. salinarium PHH1 (65). ISH26 was described as harboring two overlapping ORFs. Although the first ORF has significant similarity with the putative Tpases of other IS4 family members (for example, 26% identity to IS231W over a 143-aa overlap), the second ORF has only very limited similarity, in the region of the conserved E residue (see "IS families and the nature of the catalytic site," below). Detailed analyses suggest, however, that the introduction of several frameshifts would significantly increase this similarity. The first ORF is very closely related to the N-terminal end of the Tpase of ISH8. Like ISH27, ISH26 copies constitute a group of related, but not identical, elements (63).

ISH11. ISH11, from H. salinarium, was observed as an insertion into plasmid pGRB1. It is 1,068 bp long, with 15-bp terminal IRs, and was flanked by 7-bp direct target repeats (43). It exhibits a single long ORF of 334 aa. ISH11 has been tentatively grouped within the IS427 cluster of the IS5 family. Two copies are present in pNRC100 of Halobacterium sp. strain NRC-1.

ISH23/ISH50. ISH23/ISH50 is one of the least-frequent causes of insertion mutations in the bop gene (64). There are two ISH23 copies in H. salinarium NRC817.

ISH23 is flanked by 29-bp imperfect IRs and by a 9-bp direct target repeat. It is very similar (but not identical) to ISH50, an IS isolated as an insertion into the Halobacterium plasmid pNRC (93). ISH50 is 996 bp long, with terminal IRs of 23/29 bp and 8-bp flanking direct target repeats. It encodes a potential 273-aa Tpase and belongs to a newly defined family containing both archaeal and bacterial members (L. Gagnevin and P. Siguier, unpublished data) (see "Emerging groups, orphans, waifs, and strays," below). The first and last 200 bp of ISH23 were found to be identical to those of ISH50 and, although ISH23 and ISH50 differ by at least two restriction sites and appear to generate either 9- or 8-bp target duplications, they are assumed be isoforms of the same IS (65).

ISH24. Another infrequent insertion into the bop gene, ISH24, is 3,000 bp long, including two terminal IRs of 14 bp, and is flanked by 7-bp direct target repeats. The sequence of this element became available subsequent to the sequencing of the megaplasmid pNRC100 of H. salinarum. It was renamed ISH7 (54). ISH7 encodes two large ORFs. The second displays some weak and local similarities with the C-terminal parts of IS4 element Tpases. No clear DDE motif in ISH24 could be detected from this partial alignment.

ISH25. The short 588-nt sequence of ISH25 is sometimes associated with ISH27 insertion, but it appears unlikely to be a simple IS, as no putative ORF can be found.

ISH28. ISH28 was also isolated from a bop mutant (62). Its nucleotide sequence was revised (91). It is 938 bp long, with 16-bp terminal IRs, and carries an ORF of 828 bp. It is flanked by 8-bp direct target repeats. The putative Tpase protein is 49% similar to that of ISH1, a member of the IS5 family.

ISH28 has also been engineered to generate composite transposons, which are efficient tools for mutagenesis of Haloarcula hispanica and other halophilic organisms (92). This element showed little target sequence specificity but was biased toward target regions with a lower G+C content. Of 20 insertions characterized, 18 generated DRs of 8 bp, while the remaining 2 had DRs of 9 bp.

Collectively, these results clearly demonstrate the major role played by transposable elements in shaping the halophilic genome.

Transposition in Sulfolobus

Although most of the earliest exploratory studies in archaeal transposition were carried out with halobacteria due to the high level of transposon-mediated genome rearrangements in this model system, other archaea have received some attention. The 2.99-Mb Sulfolobus solfataricus genome is estimated to contain nearly 350 intact mobile elements (82). An early report (1) described the serendipitous isolation of an S. solfataricus IS, ISC1041 (named according to its length), which was related to the bacterial IS30 family of elements.

Like halobacterial species, S. solfataricus also exhibits a relatively high spontaneous mutation rate (52). These studies used 5-fluoro-orate resistance as a screen for uracil auxotrophs (pyrE and pyrF). Mutations were obtained at frequencies of between 10–4 and 10–5, significantly lower than in the halobacteria but at least 10-fold higher than for other members of the Sulfolobus genus. PCR analysis of several auxotrophic mutants revealed that all carried insertions ranging from 1 to 1.4 kbp. Similar auxotrophs of the related Sulfolobus acidocaldarius failed to show such insertions. Seven S. solfataricus mutants were analyzed in more detail and proved to carry insertions. These were named according to their individual lengths, in base pairs: ISC1058 (three examples), ISC1359 (two examples), and ISC1439 (one example). One example, of 1,147 bp, was closely related to, and presumably a deletion derivative of, a 1217-bp element previously isolated as an insertion of ISC1217 (13-bp IRs, 6-bp DRs) into a ß-galactosidase gene (80). All four ISs show similarities to members of the IS4 or IS5 family: their putative Tpases include both the D · N · G/A-Y/F and Y · R · E · K motifs characteristic of these DDE families (see "IS families and the nature of the catalytic site," below).

Additional active ISs have since been isolated (6), also with 5-fluoroorate resistance used as a screen. Several different, newly isolated, Sulfolobus strains from Siberia and the western United States were analyzed. As judged by the 99% nucleotide identities in the pyrB, pyrF, or pyrE gene, these appeared to be conspecific strains. Seven distinct ISs were isolated following PCR amplification across the mutated gene. Again, these were named for their lengths, in nucleotides.

In order of size they include ISC735, a member of the IS6 family with a single ORF, 18-bp IRs, and 8-bp DRs; ISC796, a member of the IS1 family with only a single reading frame, 21-bp IRs, and 8-bp DRs; ISC1057 and ISC1058b, related to ISC1058 and members of the IS5 family, with 88 to 93% shared nucleic acid identities, 20-bp IRs interrupted ("hyphenated") by a hexanucleotide, and 8-bp DRs; ISC1205, related to ISC1217, with 17- to 20-bp IRs and 4- to 7-bp DRs; ISC1290, a member of the IS5 family, with 34-bp IRs and 5-bp DRs; and ISC1926, a member of the IS200/IS605 group, with the corresponding two characteristic ORFs. ISC1926 is an isoform of ISC1913 in the sequenced genome of S. solfataricus. In addition to these entire ISs, the authors also detected an insertion of a short 128-bp fragment with terminal inverted repeats similar to those of ISC1058. This sequence corresponds to a typical MITE (see "MITEs, MICs, and solo IRs," below).

Transposition in Other Archaea

IS6-mediated gene rearrangements have been observed in the pyrococci (45, 95). These involve deletions (24), chromosome rearrangements (21, 45), and insertional inactivation (e.g., by insertion of ISpfu3 into napA in P. woesei [39]).

ISM1 was identified in a cloning study of the Methanobrevibacter smithii purE and proC genes (32). This has a typical IS structure, is distantly related to the ISL3 family, and is present in about 10 copies in M. smithii.

No data concerning transposition or the effects of transposable elements are available for other archaeal phyla, including important groups carrying numerous ISs such as the Methanosarcinales and Thermoplasmatales.


   REGULATION OF TRANSPOSITION
 Top
 Previous
 Next
 References
 
Although regulation has not been addressed experimentally in any detail in the Archaea, in principle, many of the systems which regulate transposition activity in the Bacteria (53) might be expected to operate in the Archaea. These would include control at the level of gene expression (transcription initiation, translation initiation and elongation, translational or transcriptional frameshifting, and mRNA stability) and activity (Tpase stability, intervention of host proteins). Some studies have suggested that certain archaeal elements may be regulated by small, noncoding RNAs (ncRNAs) or by translational readthrough.

Lost in Transcription: ncRNAs in S. solfataricus

Interestingly, in a recent study designed to identify small, ncRNAs (87), 8 of the 57 ncRNAs identified proved to be complementary to mRNAs encoding various Tpases. These include ISC1173, ISC1217, ISC1225, ISC1234, ISC1359, and ISC1439 (Fig. 2). In the case of the most abundant S. solfataricus IS, ISC1234, one ncRNA would overlap the Tpase initiation codon. This is reminiscent of a regulatory mechanism observed for the bacterial IS10 (81, 83), where a short RNA, RNAout, is transcribed from a promoter, Pout, located close to the left end of the element and is complementary to the RNA. The complementarity between the mRNA and RNAout regulates Tpase expression by sequestering translation initiation signals. Two other ncRNAs were found to be complementary to sequences internal to the Tpase gene. The function of these internal ncRNAs is not yet clear. They could mask internal expression signals, interfere with the expression of full-length Tpase, or influence mRNA stability. Similar ncRNAs, complementary to the mRNA translation initiation signals, were also identified for ISC1439 and ISC1173, while internally complementary ncRNAs were identified for ISC1225 and ISC1217.


Figure 2
View larger version (14K):
[in this window]
[in a new window]

 
FIG. 2. Noncoding RNA. The ISs are drawn to scale. The black arrows represent the length of the Tpase ORF. The open boxes represent the noncoding regions of the ISs. The noncoding RNA names from reference 72 are shown, together with their beginnings and ends (in bases from the first base of the Tpase coding sequence). The directions of transcription are shown.

 
In the case of ISC1217, the ncRNA complementary to an internal Tpase sequence proved to be a mixed population of identical size but carrying small nucleotide substitutions. Interestingly ISC1217 exists in several isoforms, some of which include nucleotide changes in this region. The ncRNA population was composed of examples carrying each of the isoform sequences. Finally, an ncRNA complementary to the upstream, nontranslated region of ISC1359 was identified.

Further studies are essential to determine the exact role of these ncRNAs in regulation of Tpase expression. As pointed out by Tang et al. (87), regulation at the posttranscriptional level would be an efficient strategy for S. solfataricus, since mRNAs in this organism have unusually long half-lives (4).

Lost in Translation: Translational Readthrough in Methanosarcina?

Methylamine methyltransferases are important in the production of methane by archaeal methanogens. Paul et al. (60) identified an in-frame amber codon (TAG) in the trimethylamine methyltransferase genes of both M. barkeri and M. thermophila. However, at least in the case of M. barkeri, abundant quantities of the full-length protein could be obtained and it appeared that the TAG codon was read as Lys. Moreover, all three copies of a dimethylamine methyltransferase gene were also shown to carry in-frame TAG codons. In addition, analysis of the M. mazei genome has identified seven methyltransferase genes of this type and a relatively large number of in-phase TAG termination codons within other genes. The additional genes include 18 that encode Tpases. M. barkeri encodes 58 tRNA genes, an unusually high number. This complement includes a putative amber suppressor tRNA (20). It is therefore possible that amber suppression leads to translational readthrough that regulates transposition activity in these cases.


   IS FAMILIES AND THE NATURE OF THE CATALYTIC SITE
 Top
 Previous
 Next
 References
 
As stated above, Tpases can be classified according to the nature of their catalytic site. This defines the chemistry used in the transposition reactions. At present, five types have been identified. These are the DDE Tpases, the major recognized group; Y and S Tpases, related to the tyrosine (Y) and serine (S) site-specific recombinases; and Y2 enzymes, which share many characteristics of the rolling circle replicases (for reviews, see references 13 and 15). A fifth type, resembling the DNA relaxases associated with bacterial conjugation, has been identified more recently (77, 88). Only members of the DDE, serine, and relaxase classes of transposon have as yet been identified in Archaea.

The DDE Enzymes

Arguably the major transposon class encodes Tpases called DDE Tpases. The amino acids Asp (D)-Asp (D)-Glu (E) coordinate divalent metal ions necessary for DNA cleavage and joining involved in transposon movement. Additional conserved residues can also be observed. In particular, a basic K or R is often present at a distance of seven residues on the C-terminal side of the characteristic E. This places it on the same side of an {alpha} helix as the conserved E but two turns farther toward the C terminus.

Several groups of additional conserved amino acids, designated N1, N2, N3, and C1 encompass the D (N2), D (N3), and E (C1) regions in the IS4 family (74). These have been expanded to the motifs DDT, DREAD, and YREK respectively (73).

DDE enzymes ensure cleavage of the terminal phosphodiester bonds at the 3' end of the transposon strand, which will be finally transferred into the target DNA site (transferred strand). Transposons and ISs using such enzymes generally carry imperfect IRs at their ends, including one or several Tpase binding sites. The ends of ISs (terminal IRs) that have adopted this transposition chemistry are generally the simplest. They can often be divided into two domains: a Tpase binding domain, an internal sequence of 10 to 15 bp, and a catalytic domain composed of the terminal 2 to 4 bp required for cleavage and strand transfer. DDE enzymes generally generate a characteristic direct duplication of target DNA flanking the insertion. This type of IR structure is conserved in the archaeal ISs but is generally more complicated in the Eukarya.

The Serine Enzymes

The serine enzymes are related to the site-specific recombinases involved in the resolution of cointegrate molecules, the final step in transposition of the Tn3 class of bacterial transposons. The Tpase of these elements generates a cointegrate or replicon fusion in which fused donor and target replicons are separated by directly repeated copies of the transposon at each junction. The serine recombinase intervenes by catalyzing site-specific recombination between the two transposon copies, separating the replicons and thereby completing transposition. Serine recombinases are so named because they use a serine residue as the nucleophile in DNA strand cleavage and generate covalent enzyme-DNA intermediates. In the single case analyzed, IS607 from Helicobacter pylori, the S Tpase generates a circular transposon intermediate (N. Grindley, personal communication), which presumably then undergoes integration into a target molecule.

The Relaxase Enzymes

The relaxase enzymes represent a newly recognized class of generally small (~150-aa) Tpases. They use a single tyrosine (Y) residue as a nucleophile in DNA cleavage and generate covalent Y-DNA substrate intermediates. The structures of two enzymes, the bacterial IS608 and an isoform of ISSto1 (from S. tokodaii [ISfinder]), have been solved (46, 77). They exhibit a structural topology close to that of the Rep and Relaxase proteins. Transposons using this type of Tpase do not carry terminal IRs and do not generate the small flanking direct target repeats generally produced by transposons with DDE Tpases. Instead, these Tpases bind to extensive subterminal secondary structural motifs and cleave at a fixed but distant position (88). They also use a defined tetra- or pentanucleotide as a target sequence and require this sequence for further transposition.


   IS FAMILIES IN THE ARCHAEAL GENOMES
 Top
 Previous
 Next
 References
 
We have analyzed both the fully sequenced archaeal genomes and all partial sequences deposited in the public databases as of June 2006, with a few subsequent additions. The results are summarized in Fig. 1 and in Tables 1, 2 and 3. The archaeal genomes analyzed are listed in Table 1 together with the IS content. Table 2 lists the individual ISs in family groups and indicates their copy number, the presence of complete and partial copies, and the presence of MITEs. Table 3 lists the different types of MITE observed. Below, we present a more detailed description of the distribution and characteristics of each family. Where appropriate, we have included a tree for each IS family which relates the archaeal and eubacterial members. We have color-coded the origins of the ISs from Bacteria, Sulfolobales, Thermoplasmatales, halophiles, methanogens, and others throughout the figures and in Table 1. We have also included, where appropriate, diagrams of the organization of ISs of given families. We have not included this type of diagram for families whose members are simple and for which both bacterial and archaeal members are very similar (e.g., IS481) or for families whose members are extremely heterogeneous (e.g., IS4 and IS5 families) and which are at present undergoing extensive reanalysis. In addition, due to the limited number of members of some families in the Archaea, we have not included an individual figure for these families.


View this table:
[in this window]
[in a new window]

 
TABLE 2. ISs identified in archaeal genomesb

 

View this table:
[in this window]
[in a new window]

 
TABLE 3. Archaeal MITEs

 
IS1

The IS1 family (Fig. 3) was thought to be restricted to the Enterobacteriaceae, but examples were subsequently found in several cyanobacteria. Bacterial IS1 family members are short (700 to 800 bp), bordered by highly conserved 15- to 24-bp IRs, and they generate 8- or 9-bp DRs on insertion. They generally carry two reading frames, insA and insB' (Fig. 3A, top), although bacterial derivatives that carry only a single long frame (ISAba3 from Acinetobacter baumannii and possibly ISPa14 from Pseudomonas aeruginosa) have now been identified. However, these have yet to be demonstrated as active. The Tpase termination codon is often located within the distal IR. Expression of the IS1 family Tpases generally occurs by a programmed –1 translational frameshift between the two consecutive ORFs. This fuses the product of the upstream frame insA with that of the downstream frame (insAB') to generate the Tpase as a fusion protein, InsAB', which includes a catalytic DDE motif. InsAB' also exhibits a zinc finger and a helix-turn-helix motif known to be important for Tpase binding (56, 89). InsA acts as a repressor, which binds to the IRs and regulates IS1 expression from the promoter partly included in the left end (IRL).


Figure 3
View larger version (33K):
[in this window]
[in a new window]

 
FIG. 3. IS1 members. Shown is the phylogeny of the IS1 family and comparison of a representative set of terminal IRs. The top panel shows the general organization of members of this family. Red boxes indicate the terminal IRs. Yellow (or white) boxes within the larger IS box indicate ORFs (see the text). (A) Organization of the "classical" bacterial IS1. pIRL indicates the promoter, which drives Tpase synthesis. This class includes those from the archaeal methanogens. (B) The longer of the two Sulfolobus groups carries more-extensive IRs and N- and C-terminal extensions (white boxes) to the Tpase compared to the classical IS1 and the shorter Sulfolobus class. (C) Shorter Sulfolobus class. IRs are approximately the length of those found in the classical IS1 organization. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; methanogens, blue. Bacteria are indicated in black.

 
Four IS1 members have been identified in the genomes of different Sulfolobus species. ISC1173a (S. solfataricus) and ISSto7 (S. tokodaii) (Fig. 3B, top) are closely related, as are ISC796 (Sulfolobus sp.) and ISSto9 (S. tokodaii) (Fig. 3C, top). Under our operational nomenclature, neither ISC1173a and ISSto7 nor ISSto9 and ISC796 are isoforms. Nevertheless the two pairs are phylogenetically closely related (91% and 84% amino acid identity, respectively). S. tokodaii carries both full-length and solo ISSto7 IRs, together with two complete small ISSto7-derived MITE-like elements (see "MITEs, MICs, and solo IRs," below) with sizes of 315 and 317 bp. ISC796 is present as a single copy in Sulfolobus sp. and as several fragmented copies in S. solfataricus. There are both complete and partial copies of ISSto9 in S. tokodaii, as well as solo IRs.

All four Sulfolobus elements carry only a single long reading frame (although one ISSto9 copy appears to be degenerate, with an 8-bp deletion generating two ORFs). Although there is no ORF equivalent to insA, an upstream equivalent to InsA may be produced in these single ORF elements. This could occur, for example, by proteolysis of the larger Tpase or by frameshifting to create the smaller protein, as in Escherichia coli for dnaX (5).

ISC1173a and ISSto7 are significantly longer (1,173 and 1,174 bp) than other family members, with IRs of approximately 50 bp, over twice the length of other members of the family. Moreover, the Tpase is larger than that of ISC796, ISSto9, and other members of the family (~340 aa compared to ~240 aa) due to an 80-aa N-terminal extension and a 40-aa C-terminal extension (Fig. 3B, top). Both ISC796 and ISSto9 are 796 bp long, with IRs of 21 bp (Fig. 3C, top). DNA alignments show that the long and short ISs and the MITEs are clearly derived from a common ancestor, but their exact relationship is at present unclear.

Four additional IS1 family members, organized as a canonical eubacterial IS1 (Fig. 3A, top), are present in the Methanosarcinales: ISMac16 (Methanosarcina acetivorans); ISMma7 (M. mazei, M. barkeri, and Methanococcoides burtonii), ISMba2 (M. barkeri), and ISMbu3 (Methanococcoides burtonii). ISMac16, ISMma7, and ISMba2 are 740 bp long, with 24-bp IRs and 8- or 9-bp DRs. ISMbu3 (741 bp; 8-bp DRs) has IRs of only 15 bp. In contrast to the Sulfolobus IS1 members, these all carry the expected two ORFs. They are closely related elements, with 84 to 89% identity with respect to ISMac16. Inspection of their nucleic acid sequence reveals an appropriately placed stretch of eight A residues and raises the possibility that the Tpase is produced by transcriptional rather than translational frameshifting (3; O. Fayet, personal communication).

The Tpases of these elements are related to that of ISMae3 of the cyanobacterium Microcystis aeruginosa (Fig. 3; 89) and less closely to diverse IS1 elements of the {gamma}-Proteobacteria, including IS1X and IS1S from E. coli and ISVvu1 from Vibrio vulnificus. The DDE catalytic motif and surrounding amino acid residues are also typical of this family. Finally, the terminal 23 to 30 bp are very similar to the IRs of the {gamma}-proteobacterial and cyanobacterial IS1 elements and terminate with a highly conserved 5'-GGNNNTG (CANNNCC-3'). Where identified, the site of insertion is A+T rich.

IS3

The large IS3 family is widely distributed among Bacteria and forms an extremely coherent and highly related family characterized by lengths of between 1,200 and 1,550 bp; related terminal IRs of 20 to 40 bp terminating with 5'-TG... CA-3'; DRs of between 3 and 5 bp; two consecutive, partially overlapping reading frames, orfA and orfB, from which two proteins are expressed; and a strongly conserved DDE motif closely related to that of retroviral integrases. The product of the upstream frame, OrfA, acts as a regulatory protein, while the Tpase, OrfAB, is generated by programmed translational frameshifting as in IS1 (for a review, see reference 78).

A single, distantly related degenerate element has been identified in Thermoplasma volcanium (TVN0865/67 and TVN0691/92). Blast searches revealed a relationship with diverse bacterial IS3 elements such as ISAca1 of Acinetobacter calcoaceticus, ISSod2 of Shewanella oneidensis, and ISPg5 of Porphyromonas gingivalis. Multiple alignments of these reading frames suggested that TVN0865 and TVN0691 are truncated copies of the OrfA frame and that TVN0867 and TVG0898533 represent truncated versions of the OrfB frame lacking the first catalytic aspartic acid (D). The spacing between the second catalytic aspartic acid (D) and glutamic acid (E) is conserved (35 aa), and an arginine (R) is present 7 aa after the glutamic acid (E). No IRs or DRs could be found for these two archaeal elements. T. volcanium therefore apparently carries only partial copies of IS3 elements.

IS4

The IS4 superfamily (Fig. 4) (see Addendum in Proof) forms a vast, widespread, and extremely heterogeneous group of ISs in numerous prokaryote lineages. Previously it had been divided into five groups: IS231, IS4Sa, IS10, IS50, and IS1549 (12). However, as a result of an increasing number of ISs, much of this grouping is no longer appropriate and a reassessment is at present being undertaken. At present, Tribe analysis generates seven clusters. Three of these can be included in an IS4 superfamily. The four remaining clusters appear to define new emerging families (D. de Palmenaer and J. Mahillon, personal communication). Archaeal ISs are found in three distinct clusters. The ISH8 subgroup, included in the IS4 superfamily, is limited to the Archaea. The second group belongs to the emerging IS1634 family, while the third group, ISH3, which is also limited to the Archaea, forms a separate cluster.


Figure 4
View larger version (41K):
[in this window]
[in a new window]

 
FIG. 4. IS4 members. Shown is the phylogeny of the different subgroups of the IS4 family and comparison of a representative set of terminal IRs. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; Thermoplasmatales, magenta; halophiles, green; methanogens, blue. Bacteria are indicated in black.

 
ISH8 subgroup. The ISH8 subgroup includes ISH26-1 and ISH26 from H. salinarium; ISH5, ISH8, and ISH8A to ISH8E from Halobacterium sp. and plasmids pNRC100 and pNRC200; ISHma1 from H. marismortui chromosomes I and II and plasmids pNG400 and pNG500; ISMba1 from M. barkeri; ISMba6 from M. barkeri and M. acetivorans; and ISMhu6 and ISMhu9 from M. hungatei. In addition, solo IRs of ISMba6 are found in M. acetivorans, M. barkeri, M. mazei, and M. thermophila. The ISH8 subgroup includes a 5'-CAT-3' triad at the ends of the IR.

ISH26 was described as harboring two overlapping ORFs. Although the first has significant similarity to the putative Tpases of other IS4 family members (26% identity with IS231W over a 143-aa overlap), the second has only very limited similarity (in the region of the conserved E residue). Detailed analyses indicate, however, that several frameshifts could significantly increase this similarity. The first ORF is very closely related to the N-terminal end of the Tpase of ISH8. A reevaluation of the ISH26 DNA sequence is needed to clarify this issue.

It is interesting to note that all five copies of ISH5 are interrupted by ISH11 at an identical position. This suggests that the entire interrupted IS is capable of autonomous transposition.

IS1634 subgroup. The IS1634 subgroup includes both bacterial and archaeal members. All archaeal members except ISFac6, from the incompletely sequenced F. acidarmanus, and ISTvo4, from T. volcanium, are restricted to methanogens. These include ISMac5, ISMac6, ISMac10, ISMac12, and ISMac23 from M. acetivorans; ISMba11, ISMba12, and ISMba13 from M. barkeri; ISMma3, ISMma4, and ISMma20 from M. mazei; ISMma18 from M. mazei, M. acetivorans, and M. barkeri; ISMhu4, ISMhu5, ISMhu7, and ISMhu8 from M. hungatei; and ISMth2 from M. thermophila. ISMba11 and ISMba12 also give rise to MITE derivatives (Table 3). An additional IS, ISArch8, has been identified in an uncultured environmental archaeon.

The IRs appear to be similar and begin with 5'CA or 5'CC. Short DRs generally of 5 or 6 bp are also present, but no similarities can be distinguished. Their presence, largely restricted to Methanosarcinales, could indicate horizontal acquisition of these elements from bacterial species by a common Methanosarcinales ancestor.

ISH3 subgroup. The Archaea-specific subgroup ISH3 forms a separate cluster in Tribe analysis and can be further subdivided into two phylogenetic subgroups with BLAST. It includes ISH27 (an isoform of ISH40) from H. salinarium; ISH51 from Haloferax volcanii; ISH20 from Haloarcula marismortui; ISH3 from the Halobacterium sp. chromosome, pNRC100, and pNRC200; ISFac1 in the unfinished genome of Ferroplasma acidarmanus; ISC1200, ISC1225, ISC1359, and ISC1439A and ISC1439B (76% identity with ISC1439A) from S. solfataricus; ISSto8 and ISSto14 from S. tokodaii; ISMma1 from M. mazei; ISMba14 from M. barkeri and M. burtonii; and ISMbu7 and ISMbu8 from M. burtonii. ISMba14 was reconstructed in silico because it is interrupted by ISMba11. The ISH3 subgroup shares a conserved terminal 5'-CAG-3' trinucleotide.

IS701 subgroup. At present the IS701 cluster, which has emerged as a group separate from the IS4 family, contains a single example from the Archaea, ISMba8 (M. barkeri).

IS5

The IS5 superfamily (Fig. 5) is also a relatively heterogeneous group which had been divided into six or seven subgroups (12). It includes sequences from a large variety of Archaea. As is the case for the IS4 family, the IS5 family grouping is no longer appropriate and a reassessment is at present being undertaken. Archaeal IS5 elements are present in four of the bacterial groups (IS903, IS5, IS1031, and IS427). There are also two Archaea-specific groups (ISH1 and a Sulfolobus-specific group) and five IS5-related ISs that do not fall into any of these groups.


Figure 5
View larger version (38K):
[in this window]
[in a new window]

 
FIG. 5. IS5 members. Shown is the phylogeny of the different subgroups of the IS5 family and comparison of a representative set of terminal IRs. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; Thermoplasmatales, magenta; halophiles, green; methanogens, blue. Bacteria are indicated in black.

 
IS903 subgroup. The IS903 subgroup includes two archaeal elements (Fig. 5): ISC1058 from S. solfataricus and ISFac2 in the unfinished genome of F. acidarmanus. Two short and partial copies of an IS903-related element are also found in the genome of T. volcanium (TVN0139, TVN0587). These are closely related to ISs from the {gamma}-Proteobacteria (IS903D and IS102 of E. coli, ISAs4 from Aeromonas sp., and ISVa1 from Vibrio species). The IRs of this subgroup are very homogeneous despite the fact that the very terminal "catalytic" base pairs are different from the 5'-GGC-3' consensus of the bacterial elements. They all carry a motif, TGTTG, common to the bacterial ISs between nt 6 and 10. All exhibit DRs with a length of 9 bp, as expected for this group, but no similarities between them are evident. Related partial copies are present in H. marismortui chromosome II (rrnB0094), M. mazei (MM1429), and M. barkeri (Mbar_A1398/99, Mbar_A2202).

IS5 subgroup. The IS5 subgroup (Fig. 5) includes ISMbu1 (M. burtonii), ISMac22 (M. acetivorans), and ISArch6 (from an uncultured archaeon). Three complete copies of ISMbu1 carry an in-phase insertion of 52 bp, which introduces a termination codon. Four complete copies also carry an additional tandem left end of 97 bp. A possible MITE derivative of ISMac22 was also identified. A fragment of an IS related to IS1194 can also be found in T. volcanium (TVN1409, TVN1410) and another in T. acidophilum (ID: Ta0379). ISMbu1 is related to IS1246 (Pseudomonas species) and ISSsp126 (Sphingomonas sp.). The IRs of this subgroup are heterogeneous. ISMbu1 have long DRs (14 bp), with no similarities to bacterial DRs.

IS1031 subgroup. Only a single example of this group, ISMac15 (M. acetivorans), has been identified.

IS427 subgroup. Four archaeal ISs have been identified in this subgroup: ISMac11, ISMma12 (M. mazei), ISMba5, and ISMba19 (M. barkeri). ISMac11- and ISMba5-related MITEs have also been identified.

The halophilic subgroup ISH1. The halophilic subgroup ISH1 includes ISH1 and two isoelements, ISH9 and ISH28, together with ISH19, ISHma8, ISHma9, ISHma10, ISHma11, and ISNph4. Where present, DRs are between 7 and 10 bp. A single ISH9 MITE derivative was also identified.

The Sulfolobus subgroup. Several elements in the genome of S. solfataricus (ISC1212, ISC1234, and ISC1290) are annotated as IS5 family members (8). These, together with ISSto3 from S. tokodaii, show only very weak similarities to other IS5 elements and also vary significantly among themselves. Moreover, the spacing of the DDE catalytic motifs does not align with that of other IS5 family members. MITE derivatives of ISSto3 have been identified.

IS5 orphans. Several elements that display only weak similarities with the other IS5 elements are also present in both archaeal methanogens and halophiles. We have identified ISMba15 (M. barkeri), ISMhu10 (M. hungatei), and ISMbu10 (M. burtonii). ISMbu10-related MITEs and numerous solo IRs were also identified. Solo IRs are also found in M. acetivorans, M. mazei, and M. barkeri. Two related ISs are also present in the halophiles: ISH11 (Halobacterium sp. plasmids pNRC100 and pNRC200) and ISHma6 (H. marismortui pNG500 and N. pharaonis chromosome II and pL131).

IS6

All bacterial members of the IS6 family (Fig. 6) carry short, related (15- to 20-bp) terminal IRs and generally create 8-bp DRs. No marked target selectivity has been observed. The putative Tpases are very closely related, with identity levels ranging from 40 to 94%. A single ORF is transcribed from a promoter at the left end and stretches across almost the entire IS. There is a strongly conserved DDE motif. Transposition of these elements is presumably accompanied by replication, since IS6 family members appear to give rise exclusively to replicon fusions (cointegrates) in which the donor and target replicons are separated by two directly repeated IS copies. Following cointegration, a resolution step would be required to separate donor and target replicons transferring a copy of the transposon to the target replicon. In contrast to members of the Tn3 family, which encode a specific enzyme, a site-specific recombinase, recombination between the directly repeated ISs necessary for this separation occurs by homologous recombination and requires a recombination-proficient host (12). IS6 family elements are abundant in archaea and cover almost all of the traditionally recognized archaeal lineages (methanogens, halophiles, thermoacidophiles, and hyperthermophiles (Fig. 1 and 6; Table 1). Fourteen IS6 members could be identified. Phylogenetically, these can be divided into three groups present in the halophiles, the sulfolobales, and the pyrococcales/methanosarcinales.


Figure 6
View larger version (37K):
[in this window]
[in a new window]

 
FIG. 6. IS6 members. Shown is the phylogeny of the IS6 family and comparison of a representative set of terminal IRs. The various Archaea have been color coded as follows for clarity: Sulfolobales, red; halophiles, green; "other," orange. Bacteria are indicated in black.

 
Three closely related elements were found in the halophiles: ISH14, ISH15, and ISH29. ISH14 is 75% identical to ISH15 and is present as a single copy in H. marismortui. ISH29 is present as a single copy in Halobacterium sp. plasmid pNRC200. In addition, an ISH29-related structure composed of 15 bp and 35 bp of one end flanking a 15-kb DNA segment in direct repeat is present in two identical copies in pNRC100 and in pNCR200. These are in an inverted orientation on both plasmids. ISH15 is found in the plasmid pNRG500 of H. marismortui and in Halobacterium sp. An additional sequence less related to these, ISH17, was found in H. marismortui plasmids pNG500 and pNG700 and chromosome II. One partial copy is also present in Halobacterium sp. and in the plasmid pNRC200. A single copy of another member, ISNph1, was found in Natronomonas pharaonis.

Five different members were identified in the Sulfolobales: ISC735, ISC774, ISSto2, ISSte1, and ISSis1. ISC735 is indicated as a single copy in Sulfolobus sp. (AY671942). There are also three degenerate copies (with rearrangements and deletions within the IS) in S. solfataricus. S. solfataricus also carries full and partial (mostly solo IRs) copies of ISC774, while S. acidocaldarius carries only two IRs. ISSto2 is present in four complete copies, three of which carry different mutations in one IR and at least 13 partial copies. ISSte1 is present in a single copy in Sulfolobus tengchongensis plasmid pTC. Finally, ISSis1 is present in a single copy in Sulfolobus islandicus plasmid pARN4.

Methanocaldococcus jannaschii carries ISMja1 (ISE703) in two complete and one partial copy in the genome and three partial copies in the large extrachromosomal element. In addition, eight small elements of 358 to 360 bp resembling MITEs were identified (see "MITES, MICs, and solo IRs," below).

Only a single partial copy of an IS6 family member could be identified in the Methanosarcina genus (M. barkeri Mbar_A0568).

The hyperthermophilic P. furiosus carries another three closely related elements, ISPfu1, ISPfu2, and ISPfu5, while P. abyssi carries a partial iso-ISPfu1 copy. Isoforms of these ISs are present in P. woesei and in a wide range of Pyrococcus strains.

Finally, two partial copies of an IS6-like element are present in the genome of Archaeoglobus fulgidus (AF0138, AF0895).

These archaeal elements form a monophyletic group related to bacterial ISs from Firmicutes: IS240 (Bacillus sp.), IS431 (Staphylococcus aureus), IS1297 (Leuconostoc mesenteroides), ISS1W (Lactococcus lactis), and ISEnfa1 (Enterococcus faecalis). Most carry DRs of