MMBR Try AEM online
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental material
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Ashby, M. K.
Right arrow Articles by Houmard, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ashby, M. K.
Right arrow Articles by Houmard, J.
Microbiology and Molecular Biology Reviews, June 2006, p. 472-509, Vol. 70, No. 2
1092-2172/06/$08.00+0     doi:10.1128/MMBR.00046-05
Copyright © 2006, American Society for Microbiology. All Rights Reserved.

Cyanobacterial Two-Component Proteins: Structure, Diversity, Distribution, and Evolution{dagger}

Mark K. Ashby1,{ddagger} and Jean Houmard2*

Department of Basic Medical Sciences, Biochemistry Section, University of the West Indies, Mona Campus, Kingston 7, Jamaica,1 Ecole Normale Supérieure, CNRS UMR 8541, Génétique Moléculaire, 46 rue d'Ulm, 75230 Paris Cedex 05, France2

SUMMARY
INTRODUCTION
BIOINFORMATIC GENOME ANALYSIS
CYANOBACTERIAL TWO-COMPONENT ORF REPERTOIRE: STRUCTURE AND FUNCTION
    Structural Domains Found in Cyanobacterial Two-Component Proteins
    Histidine Kinases
        Incomplete HKs.
        HKI.
        HKII.
        HKIII.
        HKIV.
        HKV.
    Response Regulators
        RRI.
        RRII.
        (i) OmpR-type subclass (T_reg output domain).
        (ii) NarL subclass (LuxR output domain).
        (iii) AraC subclass (AraC output domain).
        RRIII.
        RRIV.
    Hybrid Kinases
        HYI.
        HYII.
        HYIII.
        HYIV.
        HYV.
        HYVI.
        HYVII.
    Other Two-Component Related ORFs
ORTHOLOGOUS GROUPS
    HKs and RRs Common to All Genomes
    HY Subclasses
DISTRIBUTION OF TWO-COMPONENT ORFs
LOCALIZATION AND PHYSICAL ORGANIZATION OF TWO-COMPONENT GENES
EVOLUTION AND PHYLOGENY
    Strain Phylogeny
    Gene Origin
    Domain Shuffling, Fusion, and Gene Loss
CONCLUDING REMARKS
ACKNOWLEDGMENTS
REFERENCES

   SUMMARY
 Top
 Next
 References
 
A survey of the already characterized and potential two-component protein sequences that exist in the nine complete and seven partially annotated cyanobacterial genome sequences available (as of May 2005) showed that the cyanobacteria possess a much larger repertoire of such proteins than most other bacteria. By analysis of the domain structure of the 1,171 potential histidine kinases, response regulators, and hybrid kinases, many various arrangements of about thirty different modules could be distinguished. The number of two-component proteins is related in part to genome size but also to the variety of physiological properties and ecophysiologies of the different strains. Groups of orthologues were defined, only a few of which have representatives with known physiological functions. Based on comparisons with the proposed phylogenetic relationships between the strains, the orthology groups show that (i) a few genes, some of them clustered on the genome, have been conserved by all species, suggesting their very ancient origin and an essential role for the corresponding proteins, and (ii) duplications, fusions, gene losses, insertions, and deletions, as well as domain shuffling, occurred during evolution, leading to the extant repertoire. These mechanisms are put in perspective with the different genetic properties that cyanobacteria have to achieve genome plasticity. This review is designed to serve as a basis for orienting further research aimed at defining the most ancient regulatory mechanisms and understanding how evolution worked to select and keep the most appropriate systems for cyanobacteria to develop in the quite different environments that they have successfully colonized.


   INTRODUCTION
 Top
 Previous
 Next
 References
 
The cyanobacteria constitute a very large and morphologically diverse group of oxygen-evolving photosynthetic prokaryotes. They can be found in most terrestrial, freshwater, and marine habitats (28). Like most bacteria, cyanobacteria use two-component regulatory systems proteins to regulate cell behavior and gene expression in response to changes in the external environment (3, 24, 54, 56, 80, 132, 136, 140). Two-component systems typically consist of two types of proteins, histidine kinases (HK) and response regulators (RR), which may sometimes be carried by a single polypeptide to form the hybrid kinases (HY). They are characterized by the presence of specific signatures: the HisKA (dimerization and phosphoacceptor) and HATPase (histidine kinase ATPase) domains, which make a histidine kinase, an aspartate-containing receiver domain for the response regulators. The so-called hybrid sensors have all three domains. Upon detection of a stimulus, the HisKA and HATPase domains function to autophosphorylate a histidine residue. The phosphate group is then transferred to an aspartate residue of the receiver cognate response regulator or hybrid sensor. As a result, a change in the activity of the protein that carries the receiver domain occurs, such that it modifies some aspect of cell behavior (such as taxis) or gene expression, or the phosphate group is further transferred in so-called phosphorelays (32, 54, 68, 132). The deduced sequences of two-component protein genes have been found to contain a number of other "sensory" domains, but precise functions for a large number of them still await definition (40). Surveys performed on complete annotated genomes of prokaryotes revealed that the differences in the total number of genes and the complexity of the ecophysiology of the bacterium and of its environment have an effect on the number of response regulators (11, 38). The total number of all signal transduction proteins increases for most bacteria as a square of the genome size (38).

Genome sequences of the Cyanobacteria have revealed that they likely make extensive use of a variety of two-component proteins to regulate responses to the environment (34, 89, 102, 118). In May 2005, 8 completely annotated cyanobacterial sequences were available in Cyanobase (http://www.kazusa.or.jp/cyano/), and a total of 16 sequences, not all completely annotated, were available from the U.S. Department of Energy Joint Genome Institute (the Integrated Microbial Genomes [IMG] system [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi]). The possibility of performing an extensive comparative analysis of the repertoire of two-component genes in each organism was thus opened up. Available genomes come from unicellular and filamentous freshwater or terrestrial strains (including one thermophile) and from marine environments. Five of the 16 strains are capable of nitrogen fixation. The strains belong to three of the five subsections defined within the BX phylum (Cyanobacteria) of bacterial taxonomy (28). There is no representative of subsection II (species that reproduce by the formation of baeocytes, i.e., subsequent multiple fission of a cell that yields motile baeocytes) or of subsection V (branching filamentous heterocystous cyanobacteria [a heterocyst is a differentiated cell specialized in nitrogen fixation]). For subsection III (i.e., filamentous nonheterocystous cyanobacteria that divide in only one plane), the single sequenced genome is not really representative of the group because it can reduce molecular dinitrogen; it is the only strain known to be able to do so. In any case, it is worth noting that no single strain can truly be representative, because the subsection is polyphyletic and must now be considered an artificial grouping.

This survey presents a detailed analysis of the cyanobacterial two-component system repertoire. The species names of the 16 cyanobacterial strains and their morphologies, main features, and habitats, as well as the acronyms used at the beginning of gene names, are shown in Fig. 1 and in Table S1s in the supplemental material. The organization of the sensor, receiver, transmitter, and response domains is discussed in terms of the significance for the function of each family of two-component proteins and how the repertoire of such proteins found in each species of cyanobacteria relates to its requirement for regulation of its internal cellular activity. The 1,171 proteins found have been classified according to structural domain organization and orthology relationships. Whenever known, the function of the two-component proteins is mentioned and its occurrence within an orthology group is discussed in relation with the physiological properties and ecological niche of the strains that share it. Phylogenetic studies were performed to estimate the relative contributions of gene fusion, duplication, insertion, deletion, and shuffling during evolution. Finally, a generic gene name is proposed for each orthology group, even if at present the groups are composed of a single representative and the corresponding proteins do not yet have an assigned function, to aid in identification and future research. Corresponding names were attributed to the putative proteins: Chk for the histidine kinases, Crr for the response regulators, and Chy for the hybrid kinases (Table 1; see also Table S3s in the supplemental material). If within a group both HK and HY genes coexist, they have been named with chk and chy acronyms, respectively, with the same number attached, e.g., chk15 and chy15. To avoid, as much as possible, confusion within the literature, since numbers had already been assigned to almost all of the Synechocystis sp. strain PCC 6803 two-component proteins (described as HikX and RreY), we have kept the same numbers and used them for naming of the orthologues; numbering was continued from there for the new groups. Other sensing systems and regulators (S/T kinases, AC/GC, and one-component proteins) also contribute to the rather sophisticated regulatory pathways evolved by cyanobacteria, but they have not been considered in this review except when they are fused to two-component system protein domains.


Figure 1
View larger version (27K):
[in this window]
[in a new window]
 
FIG. 1. Phylogenetic tree of the cyanobacterial strains whose genome sequences are available. The tree is adapted from previously published trees based on 16S rRNA sequences (55, 75, 99, 114, 116, 118; J. Elhai, personal communication). Names of the marine strains are in blue. Strains able to fix dinitrogen are boxed in red, and a yellow-green motif inside a box indicates that diazotrophy is linked to heterocyst differentiation.

 

View this table:
[in this window]
[in a new window]
 
TABLE 1. Two-component proteins from the 16 complete cyanobacterial genomes available as of May 2005, listed according to orthology relationshipsa

 

   BIOINFORMATIC GENOME ANALYSIS
 Top
 Previous
 Next
 References
 
The cyanobacterial genome sequences were accessed (as of May 2005) at Cyanobase (http://www.kazusa.or.jp/cyano/) and IMG (http://img.jgi.doe.gov/cgi-bin/pub/main.cgi). Synechococcus sp. strain PCC 6301 is very closely related to PCC 7942 in terms of genome size and physical map (44, 116). Since its sequence is not yet fully released, it was not considered in our comparisons, but most of what is said for Synechococcus elongatus PCC 7942 applies to PCC 6301 as well. The identities of potential two-component genes were derived from the published assignments (http://www.kazusa.or.jp/cyano/ and http://img.jgi.doe.gov/cgi-bin/pub/main.cgi [89, 102]) and analyses performed at Interpro (http://www.ebi.ac.uk/InterProScan/) and supplemented with PBLAST searches of each genome with a battery of cyanobacterial and Escherichia coli K-12 two-component domains (domains used include receivers from CheY and OmpR, Lux_R, HisKA/HATPase, and Hpt [see Table S2s in the supplemental material]). A few protein sequences have been kept even if they presently no longer carry the key residues required to form a canonical HK or RR domain, when they clearly are part of an orthology group gathering sequences from almost all of the 16 cyanobacterial genomes (see below). Though some sequence and/or assembly errors may have occurred, they do not seem to be numerous (95), and they permit the in silico survey presented here. In the course of this study, we noticed a few probable sequencing errors; these are mentioned in the legend to Fig. 2. For clarity, each gene name begins with a four-character acronym, referenced to the organism (Fig. 1), followed by the locus tag that comes from Cyanobase (e.g., 6803slr0396) and the locus tag or gene object identifier (GOI) (gene_OID) from the IMG site (NpunNpR0564, Tery_403232890, or Avar_400182180, etc.). All sequences can thus be easily downloaded, with both locus tags and GOIs remaining available at IMG for searches even after completion of the genome annotations.


Figure 2
Figure 2
Figure 2
Figure 2
Figure 2
Figure 2
Figure 2
View larger version (245K):
[in this window]
[in a new window]
 
FIG.2. Cyanobacterial two-component ORF repertoire. (A) HKs. (B) RRs. (C) HYs. (D) Others. Subscript numbers following parentheses are the numbers of similar domains that may be found in the proteins listed in a given subclass. Boldface type indicates that orthologues exist in the 16 genomes, and the corresponding phylogenetic trees are shown in Fig. 5. *, proteins with different domain structures belong to this group of orthologues; (Figure 2), protein that has no bacterial orthologue; (+), putative sequencing error (see below). Putative proteins that have no cyanobacterial orthologue are shown in blue and are identified by their locus tag or GOI. Putative sequencing errors: 7120all1640-1639 (7A instead of 8A would create a frameshift, giving a protein with 95% identity to Avar_400225220); the A of the 7421glr4211 stop codon can serve in the ATG initiation codon for 7421glr4212; Cwat_400841320 could form an HY with Cwat_400841330; and the HATP-RR Cwat_400841330 could form an HYI with Cwat_400841320.

 
Domains were initially assigned by Pfam batch analysis at http://www.sanger.ac.uk/Software/Pfam/ (18). Domains were recorded for each two-component gene only if they were scored as "Pfam's trusted match thresholds." Domain assignments were checked and modified by using the more extensive domain assignments at InterPro (http://www.ebi.ac.uk/interpro/).The data have been manually checked to avoid false-negative and false-positive hits, which may arise from automated analyses. The results from the 16 different cyanobacterial species were used to classify the putative two-component proteins by domain organization with a nomenclature adapted from Ohmori et al. (102), with each group subdivided by the organization of the identified signaling domains. The cartoon-style diagrams presented in Fig. 2 were constructed from these data, with the sizes of the domains roughly in proportion. The putative open reading frames (ORFs) that exhibit similar domain structures have been gathered into groups and listed within each according to their orthology relationship (Table 1; see also Table S3s in the supplemental material). Parentheses followed by figures indicate the number of similar domains that may be found in the proteins listed in a given subclass. Orthology information, based on the bidirectional best hits from BLAST searches of each organism against other organism polypeptides, is accessible at the IMG site. It was ascertained by performing CLUSTALW alignments for each subclass and by making phylogenetic trees with the PHYLO_WIN program (41, 138). The domain structures of some orthologues may vary slightly as a result of gene fusion, shuffling, and/or insertions-deletions. Paralogues are the homologues present within a given organism. These definitions are not fully accurate but can be considered a useful approximation inasmuch as we cannot always be sure of whether the polypeptides arose from a single gene present in the last common ancestor (orthologues) or from gene duplication within a genome (paralogues) (128).


   CYANOBACTERIAL TWO-COMPONENT ORF REPERTOIRE: STRUCTURE AND FUNCTION
 Top
 Previous
 Next
 References
 
For each organism, a preliminary list of all histidine kinases, hybrid kinases, and response regulators was constructed from the functional annotations (http://genome.jgi-psf.org/mic_home.html[89, 102]). Table 1, Fig. 2, and Table S3s (in the supplemental material) present all of the putative two-component proteins that could be retrieved. The number of two-component systems (Fig. 3) agrees well with published data (11, 34, 38, 93, 94, 102, 106, 118). The small differences result, in particular, from the integration of previously unrecognized, "atypical" HisKA domains, HisKA_2 (HWE) and HisKA_3 in a few proteins, and of ORFs detected on later sequenced plasmid genomes.


Figure 3
View larger version (48K):
[in this window]
[in a new window]
 
FIG. 3. Cyanobacterial genes encoding two-component system proteins (see Fig. 1 and 2 for acronyms and abbreviations).

 
Structural Domains Found in Cyanobacterial Two-Component Proteins

Many of the putative polypeptides that were found, whether kinases or response regulators, are multidomain proteins. A list of all of the domains and their acronyms that have been identified in cyanobacterial two-component proteins is given in Table S2s, in the supplemental material, together with short definitions of their functions. The most common ones are GAF, PAS, and PAC for the kinases in particular and DNA-binding domains (Treg and NarL or LuxR) for the response regulators. Some are only seldomly found: CBS, CheB, CheR, CheW, CHASE, cNMP, FHA, GuC, MASE, MHYT, Pkinase, PP2C, Trk, and UPF. With the exception of GerR and LytTR, all of the different types that compose the bacterial domain repertoire can be found in cyanobacterial proteins (141).

A HisKA domain is about 60 amino acids (aa) long and constitutes the dimerization and phosphoacceptor domain of the HKs. HisKA_2 (HWE) and HisKA_3 are alternative dimerization and phosphoacceptor domains. Hkd is the homodimer interface of the signal transducing histidine kinase family, which often overlaps the HisKA domain. To form a histidine kinase (HIS_KIN), a HATPase_c domain (HATP), which is usually adjacent to and downstream of HisKA or homologues, is required. Such domains are found in many ATP-binding proteins and are necessary for kinase activity.

A canonical basic response regulator domain (RR or CheY) can be schematically described as two aspartate (D) residues and a lysyl (K) residue appropriately spaced within an ~120-aa sequence. They usually are N terminal to output domains, some of which, the transcriptional regulators, have the property of binding to specific DNA sequences. Depending on sequence similarities, response regulator proteins are often classified in subfamilies named from the best-studied CheY, OmpR, NarL, or LuxR proteins, for example.

As the acronym indicates, GAF domains (for "cGMP phosphodiesterase, adenylyl cyclases, and the bacterial transcription factor FhlA," about 130 aa long) have been linked to small-molecule binding, in particular the cyclic nucleotides cyclic AMP (cAMP) and cGMP, which are common second messengers in signal transduction (5, 9, 23, 53). They are found in different proteins related to the cNMPs, cyclases, and phosphodiesterases, as well as to light-signaling phytochromes (9, 85). The GAF family is among the largest of all classes of signaling domains. GAF (Phyt_2) is a member of this large family. PHYT is a light wavelength sensor domain to which a linear tetrapyrrole is bound through a thioether linkage via a Cys residue (113). It permits the reversible photochemical conversion of a protein between two forms.

PAS and PAC domains are often found to be associated. PAS derives from the names of three proteins that the domain occurs in: Per, period circadian protein; Arnt, Ah receptor nuclear translocator protein; Sim, single-minded protein. The acronym PAC is derived from "PAS-associated, C-terminal," such sequences contributing to the PAS domain fold. The division between the PAS and PAC domains is caused by major differences in sequences in the region connecting these two motifs. A subset of PAS domains, the best-characterized members of this family, binds cofactors such as heme and flavin adenine dinucleotide. Sensing of light, oxygen, or redox potential requires cofactors, while signals such as voltage, xenobiotics, and nitrogen availability do not (16, 21, 43, 110). PAC domains can be found without an associated PAS domain. GAF and PAS domains exhibit striking similarity in their structures, and proteins carrying such domains are clearly linked in their evolution (53). The common theme among both classes of proteins with such domains is the binding, either covalent or not, of a remarkably diverse set of small regulatory molecules that often remain unidentified (5). The two domains are presumed to be functionally similar.

Histidine Kinases

Incomplete HKs. For 18 putative proteins, which are about 450 aa long, only a HisKA domain could be recognized. Most of them form an orthology group with one sequence from each genome, except for that of Gloeobacter violaceus. The orthologous protein from this strain, Glr1586, is a complete HK, having both a HisKA and a HATPase domain (see below). One of these proteins, Slr1285 (Hik34, Chk34), has just been shown to be involved in salt sensing and hyperosmotic stress response in Synechocystis sp. strain PCC 6803 and to pair with the response regulator Slr1783 (Rre1, Crr1) (105, 126). The genes reported to be under the control of the Hik34-Rre1 pair following hyperosmotic or salt stress are rather general stress response genes; this may explain why the authors could not identify the sensor partner upstream of Hik34 in the regulatory pathway. In agreement with their previous data, the same group recently showed that Hik34 is required for thermotolerance (probably by regulating the expression of some heat shock genes) and that the purified protein could autophosphorylate in vitro (134). There do not appear to be any proteins orthologous to this Chk34 group in any of the other 110 completed bacterial genome sequences.

Thirteen putative proteins, originating from only five species (all diazotrophs except Thermosynechococcus elongatus) only have a HATP domain. Half of them are from Nostoc punctiforme, with NpF3113 (128 residues) being 100% identical at the amino acid level to NpF2204 and the adjacent NpF2205 (75 residues) being 100% identical to NpF3114. These may represent recent gene duplications. The fifth N. punctiforme representative, NpunpNPBR204, is carried by a plasmid and has orthologues in Anabaena sp. strain PCC 7120 and Anabaena variabilis, all being about 250 residues. Three of these putative proteins have additional N-terminal GAF domains and form a group of orthologues. Three additional proteins, listed as HYVII, consist of either a HisKA or a HATPase N-terminal domain linked to a response regulator (RR) downstream. Finally, a class composed of 26 proteins formed by different combinations of five basic domains, of which one is HATPase-c, appears in Fig. 2 as HYVI and HKV+CheW for the Trichodesmium erythraeum representative. It is discussed in more detail below. Whether such polypeptides act by forming complexes with specific HisKA proteins is a hypothesis that must be tested.

HKI. HK class I kinases (HKIs), having only HisKA and HATP domains, can be considered basic proteins, i.e., serving as building blocks for the more sophisticated domain arrangements that also exist in cyanobacteria. However, it is likely that many of these ORFs have signaling domains that have not yet been identified. This possibility is highlighted by a number of HKIs that have one or more transmembrane (TM) domains that could flank putative signaling domains (Table 1; also see Table S3s in the supplemental material). None were found in the marine unicellular non-N2-fixing strains, with the exception of Pro1543 in Prochlorococcus sp. strain SS120. Depending on strains, they may represent one-fifth to one-twentieth of the whole histidine kinase repertoire, being more abundant in the unicellular species. There are 36 groups of orthologues; six of them are proteins found only in the three filamentous heterocystous strains, and three are made up of Anabaena sp. strain PCC 7120 and A. variabilis proteins only. In nine instances, an HKI may have orthologues which have a more complex structure (highlighted with asterisks in Table 1 and Fig. 2). As an example, Anabaena sp. strain PCC 7120 All2956 has an HYI orthologue in C. watsonii (Cwat_400862090) and an HKIV (cNMP-HK) (Gll1662) orthologue in G. violaceus. Chk7s constitute another example, as in most of the orthologues a PAS domain is also present (see the discussion of SphS, below). Six HKIs do not have any bacterial orthologue, 7120all7605 and NpunpNPBF140 being plasmid encoded. It is interesting that a G. violaceus protein (Gll0380) has a single orthologue in all of the sequenced bacterial genomes found in Archaeoglobus fulgidus, a hyperthermophilic marine sulfate reducer isolated from a hydrothermal environment. Phylogenetic analyses place G. violaceus close to the root of the cyanobacterial lineage, and the Archaeoglobales are the only archaebacteria that can grow by sulfate reduction, a property restricted to relatively few groups of eubacteria.

The largest group of orthologues (Chk27) contains a representative from every species except the marine unicellular non-N2-fixing strains. Synechocystis sp. strain PCC 6803 ManS, a protein involved in manganese homeostasis (100), is one of them. On the other hand, the Anabaena sp. strain PCC 7120 HepX (Alr0117, Chk52) has been reported to be involved in heterocyst development (heterocyst envelope polysaccharide [97]). It has orthologues not only in the other two heterocystous strains but also in the other two N2-fixing strains, as well as, surprisingly, the unicellular freshwater S. elongatus 7942.

Synechocystis sp. strain PCC 6803 Sll0798 (Chk30; termed RppB or NrsS) has been shown to control the Ni2+-dependent induction of the nrsBACD operon and to be involved in Ni2+ sensing (76). Such a member of the bacterial binding protein-dependent transport systems would also be present in A. variabilis and Anabaena sp. strain PCC 7120. On the other hand, the inactivation of both sll0790 (hik31, chk31) and slr6041 (chk46), two HKI paralogs sharing 97.5% identity, leads to the conclusion that the gene products are involved in the regulatory mechanisms that allow Synechocystis sp. strain PCC 6803 to adapt from photoautotrophic to photomixotrophic growth (62). This HKI would be required for the expression of icfG (encoding glucokinase) and the modulation of the glucose-6-phosphate dehydrogenase, thus having a dual role.

HKII. HK class II (HKII) groups the putative proteins that have HK linked only to either one or more GAF and/or PAS or PAC domains. These domains are encountered in quite large numbers in bacteria and euryarchaeota (40), with PAS domains being more common than GAF, except in Synechocystis sp. strain PCC 6803. Compared to most other bacteria, the large number of GAF domains correlates and underlines the role of light in the regulation of gene expression and metabolic activities for photosynthetic organisms (40). None of the seven marine unicellular non-N2-fixing strains have any HKII+GAF, and only four of them have HKII+(PAS)1-3. Within this group there are 68 ORFs with only GAF sensor domains, 50 with PAS/PAC domains, and 28 with both. The analysis by Narikawa et al. (95) gives 17 PAS-containing ORFs in Synechocystis sp. strain PCC 6803, 61 in Anabaena sp. strain PCC 7120, and 84 in N. punctiforme (compared to 9 for E. coli and 10 for B. subtilis).

Only one orthology group, Chk2 (HKII+GAF), has one protein from each of the 16 species, but for G. violaceus and the marine Synechococcus and Prochlorococcus spp. the proteins are shorter and lack a detectable GAF domain; they are thus classified as HKI. Functional data have been reported for one of its members, Synechocystis sp. strain PCC 6803 Slr1147 (Hik2), which would interact with the response regulator Rre1, as does Slr1285 (Hik34, which has no detectable HATPase_c domain; see above). In this strain it would regulate the expression of sigB and four other genes in response to hyperosmotic stress (105, 126).

Within HKII, the subclass HKII-phytochrome is one in which proteins of well-known function occur. Synechocystis sp. strain PCC 6803 Slr0473 (Cph1, Chk35), for example, has been characterized as a photoreceptor (35, 36, 149). Light-induced conformational change of the chromophore in Cph1 results in inhibition of the histidine kinase activity (35). Two paralogues, aphA and aphB, exist in Anabaena sp. strain PCC 7120 as well as in the two other heterocystous strains (101). Besides these four strains, only C. watsonii has a Cph1-like protein, as well as a paralogue that does not group with AphB. Other marine species do not have any. It is worth noting that orthologues are not as widely distributed as could have been expected from the study conducted on the chromophore-binding (PHYT) domain of these proteins (49). Most of the cyanobacteria examined there were indeed shown to share a rather well-conserved chromophore binding sequence. The other HKs with multiple GAF domains are essentially from the filamentous heterocystous strains. Following the observation that red light decreased whereas far-red light increased cellular cAMP content in Anabaena sp. strain PCC 7120, Ohmori and coworkers disrupted 10 ORFs having putative chromophore-binding GAF domains. The all2699 (chk65, aphC) mutant failed to respond to far-red light. They concluded that the far-red light signal could be received by AphC and then transferred to the N-terminal RR domain of the CyaC adenylyl cyclase, stimulating its catalytic activity. The increased cAMP concentration would then drive the subsequent signal transduction cascade (104).

About half of the HKII+(PAS)1-3 subclass corresponds to one orthology group (Chk7). This group is constituted of proteins from 12 species that do not all possess a PAS domain. Those which do not possess a PAS domain have an N-terminal TM domain instead, and the T. elongatus orthologue (tll0925) has both. One member, 7942_403099950, has been identified as SphS, a sensor whose cognate response regulator is SphR (Crr29), by complementation of an E. coli phoR creC mutant for the expression of alkaline phosphatase (1). The genes are adjacent to the RR upstream from the HK. The S. elongatus 7942 mutant that lacks these genes is defective in the ability to produce alkaline phosphatase and some inducible proteins in response to phosphate limitation. This was one of the very first cyanobacterial two-component systems to be characterized. The Synechocystis sp. strain PCC 6803 Hik7 and Rre29 orthologues have since been shown to be the dominant sensory system that controls gene expression in response to phosphate limitation (51, 136). Murata and coworkers (136) suggested that a two-component system homologous to SphS-SphR is likely conserved in all cyanobacterial species. However, no direct orthologue could be detected in T. erythraeum, Synechococcus sp. strain 9902, or P. marinus SS120 and MIT9313. T. erythraeum (Chk78) has a HKII+(PAS)1-3 that is orthologous to those of Anabaena sp. strain PCC 7120 and N. punctiforme paralogues of Chk7.

Putative histidine kinases with either PAS or PAS/PAC domains that occur either in single or multiple copies were essentially found in the filamentous heterocystous strains. There is one in T. erythraeum and two in Synechocystis sp. strain PCC 6803. Ten of the 24 putative proteins do not have any orthologues, and one of them (NpunpNPAR133) is plasmid encoded.

Another HKII subclass is made up of proteins with both PAS and/or PAC and GAF domains. Such proteins are totally absent in marine strains, except C. watsonii. The Ssl1473-75 acronym (Chk32) is used in Table 1 and Fig. 2 because it corresponds to the Synechocystis sp. strain PCC 6803 wild-type sequence, which is interrupted by an insertion (IS) element in the "Kazusa" strain that was sequenced (103). This fusion protein is about 40% identical to the Fremyella diplosiphon (Tolypothrix PCC 7601) RcaE protein (GAF-PAS-PAC-HK), which has been shown to be a photoreceptor involved in complementary chromatic adaptation (137). From the microarray data obtained with a Synechocystis sp. strain PCC 6803 chk16 mutant, the Chk16 protein could be directly involved in sensing NaCl concentration (80). Under hyperosmotic conditions, it would be part of a phosphorelay cascade involving Synechocystis sp. strain PCC 6803 Chk41 (Hik41) and Crr17 (Rre17) (105, 126). Interestingly, the Synechocystis sp. strain PCC 6803 and C. watsonii Chk16s, which possess an N-terminal MASE1 domain (the function of which is currently unknown [96]) in front of a GAF (Phyt_2) domain, have orthologues in Anabaena sp. strain PCC 7120 and A. variabilis, with only a PAS domain. The N. punctiforme Chk16 orthologue has a GAF (Phyt) domain between the PAS/PAC and HK domains. Notably, for each of the three heterocystous strains, the closest paralogues of Chk16 (i) have rather similar structures, (ii) are orthologues (Chk74), and (iii) are located immediately downstream of the chk16 genes. Gene duplication thus probably occurred in an ancestor common to these three strains before their divergence, and the two genes, chk74 and chk16, have subsequently evolved differently.

HKIII. Kinases of HK class III (HKIII) possess, in addition to the HisKA and HATPase, a HAMP (or "linker") domain. The latter is typically found downstream from the last TM segment of a protein, and it has been shown that two symmetrical HAMP domains dimerize and cooperate to transfer the signal across the membrane via a linker to the histidine kinase (155). The presence of a HAMP domain suggests that the corresponding putative ORFs likely function as a dimer. In many cases, it is linked to transmitting signals across a membrane from periplasmic ligand-binding domains (6, 7, 10). The HAMP domain localizes upstream from HisKA. One protein has a PAC (Chk159, 7421gll0814), 16 have a PAS (Chk33), and 6 have an additional Cache (a signaling domain common to calcium-channel subunits and chemotaxis receptors [4]) upstream from the HAMP, of which one has a GAF (Chk161, NpunNpF6040) in between the two domains and two PAS/PAC domains (Chk155 and Chk177). Cache is a signaling domain that is found in animal calcium channel subunits and a certain class of prokaryotic chemotaxis receptors. It is thought to form an extracellular or periplasmic ligand sensor (4). All of these proteins originate from filamentous N2-fixing strains, and four were found in the endosymbiosis-forming species N. punctiforme.

Synechocystis sp. strain PCC 6803 Chk10 (Hik10) has been reported to be involved in the response to hyperosmotic stress, forming a pair with the response regulator Crr3 (Rre3) (105). No function has been described for its orthologues. Another HKIII (Chk33), which possesses a PAS domain, would also be involved in this stress response. Remarkably, orthologues of this Hik33 protein exist in all of the 16 genomes, and they are the only examples of cyanobacterial proteins with such architecture. Other bacterial orthologues (without any function yet defined) are at present restricted to the Firmicutes (gram-positive bacteria). This protein (termed DspA or Hik33 in Synechocystis sp. strain PCC 6803 and NblS in S. elongatus) has been reported to sense many environmental cues: cold, osmotic changes, high light, and nutrient limitations (56, 80, 87, 90). Since it is present even in the strains that have only a small number of two-component systems, it likely plays a key role in cyanobacteria by integrating cellular metabolism with environmental parameters. It also has homologues in the plastid genomes of the red algae Porphyra purpurea, Gracilaria tenuistipitata, and Cyanidium caldarium and was termed Ycf26. The cyanobacterial sequences, as well as those from G. tenuistipitata and P. purpurea, have a unique putative periplasmic signaling domain that has not been detected in any other protein (90).

HKIV. The HK class IV (HKIV) polypeptides have an N-terminal S/T kinase domain and a C-terminal histidine kinase domain, with GAF domains in between. They are restricted to species belonging to the Nostocales family, i.e., filamentous heterocystous N2-fixing strains (11 to 13 each), with the exception of T. erythraeum, which has one. These proteins are quite interesting, as they are able to directly couple Ser/Thr kinase activities and transduction pathways involving two-component systems. One of them, HstK (Alr2258, Chk99) from Anabaena sp. strain PCC 7120, has been characterized; its expression depends on the type of nitrogen source that is available (109). Anabaena sp. strain PCC 7120 Alr0709 (Chk162) and Alr0710 (Chk107) are very large proteins (1,799 and 1,796 aa, respectively) which have the same modular organization and are adjacent on the chromosome; they align all along their length, with only one gap (10 aa) in the middle. They are the closest paralogues, with 63% identity and 74% similarity. The same physical organization exists for Avar_400222710 (Chk165) and Avar_400222720 (Chk101), and the two proteins are 61% identical. Only Chk101 however, has an orthologue in Anabaena sp. strain PCC 7120, which is neither Chk165 nor Chk107. Gene duplications thus probably occurred rather recently, i.e., after their divergence. Four HKIVs have a second GAF domain, and one protein from Anabaena sp. strain PCC 7120, one from A. variabilis, and two from N. punctiforme have PAS and/or PAC domains in between the GAF and the HK. They all are about 2,000 residues or more. The physiological functions of these proteins should be looked at closely to determine the role of each kinase and whether they act independently or synergistically, or if these proteins are nodes receiving signals from two different transduction pathways to achieve a single output function.

HKV. In the last class, HK class V (HKV), there are 37 multidomain proteins, corresponding to the combination of different types of domains linked to a histidine kinase. One group of orthologous proteins (Chk8) has a representative in all genomes but G. violaceus. The S. elongatus 7942 (SasA) and Synechocystis sp. strain PCC 6803 orthologues have been characterized. They are clock-associated histidine kinases, necessary for the robustness of the circadian rhythm of gene expression, and have been implicated in clock output (57, 61) as well as in heterotrophic carbohydrate metabolism when cells are grown in light-dark cycles (127). The protein has been crystallized from Synechocystis sp. strain PCC 6803, and its structure has been determined to 1.9-Å resolution. It forms an open tetramer (52). Its cognate response regulator, tentatively named SasR, awaits identification. Another group (Chk178, Chy178) contains a protein from G. violaceus that associates with CheB, CheR, PAS, and HK domains, the Anabaena sp. strain PCC 7120 and A. variabilis orthologues being hybrid kinases (HYII) with an additional C-terminal RR. Within this subclass, which contains proteins with cNMP-binding domains, the Chk110 group gathers orthologues originating from quite distant strains: G. violaceus, presumed to be at the root of the cyanobacterial lineage, and N. punctiforme, which is the cyanobacterium with the largest genome (among the characterized ones) and the more complex ecophysiology.

Response Regulators

RRI. The RR class I (RRI)-CheY class groups proteins with an RR domain within a polypeptide less than 200 aa long; 121 cyanobacterial ORFs that do not have any additional recognizable domains have been found. There are 30 groups containing orthologous genes, of which only 10 have more than three ORFs. Among the unicellular marine strains, Prochlorococcus sp. strain MIT9313 and Synechococcus sp. strains 9605 and 9902 are the only species to have such a protein, the three being orthologues (Crr48).

A few of these proteins have known functions. PilH (Crr7, Rre7, taxAY3) is required for motility in Synechocystis sp. strain PCC 6803 (151) and is also found in T. elongatus and the five N2-fixing species. Another RRI-CheY, PisH or PixH (Crr35, Rre35), is required for positive phototactic movement (152). Orthologues exist only in the three filamentous heterocystous strains. Rcp1 (Crr27, Rre27) is the cognate response regulator for the phytochrome Cph1 (Chk35, Hik35 [150]). Orthologues are found only in the strains that possess such an HKII phytochrome-like protein (Chk35), and they are always adjacent to and downstream from the corresponding gene. Anabaena sp. strain PCC 7120 DevR (Alr0442, Crr42) makes with HepK (All4496, Chk86) the first two-component system identified that regulates the biosynthesis of a polysaccharide as part of a patterned differentiation process (154). Orthologues can be found not only in the other N2-fixing strains but also in S. elongatus 7942 and Synechocystis sp. strain PCC 6803. In the latter, the Crr42 orthologue is annotated as DivK, a cell division response regulator, but on bases which have not been explicated; it is 66% identical and 79% similar to DevR. All of the Crr42 orthologues are adjacent to and divergently transcribed from genes which also are orthologues and potentially encode subunit A of DNA gyrase/topoisomerase IV. Since heterocysts do not divide, it may be that the phenotype observed for the devR mutant results from global regulation involving chromosome structure.

About 80% of the small RRI-CheY domains are less than 150 aa long. The absence of any identifiable output domain raises the question of their mode of action. Each of these probably interacts with not more than one partner besides its cognate kinase. A phosphorylated (P-RR) and a nonphosphorylated (RR) form would be in equilibrium, probably differing by their conformation. Under specific conditions, the cognate kinase will provide a phosphate (P) to form P-RR that could then establish specific interactions with a partner of which it regulates the activity, either positively or negatively. In E. coli after autophosphorylation of the CheA histidine kinase, the phosphoryl group is transferred to the CheY, an RR which then interacts with flagellar motor proteins (22, 145). Rhizobium meliloti, which does not possess CheZ, has two cognate CheYs (~120 aa long) that interact with CheA: phospho-CheY2 (CheY2-P) is the chief regulator of flagellar rotation, its action being modulated by CheY1, which functions as a phosphatase of CheY2-P and becomes a sink for phosphate (129). A similar process may occur in Rhodobacter sphaeroides, which has two classic and two atypical CheA proteins and eight associated response regulators (six CheY proteins and two CheB proteins [111, 112]), as well as in cyanobacteria, which also do not have any CheZ homologues but possess a large number of "CheY"-like proteins. It will be of interest to determine whether the expression levels of the cyanobacterial genes and/or protein levels change upon alterations in the environment, as well as to look for a specific intracellular location of the gene products, if any.

The same basic RR-CheY domain also occurs in ORFs of more than 200 residues, usually about 400 aa long, with no characteristic associated domains. They have been classified as RRI PatA, because one such protein from Anabaena sp. strain PCC 7120, All0521 (Crr65), was the first of this group to have been characterized. Its name comes from the phenotype of the corresponding mutant, which is impaired in the pattern formation of the heterocysts (73). Another protein belonging to that class has been studied, Sll0038 (Rre36, Crr36), which is part of the pathway for perception and transduction of low temperature signals and might specifically regulate the expression of the desB gene in Synechocystis sp. strain PCC 6803 (135). Crr36 orthologues exist in the three filamentous heterocystous strains.

Another subclass, RRI-other (RRVI in Ohmori's nomenclature [102]), also contains a single RR domain in a polypeptide more than 200 aa long, with no other (as yet) identifiable domain but low overall sequence identity with PatA-type ORFs. This subclass mostly consists of a group of orthologues, Crr23 (Rre23, previously named Ycf55). Orthologues exist in all strains but G. violaceus; the marine unicellular non-N2-fixing strains, as well as T. elongatus and S. elongatus, however, do not presently have a canonical RR domain. They no longer exhibit in their N-terminal sequences the critical D and K residues which make recognizable RRs. They have nevertheless been kept in Fig. 2 and 3 because PBLAST searches performed with this domain, although less conserved than the C-terminal part, still pick the RR domains of the other orthologues. No function has yet been assigned to this probably very ancient and well conserved protein, present only in photosynthetic organisms.

RRII. RR class II (RRII) proteins contain the more "classical" RRs in that they correspond to the structure of the first described response regulators, all being two-component DNA-binding response regulators. They have an N-terminal RR domain fused to an output DNA-binding domain, either a T_reg (for OmpR type [81]), HTH_LuxR (or Ger_E for LuxR/NarL type), or AraC. Thus, they probably function as transcriptional regulators. Examples of these RRs are found in all species of cyanobacteria, with the number of OmpR types (4 to 19, depending on the species) outnumbering (141 versus 89) the NarL types (1 to 16). Almost all of the RR repertoire found in unicellular non-N2-fixing strains belongs to this class, the rest (at most two proteins) being RRIs.

(i) OmpR-type subclass (T_reg output domain). Within the OmpR-type subclass there are three groups of 16 orthologues and one in NarL. Two of them, Synechocystis sp. strain PCC 6803 RpaA (Crr31, Rre31) and RpaB (Crr26, Rre26, Ycf27), have been linked to long-term regulation of energy distribution by phycobilisomes (12). RpaA would also be a partner of Hik33 (also termed DspA or Chk33), and Ycf26 orthologues are present in all strains (see above). Synechocystis sp. strain PCC 6803 Sll0649 (Crr3 or Rre3), which has five orthologues, would pair with Hik10 (Slr0533 or Chk10), which also has orthologues in the same five strains (see Table S4s in the supplemental material). These two pairs are involved in the response of Synechocystis sp. strain PCC 6803 to hyperosmotic stress (105, 126). Interestingly, the Chk10 HKIII is adjacent to and downstream of Crr3 in all strains but Synechocystis sp. strain PCC 6803. In contrast, the Crr31 response regulators and Chk33 kinases are never adjacent in any of the species. For the third group of 16 proteins (Crr37), none of the RRs is adjacent to a histidine kinase and all of the corresponding genes except G. violaceus glr2274 are monocistronic transcriptional units, the adjacent genes being divergently transcribed on both sides. Expression of the Anabaena sp. strain PCC 7120 representative (all4312) is directly controlled by the global nitrogen regulator NtcA, suggesting that Crr37 might be related to cellular responses to nitrogen deprivation. The fourth one is Crr1 (Ycf29), which also has orthologues in algal plastid genomes (see below).

SphR/PhoB (Crr29) is the partner of the histidine kinase SphS (Chk7), which regulates the pho regulon in the signaling pathway of phosphate limitation (see above) (1, 136). Orthologues are distributed as for SphS, but they do not form an operon with the Chk7 proteins. Another group (Crr28) is made up of 12 sequences, no representative existing in T. elongatus and the Prochlorococcus spp. except MIT9313. No function is known for any of these, the only information being that in Synechocystis sp. strain PCC 6803, a Kdp kinase (Slr1731, Ctc1) might transfer a phosphate to 6803sll0396 (Crr28).

ManR (Crr16, Rre16) regulates manganese homeostasis in Synechocystis sp. strain PCC 6803 together with the HKI ManS (Chk27, Hik27) (100, 148). ManR orthologues exist in all of the strains that possess ManS, but they are never adjacent to their putative cognate kinases. NblR (7942_403113030, Crr73) has been described as an NblS partner that regulates expression of NblA, a protein required for the degradation of phycobilisomes under stress conditions in S. elongatus, but its precise cognate kinase awaits identification (125). Crr73 orthologues with more than 60% identity are found only in the N2-fixing species and in T. elongatus. Another group consists of seven sequences, Crr71, that originate from each of the unicellular marine non-N2-fixing strains plus S. elongatus 7942. RppA (Sll0797, NrsR, Crr33) is the RppB (Sll0798, NrsS, Chk30) partner and is located upstream from it on the Synechocystis sp. strain PCC 6803 genome. This pair was first found to be involved with redox control of photosynthesis and pigment-related genes (71) and more recently in nickel sensing (76). No orthologue was found, though Chk30 proteins seem to also exist in Anabaena sp. strain PCC 7120 and A. variabilis. For these two strains, however, no RR is adjacent.

(ii) NarL subclass (LuxR output domain). Relatively few of the NarL-type RRs (14) have assigned functions. Ycf29 (Crr1) is the only one found in all 16 sequences (Fig. 2 and 3). As mentioned above, the Slr1783 protein (Rre1) would be the partner of Hik2 and Hik34 in the response of Synechocystis sp. strain PCC 6803 to hyperosmotic stress (105, 126). In this strain, crr1 may be an essential gene, as no group has reported segregated interposon mutants (V. Zinchenko, CyanoMutants, at http://www.kazusa.or.jp/cyano/; N. Burnett, personal communication). Copies of this gene are also found on the plastid genomes of the red algae Guillardia theta, Porphyra purpurea, Cyanophora paradoxa, Cyanidioschyzon merolae, Gracilaria tenuistipitata, and Cyanidium caldarium.

In Anabaena sp. strain PCC 7120, the RRII-NarL OrrA (Alr3768, Crr81) has been found to be involved with the response to osmotic stress (124). It is not an orthologue of either of the two proteins, Crr3 and Crr31, identified for similar stress responses in Synechocystis sp. strain PCC 6803, but it has orthologues in the other two filamentous heterocystous species.

(iii) AraC subclass (AraC output domain). The last RRII group has an RR domain fused to HTH-AraC domains which, as a pair, form the DNA-binding domain of the AraC family of response regulators (139). In general, AraC transcriptional regulators are classified as having any receiver domain fused to the HTH_AraC domains. Only nine cyanobacterial sequences were found to have an RR fused to AraC. As usually observed for the sequences belonging to this family, the HTH motif is situated toward the C terminus. The three-dimensional structure of such a protein, E. coli MarA, has been solved. It showed that the two HTH_AraC subdomains are separated by 27 Å, which causes the cognate DNA to bend. There is a single such gene in Synechocystis sp. strain PCC 6803 and A. variabilis, two in Anabaena sp. strain PCC 7120, five in N. punctiforme (one plasmid encoded), and only one group (Crr90) with orthologues in the three filamentous heterocystous strains.

RRIII. Some cyanobacterial response regulators have two or even three RR domains, together with Treg and Hpt (for "histidine phosphotransfer"), and in one group GGDEF domains. The ORFs that have one RR upstream and two downstream of the T_reg-Hpt domains are from the heterocystous N2 fixers, with one from G. violaceus (Crr93). They presumably function as conditional transcriptional regulators via phosphotransfer relays. Hpt domains are known to interact with more than one RR domain and are thus particularly well suited for cross-talks. The recently demonstrated coordination of synthesis and proteolysis of RpoS in E. coli by the two-component phosphotransfer network that involves ArcB, ArcA, and RssB is a good example (86). RcaC from F. displosiphon has a domain organization similar to that of Crr93. This protein has been described as involved in complementary chromatic adaptation (30). Both the N-terminal RR and Hpt domains were found to be important for the light-regulated control of phycocyanin gene expression, whereas the C-terminal RR only had a minor role (72).

RRIV. The vast majority of the proteins in RR class IV (RRIV) do not have any DNA-binding domains, but a number have output domains with putative catalytic activities. More than 40% of these polypeptides possess a GGDEF domain, also named DUF1. This domain was first recognized in Caulobacter crescentus PleD, a response regulator controlling cell differentiation, before being found in proteins involved in cellulose biosynthesis, cell adhesion, or aggregation (119). It is highly "promiscuous," as it is found associated as a module with a multitude of different domains. It has recently been demonstrated that PleD possesses catalytic guanylate cyclase activity (107). Expression of recombinant GGDEF domains from ORFs found in six very different bacteria (including the Synechocystis sp. strain PCC 6803 Slr1143, a GAF-GGDEF protein) demonstrated that (i) they all possess diguanylate cyclase activity and (ii) for Borrelia burgdorferi Rrp1 (a RR-GGDEF protein), phosphorylation of the RR is required for activity of the GGDEF domain (120). Thus, the GGDEF domains will represent the output of complex bacterial signal transduction networks, which convert different signals into the production of a secondary messenger, cyclic diguanylic acid (c-di-GMP). The cyclase activity correlates well with the correspondence between GGDEF and the catalytic domain of adenylate cyclases (40, 108). GGDEF domains can be found associated with an EAL domain (also known as DUF2), which is a good candidate for a diguanylate phosphodiesterase function (40). The corresponding proteins would then have opposing cyclase and hydrolase activities (107). Cyclic diguanylate-specific phosphodiesterase activity has recently been demonstrated from an overexpressed E. coli ORF containing an EAL domain (122). Some cyanobacterial RRs exhibit this kind of association, eventually with additional PAS and/or PAC domains, but most of them have only one of these two domains. It is worth mentioning, however, that although Synechocystis sp. strain PCC 6803 has one such protein (Crr41), the strain possesses very little, if any, c-di-GMP, at least under standard conditions (J. Houmard, unpublished data). Synechocystis sp. strain PCC 6803 Crr4 has both a GAF (Phyt_2) and GGDEF domain fused to an RR. RRs with a GGDEF domain are found in all cyanobacterial species except the open-ocean non-N2 fixers. In two instances, multiple N-terminal RRs are associated with a GGDEF domain, one (7942_403091170) also having a DNA-binding Treg domain.

There are six examples of RRs with an HD (for "phosphohydrolase activity") output domain. The latter is found in enzymes such as cyclic nucleotide phosphodiesterase, 2'-nucleotidase, and phosphatase (8, 147). A knockout of the Synechocystis sp. strain PCC 6803 slr2100 gene (Crr20) indeed results in changes in the intracellular cyclic nucleotide (cGMP) concentrations and in an increased sensitivity of the cells to UV-B radiation (24). This protein is thus involved in cGMP homeostasis and light signaling. The other five RR-HD proteins form an orthology group. A Synechocystis sp. strain PCC 6803 crr18 (sll1624) null mutant has also been constructed and did not exhibit a phenotype similar to the crr20 mutant (24). T. erythraeum is the only diazotroph which does not possess such a protein, but it is also the only one to have an RR-GuC, which thus probably has purine nucleotide cyclase activity (discussed below).

The protein phosphatase 2C-like domain (PP2C, also referred to as SpoIIE) is found in PP2C and adenylate cyclase and in SpoIIE, which is known for its role in sporulation in Bacillus subtilis (17). Some of these proteins may have a role in cell division or differentiation. A PP2C domain is found as a C-terminal fusion to an RR in all filamentous species and T. elongatus but not in C. watsonii. This distribution closely resembles that observed for the HKIVs, which have S/T kinase domains. For one orthology group (Crr100), there is an additional GAF domain associated. Finally, there are examples of N-terminal RRs fused to GAF, PAS, cNMP, CheC, CheW, CheB, Pyr_red, or IF2 domains. Many of these ORFs are found in only one species and may result from recent fusions of domains.

Hybrid Kinases

The important feature to notice for this group of complex multidomain proteins is their complete absence from the open-ocean non-N2-fixing species. The nomenclature of each subclass is based firstly on the position of the RR domain relative to the HisKA-HATPase domains. HY class I (HYI) groups have ORFs with a single RR N terminal to the HK, and HYII groups are those with a single RR C terminal to the HK. HYIIIs correspond to those ORFs with either two or three RRs C terminal to a single HisKA-HATPase. An HYIV-type protein has a single HK with at least one RR on each side. HYVs have two HKs with one RR in between; HYVI groups are the ORFs that have HATP (a domain found in several ATP-binding proteins), CheW, and RR domains with additional Hpt and/or Hkd domains; and HYVIIs are incomplete hybrids with either HisKA or HATPase domains that may be linked to additional modules.

HYI. HYI-type proteins are totally absent from S. elongatus and T. elongatus. About half of the "orthology" groups consists of a single protein. Only one HYI has a known physiological function, 6803sll1229 (Hik41, Chy41). It has been found to respond to salt (NaCl) stress, together with Synechocystis sp. strain PCC 6803 Hik16 (Chk16) (80). Synechocystis sp. strain PCC 6803 has five "simple" HYIs, of which three (Chy38, Chy40, and Chy41) have in some strains another hybrid kinase immediately upstream and of which one (Chy23) has an HK-RR pair (AphA-Rcp1). Thus, they may belong to multiphosphorelay systems, although the colocalization of the genes involved in phosphorelays is not a prerequisite. Indeed, although Chy41 would be part of such a relay for the response of Synechocystis sp. strain PCC 6803 to hyperosmotic stress, its partners Chk16 and Crr17 are not encoded by adjacent genes (105, 126). Similarly, the Anabaena sp. strain PCC 7120 genes for AphC and CyaC, between which phosphotransfer has been evidenced (see below), are not closely localized. Some HYIs have a variable number of PAS and PAC domains in between the RR and the HK plus, for a few of them, one or two GAFs. No function has yet been assigned to any of these ORFs.

For some HYIs, an HWE (HisKA_2) histidine kinase domain substitutes for the HisKA. Members of this family differ from most other HKs by lacking a recognizable F box and the presence of uniquely conserved residues: a His in the N box and the sequence WE in the G1 (64). Though found in many different species, such proteins are not as widely distributed as HisKA. They are particularly abundant in the Rhizobiaceae family. HWE domains were previously not detected in cyanobacteria, but the present analysis shows that each of the heterocystous species has one. Anabaena sp. strain PCC 7120 and A. variabilis each have a very large HWE kinase (Chy58, ~1,700 aa long), which also has GAF and PAS-PAC domains. One N. punctiforme HYI (Chy109, NpF1799) has an HKA_3 kinase domain, another HisKA alternative.

HYII. Only 11 of the 97 HYIIs do not have additional domains. For the others, various associations involving 14 different structures exist, a large number of these ORFs having PAS, PAC, and/or GAF domains. About one-fifth of the HYII "orthology" groups have HK orthologues with similar structural organization but without the C-terminal RR. For example, 7120all1716 and Avar_400180300 (Chy178) are orthologues of HKIII-CheR/B 7421gll1854. 7120all0978, Avar_400180780, and NpunNpF5679 (Chy179) are orthologues of the fairly similar G. violaceus HKV-HTH_4 (Chk179, Gll3736).

S. elongatus HYII-GAF (Chy24) corresponds to CikA, a bacteriophytochrome that resets the circadian clock (123). No orthologue exists in G. violaceus or in the genomes of the marine non-N2 fixers, and the structure differs between the strains. For T. elongatus and the three filamentous heterocystous strains, it is HKII-GAF(PHYT_2) (Chk24) without any RR. A detailed characterization of S. elongatus 7942 CikA showed that (i) it can covalently bind bilin chromophores in vitro, even though it lacks the expected ligand residues (it may not serve, however, as a photoreceptor itself); (ii) deletion of the GAF domain or the N-terminal region adjacent to GAF dramatically reduced autophosphorylation of the HK domain, whereas elimination of the receiver domain increased activity by 10-fold; and (iii) the RR domain, which lacks the conserved aspartyl residue that serves as a phosphoryl acceptor in response regulators, would not work as bona fide receiver domain in a phosphorelay but could interact with an unknown protein partner to modulate the autokinase activity of CikA (92). In CikA, both the GAF and RR noncanonical modules would act as protein-proteininteraction domains that induce conformational changes in another domain to modulate its activity.

There is one subclass that contains only four sequences, all from T. erythraeum. All of these ORFs have a C-terminal GuC domain and thus likely possess a purine nucleotide cyclase activity. Though the presence of multiple nucleotide cyclases (AC/GC) has already been reported for cyanobacteria (see, for example, references 67 and 99), the different proteins were usually made of different domain arrangements. T. erythraeum has by far the highest number of such enzymes (13, compared to 5 or 6 for the heterocystous strains). Among the four HY-GuC proteins, three have the requirements for being adenylyl cyclases (Chy145, plus Chy129 and Chy130, which are adjacent on the chromosome), the fourth one (Chy131) having those for a guanylyl cyclase (99).

HYIII. Thirty-one ORFs differ from the previous hybrid kinases by having at least two C-terminal RRs in tandem. N. punctiforme NpR2263 is the only one that does not posses any additional domains, almost all having either PAS and/or PAC or GAFs. One member of this subclass, Anabaena sp. strain PCC 7120 Alr2279 (Chy133), has an additional N-terminal HNOBA domain (not identified by Pfam). The HNOBA domain could potentially contain a PAS-like fold. A homologous domain is also found in the first 200 aa of the N. punctiforme NpR4835 (Chk50 [58]). The two other Chk50s do not have it. HNOBA domains functionally interact with HNOB (for "heme, no binding") domains located on a second protein. The HNOB domain is predicted to function as a heme-dependent sensor for gaseous ligands (NO, CO, or possibly O2). Proteins carrying such domains (7120alr2278 and its orthologue NpunNpR4836) are encoded by the upstream genes in the two cyanobacterial examples. As stated by Iyer et al. (58), the co-occurrence of the HNOB and HNOBA domains in either the same protein or proteins encoded by the same operon suggests a strong functional interaction between them. The potential role, if any, of NO in cyanobacteria deserves further studies.

About one-third (13/31) of the "orthology" groups have only one representative, and another third have orthologues but with a different domain structure. Synechocystis sp. strain PCC 6803 Chy21 is of particular interest. It is, at present, the only cyanobacterial two-component protein that has an MHYT domain, a newly identified conserved protein domain with a likely signaling function (39). A model of the membrane topology of the MHYT domain indicates that its conserved residues could coordinate one or two copper ions, suggesting a role in sensing oxygen, CO, or NO. This protein is just upstream from and cotranscribed with the Chk40 HYI, which is followed by the RRIV-HD Crr20, a protein involved in cGMP homeostasis and UV-B response (see above) (24). This cluster, to which Chy22 (HYII-GAFPAC-Hpt) is very close, could thus form a large multiphosphorelay system sensing changes in the environmental parameters and involving cGMP as a second messenger (98). Cyclic nucleotide concentrations have already been shown to vary in some cyanobacteria upon oxic-anoxic transitions, for example (reviewed in reference 26).

HYIV. The HYIV hybrid kinases have RRs on both sides of the kinase domain. All but one of the orthology groups have additional sensing domains: PAS (plus PAC for most of them) and/or GAF. One group, Chy90, shows sequences with a histidine kinase and four RRs and up to seven different types of associated domains. G. violaceus Chy90 is about 1,000 residues less than the three others and is annotated as a (PAS/PAC)2-HK-RR HYII. However, a 611-aa-long RR [Glr4211, RR-Treg-(RR)2] is immediately upstream, the A of its stop codon also being used as the first base for the ATG of Glr4212 (Chy90). A careful analysis of the sequence would be required to ascertain that there was no sequencing frameshift error. The four proteins have an adjacent RR downstream, which is also orthologous. The cluster organization would thus have been conserved through evolution from G. violaceus to the filamentous heterocystous strains.

Synechocystis sp. strain PCC 6803 Chy19 (Hik19) has been found to be an essential gene (based on no complete segregation of the mutation) involved in the transduction of low-temperature signals (135). It might function downstream from Chk33 (DspA), transducing the low-temperature signal by phosphorylating Crr36 (PixG), which in turn controls desB gene expression.

The HYIV+GAF-GuC Chy89 proteins, known as CyaC, are orthologues present in every filamentous strain, whether an N2 fixer or not, as it is also present in Spirulina platensis (66). An orthologue has also been found in Tolypothrix sp. strain PCC 7601, also known as Calothrix sp. strain PCC 7601 or Fremylla diplosiphon (L. Jia and J. Houmard, unpublished data). Kasahara and Ohmori (65), studying CyaC from S. platensis, demonstrated that the HK domain will autophosphorylate and will transfer the phosphate to the adjacent C-terminal RR domain, whereas the N-terminal RR domain, separated from the HK domains by two GAF domains, was not phosphorylated by it. Replacement of the conserved aspartate residue by alanine in the N-terminal RR did not affect the activation of cyclase activity in vitro. S. platensis CyaC has been crystallized, and the mechanism of bicarbonate activation has been studied (130). CyaC is one of the six AC/GC purine nucleotide cyclases found in Anabaena sp. strain PCC 7120 (101). Because a cyaC mutant has a very low cAMP level, it has been proposed to be responsible for the maintenance of the steady-state level of cellular cAMP. On the other hand, it has been demonstrated that in Anabaena sp. strain PCC 7120 the phytochrome-like AphC (Chk65) mediates the increase in cAMP concentration induced by far-red light. Okamoto et al. (104) have proposed a model in which far-red light illumination provokes the autophosphorylation of AphC, followed by a phosphotransfer to the N-terminal RR domain of CyaC (Chy89). The HK domain of CyaC will then autophosphorylate, the phosphate will be transferred to the downstream RR, and the catalytic activity domain will in turn be activated. The cAMP produced could then, through binding to CRP-like proteins, regulate different adaptation processes. This is one of the very first examples of a signal transduction mechanism involving a two-component system phosphorelay described for cyanobacteria.

HYV. A few hybrid kinases possess two complete HK and one or two RR domains. None has any known or putative function yet. They also have either PAS or PAC domains. N. punctiforme Chy139 (NpF2346) has a UPF/RHH_2-type N-terminal domain, described for a few 80-aa-long hypothetical proteins, members of the MetJ/Arc repressor superfamily clan of unknown function.

HYVI. Another group of 25 ORFs all have an additional CheW domain, between the HATP and the RR, as well as an Hkd and/or Hpt domain upstream of those. They would thus be involved in chemotaxis signaling mechanisms. The Synechocystis sp. strain PCC 6803 Chy18 (PixL, TaxAY1, Hik18) and Chy39 (TaxAY2, Hik39) proteins have been shown by analysis of the phenotype of the corresponding mutants to regulate phototaxis (20). Chy43 could represent the C-terminal part of a CheA-like protein which is required for motility, transformation competency, and the assembly of thick pili, the N-terminal Hpt domain of this CheA-like protein being separately encoded by pilN (Hik36 and Ctc36, an orphan Hpt protein). Though sharing a very similar organization and gene repertoire with the tax1 cluster (slr0038 to slr0043), the tax2 cluster (sll1291 to sll1296) would not be involved in motility (19). It is worth noting that each of the three Synechocystis sp. strain PCC 6803 TaxAY (Chy18, Chy39 and Chy43) proteins would be connected to two different (adjacent on the chromosome) RRs, an RRI-CheY and an RRI-PatA: Chy18 working with Crr36 (Sll0038) and Crr35 (Sll0039), Chy39 with Crr12 (Sll1291) and Crr11 (Sll1292), and Chy43 with Crr6 (Slr1041) and Crr7 (Slr1042).

HYVII. The last class, HYVII, groups putative hybrid proteins with RRs but which only have either a HisKA or an HATPase domain. Synechocystis sp. strain PCC 6803 Chy180 (Rre22) consists of an N-terminal HisKA domain with an RR domain and a PP2C-like (PP2C_SIG) domain downstream. Although the gene name ppcE was assigned to Chy180, no description of its function could be retrieved. Such an acronym exists for probable peptidases. C. watsonii has two paralogous HATP-RRs, one of which could be a complete HYI if the putative sequencing error does exist. None has any known function.

Other Two-Component Related ORFs

There were only five examples of an orphan Hpt domain detected. The Hpt domain, more frequently found in histidine kinases and hybrid sensors, is known to act as a phosphoacceptor and phosphodonor in phosphorelays from one RR domain to another (132). As mentioned above, Hpt domains, because they interact with more than one RR domain, are especially well suited for cross-talks (86). The orphan Hpt YPD1 from Saccharomyces cerevisiae is known to greatly increase the half-life of its phosphorylated cognate response regulator (59). This could apply for the function of Ctc36 (PilN) in Synechocystis sp. strain PCC 6803 phototaxis (see above). For two Anabaena sp. strain PCC 7120 ORFs, mentioned by Ohmori et al. (102), Alr4086 and All8565, we could not find any identifiable Hpt domain by InterProScan, SMART, or PBLAST (searches performed at NCBI) (2).

KdpD proteins form a different family of histidine kinases. In E. coli, the KdpD domain senses turgor pressure and Usp forms the output domain (144, 153). It phosphorylates and interacts with its cognate (RRII-OmpR type) RR, KdpE (46). There are three examples in A. variabilis, two of which are encoded by plasmid B, and single examples in the other heterocystous strains, as well as in Synechocystis sp. strain PCC 6803, S. elongatus, and G. violaceus. The G. violaceus copy is interesting, as it has a tandem duplication of the two domains. Ballal et al. (15) have demonstrated an interaction between the N-terminal TM domains of Anabaena sp. strain L-31 with E. coli KdpD, which alters the phosphatase activity. KdpD (Ctc1) is also mentioned, probably on the basis of two-hybrid experiments, as the cognate phosphodonor for the RRII-OmpR Crr28 (Rre28) (http://www.genome.ad.jp/dbget-bin/show_pathway?syn02020+slr1731).


   ORTHOLOGOUS GROUPS
 Top
 Previous
 Next
 References
 
Many examples of highly versatile permutations and combinations of a number of conserved modules have been found. Fusions to various domains increase the versatility of a protein family and allow its recruitment into various cellular regulatory pathways. It has been reported that multidomain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In addition, for two proteins containing the same combination of two structural superfamilies the probability of them sharing the same function increases to 80%, and even up to >90% in the case of complete coverage along the full length of both proteins (47). Sensory domains, which are highly represented in multidomain proteins, have also been shown to evolve faster than other domains (receiver, transmitter, and output domains in the case of two-component proteins [146]). Finally, domain insertion may occur without affecting the function of a protein.

"Orthology" groups were thus defined that are based on the bidirectional best hits from BLAST searches of each organism against each other organism, completed by phylogenetic analyses. A tentative gene nomenclature is also proposed in Table 1 and in Table S3s in the supplemental material. Direct comparisons between ORFs will be restricted to the orthologues, the time of divergence being assumed to be the same, i.e., that of speciation. Sixteen proteins, plus six groups of two or more, do not have any orthologue in any other sequenced bacterial genomes. In contrast, for the cyanobacterial Chy83 proteins (which are 1,000 to 2,000 residues long), orthologous proteins of more than 1,000 amino acids can be found in some 22 bacterial genomes with BLAST E values of 0.0.

HKs and RRs Common to All Genomes

All of the sequenced genomes would encode a Chk33 protein (nblS, dspA, or ycf26). As mentioned above, this multidomain protein probably acts as a "hub" connecting various environmental signals to their specific signal transduction pathways. A Chk2 protein would also be present in all cyanobacteria, though with a different modular organization. It is either a GAF-HK or a "basic" HK without GAF for the unicellular marine nondiazotrophic strains and G. violaceus. The Chk34 orthologues which might be involved in salt stress (see above) have only a HisKA domain in all species except the G. violaceus representative, in which it is a complete histidine kinase. For them, orthologues do not exist in any of the other fully sequenced bacterial genomes. Quite interestingly, this Chk34 protein was also shown to be essential for thermotolerance in Synechocystis sp. strain PCC 6803, possibly by negatively regulating the expression of certain heat shock genes (134). It is thus likely a quite important protein for cyanobacteria. S. elongatus 7942 Chk8 (SasA) has orthologues in all genomes but G. violaceus, but without cognate response regulators yet identified. This protein is implicated in the circadian rhythm clock output (57, 61). It will be of interest to check whether or not G. violaceus exhibits a circadian rhythm. Four classes of RRs are made of orthologues from all genomes: Crr1 (Ycf29), Crr26 (RpaB, Ycf27), Crr31 (RpaA), and Crr37. These proteins must perform key roles in regulating important processes in cyanobacteria, particularly as they are retained in the "streamlined genomes" of Prochlorococcus species. Crr23 (Ycf55) occurs in all species but G. violaceus.

As the acronyms indicate, Ycf26, Ycf27, Ycf29, and Ycf55 orthologues are also found in the plastid genomes of red algae and/or diatoms, suggesting that the corresponding genes are very ancient and should have already existed in the cyanobacterial ancestor who gave rise to the plastids (13, 45). Most two-component genes are not essential to the growth of cyanobacteria under standard laboratory conditions and can be inactivated. Fully segregated Synechocystis sp. strain PCC 6803 knockout mutants could not, however, be easily obtained for chk33 (ycf26), crr1 (ycf29), crr23 (ycf55), crr26 (ycf27), and crr37, further supporting the key roles of the corresponding gene products (12, 90, 135, 142; N. Burnett, unpublished data). A similar result was obtained for crr37 in Anabaena sp. strain PCC 7120 (91).

As mentioned above, in Synechocystis sp. strain PCC 6803, RpaA (Crr31) would be a partner of Chk33 (DspA/NblS) in the response of the cells to hyperosmotic stress (105, 126). The pair will thus have been highly conserved throughout evolution. RpaA (Crr31) and RpaB (Crr26) are the closest paralogues, and both of them are about 41% identical and 61% similar to B. subtilis YycF, an RR which m