
Department of Basic Medical Sciences, Biochemistry Section, University of the West Indies, Mona Campus, Kingston 7, Jamaica,1 Ecole Normale Supérieure, CNRS UMR 8541, Génétique Moléculaire, 46 rue d'Ulm, 75230 Paris Cedex 05, France2
SUMMARY INTRODUCTION BIOINFORMATIC GENOME ANALYSIS CYANOBACTERIAL TWO-COMPONENT ORF REPERTOIRE: STRUCTURE AND FUNCTION Structural Domains Found in Cyanobacterial Two-Component Proteins Histidine Kinases Incomplete HKs. HKI. HKII. HKIII. HKIV. HKV. Response Regulators RRI. RRII. (i) OmpR-type subclass (T_reg output domain). (ii) NarL subclass (LuxR output domain). (iii) AraC subclass (AraC output domain). RRIII. RRIV. Hybrid Kinases HYI. HYII. HYIII. HYIV. HYV. HYVI. HYVII. Other Two-Component Related ORFs ORTHOLOGOUS GROUPS HKs and RRs Common to All Genomes HY Subclasses DISTRIBUTION OF TWO-COMPONENT ORFs LOCALIZATION AND PHYSICAL ORGANIZATION OF TWO-COMPONENT GENES EVOLUTION AND PHYLOGENY Strain Phylogeny Gene Origin Domain Shuffling, Fusion, and Gene Loss CONCLUDING REMARKS ACKNOWLEDGMENTS REFERENCES
| SUMMARY |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Genome sequences of the Cyanobacteria have revealed that they likely make extensive use of a variety of two-component proteins to regulate responses to the environment (34, 89, 102, 118). In May 2005, 8 completely annotated cyanobacterial sequences were available in Cyanobase (http://www.kazusa.or.jp/cyano/), and a total of 16 sequences, not all completely annotated, were available from the U.S. Department of Energy Joint Genome Institute (the Integrated Microbial Genomes [IMG] system [http://img.jgi.doe.gov/cgi-bin/pub/main.cgi]). The possibility of performing an extensive comparative analysis of the repertoire of two-component genes in each organism was thus opened up. Available genomes come from unicellular and filamentous freshwater or terrestrial strains (including one thermophile) and from marine environments. Five of the 16 strains are capable of nitrogen fixation. The strains belong to three of the five subsections defined within the BX phylum (Cyanobacteria) of bacterial taxonomy (28). There is no representative of subsection II (species that reproduce by the formation of baeocytes, i.e., subsequent multiple fission of a cell that yields motile baeocytes) or of subsection V (branching filamentous heterocystous cyanobacteria [a heterocyst is a differentiated cell specialized in nitrogen fixation]). For subsection III (i.e., filamentous nonheterocystous cyanobacteria that divide in only one plane), the single sequenced genome is not really representative of the group because it can reduce molecular dinitrogen; it is the only strain known to be able to do so. In any case, it is worth noting that no single strain can truly be representative, because the subsection is polyphyletic and must now be considered an artificial grouping.
This survey presents a detailed analysis of the cyanobacterial two-component system repertoire. The species names of the 16 cyanobacterial strains and their morphologies, main features, and habitats, as well as the acronyms used at the beginning of gene names, are shown in Fig. 1 and in Table S1s in the supplemental material. The organization of the sensor, receiver, transmitter, and response domains is discussed in terms of the significance for the function of each family of two-component proteins and how the repertoire of such proteins found in each species of cyanobacteria relates to its requirement for regulation of its internal cellular activity. The 1,171 proteins found have been classified according to structural domain organization and orthology relationships. Whenever known, the function of the two-component proteins is mentioned and its occurrence within an orthology group is discussed in relation with the physiological properties and ecological niche of the strains that share it. Phylogenetic studies were performed to estimate the relative contributions of gene fusion, duplication, insertion, deletion, and shuffling during evolution. Finally, a generic gene name is proposed for each orthology group, even if at present the groups are composed of a single representative and the corresponding proteins do not yet have an assigned function, to aid in identification and future research. Corresponding names were attributed to the putative proteins: Chk for the histidine kinases, Crr for the response regulators, and Chy for the hybrid kinases (Table 1; see also Table S3s in the supplemental material). If within a group both HK and HY genes coexist, they have been named with chk and chy acronyms, respectively, with the same number attached, e.g., chk15 and chy15. To avoid, as much as possible, confusion within the literature, since numbers had already been assigned to almost all of the Synechocystis sp. strain PCC 6803 two-component proteins (described as HikX and RreY), we have kept the same numbers and used them for naming of the orthologues; numbering was continued from there for the new groups. Other sensing systems and regulators (S/T kinases, AC/GC, and one-component proteins) also contribute to the rather sophisticated regulatory pathways evolved by cyanobacteria, but they have not been considered in this review except when they are fused to two-component system protein domains.
|
|
| BIOINFORMATIC GENOME ANALYSIS |
|---|
|
|
|---|
|
| CYANOBACTERIAL TWO-COMPONENT ORF REPERTOIRE: STRUCTURE AND FUNCTION |
|---|
|
|
|---|
|
A HisKA domain is about 60 amino acids (aa) long and constitutes the dimerization and phosphoacceptor domain of the HKs. HisKA_2 (HWE) and HisKA_3 are alternative dimerization and phosphoacceptor domains. Hkd is the homodimer interface of the signal transducing histidine kinase family, which often overlaps the HisKA domain. To form a histidine kinase (HIS_KIN), a HATPase_c domain (HATP), which is usually adjacent to and downstream of HisKA or homologues, is required. Such domains are found in many ATP-binding proteins and are necessary for kinase activity.
A canonical basic response regulator domain (RR or
CheY) can be schematically described as two aspartate (D) residues and
a lysyl (K) residue appropriately spaced within an
120-aa
sequence. They usually are N terminal to output domains, some of which,
the transcriptional regulators, have the property of binding to
specific DNA sequences. Depending on sequence similarities, response
regulator proteins are often classified in subfamilies named from the
best-studied CheY, OmpR, NarL, or LuxR proteins, for example.
As the acronym indicates, GAF domains (for "cGMP phosphodiesterase, adenylyl cyclases, and the bacterial transcription factor FhlA," about 130 aa long) have been linked to small-molecule binding, in particular the cyclic nucleotides cyclic AMP (cAMP) and cGMP, which are common second messengers in signal transduction (5, 9, 23, 53). They are found in different proteins related to the cNMPs, cyclases, and phosphodiesterases, as well as to light-signaling phytochromes (9, 85). The GAF family is among the largest of all classes of signaling domains. GAF (Phyt_2) is a member of this large family. PHYT is a light wavelength sensor domain to which a linear tetrapyrrole is bound through a thioether linkage via a Cys residue (113). It permits the reversible photochemical conversion of a protein between two forms.
PAS and PAC domains are often found to be associated. PAS derives from the names of three proteins that the domain occurs in: Per, period circadian protein; Arnt, Ah receptor nuclear translocator protein; Sim, single-minded protein. The acronym PAC is derived from "PAS-associated, C-terminal," such sequences contributing to the PAS domain fold. The division between the PAS and PAC domains is caused by major differences in sequences in the region connecting these two motifs. A subset of PAS domains, the best-characterized members of this family, binds cofactors such as heme and flavin adenine dinucleotide. Sensing of light, oxygen, or redox potential requires cofactors, while signals such as voltage, xenobiotics, and nitrogen availability do not (16, 21, 43, 110). PAC domains can be found without an associated PAS domain. GAF and PAS domains exhibit striking similarity in their structures, and proteins carrying such domains are clearly linked in their evolution (53). The common theme among both classes of proteins with such domains is the binding, either covalent or not, of a remarkably diverse set of small regulatory molecules that often remain unidentified (5). The two domains are presumed to be functionally similar.
Thirteen putative proteins, originating from only five species (all diazotrophs except Thermosynechococcus elongatus) only have a HATP domain. Half of them are from Nostoc punctiforme, with NpF3113 (128 residues) being 100% identical at the amino acid level to NpF2204 and the adjacent NpF2205 (75 residues) being 100% identical to NpF3114. These may represent recent gene duplications. The fifth N. punctiforme representative, NpunpNPBR204, is carried by a plasmid and has orthologues in Anabaena sp. strain PCC 7120 and Anabaena variabilis, all being about 250 residues. Three of these putative proteins have additional N-terminal GAF domains and form a group of orthologues. Three additional proteins, listed as HYVII, consist of either a HisKA or a HATPase N-terminal domain linked to a response regulator (RR) downstream. Finally, a class composed of 26 proteins formed by different combinations of five basic domains, of which one is HATPase-c, appears in Fig. 2 as HYVI and HKV+CheW for the Trichodesmium erythraeum representative. It is discussed in more detail below. Whether such polypeptides act by forming complexes with specific HisKA proteins is a hypothesis that must be tested.
HKI. HK class I kinases (HKIs), having only HisKA and HATP domains, can be considered basic proteins, i.e., serving as building blocks for the more sophisticated domain arrangements that also exist in cyanobacteria. However, it is likely that many of these ORFs have signaling domains that have not yet been identified. This possibility is highlighted by a number of HKIs that have one or more transmembrane (TM) domains that could flank putative signaling domains (Table 1; also see Table S3s in the supplemental material). None were found in the marine unicellular non-N2-fixing strains, with the exception of Pro1543 in Prochlorococcus sp. strain SS120. Depending on strains, they may represent one-fifth to one-twentieth of the whole histidine kinase repertoire, being more abundant in the unicellular species. There are 36 groups of orthologues; six of them are proteins found only in the three filamentous heterocystous strains, and three are made up of Anabaena sp. strain PCC 7120 and A. variabilis proteins only. In nine instances, an HKI may have orthologues which have a more complex structure (highlighted with asterisks in Table 1 and Fig. 2). As an example, Anabaena sp. strain PCC 7120 All2956 has an HYI orthologue in C. watsonii (Cwat_400862090) and an HKIV (cNMP-HK) (Gll1662) orthologue in G. violaceus. Chk7s constitute another example, as in most of the orthologues a PAS domain is also present (see the discussion of SphS, below). Six HKIs do not have any bacterial orthologue, 7120all7605 and NpunpNPBF140 being plasmid encoded. It is interesting that a G. violaceus protein (Gll0380) has a single orthologue in all of the sequenced bacterial genomes found in Archaeoglobus fulgidus, a hyperthermophilic marine sulfate reducer isolated from a hydrothermal environment. Phylogenetic analyses place G. violaceus close to the root of the cyanobacterial lineage, and the Archaeoglobales are the only archaebacteria that can grow by sulfate reduction, a property restricted to relatively few groups of eubacteria.
The largest group of orthologues (Chk27) contains a representative from every species except the marine unicellular non-N2-fixing strains. Synechocystis sp. strain PCC 6803 ManS, a protein involved in manganese homeostasis (100), is one of them. On the other hand, the Anabaena sp. strain PCC 7120 HepX (Alr0117, Chk52) has been reported to be involved in heterocyst development (heterocyst envelope polysaccharide [97]). It has orthologues not only in the other two heterocystous strains but also in the other two N2-fixing strains, as well as, surprisingly, the unicellular freshwater S. elongatus 7942.
Synechocystis sp. strain PCC 6803 Sll0798 (Chk30; termed RppB or NrsS) has been shown to control the Ni2+-dependent induction of the nrsBACD operon and to be involved in Ni2+ sensing (76). Such a member of the bacterial binding protein-dependent transport systems would also be present in A. variabilis and Anabaena sp. strain PCC 7120. On the other hand, the inactivation of both sll0790 (hik31, chk31) and slr6041 (chk46), two HKI paralogs sharing 97.5% identity, leads to the conclusion that the gene products are involved in the regulatory mechanisms that allow Synechocystis sp. strain PCC 6803 to adapt from photoautotrophic to photomixotrophic growth (62). This HKI would be required for the expression of icfG (encoding glucokinase) and the modulation of the glucose-6-phosphate dehydrogenase, thus having a dual role.
HKII. HK class II (HKII) groups the putative proteins that have HK linked only to either one or more GAF and/or PAS or PAC domains. These domains are encountered in quite large numbers in bacteria and euryarchaeota (40), with PAS domains being more common than GAF, except in Synechocystis sp. strain PCC 6803. Compared to most other bacteria, the large number of GAF domains correlates and underlines the role of light in the regulation of gene expression and metabolic activities for photosynthetic organisms (40). None of the seven marine unicellular non-N2-fixing strains have any HKII+GAF, and only four of them have HKII+(PAS)1-3. Within this group there are 68 ORFs with only GAF sensor domains, 50 with PAS/PAC domains, and 28 with both. The analysis by Narikawa et al. (95) gives 17 PAS-containing ORFs in Synechocystis sp. strain PCC 6803, 61 in Anabaena sp. strain PCC 7120, and 84 in N. punctiforme (compared to 9 for E. coli and 10 for B. subtilis).
Only one orthology group, Chk2 (HKII+GAF), has one protein from each of the 16 species, but for G. violaceus and the marine Synechococcus and Prochlorococcus spp. the proteins are shorter and lack a detectable GAF domain; they are thus classified as HKI. Functional data have been reported for one of its members, Synechocystis sp. strain PCC 6803 Slr1147 (Hik2), which would interact with the response regulator Rre1, as does Slr1285 (Hik34, which has no detectable HATPase_c domain; see above). In this strain it would regulate the expression of sigB and four other genes in response to hyperosmotic stress (105, 126).
Within HKII, the subclass HKII-phytochrome is one in which proteins of well-known function occur. Synechocystis sp. strain PCC 6803 Slr0473 (Cph1, Chk35), for example, has been characterized as a photoreceptor (35, 36, 149). Light-induced conformational change of the chromophore in Cph1 results in inhibition of the histidine kinase activity (35). Two paralogues, aphA and aphB, exist in Anabaena sp. strain PCC 7120 as well as in the two other heterocystous strains (101). Besides these four strains, only C. watsonii has a Cph1-like protein, as well as a paralogue that does not group with AphB. Other marine species do not have any. It is worth noting that orthologues are not as widely distributed as could have been expected from the study conducted on the chromophore-binding (PHYT) domain of these proteins (49). Most of the cyanobacteria examined there were indeed shown to share a rather well-conserved chromophore binding sequence. The other HKs with multiple GAF domains are essentially from the filamentous heterocystous strains. Following the observation that red light decreased whereas far-red light increased cellular cAMP content in Anabaena sp. strain PCC 7120, Ohmori and coworkers disrupted 10 ORFs having putative chromophore-binding GAF domains. The all2699 (chk65, aphC) mutant failed to respond to far-red light. They concluded that the far-red light signal could be received by AphC and then transferred to the N-terminal RR domain of the CyaC adenylyl cyclase, stimulating its catalytic activity. The increased cAMP concentration would then drive the subsequent signal transduction cascade (104).
About half of the HKII+(PAS)1-3 subclass corresponds to one orthology group (Chk7). This group is constituted of proteins from 12 species that do not all possess a PAS domain. Those which do not possess a PAS domain have an N-terminal TM domain instead, and the T. elongatus orthologue (tll0925) has both. One member, 7942_403099950, has been identified as SphS, a sensor whose cognate response regulator is SphR (Crr29), by complementation of an E. coli phoR creC mutant for the expression of alkaline phosphatase (1). The genes are adjacent to the RR upstream from the HK. The S. elongatus 7942 mutant that lacks these genes is defective in the ability to produce alkaline phosphatase and some inducible proteins in response to phosphate limitation. This was one of the very first cyanobacterial two-component systems to be characterized. The Synechocystis sp. strain PCC 6803 Hik7 and Rre29 orthologues have since been shown to be the dominant sensory system that controls gene expression in response to phosphate limitation (51, 136). Murata and coworkers (136) suggested that a two-component system homologous to SphS-SphR is likely conserved in all cyanobacterial species. However, no direct orthologue could be detected in T. erythraeum, Synechococcus sp. strain 9902, or P. marinus SS120 and MIT9313. T. erythraeum (Chk78) has a HKII+(PAS)1-3 that is orthologous to those of Anabaena sp. strain PCC 7120 and N. punctiforme paralogues of Chk7.
Putative histidine kinases with either PAS or PAS/PAC domains that occur either in single or multiple copies were essentially found in the filamentous heterocystous strains. There is one in T. erythraeum and two in Synechocystis sp. strain PCC 6803. Ten of the 24 putative proteins do not have any orthologues, and one of them (NpunpNPAR133) is plasmid encoded.
Another HKII subclass is made up of proteins with both PAS and/or PAC and GAF domains. Such proteins are totally absent in marine strains, except C. watsonii. The Ssl1473-75 acronym (Chk32) is used in Table 1 and Fig. 2 because it corresponds to the Synechocystis sp. strain PCC 6803 wild-type sequence, which is interrupted by an insertion (IS) element in the "Kazusa" strain that was sequenced (103). This fusion protein is about 40% identical to the Fremyella diplosiphon (Tolypothrix PCC 7601) RcaE protein (GAF-PAS-PAC-HK), which has been shown to be a photoreceptor involved in complementary chromatic adaptation (137). From the microarray data obtained with a Synechocystis sp. strain PCC 6803 chk16 mutant, the Chk16 protein could be directly involved in sensing NaCl concentration (80). Under hyperosmotic conditions, it would be part of a phosphorelay cascade involving Synechocystis sp. strain PCC 6803 Chk41 (Hik41) and Crr17 (Rre17) (105, 126). Interestingly, the Synechocystis sp. strain PCC 6803 and C. watsonii Chk16s, which possess an N-terminal MASE1 domain (the function of which is currently unknown [96]) in front of a GAF (Phyt_2) domain, have orthologues in Anabaena sp. strain PCC 7120 and A. variabilis, with only a PAS domain. The N. punctiforme Chk16 orthologue has a GAF (Phyt) domain between the PAS/PAC and HK domains. Notably, for each of the three heterocystous strains, the closest paralogues of Chk16 (i) have rather similar structures, (ii) are orthologues (Chk74), and (iii) are located immediately downstream of the chk16 genes. Gene duplication thus probably occurred in an ancestor common to these three strains before their divergence, and the two genes, chk74 and chk16, have subsequently evolved differently.
HKIII. Kinases of HK class III (HKIII) possess, in addition to the HisKA and HATPase, a HAMP (or "linker") domain. The latter is typically found downstream from the last TM segment of a protein, and it has been shown that two symmetrical HAMP domains dimerize and cooperate to transfer the signal across the membrane via a linker to the histidine kinase (155). The presence of a HAMP domain suggests that the corresponding putative ORFs likely function as a dimer. In many cases, it is linked to transmitting signals across a membrane from periplasmic ligand-binding domains (6, 7, 10). The HAMP domain localizes upstream from HisKA. One protein has a PAC (Chk159, 7421gll0814), 16 have a PAS (Chk33), and 6 have an additional Cache (a signaling domain common to calcium-channel subunits and chemotaxis receptors [4]) upstream from the HAMP, of which one has a GAF (Chk161, NpunNpF6040) in between the two domains and two PAS/PAC domains (Chk155 and Chk177). Cache is a signaling domain that is found in animal calcium channel subunits and a certain class of prokaryotic chemotaxis receptors. It is thought to form an extracellular or periplasmic ligand sensor (4). All of these proteins originate from filamentous N2-fixing strains, and four were found in the endosymbiosis-forming species N. punctiforme.
Synechocystis sp. strain PCC 6803 Chk10 (Hik10) has been reported to be involved in the response to hyperosmotic stress, forming a pair with the response regulator Crr3 (Rre3) (105). No function has been described for its orthologues. Another HKIII (Chk33), which possesses a PAS domain, would also be involved in this stress response. Remarkably, orthologues of this Hik33 protein exist in all of the 16 genomes, and they are the only examples of cyanobacterial proteins with such architecture. Other bacterial orthologues (without any function yet defined) are at present restricted to the Firmicutes (gram-positive bacteria). This protein (termed DspA or Hik33 in Synechocystis sp. strain PCC 6803 and NblS in S. elongatus) has been reported to sense many environmental cues: cold, osmotic changes, high light, and nutrient limitations (56, 80, 87, 90). Since it is present even in the strains that have only a small number of two-component systems, it likely plays a key role in cyanobacteria by integrating cellular metabolism with environmental parameters. It also has homologues in the plastid genomes of the red algae Porphyra purpurea, Gracilaria tenuistipitata, and Cyanidium caldarium and was termed Ycf26. The cyanobacterial sequences, as well as those from G. tenuistipitata and P. purpurea, have a unique putative periplasmic signaling domain that has not been detected in any other protein (90).
HKIV. The HK class IV (HKIV) polypeptides have an N-terminal S/T kinase domain and a C-terminal histidine kinase domain, with GAF domains in between. They are restricted to species belonging to the Nostocales family, i.e., filamentous heterocystous N2-fixing strains (11 to 13 each), with the exception of T. erythraeum, which has one. These proteins are quite interesting, as they are able to directly couple Ser/Thr kinase activities and transduction pathways involving two-component systems. One of them, HstK (Alr2258, Chk99) from Anabaena sp. strain PCC 7120, has been characterized; its expression depends on the type of nitrogen source that is available (109). Anabaena sp. strain PCC 7120 Alr0709 (Chk162) and Alr0710 (Chk107) are very large proteins (1,799 and 1,796 aa, respectively) which have the same modular organization and are adjacent on the chromosome; they align all along their length, with only one gap (10 aa) in the middle. They are the closest paralogues, with 63% identity and 74% similarity. The same physical organization exists for Avar_400222710 (Chk165) and Avar_400222720 (Chk101), and the two proteins are 61% identical. Only Chk101 however, has an orthologue in Anabaena sp. strain PCC 7120, which is neither Chk165 nor Chk107. Gene duplications thus probably occurred rather recently, i.e., after their divergence. Four HKIVs have a second GAF domain, and one protein from Anabaena sp. strain PCC 7120, one from A. variabilis, and two from N. punctiforme have PAS and/or PAC domains in between the GAF and the HK. They all are about 2,000 residues or more. The physiological functions of these proteins should be looked at closely to determine the role of each kinase and whether they act independently or synergistically, or if these proteins are nodes receiving signals from two different transduction pathways to achieve a single output function.
HKV. In the last class, HK class V (HKV), there are 37 multidomain proteins, corresponding to the combination of different types of domains linked to a histidine kinase. One group of orthologous proteins (Chk8) has a representative in all genomes but G. violaceus. The S. elongatus 7942 (SasA) and Synechocystis sp. strain PCC 6803 orthologues have been characterized. They are clock-associated histidine kinases, necessary for the robustness of the circadian rhythm of gene expression, and have been implicated in clock output (57, 61) as well as in heterotrophic carbohydrate metabolism when cells are grown in light-dark cycles (127). The protein has been crystallized from Synechocystis sp. strain PCC 6803, and its structure has been determined to 1.9-Å resolution. It forms an open tetramer (52). Its cognate response regulator, tentatively named SasR, awaits identification. Another group (Chk178, Chy178) contains a protein from G. violaceus that associates with CheB, CheR, PAS, and HK domains, the Anabaena sp. strain PCC 7120 and A. variabilis orthologues being hybrid kinases (HYII) with an additional C-terminal RR. Within this subclass, which contains proteins with cNMP-binding domains, the Chk110 group gathers orthologues originating from quite distant strains: G. violaceus, presumed to be at the root of the cyanobacterial lineage, and N. punctiforme, which is the cyanobacterium with the largest genome (among the characterized ones) and the more complex ecophysiology.
A few of these proteins have known functions. PilH (Crr7, Rre7, taxAY3) is required for motility in Synechocystis sp. strain PCC 6803 (151) and is also found in T. elongatus and the five N2-fixing species. Another RRI-CheY, PisH or PixH (Crr35, Rre35), is required for positive phototactic movement (152). Orthologues exist only in the three filamentous heterocystous strains. Rcp1 (Crr27, Rre27) is the cognate response regulator for the phytochrome Cph1 (Chk35, Hik35 [150]). Orthologues are found only in the strains that possess such an HKII phytochrome-like protein (Chk35), and they are always adjacent to and downstream from the corresponding gene. Anabaena sp. strain PCC 7120 DevR (Alr0442, Crr42) makes with HepK (All4496, Chk86) the first two-component system identified that regulates the biosynthesis of a polysaccharide as part of a patterned differentiation process (154). Orthologues can be found not only in the other N2-fixing strains but also in S. elongatus 7942 and Synechocystis sp. strain PCC 6803. In the latter, the Crr42 orthologue is annotated as DivK, a cell division response regulator, but on bases which have not been explicated; it is 66% identical and 79% similar to DevR. All of the Crr42 orthologues are adjacent to and divergently transcribed from genes which also are orthologues and potentially encode subunit A of DNA gyrase/topoisomerase IV. Since heterocysts do not divide, it may be that the phenotype observed for the devR mutant results from global regulation involving chromosome structure.
About 80% of
the small RRI-CheY domains are less than 150 aa long. The absence of
any identifiable output domain raises the question of their mode of
action. Each of these probably interacts with not more than one partner
besides its cognate kinase. A phosphorylated (P-RR) and a
nonphosphorylated (RR) form would be in equilibrium, probably
differing by their conformation. Under specific conditions, the cognate
kinase will provide a phosphate (P) to form P-RR that could then
establish specific interactions with a partner of which it regulates
the activity, either positively or negatively. In E. coli
after autophosphorylation of the CheA histidine kinase, the phosphoryl
group is transferred to the CheY, an RR which then interacts with
flagellar motor proteins
(22,
145). Rhizobium
meliloti, which does not possess CheZ, has two cognate CheYs
(
120 aa long) that interact with CheA: phospho-CheY2 (CheY2-P)
is the chief regulator of flagellar rotation, its action being
modulated by CheY1, which functions as a phosphatase of CheY2-P and
becomes a sink for phosphate
(129). A similar process
may occur in Rhodobacter sphaeroides, which has two classic
and two atypical CheA proteins and eight associated response regulators
(six CheY proteins and two CheB proteins
[111,
112]), as well as in
cyanobacteria, which also do not have any CheZ homologues but possess a
large number of "CheY"-like proteins. It will be of
interest to determine whether the expression levels of the
cyanobacterial genes and/or protein levels change upon alterations in
the environment, as well as to look for a specific intracellular
location of the gene products, if any.
The same basic RR-CheY domain also occurs in ORFs of more than 200 residues, usually about 400 aa long, with no characteristic associated domains. They have been classified as RRI PatA, because one such protein from Anabaena sp. strain PCC 7120, All0521 (Crr65), was the first of this group to have been characterized. Its name comes from the phenotype of the corresponding mutant, which is impaired in the pattern formation of the heterocysts (73). Another protein belonging to that class has been studied, Sll0038 (Rre36, Crr36), which is part of the pathway for perception and transduction of low temperature signals and might specifically regulate the expression of the desB gene in Synechocystis sp. strain PCC 6803 (135). Crr36 orthologues exist in the three filamentous heterocystous strains.
Another subclass, RRI-other (RRVI in Ohmori's nomenclature [102]), also contains a single RR domain in a polypeptide more than 200 aa long, with no other (as yet) identifiable domain but low overall sequence identity with PatA-type ORFs. This subclass mostly consists of a group of orthologues, Crr23 (Rre23, previously named Ycf55). Orthologues exist in all strains but G. violaceus; the marine unicellular non-N2-fixing strains, as well as T. elongatus and S. elongatus, however, do not presently have a canonical RR domain. They no longer exhibit in their N-terminal sequences the critical D and K residues which make recognizable RRs. They have nevertheless been kept in Fig. 2 and 3 because PBLAST searches performed with this domain, although less conserved than the C-terminal part, still pick the RR domains of the other orthologues. No function has yet been assigned to this probably very ancient and well conserved protein, present only in photosynthetic organisms.
RRII. RR class II (RRII) proteins contain the more "classical" RRs in that they correspond to the structure of the first described response regulators, all being two-component DNA-binding response regulators. They have an N-terminal RR domain fused to an output DNA-binding domain, either a T_reg (for OmpR type [81]), HTH_LuxR (or Ger_E for LuxR/NarL type), or AraC. Thus, they probably function as transcriptional regulators. Examples of these RRs are found in all species of cyanobacteria, with the number of OmpR types (4 to 19, depending on the species) outnumbering (141 versus 89) the NarL types (1 to 16). Almost all of the RR repertoire found in unicellular non-N2-fixing strains belongs to this class, the rest (at most two proteins) being RRIs.
(i) OmpR-type subclass (T_reg output domain). Within the OmpR-type subclass there are three groups of 16 orthologues and one in NarL. Two of them, Synechocystis sp. strain PCC 6803 RpaA (Crr31, Rre31) and RpaB (Crr26, Rre26, Ycf27), have been linked to long-term regulation of energy distribution by phycobilisomes (12). RpaA would also be a partner of Hik33 (also termed DspA or Chk33), and Ycf26 orthologues are present in all strains (see above). Synechocystis sp. strain PCC 6803 Sll0649 (Crr3 or Rre3), which has five orthologues, would pair with Hik10 (Slr0533 or Chk10), which also has orthologues in the same five strains (see Table S4s in the supplemental material). These two pairs are involved in the response of Synechocystis sp. strain PCC 6803 to hyperosmotic stress (105, 126). Interestingly, the Chk10 HKIII is adjacent to and downstream of Crr3 in all strains but Synechocystis sp. strain PCC 6803. In contrast, the Crr31 response regulators and Chk33 kinases are never adjacent in any of the species. For the third group of 16 proteins (Crr37), none of the RRs is adjacent to a histidine kinase and all of the corresponding genes except G. violaceus glr2274 are monocistronic transcriptional units, the adjacent genes being divergently transcribed on both sides. Expression of the Anabaena sp. strain PCC 7120 representative (all4312) is directly controlled by the global nitrogen regulator NtcA, suggesting that Crr37 might be related to cellular responses to nitrogen deprivation. The fourth one is Crr1 (Ycf29), which also has orthologues in algal plastid genomes (see below).
SphR/PhoB (Crr29) is the partner of the histidine kinase SphS (Chk7), which regulates the pho regulon in the signaling pathway of phosphate limitation (see above) (1, 136). Orthologues are distributed as for SphS, but they do not form an operon with the Chk7 proteins. Another group (Crr28) is made up of 12 sequences, no representative existing in T. elongatus and the Prochlorococcus spp. except MIT9313. No function is known for any of these, the only information being that in Synechocystis sp. strain PCC 6803, a Kdp kinase (Slr1731, Ctc1) might transfer a phosphate to 6803sll0396 (Crr28).
ManR (Crr16, Rre16) regulates manganese homeostasis in Synechocystis sp. strain PCC 6803 together with the HKI ManS (Chk27, Hik27) (100, 148). ManR orthologues exist in all of the strains that possess ManS, but they are never adjacent to their putative cognate kinases. NblR (7942_403113030, Crr73) has been described as an NblS partner that regulates expression of NblA, a protein required for the degradation of phycobilisomes under stress conditions in S. elongatus, but its precise cognate kinase awaits identification (125). Crr73 orthologues with more than 60% identity are found only in the N2-fixing species and in T. elongatus. Another group consists of seven sequences, Crr71, that originate from each of the unicellular marine non-N2-fixing strains plus S. elongatus 7942. RppA (Sll0797, NrsR, Crr33) is the RppB (Sll0798, NrsS, Chk30) partner and is located upstream from it on the Synechocystis sp. strain PCC 6803 genome. This pair was first found to be involved with redox control of photosynthesis and pigment-related genes (71) and more recently in nickel sensing (76). No orthologue was found, though Chk30 proteins seem to also exist in Anabaena sp. strain PCC 7120 and A. variabilis. For these two strains, however, no RR is adjacent.
(ii) NarL subclass (LuxR output domain). Relatively few of the NarL-type RRs (14) have assigned functions. Ycf29 (Crr1) is the only one found in all 16 sequences (Fig. 2 and 3). As mentioned above, the Slr1783 protein (Rre1) would be the partner of Hik2 and Hik34 in the response of Synechocystis sp. strain PCC 6803 to hyperosmotic stress (105, 126). In this strain, crr1 may be an essential gene, as no group has reported segregated interposon mutants (V. Zinchenko, CyanoMutants, at http://www.kazusa.or.jp/cyano/; N. Burnett, personal communication). Copies of this gene are also found on the plastid genomes of the red algae Guillardia theta, Porphyra purpurea, Cyanophora paradoxa, Cyanidioschyzon merolae, Gracilaria tenuistipitata, and Cyanidium caldarium.
In Anabaena sp. strain PCC 7120, the RRII-NarL OrrA (Alr3768, Crr81) has been found to be involved with the response to osmotic stress (124). It is not an orthologue of either of the two proteins, Crr3 and Crr31, identified for similar stress responses in Synechocystis sp. strain PCC 6803, but it has orthologues in the other two filamentous heterocystous species.
(iii) AraC subclass (AraC output domain). The last RRII group has an RR domain fused to HTH-AraC domains which, as a pair, form the DNA-binding domain of the AraC family of response regulators (139). In general, AraC transcriptional regulators are classified as having any receiver domain fused to the HTH_AraC domains. Only nine cyanobacterial sequences were found to have an RR fused to AraC. As usually observed for the sequences belonging to this family, the HTH motif is situated toward the C terminus. The three-dimensional structure of such a protein, E. coli MarA, has been solved. It showed that the two HTH_AraC subdomains are separated by 27 Å, which causes the cognate DNA to bend. There is a single such gene in Synechocystis sp. strain PCC 6803 and A. variabilis, two in Anabaena sp. strain PCC 7120, five in N. punctiforme (one plasmid encoded), and only one group (Crr90) with orthologues in the three filamentous heterocystous strains.
RRIII. Some cyanobacterial response regulators have two or even three RR domains, together with Treg and Hpt (for "histidine phosphotransfer"), and in one group GGDEF domains. The ORFs that have one RR upstream and two downstream of the T_reg-Hpt domains are from the heterocystous N2 fixers, with one from G. violaceus (Crr93). They presumably function as conditional transcriptional regulators via phosphotransfer relays. Hpt domains are known to interact with more than one RR domain and are thus particularly well suited for cross-talks. The recently demonstrated coordination of synthesis and proteolysis of RpoS in E. coli by the two-component phosphotransfer network that involves ArcB, ArcA, and RssB is a good example (86). RcaC from F. displosiphon has a domain organization similar to that of Crr93. This protein has been described as involved in complementary chromatic adaptation (30). Both the N-terminal RR and Hpt domains were found to be important for the light-regulated control of phycocyanin gene expression, whereas the C-terminal RR only had a minor role (72).
RRIV. The vast majority of the proteins in RR class IV (RRIV) do not have any DNA-binding domains, but a number have output domains with putative catalytic activities. More than 40% of these polypeptides possess a GGDEF domain, also named DUF1. This domain was first recognized in Caulobacter crescentus PleD, a response regulator controlling cell differentiation, before being found in proteins involved in cellulose biosynthesis, cell adhesion, or aggregation (119). It is highly "promiscuous," as it is found associated as a module with a multitude of different domains. It has recently been demonstrated that PleD possesses catalytic guanylate cyclase activity (107). Expression of recombinant GGDEF domains from ORFs found in six very different bacteria (including the Synechocystis sp. strain PCC 6803 Slr1143, a GAF-GGDEF protein) demonstrated that (i) they all possess diguanylate cyclase activity and (ii) for Borrelia burgdorferi Rrp1 (a RR-GGDEF protein), phosphorylation of the RR is required for activity of the GGDEF domain (120). Thus, the GGDEF domains will represent the output of complex bacterial signal transduction networks, which convert different signals into the production of a secondary messenger, cyclic diguanylic acid (c-di-GMP). The cyclase activity correlates well with the correspondence between GGDEF and the catalytic domain of adenylate cyclases (40, 108). GGDEF domains can be found associated with an EAL domain (also known as DUF2), which is a good candidate for a diguanylate phosphodiesterase function (40). The corresponding proteins would then have opposing cyclase and hydrolase activities (107). Cyclic diguanylate-specific phosphodiesterase activity has recently been demonstrated from an overexpressed E. coli ORF containing an EAL domain (122). Some cyanobacterial RRs exhibit this kind of association, eventually with additional PAS and/or PAC domains, but most of them have only one of these two domains. It is worth mentioning, however, that although Synechocystis sp. strain PCC 6803 has one such protein (Crr41), the strain possesses very little, if any, c-di-GMP, at least under standard conditions (J. Houmard, unpublished data). Synechocystis sp. strain PCC 6803 Crr4 has both a GAF (Phyt_2) and GGDEF domain fused to an RR. RRs with a GGDEF domain are found in all cyanobacterial species except the open-ocean non-N2 fixers. In two instances, multiple N-terminal RRs are associated with a GGDEF domain, one (7942_403091170) also having a DNA-binding Treg domain.
There are six examples of RRs with an HD (for "phosphohydrolase activity") output domain. The latter is found in enzymes such as cyclic nucleotide phosphodiesterase, 2'-nucleotidase, and phosphatase (8, 147). A knockout of the Synechocystis sp. strain PCC 6803 slr2100 gene (Crr20) indeed results in changes in the intracellular cyclic nucleotide (cGMP) concentrations and in an increased sensitivity of the cells to UV-B radiation (24). This protein is thus involved in cGMP homeostasis and light signaling. The other five RR-HD proteins form an orthology group. A Synechocystis sp. strain PCC 6803 crr18 (sll1624) null mutant has also been constructed and did not exhibit a phenotype similar to the crr20 mutant (24). T. erythraeum is the only diazotroph which does not possess such a protein, but it is also the only one to have an RR-GuC, which thus probably has purine nucleotide cyclase activity (discussed below).
The protein phosphatase 2C-like domain (PP2C, also referred to as SpoIIE) is found in PP2C and adenylate cyclase and in SpoIIE, which is known for its role in sporulation in Bacillus subtilis (17). Some of these proteins may have a role in cell division or differentiation. A PP2C domain is found as a C-terminal fusion to an RR in all filamentous species and T. elongatus but not in C. watsonii. This distribution closely resembles that observed for the HKIVs, which have S/T kinase domains. For one orthology group (Crr100), there is an additional GAF domain associated. Finally, there are examples of N-terminal RRs fused to GAF, PAS, cNMP, CheC, CheW, CheB, Pyr_red, or IF2 domains. Many of these ORFs are found in only one species and may result from recent fusions of domains.
HYI. HYI-type proteins are totally absent from S. elongatus and T. elongatus. About half of the "orthology" groups consists of a single protein. Only one HYI has a known physiological function, 6803sll1229 (Hik41, Chy41). It has been found to respond to salt (NaCl) stress, together with Synechocystis sp. strain PCC 6803 Hik16 (Chk16) (80). Synechocystis sp. strain PCC 6803 has five "simple" HYIs, of which three (Chy38, Chy40, and Chy41) have in some strains another hybrid kinase immediately upstream and of which one (Chy23) has an HK-RR pair (AphA-Rcp1). Thus, they may belong to multiphosphorelay systems, although the colocalization of the genes involved in phosphorelays is not a prerequisite. Indeed, although Chy41 would be part of such a relay for the response of Synechocystis sp. strain PCC 6803 to hyperosmotic stress, its partners Chk16 and Crr17 are not encoded by adjacent genes (105, 126). Similarly, the Anabaena sp. strain PCC 7120 genes for AphC and CyaC, between which phosphotransfer has been evidenced (see below), are not closely localized. Some HYIs have a variable number of PAS and PAC domains in between the RR and the HK plus, for a few of them, one or two GAFs. No function has yet been assigned to any of these ORFs.
For some
HYIs, an HWE (HisKA_2) histidine kinase domain substitutes for
the HisKA. Members of this family differ from most other HKs by lacking
a recognizable F box and the presence of uniquely conserved residues: a
His in the N box and the sequence WE in the G1
(64). Though found in
many different species, such proteins are not as widely distributed as
HisKA. They are particularly abundant in the Rhizobiaceae
family. HWE domains were previously not detected in cyanobacteria, but
the present analysis shows that each of the heterocystous species has
one. Anabaena sp. strain PCC 7120 and A. variabilis
each have a very large HWE kinase (Chy58,
1,700 aa long),
which also has GAF and PAS-PAC domains. One N. punctiforme HYI
(Chy109, NpF1799) has an HKA_3 kinase domain, another HisKA
alternative.
HYII. Only 11 of the 97 HYIIs do not have additional domains. For the others, various associations involving 14 different structures exist, a large number of these ORFs having PAS, PAC, and/or GAF domains. About one-fifth of the HYII "orthology" groups have HK orthologues with similar structural organization but without the C-terminal RR. For example, 7120all1716 and Avar_400180300 (Chy178) are orthologues of HKIII-CheR/B 7421gll1854. 7120all0978, Avar_400180780, and NpunNpF5679 (Chy179) are orthologues of the fairly similar G. violaceus HKV-HTH_4 (Chk179, Gll3736).
S. elongatus HYII-GAF (Chy24) corresponds to CikA, a bacteriophytochrome that resets the circadian clock (123). No orthologue exists in G. violaceus or in the genomes of the marine non-N2 fixers, and the structure differs between the strains. For T. elongatus and the three filamentous heterocystous strains, it is HKII-GAF(PHYT_2) (Chk24) without any RR. A detailed characterization of S. elongatus 7942 CikA showed that (i) it can covalently bind bilin chromophores in vitro, even though it lacks the expected ligand residues (it may not serve, however, as a photoreceptor itself); (ii) deletion of the GAF domain or the N-terminal region adjacent to GAF dramatically reduced autophosphorylation of the HK domain, whereas elimination of the receiver domain increased activity by 10-fold; and (iii) the RR domain, which lacks the conserved aspartyl residue that serves as a phosphoryl acceptor in response regulators, would not work as bona fide receiver domain in a phosphorelay but could interact with an unknown protein partner to modulate the autokinase activity of CikA (92). In CikA, both the GAF and RR noncanonical modules would act as protein-proteininteraction domains that induce conformational changes in another domain to modulate its activity.
There is one subclass that contains only four sequences, all from T. erythraeum. All of these ORFs have a C-terminal GuC domain and thus likely possess a purine nucleotide cyclase activity. Though the presence of multiple nucleotide cyclases (AC/GC) has already been reported for cyanobacteria (see, for example, references 67 and 99), the different proteins were usually made of different domain arrangements. T. erythraeum has by far the highest number of such enzymes (13, compared to 5 or 6 for the heterocystous strains). Among the four HY-GuC proteins, three have the requirements for being adenylyl cyclases (Chy145, plus Chy129 and Chy130, which are adjacent on the chromosome), the fourth one (Chy131) having those for a guanylyl cyclase (99).
HYIII. Thirty-one ORFs differ from the previous hybrid kinases by having at least two C-terminal RRs in tandem. N. punctiforme NpR2263 is the only one that does not posses any additional domains, almost all having either PAS and/or PAC or GAFs. One member of this subclass, Anabaena sp. strain PCC 7120 Alr2279 (Chy133), has an additional N-terminal HNOBA domain (not identified by Pfam). The HNOBA domain could potentially contain a PAS-like fold. A homologous domain is also found in the first 200 aa of the N. punctiforme NpR4835 (Chk50 [58]). The two other Chk50s do not have it. HNOBA domains functionally interact with HNOB (for "heme, no binding") domains located on a second protein. The HNOB domain is predicted to function as a heme-dependent sensor for gaseous ligands (NO, CO, or possibly O2). Proteins carrying such domains (7120alr2278 and its orthologue NpunNpR4836) are encoded by the upstream genes in the two cyanobacterial examples. As stated by Iyer et al. (58), the co-occurrence of the HNOB and HNOBA domains in either the same protein or proteins encoded by the same operon suggests a strong functional interaction between them. The potential role, if any, of NO in cyanobacteria deserves further studies.
About one-third (13/31) of the "orthology" groups have only one representative, and another third have orthologues but with a different domain structure. Synechocystis sp. strain PCC 6803 Chy21 is of particular interest. It is, at present, the only cyanobacterial two-component protein that has an MHYT domain, a newly identified conserved protein domain with a likely signaling function (39). A model of the membrane topology of the MHYT domain indicates that its conserved residues could coordinate one or two copper ions, suggesting a role in sensing oxygen, CO, or NO. This protein is just upstream from and cotranscribed with the Chk40 HYI, which is followed by the RRIV-HD Crr20, a protein involved in cGMP homeostasis and UV-B response (see above) (24). This cluster, to which Chy22 (HYII-GAFPAC-Hpt) is very close, could thus form a large multiphosphorelay system sensing changes in the environmental parameters and involving cGMP as a second messenger (98). Cyclic nucleotide concentrations have already been shown to vary in some cyanobacteria upon oxic-anoxic transitions, for example (reviewed in reference 26).
HYIV. The HYIV hybrid kinases have RRs on both sides of the kinase domain. All but one of the orthology groups have additional sensing domains: PAS (plus PAC for most of them) and/or GAF. One group, Chy90, shows sequences with a histidine kinase and four RRs and up to seven different types of associated domains. G. violaceus Chy90 is about 1,000 residues less than the three others and is annotated as a (PAS/PAC)2-HK-RR HYII. However, a 611-aa-long RR [Glr4211, RR-Treg-(RR)2] is immediately upstream, the A of its stop codon also being used as the first base for the ATG of Glr4212 (Chy90). A careful analysis of the sequence would be required to ascertain that there was no sequencing frameshift error. The four proteins have an adjacent RR downstream, which is also orthologous. The cluster organization would thus have been conserved through evolution from G. violaceus to the filamentous heterocystous strains.
Synechocystis sp. strain PCC 6803 Chy19 (Hik19) has been found to be an essential gene (based on no complete segregation of the mutation) involved in the transduction of low-temperature signals (135). It might function downstream from Chk33 (DspA), transducing the low-temperature signal by phosphorylating Crr36 (PixG), which in turn controls desB gene expression.
The HYIV+GAF-GuC Chy89 proteins, known as CyaC, are orthologues present in every filamentous strain, whether an N2 fixer or not, as it is also present in Spirulina platensis (66). An orthologue has also been found in Tolypothrix sp. strain PCC 7601, also known as Calothrix sp. strain PCC 7601 or Fremylla diplosiphon (L. Jia and J. Houmard, unpublished data). Kasahara and Ohmori (65), studying CyaC from S. platensis, demonstrated that the HK domain will autophosphorylate and will transfer the phosphate to the adjacent C-terminal RR domain, whereas the N-terminal RR domain, separated from the HK domains by two GAF domains, was not phosphorylated by it. Replacement of the conserved aspartate residue by alanine in the N-terminal RR did not affect the activation of cyclase activity in vitro. S. platensis CyaC has been crystallized, and the mechanism of bicarbonate activation has been studied (130). CyaC is one of the six AC/GC purine nucleotide cyclases found in Anabaena sp. strain PCC 7120 (101). Because a cyaC mutant has a very low cAMP level, it has been proposed to be responsible for the maintenance of the steady-state level of cellular cAMP. On the other hand, it has been demonstrated that in Anabaena sp. strain PCC 7120 the phytochrome-like AphC (Chk65) mediates the increase in cAMP concentration induced by far-red light. Okamoto et al. (104) have proposed a model in which far-red light illumination provokes the autophosphorylation of AphC, followed by a phosphotransfer to the N-terminal RR domain of CyaC (Chy89). The HK domain of CyaC will then autophosphorylate, the phosphate will be transferred to the downstream RR, and the catalytic activity domain will in turn be activated. The cAMP produced could then, through binding to CRP-like proteins, regulate different adaptation processes. This is one of the very first examples of a signal transduction mechanism involving a two-component system phosphorelay described for cyanobacteria.
HYV. A few hybrid kinases possess two complete HK and one or two RR domains. None has any known or putative function yet. They also have either PAS or PAC domains. N. punctiforme Chy139 (NpF2346) has a UPF/RHH_2-type N-terminal domain, described for a few 80-aa-long hypothetical proteins, members of the MetJ/Arc repressor superfamily clan of unknown function.
HYVI. Another group of 25 ORFs all have an additional CheW domain, between the HATP and the RR, as well as an Hkd and/or Hpt domain upstream of those. They would thus be involved in chemotaxis signaling mechanisms. The Synechocystis sp. strain PCC 6803 Chy18 (PixL, TaxAY1, Hik18) and Chy39 (TaxAY2, Hik39) proteins have been shown by analysis of the phenotype of the corresponding mutants to regulate phototaxis (20). Chy43 could represent the C-terminal part of a CheA-like protein which is required for motility, transformation competency, and the assembly of thick pili, the N-terminal Hpt domain of this CheA-like protein being separately encoded by pilN (Hik36 and Ctc36, an orphan Hpt protein). Though sharing a very similar organization and gene repertoire with the tax1 cluster (slr0038 to slr0043), the tax2 cluster (sll1291 to sll1296) would not be involved in motility (19). It is worth noting that each of the three Synechocystis sp. strain PCC 6803 TaxAY (Chy18, Chy39 and Chy43) proteins would be connected to two different (adjacent on the chromosome) RRs, an RRI-CheY and an RRI-PatA: Chy18 working with Crr36 (Sll0038) and Crr35 (Sll0039), Chy39 with Crr12 (Sll1291) and Crr11 (Sll1292), and Chy43 with Crr6 (Slr1041) and Crr7 (Slr1042).
HYVII. The last class, HYVII, groups putative hybrid proteins with RRs but which only have either a HisKA or an HATPase domain. Synechocystis sp. strain PCC 6803 Chy180 (Rre22) consists of an N-terminal HisKA domain with an RR domain and a PP2C-like (PP2C_SIG) domain downstream. Although the gene name ppcE was assigned to Chy180, no description of its function could be retrieved. Such an acronym exists for probable peptidases. C. watsonii has two paralogous HATP-RRs, one of which could be a complete HYI if the putative sequencing error does exist. None has any known function.
KdpD proteins form a different family of histidine kinases. In E. coli, the KdpD domain senses turgor pressure and Usp forms the output domain (144, 153). It phosphorylates and interacts with its cognate (RRII-OmpR type) RR, KdpE (46). There are three examples in A. variabilis, two of which are encoded by plasmid B, and single examples in the other heterocystous strains, as well as in Synechocystis sp. strain PCC 6803, S. elongatus, and G. violaceus. The G. violaceus copy is interesting, as it has a tandem duplication of the two domains. Ballal et al. (15) have demonstrated an interaction between the N-terminal TM domains of Anabaena sp. strain L-31 with E. coli KdpD, which alters the phosphatase activity. KdpD (Ctc1) is also mentioned, probably on the basis of two-hybrid experiments, as the cognate phosphodonor for the RRII-OmpR Crr28 (Rre28) (http://www.genome.ad.jp/dbget-bin/show_pathway?syn02020+slr1731).
| ORTHOLOGOUS GROUPS |
|---|
|
|
|---|
"Orthology" groups were thus defined that are based on the bidirectional best hits from BLAST searches of each organism against each other organism, completed by phylogenetic analyses. A tentative gene nomenclature is also proposed in Table 1 and in Table S3s in the supplemental material. Direct comparisons between ORFs will be restricted to the orthologues, the time of divergence being assumed to be the same, i.e., that of speciation. Sixteen proteins, plus six groups of two or more, do not have any orthologue in any other sequenced bacterial genomes. In contrast, for the cyanobacterial Chy83 proteins (which are 1,000 to 2,000 residues long), orthologous proteins of more than 1,000 amino acids can be found in some 22 bacterial genomes with BLAST E values of 0.0.
As the acronyms indicate, Ycf26, Ycf27, Ycf29, and Ycf55 orthologues are also found in the plastid genomes of red algae and/or diatoms, suggesting that the corresponding genes are very ancient and should have already existed in the cyanobacterial ancestor who gave rise to the plastids (13, 45). Most two-component genes are not essential to the growth of cyanobacteria under standard laboratory conditions and can be inactivated. Fully segregated Synechocystis sp. strain PCC 6803 knockout mutants could not, however, be easily obtained for chk33 (ycf26), crr1 (ycf29), crr23 (ycf55), crr26 (ycf27), and crr37, further supporting the key roles of the corresponding gene products (12, 90, 135, 142; N. Burnett, unpublished data). A similar result was obtained for crr37 in Anabaena sp. strain PCC 7120 (91).
As mentioned above, in Synechocystis sp. strain PCC 6803, RpaA (Crr31) would be a partner of Chk33 (DspA/NblS) in the response of the cells to hyperosmotic stress (105, 126). The pair will thus have been highly conserved throughout evolution. RpaA (Crr31) and RpaB (Crr26) are the closest paralogues, and both of them are about 41% identical and 61% similar to B. subtilis YycF, an RR which m