Microbiol Mol Biol Rev, March 1998, p. 1-34, Vol. 62, No. 1
1092-2172/98/$04.00+0
Copyright © 1998, American Society for Microbiology. All rights reserved.
andDepartment of Biology, University of California at San Diego, La Jolla, California 92093-0116
SUMMARY
INTRODUCTION
COMPUTER METHODS
SEVENTEEN MFS FAMILIES
ESTABLISHMENT OF HOMOLOGY FOR MFS PROTEINS
PHYLOGENETIC TREE FOR THE MFS
FAMILY 1: SUGAR PORTER (SP) FAMILY
FAMILY 2: DRUG:H+ ANTIPORTER (14-SPANNER) (DHA14) DRUG EFFLUX FAMILY
FAMILY 3: DRUG:H+ ANTIPORTER (12-SPANNER) (DHA12) DRUG EFFLUX FAMILY
FAMILY 4: ORGANOPHOSPHATE:INORGANIC PHOSPHATE ANTIPORTER (OPA) FAMILY
FAMILY 5: OLIGOSACCHARIDE:H+ SYMPORTER (OHS) FAMILY
FAMILY 6: METABOLITE:H+ SYMPORTER (MHS) FAMILY
FAMILY 7: FUCOSE-GALACTOSE-GLUCOSE:H+ SYMPORTER (FGHS) FAMILY
FAMILY 8: NITRATE-NITRITE PORTER (NNP) FAMILY
FAMILY 9: PHOSPHATE:H+ SYMPORTER (PHS) FAMILY
FAMILY 10: NUCLEOSIDE:H+ SYMPORTER (NHS) FAMILY
FAMILY 11: OXALATE:FORMATE ANTIPORTER (OFA) FAMILY
FAMILY 12: SIALATE:H+ SYMPORTER (SHS) FAMILY
FAMILY 13: MONOCARBOXYLATE PORTER (MCP) FAMILY
FAMILY 14: ANION:CATION SYMPORTER (ACS) FAMILY
FAMILY 15: AROMATIC ACID:H+ SYMPORTER (AAHS) FAMILY
FAMILY 16: UNKNOWN MAJOR FACILITATOR (UMF) FAMILY
FAMILY 17: CYANATE PERMEASE (CP) FAMILY
POSSIBLE INCLUSION OF THE PROTON-DEPENDENT OLIGOPEPTIDE TRANSPORTER (POT) FAMILY IN THE MFS
MFS FAMILY-SPECIFIC SIGNATURE SEQUENCES
MFS-SPECIFIC SEQUENCE MOTIF
EVIDENCE FOR REPEAT SEQUENCES IN MFS PROTEINS
CLASSIFICATION OF MFS PERMEASES
CONCLUSIONS AND PERSPECTIVES
ACKNOWLEDGMENTS
NOTE ADDED IN PROOF
REFERENCES
SUMMARY
|
|
|---|
The major facilitator superfamily (MFS) is one of the two largest families of membrane transporters found on Earth. It is present ubiquitously in bacteria, archaea, and eukarya and includes members that can function by solute uniport, solute/cation symport, solute/cation antiport and/or solute/solute antiport with inwardly and/or outwardly directed polarity. All homologous MFS protein sequences in the public databases as of January 1997 were identified on the basis of sequence similarity and shown to be homologous. Phylogenetic analyses revealed the occurrence of 17 distinct families within the MFS, each of which generally transports a single class of compounds. Compounds transported by MFS permeases include simple sugars, oligosaccharides, inositols, drugs, amino acids, nucleosides, organophosphate esters, Krebs cycle metabolites, and a large variety of organic and inorganic anions and cations. Protein members of some MFS families are found exclusively in bacteria or in eukaryotes, but others are found in bacteria, archaea, and eukaryotes. All permeases of the MFS possess either 12 or 14 putative or established transmembrane
-helical spanners, and evidence is presented substantiating the proposal that an internal tandem gene duplication event gave rise to a primordial MFS protein prior to divergence of the family members. All 17 families are shown to exhibit the common feature of a well-conserved motif present between transmembrane spanners 2 and 3. The analyses reported serve to characterize one of the largest and most diverse families of transport proteins found in living organisms.
INTRODUCTION
|
|
|---|
"If you do not expect to, you will not discover the unexpected."
Heraclitus
Transport systems allow the uptake of essential nutrients and ions, excretion of end products of metabolism and deleterious substances, and communication between cells and the environment (53). They also provide essential constituents of energy-generating and energy-consuming systems (54). Primary active transporters drive solute accumulation or extrusion by using ATP hydrolysis, photon absorption, electron flow, substrate decarboxylation, or methyl transfer (17). If charged molecules are unidirectionally pumped as a consequence of the consumption of a primary cellular energy source, electrochemical potentials result (54). The consequential chemiosmotic energy generated can then be used to drive the active transport of additional solutes via secondary carriers which merely facilitate the transport of one or more molecular species across the membrane (48, 49).
Recent genome-sequencing data and a wealth of biochemical and molecular genetic investigations have revealed the occurrence of dozens of families of primary and secondary transporters (63). Two such families have been found to occur ubiquitously in all classifications of living organisms. These are the ATP-binding cassette (ABC) superfamily (15, 21, 37, 44) and the major facilitator superfamily (MFS), also called the uniporter-symporter-antiporter family (7, 28, 30, 35, 51). While ABC family permeases are in general multicomponent primary active transporters, capable of transporting both small molecules and macromolecules in response to ATP hydrolysis (59), the MFS transporters are single-polypeptide secondary carriers capable only of transporting small solutes in response to chemiosmotic ion gradients. Although well over 100 families of transporters have now been recognized and classified (73), the ABC superfamily and MFS account for nearly half of the solute transporters encoded within the genomes of microorganisms (63). They are also prevalent in higher organisms. The importance of these two families of transport systems to living organisms can therefore not be overestimated.
The MFS was originally believed to function primarily in the uptake of sugars (36, 46). Subsequent studies revealed that drug efflux systems and Krebs cycle metabolites belong to this family (30, 62). The family was then expanded to include organophosphate:phosphate exchangers and oligosaccharide:H+ symport permeases (51). Reizer et al. (67) noted that a mammalian phosphate:Na+ symporter is a distant member of this family; Paulsen et al. (60) subdivided the MFS drug efflux pumps into two phylogenetically distinct families with differing topologies; and Goffeau et al. (26) identified a novel MFS family that consists exclusively of functionally uncharacterized proteins from Saccharomyces cerevisiae revealed by genome sequencing. Recently, Williams and Shaw (86) noted that a family of bacterial aromatic acid permeases belongs to the MFS. These observations led to the probability that the MFS is far more widespread in nature and far more diverse in function than had been thought previously.
Although isolated reports have allowed recognition of an increasing degree of diversity within the MFS, there has been no recent systematic attempt to identify the sequenced proteins that make up the MFS and to classify these proteins into phylogenetic families. We have therefore undertaken this task in the hopes of allowing (i) recognition of the significance of this family to cell physiology; (ii) extrapolation of biochemical, molecular genetic, and biophysical information obtained from the study of a few such systems to all members of the family; (iii) unification of mechanistic models, to the greatest extent possible, so as to be applicable to a maximal number of transporters; (iv) introduction of a rational system of MFS protein classification; and (v) comprehension of the pathways taken in the development of structural and functional diversity resulting from the evolutionary process used.
In this report, we present analyses that allow us to generalize some previous observations regarding the MFS and to note additional characteristics of this immense superfamily. Thus, based exclusively on degrees of sequence similarity, we have constructed phylogenetic trees which allow us to divide all the recognized members of the MFS into 17 families. The members of each family all proved to be more closely related in sequence to each other than they were to any of the other MFS proteins. This fact presumably reflects the evolutionary histories of these proteins (71, 72), and, remarkably, we find that phylogenetic family correlates with function. Thus, each of the families recognizes and transports a distinct class of structurally related compounds. These observations have allowed us to derive a rational classification system for the MFS based on both phylogeny and function. This classification system has proven applicable to virtually all permeases found in nature (73).
In 1990, Rubin et al. (70) presented evidence that strongly argued in favor of an earlier suggestion (see reference 36), that MFS permeases arose by a tandem intragenic duplication event. In this report, we provide additional statistical evidence in favor of this possibility. This event generated the 12-transmembrane-spanner (TMS) protein topology from a primordial 6-TMS unit. Surprisingly, all currently recognized MFS permeases retain the two six-TMS units within a single polypeptide chain, although in 3 of the 17 MFS families, an additional two TMSs are found (60). Moreover, the well-conserved MFS-specific motif between TMS2 and TMS3 and the related but less well conserved motif between TMS8 and TMS9 (36) prove to be a characteristic of virtually all of the more than 300 MFS proteins identified. The functional significance of this repeated motif has been examined by Jessen-Marshall et al. (39) and by Yamaguchi et al. (87-89).
Many additional observations allowed the identification of highly specific characteristics of individual MFS families as well as general characteristics of the MFS as a whole. We hope that the computational analyses reported will provide a guide for molecular biologists, biochemists, and biophysicists interested in structural, functional, and evolutionary aspects of MFS permeases.
COMPUTER METHODS
|
|
|---|
The FASTA (64) and BLAST (2) programs were used to screen the peptide and translated nucleotide databases. The statistical significance of sequence similarities between putative members of the various families of the MFS was established by using the RDF2 (64) and GAP (16) programs with at least 200 random shuffles. Binary comparison scores are expressed in standard deviations (SD) (14). A value of 9 SD for a protein segment larger than 60 residues is deemed sufficient to establish homology (18, 71). This criterion was used to establish homology between MFS families (see Table 2).
Multiple-sequence alignments were constructed with the PREALIGN and TREE programs of Feng and Doolittle (22) and the PILEUP program (16). Phylogenetic analyses were routinely performed with the TREE program (22) but were checked with other programs. The different programs generally gave very similar, and often identical, branching orders, and the branch lengths were also strikingly similar. Branch length is approximately proportional to the degree of sequence divergence, which, to a first approximation, is assumed to be proportional to the phylogenetic distance (but see the section Conclusions and Perspectives, below). It is important to emphasize that branch lengths and even branch positions represent approximations to the evolutionary process, allowing facile visualization of the relationships between sequences within families. They reflect relative degrees of sequence divergence and can be considered to represent the evolutionary process only to a first approximation (71, 72).
Average hydropathy, average amphipathicity and average similarity analyses were conducted for all protein families analyzed. They were based on the complete multiple-sequence alignments generated with the TREE program. Only representative, well-conserved portions of these multiple-sequence alignments are presented. The hydropathy analyses were conducted with the assumptions and algorithm described by Kyte and Doolittle (45) with a sliding window of 20 residues. Similarly, a sliding window of 20 residues was used to generate the average similarity and average amphipathicity plots (45a). These latter analyses are not presented but are described in the text (see Table 2).
Charge bias analysis of membrane protein topology was performed with the program TOP PRED (83). Signature sequences were defined by the method of Bairoch et al. (6). The programs MEME and MAST (5) were used to help identify conserved motifs within the protein families of the MFS. Most of the methods used in this study have been applied to numerous transport proteins and have been evaluated (see references 71 and 72 for recent reviews).
SEVENTEEN MFS FAMILIES
|
|
|---|
Table 1 lists and summarizes the properties of transport protein families found within the current MFS. We have classified current members of the MFS into 17 (possibly 18) distinct families. This number of MFS families represents more than a threefold expansion over that published previously (51). The table provides the family number; the name of the family; the abbreviation of the family to be used in this study; the number of currently recognized sequenced members in each family; the range of organisms in which members of the family are found; the size range of the proteins (in numbers of amino acyl residues) for fully sequenced members; the number of putative TMSs in each protein (believed to be uniform for members of a given family); the energy-coupling mechanisms, if any, used by members of the family; the polarities of transport catalyzed by family members; the substrates known to be transported by various members of the family; and a representative and well-characterized member of the family.
|
The largest family (family 1) is the sugar porter (SP) family, with 133 identified members. These proteins are derived from all of the major groups of living organisms: bacteria, archaea, eukaryotic protists, fungi, mostly yeasts, animals, and plants. These proteins have 12 established or putative TMSs. They can function by uniport, solute:solute antiport, and/or solute:cation symport, depending on the system and/or conditions. Uniporters exhibit no polarity but can usually catalyze both uniport and antiport depending on whether a substrate is present on the trans side of the membrane. The polarity of solute:solute antiporters is indicated in Table 1 by "both." Symporters function with inwardly-direct polarity in the presence of a membrane potential (negative inside), but many of these proteins have also been shown to catalyze antiport when a substrate is present on the trans side of the membrane. Substrates transported by SP family members include hexoses, pentoses, disaccharides, quinate, inositols, and organic cations. Most but not all members of the SP family thus catalyze sugar transport.
Family 1 permeases exhibit a size range of 404 to 818 residues. The smaller permeases possess very short hydrophilic N and C termini and short loops connecting the 12 TMSs. As is true of many MFS families, the bacterial sugar porters are usually smaller than the eukaryotic proteins. The larger sizes of the eukaryotic proteins are due to large hydrophilic N and/or C termini or, less frequently, to increased sizes of specific inter-TMS loops. The hydrophilic regions of the eukaryotic proteins may play roles in regulation or in cytoskeletal attachment, and they are frequently subject to phosphorylation by ATP-dependent protein kinases. A representative well-characterized example of the SP family is the arabinose:H+ symport permease (AraE) of Escherichia coli (47).
Families 2 and 3 consist of drug efflux systems which possess 14 and 12 TMSs, respectively (74). Since these permeases uniformly catalyze drug:H+ antiport, they are referred to as the DHA14 and DHA12 families, respectively. A total of 30 and 46 sequenced members are currently recognized in these two families. Because these permeases have recently been the subject of an extensive review which presented multiple alignments and phylogenetic trees (60), they will not be described or analyzed here. Members of both families are found in bacteria and eukaryotes, and DHA12 family members have also been identified in archaea.
Families 4, 5, and 6, the organophosphate:inorganic phosphate antiporters (OPA), the oligosaccharide:H+ symporters (OHS), and the metabolite:H+ symporters (MHS), respectively, were recognized to be families within the MFS in 1993 (30, 51). Since these permeases are restricted to bacteria, it is not surprising that they are all relatively small (400 to 500 residues). All three of these families have become substantially larger and more diverse in function since 1993, due to the sequencing and functional identification of new members.
All the remaining families listed in Table 1 (families 7 to 18) were not recognized in 1993 and are therefore new MFS families. Family 7 (the fucose-galactose-glucose:H+ symporters [FGHS]) is a small family with four distantly related members. As with most members of the SP family, these proteins are specific for sugars. They all probably function by proton symport. They are relatively small (404 to 438 residues), as expected since they are derived exclusively from bacteria.
The nitrate-nitrite porter (NNP) family (family 8) has members in bacteria, yeasts, and plants. Not surprisingly, these proteins exhibit a larger size range (395 to 547 residues) than was observed for FGHS family members. These proteins catalyze either nitrate uptake or nitrite efflux. The energy-coupling mechanisms are not well defined.
Family 9, the phosphate:H+ symporter (PHS) family, has sequenced representatives only in yeast and plants. The 11 proteins of the PHS family are fairly uniform in size, but they are substantially larger than most bacterial MFS proteins (518 to 587 residues). The characterized members are uniform in function.
Family 10, the nucleoside:H+ symporter (NHS) family, has only two bacterial members, and they are of the same size (418 residues each). They are both from E. coli and differ in specificity.
Family 11, the oxalate/formate antiporter (OFA) family, is a small but diverse family. Only five members have been sequenced, but these proteins are found in the bacterial, archaeal, and eukaryotic kingdoms. Surprisingly, they are of fairly uniform size (373 to 470 residues). The very small size of one of these proteins (see below) raises the possibility that its sequence is incomplete.
Family 12, the sialate:H+ symporter (SHS) family, like the NHS family, is very small (with only three members), and, again like the NHS family, the members are all derived from gram-negative bacteria. Their sizes are consistent with those generally observed for bacterial MFS proteins (407 to 496 residues). These proteins differ from most MFS proteins in possessing 14 putative TMSs.
Family 13, with 13 members derived exclusively from yeasts and animals, is the monocarboxylate porter (MCP) family. These permeases transport pyruvate, lactate, and/or mevalonate with inwardly-directed polarity. They all presumably function by proton symport. Their reported sizes range from 450 to 808 residues.
Family 14, the anion:cation symporter (ACS) family, is a relatively large family with 40 sequenced members. The proteins are derived from bacteria, yeasts, and animals, and they exhibit an intermediate range of sizes (411 to 596 residues). They accumulate their substrates in symport with either Na+ or H+, depending on the system. They may transport either inorganic anions (e.g., phosphate) or organic anions (e.g., glucarate, hexuronate, tartrate, allantoate, or 4-hydroxylphenyl acetate). Of the functionally characterized porters, the inorganic anion porters of the ACS family cotransport Na+ while the organic anion porters cotransport H+.
Family 15, the aromatic acid:H+ symporter (AAHS) family, consists of seven sequenced proteins, all from bacteria. As expected, these porters show fairly uniform sizes (418 to 460 residues), all on the low end of the scale. They transport a variety of aromatic acids as well as cis,cis-muconate, as indicated in Table 1. Interestingly, one member of the family has been implicated in chemotaxis, allowing the bacteria to swim up concentration gradients of its substrates (34). This is the only documented case where an MFS protein apparently serves as a chemoreceptor. One of the AAHS proteins (BenK Aca) transports benzoate (11). Two additional (putative) benzoate:H+ symporters (BenE) have been sequenced. They are both derived from gram-negative bacteria. One is the functionally characterized BenE protein of Acinetobacter calcoaceticus, and the other is a closely related protein from E. coli (55). These two proteins both contain a single region that exhibits limited sequence similarity to family 15 porters, as might be expected on the basis of the specificity of the A. calcoaceticus protein. However, they are very divergent in sequence from the latter proteins and cannot be shown to be homologous to any member of the MFS. They are therefore included in a separate family designated the benzoate:H+ symporter (BenE; TC #2.46) family (72a).
Six members of a novel family, family 16, the unknown major facilitator (UMF) family, have recently been identified (26). Although it has been proposed that these carriers are drug efflux pumps, no member of this family has been functionally characterized, and consequently the designation UMF has tentatively been assigned to this family. All six currently recognized members of the family are from Saccharomyces cerevisiae, and no close homologs are found in other organisms. These proteins exhibit the less common putative 14-TMS topology observed for only two other MFS families. The proteins of the UMF family exhibit almost no size variation (range, 606 to 637 residues).
Family 17, the cyanate permease (CP) family, includes only three
proteins, all from bacteria. They are small proteins (393 to 402 residues with 12 TMSs). The substrate of one of these proteins (CynX of
E. coli) is believed to be cyanate (NCO
). The
other two members, from E. coli and Bacillus
subtilis, are strikingly divergent in sequence but not in size, as
noted above.
The proton-dependent oligopeptide transporter (POT) family has been described previously (62, 78). We have observed sequence similarities of these proteins to members of the SP and DHA14 families (see below). Although this similarity is insufficient to establish homology, the similarities in sequence, mechanism, and topology between proteins of the POT family and those of several MFS families strongly suggest that the POT family is a distant constituent of the MFS.
ESTABLISHMENT OF HOMOLOGY FOR MFS PROTEINS
|
|
|---|
Proteins within any one family of the MFS exhibit fairly extensive sequence similarities, as revealed by the portions of the multiple alignments shown in Fig. 3 to 17. Intrafamily comparison scores are always in excess of 15 SD, thus easily establishing that the members of any one family are homologous. However, sequence similarity for any two proteins derived from different MFS families is much less extensive. We therefore conducted interfamily binary comparisons to establish homology for all MFS families (18, 71). Homology for families 1 to 6 has been established previously (51). The results of the present comparisons are presented in Table 2 and Figure 1.
|
|
As noted above, families 1 to 6 and family 16 have already been shown to be constituent families of the MFS (26, 51, 60). The data presented in Table 2 establish that families 7 to 17 (described above) are all constituents of the MFS. In the case of FucP of family 7 (FGHS), 21% identity (8 SD) was observed with Gtr5 of family 1 in a region of 125 residues that exhibits no gaps in the binary alignment. Gtr5 is an established member of family 1 of the MFS (see Table 3). Of the comparison scores recorded in Table 2, this is the only score below 10 SD, and most of the other sequences compared include all or most of the two proteins compared. Family 7 is therefore the only family included in Table 2 that is not fully established as an MFS constituent. Other considerations provide additional support for the conclusion that the FGHS family is in fact a member of the MFS (see below).
NasA of family 8 exhibits a comparison score of 13 SD with 21% identity to YidT of family 14 for a 150-residue segment exhibiting no gaps, thus linking these two families, and GudT of family 14 exhibits 22% identity and 12 SD to Bmrl of family 3 for the full lengths of the two proteins (eight gaps in the complete binary alignment). Bmrl is an established member of the MFS (60). Thus, families 8 and 14 are members of the MFS as determined by these comparisons. Similarly, the OFA (family 11) member YhjX exhibits 18% identity and 11 SD with five gaps for the full binary alignment with respect to Ykwl of family 13. Family 13 member Motl exhibits 10 SD (17% identity; 5 gaps for the full-length binary alignment) with respect to NanT of family 12, and NanT exhibits 10 SD (25% identity with 11 gaps in the complete binary alignment) with respect to CitA of family 6, a protein shown previously to be a member of the MFS (51). Thus, on the basis of these comparisons and the superfamily principle (18, 71), families 11, 13, and 12 are within the MFS. Using similar logic, the results summarized in Table 2 establish that all 17 families under consideration (with the improbable exception of family 7) are homologous.
Short regions of the binary alignments upon which the comparison scores recorded in Table 2 were based are shown in Fig. 1. These alignments exhibit between 17 and 25% identity with greater than 50% similarity in each case. Most of the regions shown are derived from the N-terminal halves of these proteins. These regions are generally the best-conserved portions of the MFS proteins, as pointed out previously for families 1 to 7 (51, 71).
PHYLOGENETIC TREE FOR THE MFS
|
|
|---|
A phylogenetic tree for the MFS, which includes representative proteins from most of the families, is shown in Fig. 2. Several features are worthy of note. First, most of the families branch from points near the center of the tree. Second, the DHA14 and DHA12 families (families 2 and 3, respectively) branch off from each other after the initial divergence from the center of the tree, suggesting that they are more closely related to each other than to other MFS families. This is in agreement with their similar specificities. Third, the UMF family (family 16) does not branch from a point near the branch for the two DHA families (families 2 and 3), and thus there is no phylogenetic evidence for the suggestion that they transport drugs, even though the proteins of the UMF and DHA14 families both have 14 putative TMSs (26). Fourth, the MCP and OFA families (families 13 and 11, respectively) branch from each other at a point that is somewhat distant from the center of the tree. A late branching point, suggestive of late divergence, is consistent with the fact that both families transport carboxylates. Fifth, the MHS and SHS families (families 6 and 12, respectively) branch from each other relatively far from the center of the tree, suggesting that they are close familial relatives, having diverged from each other late in the evolutionary process. Most, and perhaps all, of the members of these two families transport anionic compounds. The PHS and SP families (families 9 and 1, respectively) also stem from the primary branch from which the MHS and SHS families stem. However, these families branch off close to the center of the tree. Consequently, close phylogenetic relationships for these families are not suggested. Finally, the OPA, NHS, and OHS families (families 4, 10, and 5, respectively) are found branching from the same trunk. All of the proteins of the NHS and OHS families, and some of the members of the OPA family, transport glycosides.
|
FAMILY 1: SUGAR PORTER (SP) FAMILY
|
|
|---|
The SP family was described many years ago, and its description
has been repeatedly updated (7, 7a, 28, 30, 35, 36, 46, 58).
The present SP family consists of 133 sequenced members derived from
bacteria, archaea, and eukarya. The family includes members that are
very diverse in sequence and function. As revealed by the information
in Table
3,
these proteins function under normal physiological conditions either by
uniport or by H+ symport. However, many of these and other
MFS permeases can catalyze solute:solute antiport when substrates are
present on both sides of the membrane (42). The symporters
all function in energized cells with inwardly directed polarity.
Substrates of SP family members include galactose, arabinose, xylose,
and glucose in bacteria; galactose, quinate, myoinositol, lactose,
maltose, and
-glucosides in yeasts and fungi; hexoses in
trypanosomes and plants; and sugars as well as organic cations and
neurotransmitters in animals. Most but by no means all members of the
SP family are therefore specific for sugars.
|
Figure 3A presents a portion of the multiple-sequence alignment of 20 representative members of the SP family. The proteins included are from bacteria, yeasts, fungi, trypanosomes, plants, and animals. The 47-residue segment shown exhibits no gaps in the multiple alignment, and there are two fully conserved residues. Almost half of the residue positions within this segment exhibit a predominant residue that appears in the consensus sequence. This fact reflects a high degree of conservation. At least 50% of the proteins included in the alignment exhibit the same residue at each of these positions, by definition. While the fully conserved glycine (G) is likely to be of structural importance, the fully conserved glutamate (E) may play a catalytic role. Several of the well-conserved residues (e.g., R's at alignment positions 2 and 29 and G's at alignment positions 6 and 10) are conserved in all but one or two of the proteins depicted. The high degree of conservation of these residues clearly suggests that they play important structural or functional roles.
|
The phylogenetic tree for the SP family members whose sequences are represented in Fig. 3A is shown in Fig. 3B. On the left-hand side of the tree are all of the sugar porters as well as the quinate:H+ symporter (Qa-Y Ncr). Uniporters and proton symporters are often closely related (e.g., XylE Eco and Glf Zmo). More divergent members of the sugar porter cluster are Tht2A Tbr, Ma6T Sce, and Lac12 Kla of protists and yeasts. It is noteworthy that the bacterial, plant, and animal proteins cluster loosely together but most of the yeast and fungal proteins cluster separately.
Two yeast proteins and one protozoan protein branch from points near the base of the tree. However, the most distant members (right-hand side of the tree) include the synaptic vesicle transporter Sv2 Rno and the organic cation transporter Oct-1 Rno, both from the rat. The great phylogenetic distances observed between these proteins and the sugar porters correlate roughly with the divergent substrate specificity of these transporters relative to other members of the SP family.
FAMILY 2: DRUG:H+ ANTIPORTER (14-SPANNER) (DHA14) DRUG EFFLUX FAMILY
|
|
|---|
The DHA14 drug efflux family has been described recently (60). Thirty members of the family were identified in that study. All functionally characterized members of the DHA14 family have been found to catalyze drug efflux. Of these functionally characterized permeases, 7 are multidrug resistance pumps from gram-negative and gram-positive bacteria as well as yeasts, 12 are putative drug-specific pumps from gram-positive bacteria, and 11 are hypothetical or uncharacterized proteins from gram-negative bacteria, yeasts, and fungi. A multiple alignment and a dendogram for the family were presented in that study (60). The multidrug resistance pumps, drug-specific permeases, and uncharacterized proteins did not group together on the DHA14 family dendogram but instead proved to be scattered in an apparently random fashion relative to each other. This fact suggests that drug-specific and multidrug efflux pumps arose repeatedly by narrowing and broadening of their specificities and that the functionally uncharacterized members of the family are also probably involved in drug efflux (74). Because of the extensive treatment of this family by Paulsen et al. (60), these proteins will not be considered further here.
FAMILY 3: DRUG:H+ ANTIPORTER (12-SPANNER) (DHA12) DRUG EFFLUX FAMILY
|
|
|---|
The DHA12 drug efflux family, also described by Paulsen et al. (60), consists of 46 proteins. Of these, 9 have been shown to be multidrug resistance pumps, 15 are probably drug-specific efflux pumps, and 22 are hypothetical or uncharacterized proteins. Like the DHA14 family, functionally characterized members of the DHA12 family exhibit specificities only for drugs, although the range of drugs transported is remarkable (60). Interestingly, the range of organisms in which DHA12 family members are found is wider than that for the DHA14 family. Thus, the DHA12 MDR pumps are found in animals as well as in yeasts and a variety of gram-negative and gram-positive bacteria. The proven and putative drug-specific efflux pumps are also found in a wide range of gram-negative and gram-positive bacteria, yeasts, and animals. Uncharacterized members of this family include an even wider range of organisms, including humans and archaea (reference 60 and unpublished results).
The dendogram for the DHA12 family (60) resembles that for the DHA14 family in that multidrug resistance and drug-specific pumps are interspersed. Thus, in both families, phylogeny does not appear to provide an indication of drug specificity.
FAMILY 4: ORGANOPHOSPHATE:INORGANIC PHOSPHATE ANTIPORTER (OPA) FAMILY
|
|
|---|
Table 4 lists the members of the OPA family of the MFS. The seven functionally characterized members are derived from gram-negative and gram-positive bacteria and function in the transport of either sugar phosphates, glycerol phosphate, or phosphoglycerates and phosphoenolpyruvate. The UhpC proteins are believed to function in the regulation of hexose phosphate transporter synthesis. UhpC presumably serves as a receptor for glucose-6-phosphate the inducer, in controlling transcription of the uhpT operon (38). It is a rare example of an MFS member which does not serve a primary transport function. However, it is not known whether or not it has the capacity to transport its ligand, glucose-6-phosphate.
|
The predominant mechanism of transport catalyzed by permeases of the OPA family under normal physiological conditions appears to be antiport of an organophosphate ester for inorganic phosphate (49, 50). These permeases may also be capable of catalyzing substrate:H+ symport (19). The best-characterized members of the family are UhpT and GlpT, both of E. coli, for which detailed topological models have been presented (29, 90, 91). The OPA family includes several proteins from the worm Caenorhabditis elegans. Thus, this family includes members derived from eukaryotes as well as prokaryotes.
A well-conserved segment of the complete multiple alignment for the OPA family is presented in Fig. 4A. The 52-residue multiple-sequence alignment reveals only three single-residue gaps and shows four fully conserved residues (Q, R, W, and G). Of the 52 alignment positions, 19 are conserved in a majority of the proteins and hence appear in the consensus sequence. Structural (G, P), hydrophobic (F, V), semipolar (W), and strongly polar (N, Q, E, R and H) residues occur in the consensus sequence, with the last group being overrepresented.
|
The phylogenetic tree for the OPA family shows all bacterial proteins clustering together, as do the uncharacterized C. elegans proteins. Surprisingly, the UhpT transport proteins are as distant from the UhpC receptor proteins as these proteins are from the phosphoglycerate transporter (PgtP) or the glycerol phosphate transporters (GlpT). It is therefore of interest that UhpC is apparently specific for glucose-6-phosphate and 2-deoxyglucose-6-phosphate whereas UhpT recognizes a wide spectrum of sugar phosphates (3, 38). The large separation observed for the bacterial and animal proteins suggests that a primordial gene encoding one of the latter proteins was transferred to eukaryotes by vertical transmission from their prokaryotic progenitors and that gene duplication events in the developing eukaryote gave rise to the three paralogs found in C. elegans. Similarly, the configuration of the tree suggests that the gene duplication and divergence events that gave rise to the functionally dissimilar members of the bacterium-specific subfamilies occurred after the divergence of eukaryotes from prokaryotes.
FAMILY 5: OLIGOSACCHARIDE:H+ SYMPORTER (OHS) FAMILY
|
|
|---|
The current OHS family consists of six proteins, three of which
are
-galactoside permeases from closely related bacteria (Table
5). The lactose permease of E. coli is not only the best-characterized member of this family but
also probably the most extensively studied permease in the MFS
(41, 82). An experimentally verified 12-TMS topological
model for LacY has been published (10), and extensive data
provide evidence for the nature of the substrate binding sites within
the transmembrane region of the permease (8, 12, 23, 27, 43,
57). A detailed mechanistic model that incorporates information
obtained using many different experimental approaches has recently been
proposed (40).
|
The other members of the OHS are specific for (i) the trisaccharide,
raffinose; (ii) the
,
-nonreducing glucoside-fructoside, sucrose;
and (iii) the
-galactoside, melibiose. All of these putative proton
symporters are from gram-negative bacteria. The sequence of the
melibiose permease of Enterobacter cloacae was deposited in the database after the completion of our phylogenetic analysis of the OHS family and is therefore not represented in Fig. 5.
However, this protein proved to resemble RafB Eco (75% identity) more
closely than it resembles one of the lactose permeases (50% identity)
or the sucrose permease (<40% identity).
Figure 5A presents an alignment of a well-conserved 33-residue portion of the OHS family proteins. Over one-third of the residues shown in this gap-free alignment are fully conserved, and a large majority of the residues appear in the consensus sequence.
|
The phylogenetic tree (Fig. 5B) reveals that the raffinose permease clusters tightly with the lactose permeases, but that the sucrose permease is much more distant. Raffinose is a trisaccharide which incorporates the structural elements of sucrose, and melibiose in a single molecule. The degree to which these different permeases overlap in specificity is not known, but the broad specificity of the lactose permease of E. coli is noteworthy (56, 82).
FAMILY 6: METABOLITE:H+ SYMPORTER (MHS) FAMILY
|
|
|---|
The MHS family includes 16 currently sequenced members of widely
differing specificities (Table 6). Those
of known transport function recognize (i) citrate, (ii)
-ketoglutarate, (iii) proline and betaine, (iv)
4-methyl-O-phthalate, and (v) dicarboxylates. The
-ketoglutarate:H+ symport permease of E. coli
(KgtP) is probably the best-characterized member of this family
(75). An experimentally documented 12-TMS topological model
has been proposed for this permease (76).
|
Metabolites transported by members of the MHS family have little in common, except that they all possess at least one carboxyl group. Several protein members of the MHS family are specific for Krebs cycle intermediates. All are from bacteria, and all characterized members of the MHS family function by proton symport.
A 51-residue segment of the multiple-sequence alignment of the MHS family, including all functionally characterized members, is shown in Fig. 6A. There are no gaps in the aligned sequences, and 11 residues are fully conserved. The majority of the fully conserved residues are probably of structural significance. These residues include four G's, two P's and an A. Two fully conserved residues (M and I) are hydrophobic, and two (D and R) are hydrophilic charged residues. Over half of the positions appear in the consensus sequence, illustrating the high degree of conservation observed for the members of this family.
|
The phylogenetic tree for the MHS family reveals clustering according
to substrate specificity. Thus, the
-ketoglutarate permease of
E. coli and the dicarboxylate permease of Pseudomonas putida cluster together, the osmoprotectant (proline/betaine) permeases of E. coli and Erwinia chrysanthemi
cluster tightly together and are undoubtedly orthologs, and all three
citrate permeases cluster tightly together. These three proteins are
also undoubtedly orthologs. The MopB protein of Burkholderia
cepacia, specific for 4-methyl-O-phthalate, is on a
branch by itself. Orf3 Shy clusters loosely with dicarboxylate
permeases and therefore may exhibit specificity for such a compound.
The phylogenetic tree does not provide clues to the functions of the
remaining two unidentified proteins. However, these two proteins (YhjE
Eco and HI0418 Hin) appear to be located at a phylogenetic distance from each other consistent with their being orthologs. Several MHS
homologs of unknown function listed in Table 6 are not represented in
Fig. 6 because the sequences were deposited in the databases after
completion of the phylogenetic studies reported.
FAMILY 7: FUCOSE-GALACTOSE-GLUCOSE:H+ SYMPORTER (FGHS) FAMILY
|
|
|---|
The first sequenced member of the FGHS family to be characterized was the FucP fucose permease of E. coli (31), and a 12-TMS topological model for this permease has been presented (32). Subsequently, a galactose/glucose permease of Brucella abortus (20) and a glucose/mannose permease of Bacillus subtilis (61) were characterized and shown to be members of the FGHS family (Table 7). The Bacillus protein, like the E. coli FucP protein, is believed to be a sugar:proton symporter (61). Only four proteins are currently in the FGHS family.
|
In spite of the small size of the FGHS family, its members, all of which are derived from bacteria, exhibit a surprising degree of sequence diversion. Figure 7A shows the best-conserved portion of the complete multiple-sequence alignment of these sequences. Only 4 positions in the gap-free 32-position alignment shown are fully conserved, and a minority of the residue positions appear in the consensus sequence. The phylogenetic tree reveals that the Haemophilus influenzae protein clusters closely with FucP of E. coli, that the galactose/glucose permease of B. abortus is more distant, and that the hexose permease from B. subtilis is most divergent. These relative distances correlate with substrate specificity to the extent known since fucose is of the galacto configuration.
|
FAMILY 8: NITRATE-NITRITE PORTER (NNP) FAMILY
|
|
|---|
Thirteen proteins make up the current NNP family (Table 8). These proteins are derived from a variety of gram-negative and gram-positive bacteria as well as various eukaryotes including yeasts (Ynt1 Hpo), fungi (CrnA Eni), algae (Nar3 Cre), and higher plants (Bch1 Hvu and Bch2 Hvu). Irrespective of the organism, the nitrate permeases of the NNP family take up their substrate while the nitrite permeases apparently extrude theirs. Well-characterized members of the family are the NarK nitrite extrusion system involved in anaerobic nitrate-dependent respiration in E. coli (68) and the CrnA nitrate uptake permease of Aspergillus nidulans (81).
|
A portion of the NNP family protein multiple alignment is shown in Fig. 8A. Although this region is the best-conserved region in the complete multiple-sequence alignment, only three residues, all G's, are fully conserved. There are few gaps, and several residues are largely conserved (e.g., three additional G's, an F, and an N are conserved in all but one, two, or three of the proteins, respectively).
|
The phylogenetic tree for the NNP family (Fig. 8B) reveals that all of the eukaryotic proteins cluster together (on the left) as do the prokaryotic proteins (on the right). Further, within the eukaryotic cluster, the fungal proteins comprise one cluster while the plant proteins comprise another. Within the prokaryotic cluster, the two (putative) nitrite extrusion systems of E. coli (NarK and NarU) cluster tightly together. These paralogs probably arose by a recent gene duplication event. All other prokaryotic full-length proteins represented are from gram-positive bacteria. The NasA and NarK proteins of B. subtilis cluster loosely together, even though they are believed to catalyze nitrate uptake and nitrite efflux, respectively. The shorter phylogenetic distance of the M. tuberculosis protein (CY04C12) from NarK Bsu suggests that these two proteins may have the same or similar functions. Further, the other M. tuberculosis protein, CY3G12, resembles the two E. coli permeases, NarK and NarU (Fig. 8B). By contrast, the MG294 protein of M. genitalium is distant from all other members of the NNP family. Proteins from the mycoplasmas often exhibit greater distances from other gram-positive bacterial homologs than do other homologs from the latter bacteria (unpublished observations). The tree thus does not provide a clue to the function of this protein.
FAMILY 9: PHOSPHATE:H+ SYMPORTER (PHS) FAMILY
|
|
|---|
The current PHS family is unusual in that it includes members from yeasts, fungi and plants but none from bacteria, animals, and other eukaryotes (Table 9). As a family within the MFS, it is presumed to be of ancient origin, and therefore one would expect members of the family to be found in bacteria (see Conclusions and Perspectives, below). Two well-characterized members of the PHS family are the Pho84 inorganic phosphate transporter of S. cerevisiae (9) and the GvPT phosphate transporter of Glomus versiforme (33).
|
The 11 members of the PHS family are fairly uniform in size (518 to 587 residues) and exhibit a striking degree of sequence similarity (Fig. 9A). Only eight proteins are included in Fig. 9 because of the near identity of the remaining three sequences to at least one protein that was included. Thus, in the 39-residue segment of the multiple-sequence alignment presented for the eight divergent proteins, 11 positions exhibit full conservation. Four of these residues are glycines, one is a proline, and one is alanine, all of which probably play structural roles. The remaining fully conserved residues (N, E, R, S, and K) are polar and therefore may function in substrate or proton binding or in catalysis of transport.
|
The phylogenetic tree for eight members of the PHS family is shown in Fig. 9B. Three of the four plant proteins cluster tightly together, and the short branch lengths separating them indicate that these proteins differ only slightly in sequence. This fact is also revealed by the partial multiple-sequence alignment shown in Fig. 9A. The yeast and fungal proteins cluster loosely together with the fourth plant protein. The occurrence of distant homologs in both the plant and fungal kingdoms suggests that both kingdoms possess isoforms that diverged from each other before plants diverged from fungi.
FAMILY 10: NUCLEOSIDE:H+ SYMPORTER (NHS) FAMILY
|
|
|---|
The NHS family currently has only two sequenced members, and both proteins are from E. coli (Table 10). One is a general nucleoside:proton symporter, NupG, and the other is a xanthosine permease. NupG has been examined structurally and is the better characterized of the two proteins (85).
|
As shown in Fig. 10, these two proteins are very similar, exhibiting close to 50% identity in the region shown. They evidently arose by a gene duplication event that occurred relatively recently in evolutionary time.
|
FAMILY 11: OXALATE:FORMATE ANTIPORTER (OFA) FAMILY
|
|
|---|
Five sequenced proteins comprise the OFA family, but only one of these proteins has been functionally characterized (Table 11). This protein, the oxalate:formate antiporter from Oxalobacter formigenes, provides the basis for naming the OFA family (1, 4). The protein has been purified, reconstituted in an artificial membrane system, and studied structurally (24, 69). As is apparent from Table 11, members of the OFA family are widely distributed in nature, being present in the bacterial, archaeal, and eukaryotic kingdoms.
|
One would expect that a family that derives its members from all three kingdoms of life would exhibit little sequence similarity. However, as shown in Fig. 11A, 6 residues in the 40-residue segment shown are fully conserved and a majority of residues are present in the consensus sequence. The phylogenetic tree reveals that all five members of the OFA family are nearly equally distantly related to each other. This may be due in part to the fact that most of the organisms from which these proteins are derived are distantly related. However, one cannot draw conclusions about whether these proteins are orthologs of the same function. The answer to this dilemma will require direct experimentation.
|
FAMILY 12: SIALATE:H+ SYMPORTER (SHS) FAMILY
|
|
|---|
Only three currently sequenced proteins comprise the SHS family, and only one of these, the NanT sialic acid permease of E. coli, is functionally characterized (Table 12) (52). E. coli possesses two SHS family paralogs, and Haemophilus influenzae possesses one homolog. Sequence comparisons (Fig. 12) and phylogenetic analyses (data not shown) reveal that the two E. coli paralogs are more closely related to each other than either of these proteins is related to the protein from Haemophilus influenzae, even though H. influenzae is closely related to E. coli. The function of this last protein can therefore not be surmised.
|
|
FAMILY 13: MONOCARBOXYLATE PORTER (MCP) FAMILY
|
|
|---|
Thirteen proteins comprise the MCP family, and all are from eukaryotes (Table 13). Most of these proteins are derived from various animal sources including three from C. elegans. However, S. cerevisiae possesses four paralogs. Only mammalian members of the MCP family have been functionally characterized. These permeases appear to be energized by proton symport (80). Monocarboxylates transported by these permeases include lactate, pyruvate, and mevalonate (25). Topological studies leading to a 12-TMS model have been reported (65).
|
The portion of the complete multiple-sequence alignment shown in Fig. 13A reveals that within the 53-residue segment presented, only 4 residues are fully conserved, and all are probably of structural significance (two G's, one F, and one A). In fact, except for the KRR motif and two serines, all of the residues that appear in the consensus sequence are structural or hydrophobic.
|
The phylogenetic tree for the MCP family is shown in Fig. 13B. All of the functionally characterized mammalian monocarboxylate transporters cluster tightly together, and even the outlying proteins from higher animals (RemP Gga and XpcT Hsa) are loosely associated with this cluster. The three C. elegans paralogs comprise a second diverse cluster. The four S. cerevisiae paralogs branch distantly from the animal proteins, and the yeast paralogs comprise two distinct clusters. Because of the extensive sequence diversion of the functionally uncharacterized proteins, we anticipate that they will prove to exhibit different transport functions.
FAMILY 14: ANION:CATION SYMPORTER (ACS) FAMILY
|
|
|---|
The ACS family is a large family with 40 currently sequenced members (Table 14). One of the members of this family was previously recognized to be a member of the MFS (67). This protein is the rabbit inorganic phosphate:Na+ cotransporter (84). Several mammalian proteins of this specificity have now been characterized. All of the recognized substrates of the ACS family permeases are either organic or inorganic anions. Among the organic anions transported are glucarate, hexuronates, phthalate, allantoate, and probably tartrate (13, 66).
|
Proteins of the ACS family are widely distributed in nature. They are found in both gram-negative and gram-positive bacteria and in both the animal and fungal eukaryotic kingdoms. Oddly, no plant member has been sequenced, and none of the sequenced ACS proteins is from an archaeon.
Several organisms possess multiple ACS family paralogs. Thus, B. subtilis and Rattus norvegicus each have at least 2, E. coli has 5, S. cerevisiae has 7, and C. elegans has at least 15. Considering that the C. elegans genome was only about half sequenced when these analyses were conducted, one can anticipate that this one organism will prove to have nearly 30 paralogs within this one family of the MFS!
A portion of the complete multiple-sequence alignment for the proteins of the ACS family is shown in Fig. 14A. No residue is fully conserved at any one position. However, the aligned sequences are essentially gap free except for an incompletely sequenced region of one protein from C. elegans and another C. elegans protein which exhibits a single-residue insertion not found in the other proteins. Particularly worthy of note are the following residues found in the consensus sequence. The G is conserved in all but one protein; the W is conserved in all of the top 31 proteins, and the ER motif is conserved in many of the proteins. It is clear that the proteins have been correctly aligned in spite of very significant sequence divergence.
|
The phylogenetic tree for the ACS family is shown in Fig. 14B. Distinct clustering of the many proteins represented is apparent. Most striking is the fact that all bacterial proteins comprise one cluster, all the yeast proteins comprise a second, and the animal proteins comprise two additional diverse clusters. Proteins of known and similar specificities cluster tightly together (e.g., the mammalian inorganic phosphate:Na+ cotransporters or the glucarate transporters of gram-negative and gram-positive bacteria). Proteins specific for different but structurally related substrates (e.g., phthalate and 4-hydroxyphenylacetate, or glucarate and hexuronate) are found within the same cluster but distantly, while those of very dissimilar substrate specificities do not cluster at all. This tendency of permeases of similar specificities to "flock" together has been noted before (71, 72) and provides a basis for assigning tentative functions to several of the uncharacterized members of the ACS family.
FAMILY 15: AROMATIC ACID:H+ SYMPORTER (AAHS) FAMILY
|
|
|---|