Department of Plant Pathology, University of California, Riverside, California 92521,1 Department of Chemistry, California State Polytechnic University, Pomona, California 91768,2 Department of Plant Pathology and Microbiology, Faculty of Agricultural, Food and Environmental Quality Sciences, The Hebrew University of Jerusalem, Rehovot 76100, Israel,3 Institute of Molecular Biology, University of Oregon, Eugene, Oregon 97403,4 Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095,5 Institute of Cell and Molecular Biology, University of Edinburgh, Edinburgh EH9 3JH,6 Department of Molecular Microbiology & Genetics, Institute of Microbiology & Genetics, Georg-August-University, D-37077 Goettingen,7 Institute of Biochemistry, Heinrich Heine University, 40225 Dusseldorf,13 Department of Biology,8 Department of Biochemistry and Biophysics,23 Department of Plant Pathology and Microbiology, Texas A&M University, College Station, Texas 77843,28 Department of Biochemistry and Molecular Biology, Wright State University, Dayton, Ohio 45435,9 Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108,10 School of Biological Sciences, University of MissouriKansas City, Kansas City, Missouri 64110,11 Bio-Rad Laboratories, Inc., Hercules, California 94547,12 ,14 Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada,15 ,16 Department of Cell and Developmental Biology,17 Department of Cellular and Structural Biology, University of Colorado Health Sciences Center, Denver, Colorado 80262,29 Whitehead Institute Center for Genome Research, Cambridge, Massachusetts 02141,18 Department of Genetics,19 Department of Biochemistry, Dartmouth Medical School, Hanover, New Hampshire 03755,20 School of Biological Sciences, Flinders University, Adelaide 5001, Australia,21 Laboratory of Genetics, Department of Regulation Biology, Saitama University, Saitama City, Saitama 338-8570, Japan,22 Department of Environmental and Biomolecular Systems, School of Science and Engineering, Oregon Health and Science University, Beaverton, Oregon 97006,24 Department of Biochemistry, Ohio State University, Columbus, Ohio 43210,25 The Institute for Genomic Research, Rockville, Maryland 20878,26 Department of Molecular Biology and Biochemistry, University of California, Irvine, California 92697,27
SUMMARY INTRODUCTION Neurospora crassa: a Model Filamentous Fungus Basic Features of the Neurospora Genome Sequencing and assembly. Integration of the sequence with the genetic map. Nuclear and mitochondrial genes. Codon bias. Comparative multigene family and domain analysis. CHROMATIN ASSEMBLY AND GENE REGULATION Centromere Organization and Kinetochore Complexes Organization of centromeres. Kinetochore complexes and motors that move chromosomes. Chromosomes move through checkpoints. Chromatin Structure and Gene Regulation Nucleosome assembly and histone modification. (i) Nucleosome assembly and nucleosome spacing. (a) Core histones. (b) Core histone variants. (c) Linker histones. (d) Histone fold motifs and HMG proteins. (ii) Histone modifications. (a) HATs. (b) HDACs. (c) HMTs. (d) Histone kinases. (e) Histone ubiquitylases. (f) Histone ADP-ribosylases. Chromatin assembly and remodeling. (i) CAFs. (ii) CRFs. Transcription Factors Zn(II)2Cys6 fungal binuclear cluster family. C2H2 zinc fingers. GATA factors. bHLH transcription factors. B-ZIP transcription factors. Miscellaneous factors. Translation Factors GENOME DEFENSE, DNA REPAIR, AND RECOMBINATION Genome Defense Mechanisms Heterochromatin silencing and DNA methylation. RIP. RNA-dependent silencing. DNA Repair Photoreactivation. Excision. (i) NER. (ii) BER. Recombination repair. Postreplication repair. Checkpoint control. Meiotic Recombination Before the DSB. DSB generation. Removal of Spo11 protein from DNA. Resection of ends. Strand invasion. Synapsis and SC formation. Regulation of crossover frequency. Mismatch repair. Resolution of recombination intermediates. Nonhomologous end joining. METABOLIC PROCESSES AND TRANSPORT Extracellular Digestion Glycosyl hydrolases. Proteases. Nucleases and phosphatases. Lipases. Transporters Glycolysis, Fermentation, and Gluconeogenesis Glycolysis and the pentose phosphate cycle. (i) Hexose phosphorylation. (ii) EM glycolysis. (iii) HM and ED glycolysis and the pentose phosphate cycle. Alcoholic fermentation. Gluconeogenesis. Mitochondrion and Energy Metabolism Sulfur Metabolism Sulfur acquisition and processing. Generation of sulfide and cysteine. Homocysteine and methionine metabolism. Additional aspects of sulfur metabolism. Components of the regulatory machinery for sulfur metabolism. Nitrogen Metabolism Proteasome Lipids Protein Glycosylation, Secretion, and Endocytosis N-linked protein glycosylation (dolichol) pathway. Secretory and endocytic pathways. ENVIRONMENTAL SENSING Major Signal Transduction Pathways Two-component regulatory systems. Heterotrimeric G proteins. Ras-like GTPases. cAMP signaling. PAKs and GCKs. (i) PAKs. (ii) GCKs. MAPKs. Calcium signaling. Protein phosphatases. Mammalian signaling proteins not found in Neurospora. Photobiology and Circadian Rhythms Heat Shock and Stress Responses GROWTH AND REPRODUCTION Cell Wall Glucan synthases. (i) (1,3)ß-Glucan synthesis. (ii) (1,6)ß-Glucan synthesis. Chitin substrate synthesisthe Leloir pathway. Cell wall precursors. Chitin synthases. Hyphal Morphogenesis Generation of hyphal polarity. (i) Proteins important for cell polarity development. (ii) Rho-type GTPases as key regulators of polarity. Cytoskeleton and motor proteins. (i) Structural components. (ii) Kinesins. (iii) Myosins. (iv) Dynein. Cyclin/CDK Machinery Asexual and Sexual Sporulation Macroconidiation. Meiosis and the sexual cycle. FUNGAL PATHOGENESIS AND HUMAN DISEASE Relationship to Animal and Plant Pathogens Animal pathogens. Plant pathogens. Human Disease Genes PERSPECTIVES AND FUTURE DIRECTIONS ACKNOWLEDGMENTS REFERENCES
| SUMMARY |
|---|
|
|
|---|
10,000 predicted proteins encoded by the genome sequence of the filamentous fungus Neurospora crassa. Seven major areas of Neurospora genomics and biology are covered. First, the basic features of the genome, including the automated assembly, gene calls, and global gene analyses are summarized. The second section covers components of the centromere and kinetochore complexes, chromatin assembly and modification, and transcription and translation initiation factors. The third area discusses genome defense mechanisms, including repeat induced point mutation, quelling and meiotic silencing, and DNA repair and recombination. In the fourth section, topics relevant to metabolism and transport include extracellular digestion; membrane transporters; aspects of carbon, sulfur, nitrogen, and lipid metabolism; the mitochondrion and energy metabolism; the proteasome; and protein glycosylation, secretion, and endocytosis. Environmental sensing is the focus of the fifth section with a treatment of two-component systems; GTP-binding proteins; mitogen-activated protein, p21-activated, and germinal center kinases; calcium signaling; protein phosphatases; photobiology; circadian rhythms; and heat shock and stress responses. The sixth area of analysis is growth and development; it encompasses cell wall synthesis, proteins important for hyphal polarity, cytoskeletal components, the cyclin/cyclin-dependent kinase machinery, macroconidiation, meiosis, and the sexual cycle. The seventh section covers topics relevant to animal and plant pathogenesis and human disease. The results demonstrate that a large proportion of Neurospora genes do not have homologues in the yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe. The group of unshared genes includes potential new targets for antifungals as well as loci implicated in human and plant physiology and disease. | INTRODUCTION |
|---|
|
|
|---|
Fungi are often found in association with other organisms. Numerous diseases of humans and other animals are caused by fungi, and the incidence of life-threatening fungal infections is on the rise, in parallel with the increased number of immunocompromised patients (453). Fungi are the most important group of plant pathogens, causing significant and often devastating losses in crop yield worldwide (8). Mycorrhizal fungi form symbiotic associations with the roots of higher plants (744) and effectively determine what type of plant ecosystem develops (see, e.g., reference 389).
The fungi have played a major role in the progress of biochemistry, genetics, and molecular biology. George W. Beadle and Edward L. Tatum (55) defined the role of genes in metabolism, and this led quickly to the mid-century revolution in genetics. Their work took advantage of the filamentous ascomycete Neurospora crassa (hereafter referred to as Neurospora), which was first described in 1843 as the causative agent of an orange mold infestation in French bakeries; (Fig. 1) (607, 610). Neurospora was later domesticated as an experimental organism by Bernard O. Dodge (725) and Carl C. Lindegren (see, e.g., reference 476). Beadle and Tatum sought an organism displaying Mendelian genetics that could be grown on simple media and might display additional nutrient requirements arising by mutation. Their success emboldened others to use bacteria, algae, and other fungi in similar studies. Together with the elucidation of the structure of DNA in 1953, molecular biology as we know it was born.
|
One of the attractive features of Neurospora as a model organism is its complex yet genetically and biochemically tractable life cycle (Fig. 2). Neurospora is multicellular and produces at least 28 morphologically distinct cell types (82), many of which are derived from hyphae. Neurospora vegetative hyphae are tip-growing cellular elements that undergo regular branching (294, 798, 800, 812) and are multinucleate. These hyphae contain incomplete cross walls (septa) (315) that allow the movement of organelles between compartments. Frequent fusion among hyphal filaments produces a complex hyphal network (the mycelium) (336) and promotes the formation of heterokaryons in which multiple genomes can contribute to the metabolism of a single mycelium. Specialized aerial hyphae are differentiated from vegetative hyphae in response to nutrient deprivation, desiccation, or various stresses, and these form chains of asexual spores (the multinucleate macroconidia) for dispersal (752). The timing of macroconidiation is controlled by a circadian rhythm, which in turn is modulated by exposure to blue light. Another type of asexual spore, the uninucleate microconidium, is differentiated from microconidiophores or directly from the vegetative hypha (82, 495, 752). Limiting nitrogen induces a type of hyphal aggregation that leads to generation of multicellular female sexual organs (protoperithecia) (564, 642). Mating is accomplished by chemotropic growth of a specialized female hypha from the protoperithecium toward the male cell (typically a conidium) in a process involving pheromones (81). Fertilization and meiosis result in development of the female structure into a beaked fruiting body (the perithecium) within which asci, each containing eight ordered sexual spores (ascospores), are formed (638).
|
|
30. A comparison with 17 Mb of finished BAC sequence (http://mips.gsf.de/proj/Neurospora/) confirmed the sequence accuracy. Only 12 discrepancies were identified in this comparison. Five lie at the end of contigs and are most probably caused by misaligned or low-quality terminal reads. Four are short insertions or deletions, ranging from 9 to 559 bp. The remaining three discrepancies appear to be instances in which the finished sequence does not correctly represent the genome, owing to chimerism in the BACs and in which the whole-genome assembly is correct.
The total genome size can be approximated from the draft assembly by estimating the size of gaps between contigs and scaffolds. The size of gaps between adjacent contigs in a scaffold can be derived from the size of clones spanning the gap. When these gap sizes are included, the total physical length of all scaffolds is estimated to be 39.9 Mb. The size of gaps between scaffolds is more difficult to estimate since spanning clones are not available. In addition, these gaps include difficult-to-sequence regions of the genome including the ribosomal DNA (rDNA) repeats, centromeres, and telomeres. A total of
1.7 Mb of additional sequence (251) is probably accounted for by these regions. Based on these considerations, the genome size is estimated to be 41 Mb. The most recent estimate of the genome size based on pulsed-field gel electrophoresis of intact Neurospora chromosomes is 42.9 Mb (www.fgsc.net/fgn39/oneline.html). The size predicted from the sequence (41 Mb) is well within the limits of resolution for pulsed-field gel electrophoresis measurements of such large molecules.
Integration of the sequence with the genetic map. Approximately 1,000 genetic markers exist for Neurospora. The majority have been ordered on the genetic maps for the seven linkage groups and are described in a recent compendium (612). The Neurospora assembly was correlated with the genetic maps by using a subset of 252 markers for which there is sequence in Neurospora (or other closely related fungi). The marker sequences were located on the current assembly by using BLASTN and filtering for unique high-quality alignments (http://www.ogi.edu/satacad/ase [269]). The 243 (96% of the total) markers that aligned were then used to place contigs and scaffolds on the physical map, according to the genetic marker order. In all, 95% of the assembly was assigned to a linkage group; 85% of this sequence was further ordered and oriented within a linkage group.
Only a handful of discrepancies were noted between the physical and genetic maps. There were three cases where gene order differs between the two maps. Five markers were located in more than one contig, indicating places where the assembly failed to merge contigs. Twelve markers failed to be located within the physical map, indicating sequence gaps within the current assembly. Finally, nine scaffolds contained markers on different linkage groups, indicating either misplaced markers on the genetic map or contigs incorrectly linked within a scaffold.
Nuclear and mitochondrial genes. An automated annotation of the Neurospora draft genome sequence was performed by the Whitehead Institute Center for Genome Research (WICGR) by using the Calhoun annotation and analysis system. A combination of three gene prediction algorithms (FGenesH, FGenesH+, and Genewise) was combined with available protein homology to predict protein-coding genes. Gene predictions were compared with BLAT alignments of available expressed sequence tags (ESTs) (63, 565, 900) to assess accuracy. A total of 10,082 protein-coding genes were predicted. Eliminating proteins shorter than 100 amino acids that lack protein or EST similarity reduces this number to 9,200. This number of genes is within the range of 9,200 to 13,000 estimated by previous authors (56, 409, 442, 565). An additional 26 protein-coding genes reside in the mitochondrial genome (see "Metabolic processes and transport" below).
Consistent with the greater biological complexity of filamentous fungi compared to both fission and budding yeast, Neurospora possesses nearly twice as many genes as Schizosaccharomyces pombe (
4,800) and S. cerevisiae (
6,300). Neurospora contains almost as many as genes as Drosophila melanogaster (
14,300), despite the relative developmental complexity of the latter. In addition, 41% of the predicted Neurospora proteins do not have significant similarity to known or predicted proteins in other organisms and 57% do not have good matches to proteins in either S. cerevisiae or S. pombe (269). The Neurospora gene complement also displays greater structure complexity than those of the two yeasts. Neurospora genes possess a predicted 17,118 introns (1.7 introns per gene), compared to roughly 286 (0.04 intron per gene) and 4435 (0.95 intron per gene) in S. cerevisiae and S. pombe, respectively. However, as with the yeasts and other simple eukaryotes, Neurospora introns do appear to be biased toward the 5' regions of genes.
A total of 413 tRNA genes were identified using tRNAscan (487), including 234 (57%) with introns. Of this number, 396 are predicted to decode all standard amino acids and one could potentially decode UAG termination codons. Ten tRNA pseudogenes were identified, two of which were inferred to be mutated by RIP. An additional six tRNAs were predicted with undetermined specificity; one of these was inferred to be a relic of RIP.
All annotation data are available at the WICGR Neurospora crassa website (http://www-genome.wi.mit.edu/annotation/). In addition, a manually curated annotation of the Neurospora gene set is available at the Munich Information Center for Protein Sequences (MIPS) Neurospora crassa database (MNCDB; http://mips.gsf.de/proj/neurospora) (498). At present, MNCDB contains 8,500 Neurospora proteins; this number is expected to increase as manual gene prediction and annotation progress. MIPS protein codes were chosen according to the cosmid, BAC, and DNA shotgun contigs from which they were derived. Linkage was established with their respective counterparts in the WICGR database that were identified using automated gene prediction tools. The proteins in the WICGR database may differ from those in MNCDB, due to manual correction, but the proteins in the different databases are linked as long as partial matches are found.
Codon bias. The mRNA expression level is influenced by synonymous codon usage in a number of organisms. In particular, increasing codon bias is correlated with greater expression level in Escherichia coli (362), S. cerevisiae (156, 266, 606 [although see also reference 305]), as well as Caenorhabditis elegans, D. melanogaster, and Arabidopsis thaliana (208). Correspondence between tRNA gene copy number and codon usage has also been demonstrated for highly expressed genes in S. cerevisiae (816) and E. coli (362). It has been proposed, based on these and other data, that codon bias reflects coadaptation between codon usage and tRNA abundance in order to maximize the efficiency of protein translation for highly expressed genes. However, in mammalian genomes, codon bias has been attributed to regional variations in genomic G+C content (i.e., isochores). In support of this, it has been shown that in mammals the G+C content of regions flanking genes (816) and the GC content of introns (207) covary with the G+C content in the third position of codons.
Although Neurospora genes display significant variation in codon bias, the determinants of this bias are not known for filamentous fungi. To determine whether this variation might reflect mutational selection for translational efficiency, EST sequences from a number of previously characterized libraries were used to estimate relative transcript levels (63, 565, 900). In particular, a count of the number of distinct EST clones that align with a given gene (or flanking region) was used as an estimate of the relative transcript level for that gene. Two different measures of codon bias were used: the codon bias index (CBI) (65) and the effective number of codons (Nc). CBI is a measure of the amount of bias toward a particular set of favored codons, with a large CBI indicating greater bias; for this analysis, the set of favored codons from reference 481 was used. Nc is a measure of codon bias away from uniform codon usage, with a smaller Nc indicating greater bias.
A statistically significant correspondence between estimated transcript level and codon bias was detected using both CBI (Spearman rank correlation coefficient R = 0.30, n = 10,082, P < 1e-197) and Nc (Spearman rank correlation coefficient R = -0.25, n = 10,082, P < 1e-138). Furthermore, a significant correspondence between estimated transcript level and the degree of correlation of codon usage with synonymous tRNA copy number was detected. In other words, more highly expressed genes showed a strong tendency to display a codon usage that was more closely aligned with a synonymous tRNA gene copy number. A significant correspondence between codon third-position G+C content in genes and estimated transcript levels was also detected; however, there was no significant relationship between intron G+C content and estimated transcript levels. These data suggest that, similar to the situation in S. cerevisiae, codon usage and tRNA abundance in Neurospora have coevolved to maximize the efficiency of protein translation for highly expressed genes.
Comparative multigene family and domain analysis. Despite the presence of RIP, Neurospora possesses 527 multigene families, including 118 families expanded relative to their counterparts in S. cerevisiae. In addition, Neurospora possesses numerous Interpro protein domains that display expansions in number relative to other sequenced eukaryotes. Particularly surprising is the abundance of cytochrome P450 domains, which are numerous in plants and in Neurospora but very scarce in both S. cerevisiae and S. pombe. The cytochrome P450 enzyme domain, including the E-class P450 group 1 domain and E-class P450 group IV domain subclasses, are represented by 38 proteins in Neurospora. In contrast, S. cerevisiae and S. pombe contain only two to four proteins with these domains. Accounting for genome size, this represents a six- to eightfold increase in genes with these domains in Neurospora. Cytochrome P450s are known for playing roles in both detoxification and secondary metabolism, and the implications of their high representation in Neurospora have been discussed previously (269).
Other domains abundant in Neurospora include the zinc finger C2H2-type domain, the S-adenosylmethionine (SAM) binding motif domain, the short-chain dehydrogenase/reductase (SDR) superfamily domain, and the flavin adenine dinucleotide-dependent pyridine nucleotide-disulfide oxidoreductase domain. Interestingly, a number of domains involved in signaling appear underrepresented compared to other fungi and plants. These include the eukaryotic, serine/threonine, and protein tyrosine kinase domains. Other underrepresented domains include certain helicases, RNA binding protein motifs, and the AAA-ATPase superfamily domain.
In the following sections, different variations of the BLAST program (17) were used to search DNA or protein databases using DNA or protein sequences. The resulting e value is dependent on the database size, and various databases are of different sizes and many are increasing in size over time. Hence, the magnitude of e should be treated as an indication and not as absolute measure of the similarity between two sequences.
| CHROMATIN ASSEMBLY AND GENE REGULATION |
|---|
|
|
|---|
Organization of centromeres.
S. cerevisiae has the simplest eukaryotic centromere known, only
125 bp of DNA associated with a single nucleosome. This short region is divided into three centromere DNA elements (CDEs) which are conserved on all 16 chromosomes and serve as binding sites for the sequence-specific DNA binding protein Cbf1 and the essential CBF3 complex (457). The sequence of CDE II is not conserved, but the length (
80 bp) and A+T content (
90%) is similar at all yeast centromeres. CDE III is associated with the histone H3 variant Cse4/CENP-A and the kinetochore chromatin binding protein Mif2/CENP-C. In contrast to S. cerevisiae, S. pombe centromeres are much larger (40 to 100 kb) and are composed of two inverted repeats surrounding a nonconserved core. The inner inverted repeats and the core sequence are associated with Cnp1/CENP-A nucleosomes and the Ctf19 homologues Mis6 and Mis12 (603, 772). The flanking regions are assembled into heterochromatin in part by the histone methyltransferase, Clr4, and the heterochromatin protein Swi6 (see "Genome defense, DNA repair, and recombination" below). Drosophila centromeres are large (400 to 500 kb) and are composed of 5-bp satellite sequences interspersed with transposons or transposon relics, while human centromeres are 0.5 to 5 Mb long and are homogenously composed of 171-bp long
-satellites (for a review, see reference 154). In most animals and in plants, the repeat sequences of the satellite arrays are not conserved but the array repeats usually approximate the length of a nucleosome repeat of DNA sequence (330).
The seven centromeres of Neurospora remain largely uncharacterized, even after the cloning of Cen VII (137) and detailed analysis of a 17-kb segment (121). It is clear, however, that Neurospora centromeres are large (
200 to 400 kb) and AT rich, like those in Drosophila (Table 2). As in flies, they appear to consist of an accumulation of complete or fragmented and rearranged transposon relics, in particular the gypsy-type Tgl1 or copia-type Tcen retrotransposon relics, the LINE-like element Tad, and a homolog of the S. cerevisiae Ty3 transposon, Tgl2 (121). All such transposon relics have been inactivated by the genome defense mechanism of RIP (see "RIP" below), which introduces CG-to-TA transition mutations and thus renders the DNA highly AT rich. Micro- and minisatellites and homopolymeric stretches can be identified in centromeric regions (498). No specific accumulation of tRNA genes has been noted close to the centromeres. Assembly 3 of the Neurospora genome sequence contains the sequence of most of the centromeres of all chromosomes, but some of the putative centromeric regions cannot yet be assigned to supercontigs on specific chromosomes. Similarly, the German sequencing project (498) has not yielded complete sequence information for the centromeres of linkage groups (LG) II and V. Curiously, no significant increase in the ratio of physical to genetic distance between known markers has been observed in regions near the centromeres of LG II and V (498), in contrast to that previously reported for LG III (183).
|
Kinetochore complexes and motors that move chromosomes. To move chromosomes during cell division, centromeres are attached to spindle microtubules via the kinetochore and various motor complexes. In S. cerevisiae, the Cse4p/CENP-A nucleosome is bound by a centromere protein clamp, consisting of a homodimer of Cbf1p, the essential CBF3 complex, and Mif2/CENP-C (274, 519). Interestingly, Neurospora has putative homologues to both, presumably functionally equivalent, S. cerevisiae helix-loop-helix Cbf1p and the animal-type CENP-B proteins (Table 3). Curiously, alignments of putative Neurospora CENP-B homologues with S. pombe proteins (Cbh1, Cbh2, and Abp1) involved in centromere binding revealed that the Neurospora CENP-B homologues contain RIP-type mutations, including numerous nonsense mutations. Because CENP-B and Cbh1-like proteins are also related to the Drosophila Pogo and human Tigger transposons, most of the Neurospora Cbh1 homologues have been considered to be transposon relics (717). CBF3 is an octameric complex composed of four subunits (four Ndc10p, two Ctf13p, and one each of Cep3p and Skp1p). Neurospora has one Skp1p homolog, SCON-3, which is part of an SCF complex (E3 ubiquitin ligase see "Sulfur metabolism" below) involved in sulfur regulation (741), but there are no matches to the other three essential CBF3 subunits.
|
In mammals, the kinesin CENP-E, cytoplasmic dynein, microtubule tracking proteins, and disassemblases are involved in chromosome movement (see "Growth and Reproduction" below); whereas in yeast, dynein positions the spindle but is not involved in actual chromosome movement (154). It remains to be seen whether dynein is involved in chromosome segregation in Neurospora; nevertheless, effects similar to those in mammals have been observed in Tetrahymena and Drosophila (154). It is likely that both active motor movement and microtubule flux contribute to anaphase movements of chromosomes.
Chromosomes move through checkpoints. The mitotic (or spindle assembly) checkpoint blocks the entry into anaphase until the two kinetochores of duplicated chromatid pairs have attached to spindle microtubules. This ensures accurate segregation (655). It appears that a combination of unattached kinetochores coupled to lack of tension acting on both kinetochores of a chromatid pair causes the block (154). Genetic dissection in S. cerevisiae has identified seven components of the mitotic checkpoint (552), all of which appear conserved in Neurospora. The molecular interactions between kinetochores and all checkpoint proteins are not established, but it appears that Mad2p and Cdc20p play central roles in signaling the unattached kinetochore, either directly or through a signal-amplifying cascade which inactivates or sequesters Cdc20p. Loss or reduced levels of checkpoint proteins in metazoans causes chromosome missegregation, tumorigenesis, and apoptosis (154).
|
(c) Linker histones. Similar to other fungi (403), Neurospora has only one histone H1 gene (Table 4) (242). Studies with Neurospora, S. cerevisiae, Tetrahymena, Aspergillus nidulans, and Ascobolus immersus revealed that H1 is not essential in any of these organisms (44, 252, 640, 729, 817) but, rather, is implicated in the regulation of nitrogen and carbon metabolism (242, 329, 727).
(d) Histone fold motifs and HMG proteins. Short proteins with histone fold motifs are involved in transcriptional regulation in all eukaryotes (271, 639). The histone fold motif in CBFD/NF-YB/HMF is similar to domains found in archaebacteria (608). Three predicted Neurospora proteins containing histone folds are related to general transcription factors (Table 4): TATA binding factor (TBF) (NCU02017.1); CHRAC17 (NCU03073.1), a putative subunit of RNA polymerase II with homology to a subunit of the CHRAC chromatin remodeling factor from Drosophila; and HAP (NCU09248.1), a homologue of the CCAAT binding proteins Hap3p from S. cerevisiae, HAP-C from A. nidulans, Php3 from S. pombe, and NF-YB from humans.
Two sex-determining region, Y chromosome (SRY)-related high-mobility group (HMG) transcription factors, MATA-3 and MATa-1, have been characterized in Neurospora (233, 615). Three additional proteins have homology to mating peptides or carry a sterile alpha motif and an HMG-1-like box (Table 4). NCU03481.1 is related to a repressor of hypoxic genes (Rox1p) in S. cerevisiae and a virulence factor (Rfg1) in Candida albicans. NCU09387.1 and NCU02326.1 are related to S. pombe Ste11. Interestingly, the Ste11-related proteins from Neurospora are each other's closest homologues, which is unusual in Neurospora. HMG-like proteins are typically short, such as HMG1.2 (NCU09995.1; related to S. cerevisiae recombination proteins Nhp6A and Nhp6B) and HMG2A (NCU02819). The most unusual HMG protein predicted in Neurospora (NCU09120.1) has a HMG box at the C terminus and is most similar to human polyamine oxidases.
(ii) Histone modifications. Core and linker histones are extensively modified at the posttranscriptional level (826). Histone modifications have been studied both for their effects on the regulation of specific genes and for their importance to global regulatory phenomena. Recent work has uncovered an epigenetic "histone code" (384, 750, 761, 811) involved in transcriptional regulation (for reviews, see references 67 and 104) and other DNA transactions (for recent reviews, see references 245, 361, 691, and 810). Histone residues can be actively modified by acetylation and deacetylation of lysine; methylation of lysine and arginine; phosphorylation and dephosphorylation of serine, threonine, and histidine; ADP-ribosylation of glutamic acid, and ubiquitylation of the entire proteins (384, 826). Because the interrelationships between histone modifications are essential for the formation and maintenance of silent chromatin states (384, 445, 559), genes involved in Neurospora histone modification are described below (see "Genome defense, DNA repair, and recombination"). The combinatorial possibilities of the histone code are staggering, even without considering the facts that different modification states are possible on the same residue and that lysine and arginine can be mono-, di-, or trimethylated (see, e.g., reference 778).
(a) HATs. Histone acetyltransferases (HATs) transfer acetyl groups from acetyl coenzyme A acetyl-CoA to lysines, most often located within the amino-terminal tail of the core histones. Neurospora has representatives of many of the HATs involved in transcriptional activation and gene silencing (e.g., TAFII250, Gcn5p, Sas2p, Sas3p, Esa1p, and Elp3p [J. Dobosy and E. Selker, unpublished results]) but lacks homologues to S. cerevisiae Hpa1p and metazoan CBP and SRC (Table 5). The homologue of S. cerevisiae Gcn5p lacks a locus identifier because the predicted protein lies at the beginning of contig 3.38 (Table 5). Neurospora has three N-terminal acetyltransferases that correspond to the budding yeast proteins Nat1p, Nat3p, and Mak3p, all of which are involved in cell cycle control (Table 5) (415). Interestingly, three of the putative GNAT/RIM1-type acetyltransferases have bacterial proteins as their closest relatives. The range of substrates of the predicted Neurospora acetyltransferases remains unknown.
|
(c) HMTs. Histone methyltransferases (HMTs) add methyl groups supplied by SAM to lysines (Lys) or arginines (Arg) on core histones. Neurospora has a single member of all known HMT subfamilies (Table 5), whereas metazoans, plants, and in some cases S. cerevisiae and S. pombe have multiple proteins for each. Neurospora is predicted to have nine proteins (SET-1 to SET-8 and DIM-5; M. Freitag, K. Adhvaryu, and E. Selker, unpublished data) with SET domains, a motif first found in the Drosophila Su(var) 3-9, Enhancer of zeste, and Trithorax proteins. SET domains are characteristic of lysine protein methyltransferases (for a review, see reference 445), although not all SET domain proteins are HMTs. Neurospora lacks some SET proteins identified in S. cerevisiae (e.g., Set5p and Set6p) and in humans but has two proteins (SET-6 and SET-8) that are either novel or restricted to only a subset of fungi. Both proteins align well by their Zn finger and Jumanji (JmJi) domains, but the pairing of these domains with the SET motif is rare. Like its homologues in S. pombe, animals, and plants (641, 690), Neurospora DIM-5 is a histone H3 Lys9 HMT (776, 778). Budding yeast Set1p and Set2p methylate histone H3 on Lys4 (106, 553, 660) and Lys36 (762), respectively, and Neurospora has striking homologues (SET-1 and SET-2) to these two proteins. Neurospora SET-3 is related to Drosophila ASH1, which methylates histone H4 on Lys20 and histone H3 on Lys4 and Lys9 (59), and SET-7 is related to E(z), which methylates histone H3 Lys9 and Lys27 (546). Human G9A also methylates histone H3 on Lys9 and Lys27 (770, 771), and Neurospora SET-5 appears related to the G9A subgroup.
Neurospora has three putative arginine methyltransferases (PRMTs) (for reviews, see references 432 and 511) that are homologous to PRMT1, PRMT3 and PRMT5 of humans, respectively (Table 5). Interestingly, Neurospora does not have a recognizable homologue of either PRMT2 or PRMT4 (CARM1). CARM1 methylates histone H3 Arg2, Arg17, and Arg26 (51, 180). PRMT3 has not yet been shown to methylate histones (249, 781), but PRMT1 methylates histone H4 Arg3 in vivo (696, 843) and PRMT5 and its homologues can methylate both histone H2A and H4 in vitro (103, 227). PRMT5 is a homolog of S. pombe Skb1 and S. cerevisiae Hsl7p, which methylate the protein kinases Shk1 (43) and Swe1p (150), respectively. The Neurospora homolog is called PP-2 (P. Bobrowicz and D. Ebbole, unpublished data).
Neurospora has one homologue of S. cerevisiae Dot1p (M. Freitag, C. Matsen, J. Murphy, G. Kothe and E. Selker, unpublished data) (Table 5), an HMT which methylates Lys79 within the globular domain of histone H3 and which is important in telomeric silencing in S. cerevisiae (740) and humans (232, 568).
(d) Histone kinases.
Like histone acetylation and methylation, histone phosphorylation has been intensively studied, and found to be important for chromosome codensation, the signaling of active versus silent chromatin states, transcription, regulated cell death, and DNA repair (145). All core and linker histone H1 can be phosphorylated in vitro (826), and all histone kinases that act on histones in vivo are involved in the control of cell cycle progression. Histone H3 Ser10 can be phosphorylated by at least two different kinases in S. cerevisiae, Snf1p (483) and Ipl1p/Aurora B (172, 351). The S. cerevisiae Snf1p kinase is a heterotrimer, composed of the Snf1p
subunit, the Snf4p
subunit and three different ß subunits, Sip1p, Sip2p, or Gal83p (474). Neurospora has homologues to Snf1p and Snf4p but has only one ß subunit, most closely related to Gal83p (Table 5). Phosphorylation of H3 Ser10p is a prerequisite of acetylation at histone H3 Lys14 and is usually required for gene activation (482, 572). Like H3, the centromeric H3 variant CENP-A can be phosphorylated, albeit at Ser7 (895). In addition to homologues of Snf1p and Ip1p/Aurora B, filamentous fungi have a cell cycle kinase, Nim-A (Neurospora NIM-1 [635]), that phosphorylates histone H3 Ser10 (192).
In animals, histone H3 Thr13 can also be phosphorylated by the Dlk/Zip kinase in vivo (629). This modification is found on both H3 and CENP-A at assembled centromeres, suggesting that Thr11 phosphorylation rather than Ser10 phosphorylation may be involved in the maintenance of silent chromatin and kinetochore attachment. Curiously, these threonine DAP (death-associated protein kinase)-like enzymes seem restricted to animals, since no homologue has been found in any fungus, including Neurospora.
Candidate histone phosphatases from Neurospora (PPP-1, PPH-1, and PZL-1) have been isolated by biochemical means, and their genes have been identified (see "Environmental sensing" below).
(e) Histone ubiquitylases. Our understanding of histone ubiquitylation is still fragmentary. Ubiquitylated histones H2A and H2B are the most abundant ubiquitylated proteins in eukaryotes (381). Ubiquitin is linked to histone H2A Lys119 (91) and H2B Lys120 or Lys123 in animals and S. cerevisiae, respectively (658, 789). All histone H2 variants known in animals can be ubiquitylated (381). Neurospora has several predicted subunits of ubiquitin-activating (E1) and ubiquitin-ligating (E2) proteins that are predicted to be involved in histone ubiquitylation. For example, the Neurospora mus-8 gene encodes a Rad6p-like H2B ubiquitin-ligase (Table 5) (658, 749). This enzyme affects gene silencing via histone H3 methylation states in S. cerevisiae, an example of "trans-tail" regulation of histone modifications (765).
(f) Histone ADP-ribosylases. Of the histone modifications, ribosylation is currently the least well understood. One class of enzymes thought to be involved in ribosylation and histone turnover are the poly(ADP-ribose) polymerases (PARPs) (16). In contrast to mammals, which have multiple PARPs, Neurospora has a single parp gene (G. Kothe and E. Selker, unpublished results; (Table 5). Whether any of the small GTP binding proteins with ADP ribosylation activity have effects on histones in vivo is unknown.
Chromatin assembly and remodeling. (i) CAFs. Eukaryotes use histone chaperones or chromatin assembly factors such as CAF and RCAF to guide histones prior to assembly into nucleosomes (for a review, see reference 517). As in other eukaryotes, Neurospora CAF-1 is predicted to be composed of three subunits (CAC-1 to CAC-3) (M. Freitag and E. Selker, unpublished data) (Table 6). Disruption mutants with mutations in the gene encoding the largest, least well conserved subunit, cac-1, are viable (Freitag and Selker, unpublished); this is similar to the situation in S. cerevisiae, where neither CAF nor the RCAF subunit of the antisilencing factor (Asf1p) are essential. These findings suggest the presence of additional chromatin assembly factors. CAC-2 and CAC-3 contain WD40 domains and are conserved among all eukaryotes studied. Neurospora CAC-3 is more closely related to retinoblastoma binding protein 48 (RBBP4; p48) from mammals than it is to S. cerevisiae Cac3p (Msi1p/Ira1p). In mammals, RBBP4/p48 also associates with HDAC complexes (653, 840).
|
Neurospora has two predicted proteins with domains characteristic of nucleosome assembly proteins (NAPs) (Table 6). All other eukaryotes with sequenced genomes have at least two paralogues. NAF-1 (nucleosome assembly factor 1) is a canonical NAP, which probably functions as a chaperone for the histone H2A-H2B tetramer (139, 368). NAF-2 is related to phosphatase 2A inhibitor 2 (471), which is involved in chromatin decondensation in S. pombe and humans and may help balance competing kinase and phosphatase activities at histone H3 Ser10 (572).
Like S. cerevisiae and S. pombe, Neurospora has a single homologue of the proliferating-cell nuclear antigen protein (PCNA) (Table 6). The homotrimeric PCNA complex serves to mark newly replicated DNA at the replication foci to which the CAF complex localizes. PCNA is involved in many activities involving DNA, including replication, repair, and silencing (517).
(ii) CRFs. Chromatin-remodelling factors (CRFs) use the energy generated by their ATPase subunits to remove or position nucleosomes relative to the DNA substrate (for recent reviews, see references 58, 240, and 452). Neurospora has 24 predicted proteins related to the SWI/SNF ATPase/helicase domain (Table 7). In organisms in which they have been studied, these proteins have been implicated in chromatin remodeling, DNA repair, and activation of transcription and are typically found in large complexes. Ten of the predicted Neurospora proteins are homologues of previously identified CRFs. The remaining 14 may be involved in ATPase-dependent repair processes and transcription. One of these, MUS-25, is the previously identified Neurospora RAD54 homolog (312). The Neurospora complement of SWI/SNF ATPases is a subset of those found in S. cerevisiae, S. pombe, Arabidopsis, and animals.
|
CRFs were categorized by their putative conserved ATPase subunits alone. Usually, ATPase subunits of various CRFs from different organisms are more closely related to each other than to different CRF ATPases from the same organism. Compared to CRFs from the yeasts, Drosophila, and human, relatively few homologues to non-ATPase subunits can be identified in Neurospora (e.g., to the yeast RSC, SNF2, and ISW complexes, the Drosophila CHRAC and NURF complexes, and the human BRG and BRM complexes). While there are fewer CRF ATPases in the Neurospora genome than in the S. cerevisiae genome, it appears likely that one ATPase subunit may associate with different accessory proteins to form specific complexes, analogous to the situation in Drosophila, where ISWI is present in three complexes: ACF, NURF, and CHRAC (452).
Similar to S. cerevisiae, Neurospora has two ATPase homologues in the INO80 group of CRFs, CRF1-1 and CRF2-1 (Table 7). Ino80p is the only CRF that has helicase activity (728). CRF1-1 is a homologue of Swr1p, while CRF2-1 is a homolog of Ino80p. Both are related to DOMINO CRFs of Drosophila (667).
Arabidopsis, metazoans, S. pombe, and S. cerevisiae have several CRFs that are generally associated with global gene activation. In Drosophila, for example, the distribution of RNA polymerase II and the BRAHMA complex largely coincide (27). Like humans (BRG and BRM), S. cerevisiae has two such complexes (SNF2 and RSC) with the bromodomain ATPases Snf2p and Sth1p, respectively (392, 542). Strikingly, Neurospora has only one predicted bromodomain ATPase, CRF3-1 (Table 7). Similar to their homologues, Neurospora CRF3-1 and two polybromodomain proteins (PBD-1 and PBD-2 [Table 7]) predicted to be in RSC- or SNF2-like complexes contain bromodomains, which bind to acetylated lysine residues (581).
Most eukaryotes have more than one heterochromatin-associated complex of the Imitation Switch (ISWI) type. In S. cerevisiae, at least four ISWI complexes exist, two each with specific ATPase, Isw1p and Isw2p, whereas in Drosophila, the single ISWI protein is present in three separate complexes: ACF, NURF, and CHRAC (for a review, see reference 452). This may be similar to the situation in Neurospora, where only one ISWI homologue, CRF4-1, exists but where subunits related to a yeast ISW2 component and Drosophila CHRAC subunits (Table 4, "Histone-like proteins") can be identified (Table 7). CR