Microbiology and Molecular Biology Reviews, September 2005, p. 373-392, Vol. 69, No. 3
1092-2172/05/$08.00+0 doi:10.1128/MMBR.69.3.373-392.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Department of Plant and Soil Sciences, and Kentucky Tobacco Research and Development Center, University of Kentucky, Cooper and University Drives, Lexington, Kentucky 40546,1 Verdia Research Campus, Pioneer International, A Dupont Company, 700A Bay Road, Redwood City, California 94063,2 Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 606373
SUMMARY INTRODUCTION STRATEGIES FOR DIRECTED EVOLUTION IN PROTEIN DESIGN DNA Shuffling Whole-Genome Shuffling Heteroduplex Random Chimeragenesis on Transient Templates Assembly of Designed Oligonucleotides Mutagenic and Unidirectional Reassembly Exon Shuffling Y-Ligation-Based Block Shuffling Nonhomologous Recombination Combining Rational Design with Directed Evolution APPLICATIONS OF DIRECTED EVOLUTION Directed Evolution of Nucleic-Acid-Modifying Enzymes Polymerases. Nucleases. Transposase. Integrase/recombinase. Reporter genes. Directed Evolution of Biochemical Catalysts Proteolytic enzymes. Cellulolytic enzymes. Enzymes for bioremediation. Lipases and esterases. Cytochrome P450 enzymes. Directed Evolution of Metabolic Pathways Directed Evolution of Pharmaceuticals Protein pharmaceuticals. Antibodies. Vaccines. Viruses. Therapeutic chemicals. Directed Evolution of Agriculturally Important Traits Existing traits. (i) Glyphosate tolerance. (ii) B. thuringiensis toxin. (iii) Golden rice. Next-generation traits. (i) Chitinase for antifungal properties. (ii) Mycotoxin detoxification. (iii) Viral vectors. CLOSING REMARKS ACKNOWLEDGMENTS REFERENCES
| SUMMARY |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Directed protein evolution is a general term used to describe various techniques for generation of protein mutants (variants) and selection of desirable functions. Over the last three decades, directed protein evolution has emerged as a powerful technology platform in protein engineering. This technology has been advanced considerably by the availability of molecular biology tools and emerging high-throughput screening technologies. These methodologies have simplified the experimental processes and facilitated the identification of mutants with even small improvements in desired function. Advanced recombinant DNA technologies have allowed the transfer of single structural genes or genes for an entire pathway to a suitable surrogate host for rapid propagation and/or high-level protein production. Furthermore, it is now possible to control the rate of mutagenesis in widely applied methods such as error-prone PCR and to modify proteins by systematic insertions or deletions. In addition, site-directed, site-saturation mutagenesis and synthetic oligonucleotides can be used to expand the localized amino acid diversity. While functional complementation of mutant strains is still an excellent choice when possible, the development of sensitive instrumentation and the ability to miniaturize many chemical or biological assays allow the screening of large numbers of samples for selection of desired functions. The ability to rapidly obtain DNA sequence information for gene variants not only provides insight into protein sequence-function relationship but also enhances our ability to select the strategy best suited for the evolution of a particular protein. Thus, directed protein evolution has been expanded from the original in vivo approach (e.g., the evolution of EbgA) to include in vitro exploration.
One of the most effective strategies in directed protein evolution is to gradually accumulate mutations, either sequentially or by recombination, while applying selective pressure. This is typically achieved by the generation of libraries of mutants followed by efficient screening of these libraries for targeted functions and subsequent repetition of the process using improved mutants from the previous screening. Many formats of directed protein evolution have been, and continue to be, developed (8, 9).
Here, we review the more recent progress in directed protein evolution (referred as directed evolution hereafter) in a wide range of scientific disciplines and its impacts in chemical, pharmaceutical, and agricultural sciences. Although many strategies for directed evolution are described, we focus on the directed evolution of proteins through gradual accumulation of beneficial mutations, and examples of recombination-based approaches are used primarily to illustrate the power of this technology. The advances in screening technologies for identification of useful functions will not be discussed here, as they have been reviewed elsewhere (8, 184, 207, 273).
| STRATEGIES FOR DIRECTED EVOLUTION IN PROTEIN DESIGN |
|---|
|
|
|---|
|
Library diversity is created through mutagenesis or recombination. Traditionally, libraries have been generated by random point mutagenesis (using, for example, error-prone PCR) or by site-directed mutagenesis of a starting sequence. These libraries are screened (or selected), and the best variant is selected for additional mutagenesis. Because the frequency of beneficial mutations is generally low relative to that of deleterious mutations, only single beneficial mutations are added in each cycle of mutagenesis and screening. Indeed, the probability of improvement decreases rapidly when multiple mutations are made. Thus, iterative, point-mutation-based approaches are generally limited to improvements made in small steps.
DNA shuffling overcomes this limitation by allowing the direct recombination of beneficial mutations from multiple genes. In DNA shuffling a population of DNA sequences are randomly fragmented and then reassembled into full-length, chimeric sequences by PCR (286, 287). In so-called "single-gene" formats, mutations are introduced during the reassembly process by controlling the error rate of DNA polymerase. After screening or application of selective pressure, progeny sequences encoding desirable functions are identified. These clones are then shuffled (bred) iteratively, creating offspring that contain multiple beneficial mutations. Because of this poolwise recombination of beneficial mutations, DNA shuffling gives rise to dramatic increases in the efficiency with which large phenotypic improvements are obtained.
While such methods are relatively efficient when small steps through sequence space are required, the relationship between library diversity, library size, and assay capability dictates that the evolution of phenotypes requiring larger steps through sequence space employ a more efficient search strategy. A simple and powerful way to do this is to use naturally occurring homologous genes as the source of starting diversity (64). In contrast to single-gene shuffling, in which library members are typically 95 to 99% identical, so-called "family shuffling" allows block exchanges of sequences that are typically >60% identical. In part because the sequence diversity comes from related, parental sequences that have survived natural selection ("functional" sequence diversity), much larger numbers of mutations are tolerated in a given sequence without introducing deleterious effects on the structure or function. The increased sequence diversity of these chimeric libraries thus results in sparse sampling of much greater regions of sequence and function space.
Even greater control over the incorporation of sequence diversity can be achieved through "synthetic shuffling." In this approach, no physical starting genes are required. Instead, a series of degenerate oligonucleotides that incorporate all desired diversity (for example, naturally occurring diversity and diversity identified by structural analysis) are used to assemble a library of full-length genes (217). In contrast with fragmentation-based methods, in synthetic shuffling every amino acid from a set of parents is allowed to recombine independently of every other amino acid. By breaking the linkages between amino acids normally present in parental genes, synthetic shuffling methods access unique regions of sequence space.
All directed evolution experiments must contend with the constraints described above: principally, the type and quality of diversity present in the library, the library size, and the ability of an assay to accurately identify desired clones from that library. To the extent that a desired phenotype is accessible within these constrains, standard DNA shuffling formats and other formats described below provide a rapid and powerful method to optimize activity. For more demanding phenotypes, such as de novo enzyme design, novel substrate specificity, novel enzyme chemistry, etc., there is a need to maximize the information content of a library so that larger steps through vast regions of sequence and function space may be efficiently explored.
Whole-genome shuffling is powerful and beneficial in manipulation of organisms (52, 67). It allows the evolution of desired phenotypes by rapid genomic manipulation and stabilization. Directed whole-genome evolution is not limited to microorganisms. By a variety of means, genomes from eukaryotic cells, including regenerable cells from animals and plants, can be recombined recursively for accelerated phenotypic improvement.
-phosphorothioate deoxynucleoside triphosphates (194). Subsequent blunt-ending and treatment with the second restriction enzyme release truncated fragments in various lengths, and chimeras can then be generated by ligation to recyclize the vector. This approach has been combined with an additional recombination step to develop SCRATCHY (193). More recently, the NRR method has been described (23). NRR is based on DNase I fragmentation, blunt-end ligation/extension, and capping using two asymmetrical DNA hairpins to stop the extension. This method potentially provides higher flexibility in modulating fragment size and crossover frequency, as well as in the number of parental genes. The major challenge facing all techniques for sequence-independent recombination of proteins is the presence of large numbers of nonfunctional progeny in the libraries (due to nonsense mutations caused by, for example, frameshifting and/or reversed DNA fragment orientation), thus hindering the search for functional mutants. Therefore, it is critical that a high-throughput screening is in place for the selection; otherwise, a preselection strategy, e.g., downstream fusion of a reporter or selection marker to reduce mutants with internal stop codons, can be applied to generate high-quality libraries. A powerful application of rational design is using it to focus library diversity for directed evolution experiments. In general, computational analysis of a protein's structure is first used to generate sequence diversity and to test those sequences for functional properties that can be modeled (scored) in silico. Only those variants that pass this prescreen are then synthesized and tested experimentally. In this manner, costly and time-consuming experimental searches are limited to regions of sequence space that are consistent with a protein's structure.
In an elegant example of structure-based computational design, Dwyer et al. introduce triosephosphate isomerase activity into a catalytically inert protein scaffold, ribose-binding protein (79). The design strategy consists of three stages. First, a chemical and geometric definition of the catalytic machinery was generated. Second, a combinatorial search was performed to identify positions within the active site where the catalytic machinery and substrate could be placed, while simultaneously satisfying the above constraints. Third, the remainder of the active site was optimized to form a stereochemically complementary binding surface. A total of 14 designs were tested, and one of these exhibited a kcat/Km ratio of 1.5 x 102 for the conversion of dihydroxyacetone phosphate to glyceraldehyde-3-phosphate. This is about 3 orders of magnitude less than the ratio for wild-type triosephosphate isomerase but is nevertheless a rate enhancement of more than 105 over that of the uncatalyzed reaction. Subsequently, the authors use directed evolution to improve the kcat/Km ratio of the designed enzyme. As is often the case, many of the accumulated changes identified by directed evolution lie in regions distal from the active site, and their effect on activity is therefore difficult to rationalize. A key issue for future design strategies lies in understanding how such mutations, which often contribute cooperatively and over long distances, improve activity (284).
One of the great advantages that emerges from the synthesis of rational design and directed evolution is that once a gene with even low levels of starting activity is obtained through design, it may be rapidly optimized by directed evolution (275). Thus, the goal of rational design becomes detecting even a weak starting activity from a focused library, rather than obtaining an optimized level of activity. The complementary use of rational design with directed evolution is a promising path towards the production of proteins with new and improved properties.
| APPLICATIONS OF DIRECTED EVOLUTION |
|---|
|
|
|---|
Polymerases. Molecular biology technologies such as DNA labeling, PCR, sequencing, site-directed mutagenesis, and some cloning often require DNA polymerases with high activity under suboptimal conditions, such as extreme temperatures and/or in the presence of inhibitors. Compartmentalized self-replication (CSR) is a useful strategy for directed evolution of DNA polymerases or RNA polymerases (89). CSR is based on a feedback loop consisting of a polymerase that replicates only its own encoding gene. Self-replications of polymerase variants generated by error-prone PCR are performed in separated compartments formed by water-in-oil emulsions. Genes encoding improved polymerase under the selection conditions used replicate at higher rates and eventually dominate the mutant population. CSR has been used for evolution of Taq polymerase in the presence of increasing amounts of the inhibitor heparin, resulting in the isolation of a variant that exhibits a 130-fold increase in heparin resistance (89).
Directed evolution has been successfully applied to DNA polymerase for enhanced activity (233) and conversion to an efficient RNA polymerase (232, 333). The 2'-O-methyl-RNA is more stable and has been produced by chemical synthesis. Chelliserrykattil and Ellington established an efficient screening system for selection of highly active polymerases (47). This system creates a so-called "autogene" by cloning the T7 RNA polymerase under the control of its own promoter. In this system the polymerase variants with higher activity will generate more mRNA and can thus be selectively amplified by a reverse transcription-PCR process. The autogene system has allowed the identification of T7 RNA polymerase variants that can efficiently incorporate various 2'-modified nucleotides with good processivities (47, 48). Mixtures of the polymerase mutants with different specificities have produced transcripts with multiple modified nucleotides. DNA polymerase that is capable of incorporating 2'-O-methyl nucleotides has also been created by directed evolution (82).
Nucleases. Nucleases, including restriction endonucleases, are essential enzymes in modern molecular biology and thus are active targets for directed evolution. An intelligently designed selection by compartmentalization of each gene variant in a rabbit reticulocyte transcription/translation system overcomes limitations associated with in vivo screening techniques, allowing the efficient screening of restriction endonuclease libraries (74). Novel selection methods have also been developed for selection of restriction enzymes with altered substrate specificities (80, 168, 256, 353). DNA cleavage specificities have been created from the E. coli RNase P derivatives (59).
Transposase. Naumann and Reznikoff (216) used directed evolution to generate a mutated Tn5 bacterial transposase that could function on transposons with mutated end binding sequences. The Tn5 transposon encodes a 53-kDa transposase protein (Tnp) that facilitates the movement of the entire transposon by first binding to each of the two 19-bp specific binding sequences (known as outside end [OE]), followed by formation of a nucleoprotein complex, blunt-end cleavage, and then transfer to the target DNA. The transposon also promotes the movement of a single OE by using an additional 19-bp inside end sequence (IE). The wild-type Tn5 Tnp activity is inhibited in E. coli as a result of Dam methylation at the IE (IEME). In order to screen for a transposase mutant that functions with mutated inverted repeats, the IE was modified at position 12 from thymine to adenine (IE12A), which results in loss of recognition by the wild-type transposase. As a consequence, insertion of IE12A in the flanking region of the lacZ gene between the transcription and translation start sites results in an inactive transposon. Three rounds of gene shuffling and high-throughput screening for LacZ activity at about 104 colonies per round, followed by analysis of the active variants for activities against OE and IE, has allowed the isolation of a specific hyperactive Tnp variant (TnpsC7). While methylation of IE reduced the wild-type Tnp activity by 100-fold, TnpsC7 activity in the presence of IEME was markedly higher.
Integrase/recombinase.
Improved site specificity for large genome modifications has been recently demonstrated for the wild-type
C31 integrase (265). Sclimenti et al. (265) applied two rounds of DNA shuffling in combination with a genetic screen that is capable of identifying improved variants expressing the lacZ reporter gene. This improved enzyme possesses strong preference for target-site DNA sequences and has 10- to 20-fold-higher absolute integration frequencies than the wild-type
C31 integrase. In addition to the demonstration of improved site specificity of this integrase, several other groups have successfully altered the site specificity of the Cre/Flp recombinases by directed evolution (35, 36, 252, 258, 314). The Cre recombinase catalyzes the integration, excision, and rearrangement of two 34-bp, double-stranded recombination sites known as loxP. Santoro and Schultz (258) designed a fluorescence-activated cell sorting-based screening for recombinases that recognize unnatural recombination sites. The screening system consists of a recombinase variant and a reporter gene plasmid, expressing either enhanced yellow fluorescent protein (YFP) or green fluorescent protein (GFP). Using this high-throughput selection system, the authors isolated recombinase variants that show high specificity for unnatural loxP sites and low activity for the wild-type loxP site. Site-specific manipulation of genomes by recombinases is a powerful functional genomic tool. Recombinases such as Cre have been widely used to mutagenize and replace genes in mice. Expanding the recombination sequences of recombinases will improve the efficiency and the quality of production of transgenic animals and plants. The ability to evolve proteins that interact with DNA has broad implications. Efforts to evolve other DNA-binding proteins, such as transcription factors, for tailor-made specificities are under way.
Reporter genes. Although by themselves they usually do not modify nucleic acids, in molecular biology, reporter proteins are often closely associated with other proteins that do. Directed evolution has been applied to optimize the physical properties of fluorescent proteins and small-molecule probes for real-time imaging of live cells (21, 40, 142). Fluorescent probes function as "passive" markers that provide high sensitivity for real-time visualization and tracking of cellular events without perturbing the cells. GFP is widely used for tracking protein localization in vivo and has been evolved by directed evolution (65). Additional fluorescent variants such as YFP and cyan fluorescent protein have been generated by mutagenesis of the wild-type GFP. These fluorescent variants may be used as companion markers for protein colocalization and for tracking protein-protein interactions by fluorescent resonance energy transfer (FRET). Nguyen and Daugherty (220) addressed the dynamic range and sensitivity limitations associated with FRET by designing a strategy in which a cyan fluorescent protein-YFP fusion system is used to allow the detection of subtle improvements, enabling gradual optimization of FRET signals. When this system is coupled with random mutagenesis and targeted saturation mutagenesis, substantial enhancement of FRET dynamic range and sensitivity has been achieved. Another example is the engineering of the Discosoma red fluorescent protein (DsRed). The wild-type, tetrameric DsRed has poor solubility that can affect the function and localization of the tagged proteins. DsRed is also slow in the chromophore maturation process. By applying seven rounds of site-directed mutagenesis and error-prone PCR followed by high-throughput visual screening for fluorescence in microbial cells, Bevis and Glick (21) isolated soluble DsRed variants that also mature 10 to 15 times faster than the wild-type protein. While the improved DsRed isolated by Bevis and Glick retained its tetrameric state, Campbell et al. (40) evolved DsRed to an active monomeric form that matures 10 times faster than the wild-type protein. Their approach was a stepwise evolution of DsRed first to a dimer and then to a monomer. This sequential improvement of DsRed resulted in an active monomeric protein with improved solubility and shorter maturation time, leading to greater tissue penetration and spectral separation from autofluorescence and other fluorescent probes. The next generation of the monomeric fluorescent proteins have been shown to be more photostable, mature more completely, and be more tolerant to forming fusion proteins (274). The improvement of another well-known reporter protein, beta-glucuronidase, was achieved (200, 202). Further evolution successfully converted this enzyme into a beta-galactosidase (202). Beta-galactosidase activity has also been evolved from a fucosidase (72, 345).
Increasing protein solubility by directed evolution is not limited to reporter proteins. Overexpressed proteins in heterologous systems such as E. coli often fail to fold into their native states and are thus accumulated as insoluble inclusion bodies. An efficient method to generate more soluble forms of insoluble proteins is directed evolution. One way to screen for soluble variants is to fuse the variants of an insoluble protein to a reporter for heterologous expression, followed by screening of the reporter protein activity (reviewed by Waldo [317]). Yang et al. (336) utilized a GFP-based screening to evolve the solubility of the Mycobacterium tuberculosis Rv2002 gene product. While overexpression of Rv2002 in E. coli resulted in inclusion bodies, five soluble mutants were identified after three rounds of error-prone PCR and DNA shuffling. Because the Rv2002 mutants are fused with GFP, the soluble Rv2002-GFP emits brighter fluorescence than the wild-type protein. Enzymatic assays indicated that a soluble mutant Rv2002-M3 protein possesses high catalytic activity as an NADH-dependent 3
,20ß-hydroxysteroid dehydrogenase.
Proteolytic enzymes. The serine endoprotease subtilisin is a commercially important enzyme. With annual sales over $500 million, the highest among industrial enzymes, subtilisins are widely applied as additives in laundry detergents and other uses. A major challenge in improvement of most industrial enzymes is that the performance is defined not by any single property but by a complex mix of parameters. Although rational design and random mutagenesis have been used to improve single properties such as the thermostability of activity in organic solvents, it is often at the expense of other critical properties. Ness et al. (218) demonstrated multidimensional improvement of subtilisin by DNA shuffling. Twenty-five subtilisin gene fragments obtained from different Bacillus isolates were bred together with the full-length gene for a leading commercial protease and screened for thermostability, solvent stability, and pH dependence (at pH 5, pH 7.5, and pH 10). High frequencies of improvements (4 to 12%) in all parameters were achieved using a relatively small library (654 active clones). In addition, the diversity of combinations of properties ranged well beyond that of the properties of the parental enzymes. Sequence analysis of several high performers under each set of conditions revealed that variants with similar properties could be encoded by different sequences. Thermostability, for example, could be conferred by any one of the at least three different genetic elements. Because of the importance of proteolytic enzymes, directed evolution of proteases and peptidases remains one of the most actively pursued research areas (10, 12, 34, 100, 160, 210, 211, 285, 297, 304, 327-329, 349).
Cellulolytic enzymes. Enzymes that hydrolyze carbohydrates are also active targets for directed evolution. Up to sevenfold enhancement of the thermostability of the endoglucanase EngB has been achieved by introducing sequence diversities from a partially homologous endoglucanase, EngD (213, 214). A library was constructed using genes encoding the cellulosomal endoglucanase EngB and noncellulosomal cellulase EngD from Clostridium cellulovorans. The more thermostable cellulosomal endoglucanases are of high industrial relevance. Cellulosomes from clostridia are efficient at hydrolyzing microcrystalline cellulose. The relatively high efficiency has been attributed to (i) the correct ratio between catalytic domains, which optimizes synergism between them; (ii) appropriate spacing between the individual components to further promote synergism; and (iii) the presence of different enzymatic activities (cellulolytic or hemicellulolytic) in the cellulosome, which can remove other polysaccharides in heterogeneous cell wall materials.
Applications of cell wall-loosening enzymes can be found in a variety of industrial processes. In the pulp and paper industry, enzymatic degradation of the hemicellulose-lignin complexes present in pulps preserves intact cellulose fibers and strongly reduces the amount of bleaching chemicals required. The enzyme laccase is of interest for biobleaching and has been improved in industrially relevant parameters by directed evolution (38). Other applications in which cellulosic hydrolases are used include improvement of dough quality in the baking industry, increasing the feed conversion efficiency of animal feed, clarifying juices, and producing xylose, xylobiose, and xylo-oligomers. In addition, cellulosic hydrolases are important in biomass conversion for novel biofuel and other valuable chemicals. In a broader aspect, directed evolution has been successfully applied to improve many enzymes involved in carbohydrate biosynthesis, modification, and degradation. Examples include ADP-glucose pyrophosphorylase (254), amylosucrase (310), aldolase (86, 326), sugar kinase (120), cellulase (153), amylases (19, 20, 154, 312), xylanases (49, 129, 203), glucose dehydrogenase (14), and beta-glucosidase (13).
Enzymes for bioremediation. Enzymes that cleave carbon-halogen bonds are being studied not only because of the important chemical reactions they catalyze but also for potential use in environmental sciences. Haloalkane dehalogenase converts alkylhalide functionality to an alcohol group with broad substrate specificity. This enzyme has been subjected to directed evolution for improved function in detoxification of halogenated compounds (30, 38, 95, 96, 240, 348). Organophosphate-degrading enzymes have been evolved and selected for broadened substrate specificity (53, 335). Broadened substrate specificity of a biphenyl dioxygenase has also been achieved (33, 87, 164, 291). Efforts in cleaning underground water contamination prompted the evolution of an enzyme for chlorinated ethene degradation (41).
Lipases and esterases. Lipases, which comprise another class of hydrolases, have broad industrial applications. Lipases catalyze the hydrolysis and synthesis of long-chain acylglycerols from triglycerides. For production of biofuel, a single transesterification reaction using lipases in organic solvents can convert vegetable oil to methyl- or other short-chain alcohol esters. Biodegradable biopolymers such as polyphenols, polysaccharides, and polyesters show a considerable degree of diversity and complexity. Lipases and esterases are used as catalysts for polymeric synthesis (e.g., stereoselectivity, regioselectivity, and chemoselectivity) under mild reaction conditions. Lipases are also used in synthesis of fine chemicals, agrochemicals, and pharmaceuticals.
Directed evolution of industrially important lipases has been extensively reviewed (131-134, 247-249). The enantioselectivity of lipases is of biochemical interest. The ability to engineer lipases with high enantioselectivities allows the production of desired enantiopure compounds. A Pseudomonas aeruginosa lipase has been evolved to increase enanselectivity towards the chiral substrate 2-methyldecanoic acid p-nitrophenyl ester. A few rounds of directed evolution produced greater than 25-fold improvement of the enanselectivity. It is interesting that the best variants contain five amino acid changes and most of them are located in the flexible loop regions (183, 249). Using the ADO approach, increased enantioselectivities of two B. subtilis lipases have been identified by screening of a small number of variants (343). The substrate specificity and stability of lipases can also be modified by directed evolution (147, 282). The lipase from Bacillus thermocatenulatus BTL2 exhibits low phospholipase activity. A single round of random mutagenesis followed by screening of 6,000 variants generated progeny with more than a 10-fold increase in phospolipase activities (147). Most of the variants show reduced activities towards medium- and long-chain fatty acyl methyl esters compared to the wild-type enzyme. Moreover, in combination with structure-guided site-directed mutagenesis, further improvement of the phospholipase activity has been achieved. The best variant, which exhibits 17-fold improvement in phospholipase selectivity, has 1.5- to 4-fold-higher activity towards long-chain fatty acyl substrates. In an effort to achieve the opposite goal, the phospholipase A of Serratia has been converted to a lipase by using a combination of DNA shuffling and N-terminal truncations (281).
By sequential generation of random mutagenesis and screening, Moore and Arnold (212) evolved an esterase for deprotection of an antibiotic p-nitrobenzyl ester in aqueous organic solvents. A variant has been found to perform as well in 30% dimethylformamide as the wild-type enzyme in water, a 16-fold improvement in esterase activity. As in many other directed evolution experiments, the successful outcome of this work relied on the establishment of a high-throughput screening assay, this time using the p-nitrophenyl ester. In recent years, a great deal of effort has been devoted to design of screening tools for improvement of lipases and esterases (91, 97). Droge et al. (77) reported the binding of a phosphonate suicide inhibitor to lipase A that is presented by phage display. The specific interaction with the suicide inhibitor provides a fast and reproducible method for selection lipases with novel substrate specificities. Two new triglyceride analogue biotinylated suicide inhibitors have been designed, synthesized, and applied in directed evolution of phage-displayed lipolytic enzymes (70, 71).
Cytochrome P450 enzymes. The cytochrome P450 superfamily is a highly diversified set of heme-containing proteins, and members serve a wide spectrum of functions. In addition to the most common function of catalyzing hydroxylation, P450 proteins perform a variety of reactions, including N oxidation; sulfoxidation; epoxidation; N, S, and O dealkylation; peroxidation; deamination; desulfuration; and dehalogenation. In mammals they are critical for drug metabolism, blood hemostasis, cholesterol biosynthesis, and steroidogenesis. In plants they are involved in plant hormone synthesis, phytoalexin synthesis, flower petal pigment biosynthesis, and most likely hundreds of additional, unknown functions. In fungi they make ergosterol and are involved in pathogenesis by detoxification of host plant defenses. Bacterial P450s are key players in antibiotic synthesis. More recently, cytochrome P450 enzymes have shown promise in industrial applications as new methods for high-level production and high-throughput assays have been developed (4, 18, 306).
A number of cytochrome P450 enzymes have been the targets of directed evolution (50, 54, 83, 250, 255, 306, 307, 331, 332). Cytochrome P450 enzymes are often found to be poorly active, with narrow substrate specificity. The wild-type P450 BM-3, which is specific for long-chain fatty acids, was a target for rational design and directed evolution (181). Based on the crystal structure, eight amino acids were identified for creation of libraries by site-specific randomization mutagenesis of each residue. The libraries were screened by a spectroscopic assay using omega-p-nitrophenoxycarboxylic acids as substrates. By sequential evolution, variants showing specificity towards medium-chain substrates were identified. In a subsequent study (182), one of the variants was found to be able to efficiently hydroxylate indole, resulting in the formation of indigo and indirubin. Further characterization of this mutant revealed that it is capable of hydroxylating several alkanes and alicyclic, aromatic, and heterocyclic compounds, all of which are nonnatural substrates for the wild-type enzyme (6). Many cytochrome P450 monooxygenases are multimeric and membrane associated, with low catalytic efficiencies. Glieder et al. (92) evolved the Bacillus megaterium cytochrome P450 BM-3, which is specific for C12 to C18 fatty acids, to efficiently catalyze the conversion of C3 to C8 alkanes to alcohols. In this case the evolved enzyme exhibits a broad range of substrate specificities, including the gaseous alkane propane, as well as improved activity towards the natural fatty acid substrates. BM-3 has also been engineered to be significantly more tolerant to several cosolvents, including the organic cosolvents dimethyl sulfoxide and tetrahydrofuran (332). Furthermore, the regioselectivity and enantioselectivity of BM-3 have been engineered through a combination of in vitro evolution, and the selectivity appears to be retained in vivo with E. coli cells (238).
Successful evolution of cytochrome P450 requires efficient high-throughput screens that are sensitive to the activities of interest. Horseradish peroxidase couples the phenolic products of hydroxylation of aromatic substrates to generate colored or fluorescent compounds that are easily detectable in high-throughput formats. Joo et al. (139) have taken advantage of this system by coexpressing the coupling enzymes with functional mono- and dioxygenases. Using fluorescent digital imaging, they screened libraries of cytochrome P450cam from Pseudomonas putida for novel activity of chlorobenzene hydroxylation. Joo et al. (140) also utilized this so-called "peroxide shunt" pathway to identify variants showing significantly improved activity for naphthalene hydroxylation in the absence of the NADPH cofactor. Interestingly, the P450 enzyme has recently been used as a model for computational structure-guided evolution (227).
(i) Whole genomes are shuffled (see above) and selected for desired phenotypes or products (239). The successful engineering of polyketide and lactic acid production in Lactobacillus (234, 347) has demonstrated that whole-genome shuffling is one of the most powerful tools in directed evolution of pathways. It is particularly useful when a pathway is not well characterized and key enzymes or genes have not yet been identified or cloned. Phenotypic improvement by whole-genome shuffling is an important milestone for bioprocess optimization. Together with novel techniques for cultivating and identifying previously unrecognized microorganisms (342) and information on biodiversity in terms of species, distribution, and ecosystem function (reviewed by Bull et al. [37]), whole-genome shuffling will continue to expand its impact to the production of high-value biomolecules.
(ii) The genes encoding key enzymes are heterologously expressed to alter an existing pathway. Introduction of an enzyme with novel specificity can redirect the metabolic flux in a host and result in production of new products (261, 321). These recombinant enzymes can be obtained from other organisms known to produce the compounds (299) or by directed evolution to create the desired specificity from an enzyme that normally catalyzes other reactions (144, 315). For instance, under anaerobic conditions yeast does not efficiently produce ethanol by using xylose. By heterologous expression of a xylose isomerase from the fungus Piromyces and selection of yeast transformants on xylose, Kuyper et al. (166) have isolated a mutant strain that exhibits a sixfold increase in the anaerobic growth rate on xylose and higher yields of ethanol. Pathway engineering often requires alteration of the substrate pools for the key steps. Thus, directly targeting enzymes responsible for the production of these substrates can enhance or even redirect biosynthetic pathways (177). To engineer a multienzyme pathway for novel carotenoid production in E. coli, Schmidt-Dannert and colleagues first introduced two genes to produce the precursor phytoene. Subsequently, a library of two shuffled desaturase genes from Erwinia was introduced for the desaturation of phytoene. Divergent lycopene-like compounds with different degrees and positions of desaturation were identified. The pathway of a chosen mutant was further modified by introducing a library of shuffled cyclase genes. The engineering of the carotenoid pathway represents a fine example of how directed evolution can be used to redesign a complex pathway (68, 147, 167, 175, 176, 178, 205, 206, 257, 262, 263, 305, 320, 324).
(iii) In nature, many pathway genes are organized in gene clusters or operons (171, 172). Well-known examples include pathways for polyketide biosynthesis (125) and biosynthesis of certain secondary metabolites (190). Early work using the ebg operon presented convincing arguments for directed evolution of an operon as an effective approach in pathway engineering (103, 105, 108, 109, 111). Directed evolution of naturally existing operons and, in some cases, artificially assembled operons offers a unique and coordinated approach to engineer novel functions. Another demonstration of this approach is the manipulation of an arsenate detoxification pathway by DNA shuffling (63). A plasmid containing the operon of four ars genes was shuffled and selected for increased resistance to arsenic. While the native operon does not confer E. coli resistance to arsenic, several rounds of selection resulted in cell growth in media where the arsenate concentration reached the solubility limit. In another example, the trehalose-6-phosphate synthase/phosphatase operon was evolved to achieve greater trehalose production in E. coli (159, 160). In E. coli, trehalose-6-phosphate synthase and trehalose-6-phosphate phosphatase are encoded by the otsBA operon. Directed evolution of the otsBA operon and screening for trehalose synthesis resulted in 15 positive clones and 12-fold improvement in trehalose production compared to that with the wild-type strain. The same strategy can be applied to artificial operons similar to that constructed for the production of the biopolymer poly(3-hydroxybutyrate-co-3-hydroxyhexanoate) (231). In another example, a metabolically engineered E. coli strain for astaxanthin production has been generated by overexpression of three metabolic enzymes from different origins: the E. coli isopentenyl diphosphate isomerase, the Archaeoglobus fulgidus geranylgeranyl diphosphate synthase (GPS), and the Agrobacterium aurantiacum astaxanthin biosynthesis enzymes (crtWZYIB gene products) (322). In a subsequent effort, repeated cycles of error-prone PCR, which employs a low-fidelity replication step to introduce random point mutations at each round of amplification, were used to evolve one of these key enzymes, GPS (321). A 100% improvement in lycopene production has been detected by screening for deeper orange color in 3,500 colonies. It is tempting to speculate that the application of directed evolution to the synthetic operon that contains isopentenyl diphosphate isomerase, GPS, and crtWZYIB might result in larger amounts of astaxanthan than the levels observed by single-gene evolution.
(iv) The characteristics of a metabolic pathway are a result of the dynamic interaction between its structural genes and the gene regulatory apparatus. Therefore, directed pathway evolution can be achieved by engineering of gene regulation factors that control these pathways (61). The recent exciting progress in engineering of artificial transcription factors has shown that this approach is not only feasible but also advantageous in certain areas of metabolic engineering. Notable advances have mainly been in the generation of artificial zinc finger transcription factors (17, 25, 75, 76, 127, 128, 135, 146, 174, 186, 187, 215, 266-271, 300). Chimeric proteins containing novel DNA-binding domains (such as polydactyl zinc fingers) have shown promise in high-throughput ligand-binding screens, genome-wide gene activation/repression, targeted DNA cleavage, DNA/chromotin modification, and site-specific integration (135). This strategy is particularly powerful when dealing with pathways that are undefined or normally inactive without induction. Engineered transcription factors can also be used to target a known gene regulatory region(s). For example, they can be evolved to bind specific promoter sequences proximal to the binding sites of known and natural transcription factors (94). Transcription factors and their target genes comprise the basic unit in the complex transcriptional regulatory network. Network-wide engineering must deal with higher levels of complexity. The ability to evolve the transcriptional network, however, represents a new possibility in pathway engineering. Yokobayashi et al. proposed the construction of an artificial transcriptional control network and provided examples of how such a genetic circuit can be optimized by a combination of rational design and directed evolution (338, 339). Metabolic pathways often respond to cell-cell communications. An elegantly designed "population control" system was constructed based on a quorum-sensing system, allowing a synthetic bacterial ecosystem to be controlled by cell-cell communication (340). Directed evolution of the major component of this system, the LuxR-type transcriptional regulators, revealed the evolutionary plasticity of the quorum-sensing mechanism (60). Another challenge in pathway engineering is to control the timing of gene expression. Inducible gene regulation systems such as the tetracycline/Tet receptor can be used to switch pathways on and off. Evolving these systems to recognize novel inducers has tremendous practical implications in pathway engineering (264, 280).
Antibodies. Therapeutic antibodies represent the fastest growing area in pharmaceutical development. Considering that in nature the combinatorial antibody diversity is a result of somatic recombination, it is not surprising that directed evolution can be a powerful and practical tool for the creation of high-affinity antibodies in vitro. Techniques such as surface display facilitate high-throughput selection for desired activity (32, 62, 85, 124, 143, 295, 308). Recombination of phage-displayed, low-affinity immunoglobulin M antibodies resulted in variants with increased affinity of several orders of magnitude in just two rounds of evolution (85). The same strategy has yielded stable disulfide bond-free antibody single-chain fragments (244). The requirement for disulfide bond formation has hindered antibody production in systems such as E. coli, and disulfide bond-free antibodies not only potentially simplify production but also provide insight into antibody protein folding. Additional research has aimed at engineering antibodies to achieve extremely high affinities (15, 26, 66, 112, 137, 246). The gene for the llama heavy chain antibody fragment was evolved and selected for improvement in production (309). Antibody variants were identified that exhibited two- to fourfold increases in production while retaining their antigen specificity (341). Crystallographic analysis of one of the evolved antibodies revealed that the mutations conferring significant improvement in affinity do not directly contact the antigen, suggesting that it would be difficult to obtain such results via rational design. Nonetheless, the strategy of combining rational design and directed evolution should accelerate antibody engineering more rapidly than using either approach alone.
Catalytic antibodies are also of interest for directed evolution (298, 301). Superior catalysts for aryl phosphate were generated from synthetic human antibody libraries (43). Antibodies have also been engineered for diagnostic purposes (161).
Vaccines. Directed evolution has played and continues to play an important role in the development of new vaccines (58, 188, 189, 197, 235, 245, 325). To boost immunity, directed evolution can be used to generate improved proteous antigens or other immunomodulatory molecules, DNA vaccines, and whole viruses (see below). On the other hand, certain cytokines and allergens can be bred for down-regulation of allergic immune responses. Recursive library construction and selection allowed the isolation of high-affinity, protective mimotopes against Cryptococcus neoformans (16). Highly immunogenic mimotopes of the hepatitis C virus hypervariable regions have been selected by a combination of DNA shuffling and phage display-based screening (346). A DNA vaccine of the E7 oncogene has been developed and shown to provide protection against tumor cells (223). This strategy of rearranging oncogene sequences presents an advantage over wild-type oncogene-derived DNA vaccines, which carry a risk of de novo tumor induction. Toxic side effects have been associated with the direct administration of recombinant antitumor interleukin-12 protein. A DNA vaccine based on the interleukin-12 gene has been shown to reduce adverse side effects, while its potency and effectiveness have been further improved by directed evolution (179). In addition, high-affinity T-cell receptor variants can be generated and used for detecting peptide-major histocompatibility complex complexes on antigen-presenting cells (121).
Viruses. Breeding of viruses has tremendous practical implications in gene therapy and vaccine development (283). The feasibility was demonstrated using the murine leukemia viruses (MLV). Family shuffling of six MLV produced variants with novel tropism (283). The MLV envelope protein consists of two subunits, SU and TM, associated by a labile disulfide bond. This complex, which interacts with a cellular receptor and mediates fusion with the plasma membrane, is highly sensitive to physical forces during the manufacturing process. As a result, the concentration procedure commonly used for retrovirus vectors is ineffective for manufacturing stocks of high titer. To improve the resistance of the MLV envelope protein to the process of concentration by ultracentrifugation, the envelope regions of six ecotropic strains were shuffled (243). Screening for survival after three consecutive concentration steps resulted in 30- to 100-fold-improved stability compared to the parental viruses. In an effort to establish a pig-tailed macaque model for human immunodeficiency virus (HIV) infection, Pekrun et al. evolved a HIV type 1 variant with a substantially enhanced replication rate (237). In an interesting attempt to control the risks associated with pathogenic phenotypes of high-replicating viral vaccines, a tetracycline-inducible system was introduced to control the HIV gene replication (199). By application of directed evolution, highly infectious viral variants have been isolated; however, the viral replication is strictly controlled by a doxycycline-dependent switching system. An alternate strategy to control viral replication by using the bacteriophage T7 polymerase has also been developed (31).
Therapeutic chemicals. The role of biocatalysis in pharmaceutical production has been rapidly expanding since the establishment of recombinant DNA technology (45, 123). The involvement of enzyme and metabolic pathway engineering in therapeutic chemical production is moving towards the mainstream in the industry, and directed evolution technologies are leading the advance. Applications of directed evolution in development of anti-infection agents were among the early examples demonstrating the power and effectiveness of the technologies. Evolution of polyketide synthases to generate novel antibiotic activities demonstrated that novel compounds can be identified even in small libraries (123). The modular nature of the polyketide synthetic pathway allows an efficient way to create large numbers of polyketide variants by replacing individual modules with a shuffled library (151). Directed evolution of a toluene-xylene monooxygenase resulted in variants that catalyze the synthesis of various valuable fine chemicals, such as catechol (311). The substrate specificity of the cephalosporin acylase has been altered for the improvement of cephalosporin and penicillin production (229, 278). Directed evolution has allowed the identification of "hot spots," in this case, a single amino acid residue crucial for substrate specificity. When this hot spot was subjected to saturation mutagenesis, variants with further improvement or novel specificity were identified (228). Protein engineering using site-directed and/or saturation mutagenesis, guided by information generated from directed evolution, can be an extremely powerful approach to create novel functionalities (73, 88, 208, 316).
In the 20 years since it has been possible to introduce transgenes into plants, many novel strategies have been devised to improve the quality of crops. Many strategies for pest control, cold tolerance, disease control, and other areas of improvement have had positive initial results in laboratory settings; however, the genes have not provided sufficient efficacy to produce commercially viable genetically modified (GM) products. In retrospect this makes sense, since many transgenes that were used in these experiments clearly had not been optimized for use in GM crop plants.
Directed evolution can be used to improve existing traits such as glyphosate resistance and Bacillus thuringiensis toxin expression in commercial crops. It can also be used to develop traits from programs in which initial leads (genes) provided insufficient efficacy. Furthermore, directed evolution can be applied to develop desirable gene functions from gene targets that have low or no activity, resulting in novel traits that would otherwise not have been possible (169).
Existing traits. (i) Glyphosate tolerance. Existing glyphosate resistance traits in corn, cotton, and soybean, based on expression of a microbial enopyruvylshikimate-3-phosphate synthase that is not affected by the herbicide, are effective. However, there is clearly room for improvement. He et al. (116) bred E. coli and Salmonella enterica serovar Typhimurium enopyruvylshikimate-3-phosphate synthases (the enzyme which, when carrying a specific mutation, conditions tolerance to the herbicide) to develop variants with superior properties. Several gene variants from a single round of directed evolution resulted in enzymes simultaneously improved over the best parent in multiple kinetic parameters, including a twofold-improved specific activity, a fivefold-improved Km for phosphoenolpyruvate, and a fivefold decrease in sensitivity to glyphosate. Interestingly, the mutations identified in that study do not coincide with the mutations identified previously by other researchers in their efforts to improve the properties of this enzyme. These results demonstrate that directed evolution can provide novel solutions to improving protein function even for proteins that have undergone extensive improvement through random mutage