MMBR Figure table search 04
Home Help [Feedback] [For Subscribers] [Archive] [Search] [Contents]
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowReprints and Permissions
Right arrow Copyright Information
Right arrow Books from ASM Press
Right arrow MicrobeWorld
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Yuan, L.
Right arrow Articles by Keenan, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yuan, L.
Right arrow Articles by Keenan, R.

Next Article 

Microbiology and Molecular Biology Reviews, September 2005, p. 373-392, Vol. 69, No. 3
1092-2172/05/$08.00+0     doi:10.1128/MMBR.69.3.373-392.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.

Laboratory-Directed Protein Evolution

Ling Yuan,1* Itzhak Kurek,2 James English,2 and Robert Keenan3

Department of Plant and Soil Sciences, and Kentucky Tobacco Research and Development Center, University of Kentucky, Cooper and University Drives, Lexington, Kentucky 40546,1 Verdia Research Campus, Pioneer International, A Dupont Company, 700A Bay Road, Redwood City, California 94063,2 Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 606373

SUMMARY
INTRODUCTION
STRATEGIES FOR DIRECTED EVOLUTION IN PROTEIN DESIGN
    DNA Shuffling
    Whole-Genome Shuffling
    Heteroduplex
    Random Chimeragenesis on Transient Templates
    Assembly of Designed Oligonucleotides
    Mutagenic and Unidirectional Reassembly
    Exon Shuffling
    Y-Ligation-Based Block Shuffling
    Nonhomologous Recombination
    Combining Rational Design with Directed Evolution
APPLICATIONS OF DIRECTED EVOLUTION
    Directed Evolution of Nucleic-Acid-Modifying Enzymes
        Polymerases.
        Nucleases.
        Transposase.
        Integrase/recombinase.
        Reporter genes.
    Directed Evolution of Biochemical Catalysts
        Proteolytic enzymes.
        Cellulolytic enzymes.
        Enzymes for bioremediation.
        Lipases and esterases.
        Cytochrome P450 enzymes.
    Directed Evolution of Metabolic Pathways
    Directed Evolution of Pharmaceuticals
        Protein pharmaceuticals.
        Antibodies.
        Vaccines.
        Viruses.
        Therapeutic chemicals.
    Directed Evolution of Agriculturally Important Traits
        Existing traits. (i) Glyphosate tolerance.
        (ii) B. thuringiensis toxin.
        (iii) Golden rice.
        Next-generation traits.
        (i) Chitinase for antifungal properties.
        (ii) Mycotoxin detoxification.
        (iii) Viral vectors.
CLOSING REMARKS
ACKNOWLEDGMENTS
REFERENCES

   SUMMARY
 Top
 Next
 References
 
Systematic approaches to directed evolution of proteins have been documented since the 1970s. The ability to recruit new protein functions arises from the considerable substrate ambiguity of many proteins. The substrate ambiguity of a protein can be interpreted as the evolutionary potential that allows a protein to acquire new specificities through mutation or to regain function via mutations that differ from the original protein sequence. All organisms have evolutionarily exploited this substrate ambiguity. When exploited in a laboratory under controlled mutagenesis and selection, it enables a protein to "evolve" in desired directions. One of the most effective strategies in directed protein evolution is to gradually accumulate mutations, either sequentially or by recombination, while applying selective pressure. This is typically achieved by the generation of libraries of mutants followed by efficient screening of these libraries for targeted functions and subsequent repetition of the process using improved mutants from the previous screening. Here we review some of the successful strategies in creating protein diversity and the more recent progress in directed protein evolution in a wide range of scientific disciplines and its impacts in chemical, pharmaceutical, and agricultural sciences.


   INTRODUCTION
 Top
 Previous
 Next
 References
 
The concept of laboratory-directed protein evolution is not new. Systematic approaches to directed evolution of proteins have been documented since the 1970s (39, 106, 110). One early example is the evolution of the EbgA protein from Escherichia coli, an enzyme having almost no ß-galactosidase activity. Through intensive selection of a LacZ deletion strain of E. coli for growth on lactose as a sole carbon source, the wild-type EbgA was "evolved" as a ß-galactosidase sufficient to replace the lacZ gene function (39). Perhaps surprisingly, the evolution of new functions of an enzyme can require few mutations, as was the case for the EbgA protein. EbgA enzyme variants with newly acquired hydrolytic activities toward a variety of ß-galactoside sugars contain only one to three mutations (102, 104, 107). The ability to recruit new protein functions was noted by Roy Jenson to arise from the considerable substrate ambiguity of many proteins (136). The substrate ambiguity of a protein can be interpreted as the evolutionary potential that allows a protein to acquire new specificities through mutation or to regain function via mutations that differ from the original protein sequence. All organisms have evolutionarily exploited this substrate ambiguity. When exploited in a laboratory under controlled mutagenesis and selection, it enables a protein to "evolve" in desired directions.

Directed protein evolution is a general term used to describe various techniques for generation of protein mutants (variants) and selection of desirable functions. Over the last three decades, directed protein evolution has emerged as a powerful technology platform in protein engineering. This technology has been advanced considerably by the availability of molecular biology tools and emerging high-throughput screening technologies. These methodologies have simplified the experimental processes and facilitated the identification of mutants with even small improvements in desired function. Advanced recombinant DNA technologies have allowed the transfer of single structural genes or genes for an entire pathway to a suitable surrogate host for rapid propagation and/or high-level protein production. Furthermore, it is now possible to control the rate of mutagenesis in widely applied methods such as error-prone PCR and to modify proteins by systematic insertions or deletions. In addition, site-directed, site-saturation mutagenesis and synthetic oligonucleotides can be used to expand the localized amino acid diversity. While functional complementation of mutant strains is still an excellent choice when possible, the development of sensitive instrumentation and the ability to miniaturize many chemical or biological assays allow the screening of large numbers of samples for selection of desired functions. The ability to rapidly obtain DNA sequence information for gene variants not only provides insight into protein sequence-function relationship but also enhances our ability to select the strategy best suited for the evolution of a particular protein. Thus, directed protein evolution has been expanded from the original in vivo approach (e.g., the evolution of EbgA) to include in vitro exploration.

One of the most effective strategies in directed protein evolution is to gradually accumulate mutations, either sequentially or by recombination, while applying selective pressure. This is typically achieved by the generation of libraries of mutants followed by efficient screening of these libraries for targeted functions and subsequent repetition of the process using improved mutants from the previous screening. Many formats of directed protein evolution have been, and continue to be, developed (8, 9).

Here, we review the more recent progress in directed protein evolution (referred as directed evolution hereafter) in a wide range of scientific disciplines and its impacts in chemical, pharmaceutical, and agricultural sciences. Although many strategies for directed evolution are described, we focus on the directed evolution of proteins through gradual accumulation of beneficial mutations, and examples of recombination-based approaches are used primarily to illustrate the power of this technology. The advances in screening technologies for identification of useful functions will not be discussed here, as they have been reviewed elsewhere (8, 184, 207, 273).


   STRATEGIES FOR DIRECTED EVOLUTION IN PROTEIN DESIGN
 Top
 Previous
 Next
 References
 
One of the primary goals of protein design is to generate proteins with new or improved properties. In addition to deepening our understanding of the design processes used in nature, the ability to confer a desired activity on a protein or enzyme has considerable practical application in the chemical, agricultural, and pharmaceutical industries. Two strategies are currently being employed towards this goal. The first is directed evolution, in which libraries of variants are searched experimentally for clones possessing the desired properties. The second is rational design, in which proteins are modified based on an understanding of the structural and mechanistic consequences of a particular change or set of changes. While the power of directed evolution is now widely appreciated, our present knowledge of structure-function relationships in proteins is still insufficient to make rational design a robust approach. In this section, we review a few methods and strategies of DNA mutagenesis and recombination for directed evolution, and we discuss ways in which rational design is now being used to facilitate the development of proteins with new and improved properties. Table 1 summarizes some of the methods that have been successfully utilized for directed evolution of a variety of proteins. This is not a complete list, as techniques and strategies of DNA mutagenesis and recombination for directed evolution are constantly arising (54, 148, 149, 150, 224, 236, 344, 352; reviewed by Farinas et al. [84] and by Lutz and Patrick [196]).


View this table:
[in this window]
[in a new window]
 
TABLE 1. Selected methodologies for directed evolution

 
DNA Shuffling

The goal of directed evolution is to accumulate improvements in activity through iterations of mutation and screening. The extent to which it succeeds depends critically on the delicate interplay between the quality of biological diversity present in the library, the size of the library, and the ability of an assay to meaningfully detect improvements in the desired activity. The strength of directed evolution lies in the ability of its scoring function (i.e., assay) to mimic the property being evolved, while its weakness lies in the relatively small number of sequences that can be experimentally measured (on the order of 103 to 106 for high-throughput screening to >1012 for display methods [251]).

Library diversity is created through mutagenesis or recombination. Traditionally, libraries have been generated by random point mutagenesis (using, for example, error-prone PCR) or by site-directed mutagenesis of a starting sequence. These libraries are screened (or selected), and the best variant is selected for additional mutagenesis. Because the frequency of beneficial mutations is generally low relative to that of deleterious mutations, only single beneficial mutations are added in each cycle of mutagenesis and screening. Indeed, the probability of improvement decreases rapidly when multiple mutations are made. Thus, iterative, point-mutation-based approaches are generally limited to improvements made in small steps.

DNA shuffling overcomes this limitation by allowing the direct recombination of beneficial mutations from multiple genes. In DNA shuffling a population of DNA sequences are randomly fragmented and then reassembled into full-length, chimeric sequences by PCR (286, 287). In so-called "single-gene" formats, mutations are introduced during the reassembly process by controlling the error rate of DNA polymerase. After screening or application of selective pressure, progeny sequences encoding desirable functions are identified. These clones are then shuffled (bred) iteratively, creating offspring that contain multiple beneficial mutations. Because of this poolwise recombination of beneficial mutations, DNA shuffling gives rise to dramatic increases in the efficiency with which large phenotypic improvements are obtained.

While such methods are relatively efficient when small steps through sequence space are required, the relationship between library diversity, library size, and assay capability dictates that the evolution of phenotypes requiring larger steps through sequence space employ a more efficient search strategy. A simple and powerful way to do this is to use naturally occurring homologous genes as the source of starting diversity (64). In contrast to single-gene shuffling, in which library members are typically 95 to 99% identical, so-called "family shuffling" allows block exchanges of sequences that are typically >60% identical. In part because the sequence diversity comes from related, parental sequences that have survived natural selection ("functional" sequence diversity), much larger numbers of mutations are tolerated in a given sequence without introducing deleterious effects on the structure or function. The increased sequence diversity of these chimeric libraries thus results in sparse sampling of much greater regions of sequence and function space.

Even greater control over the incorporation of sequence diversity can be achieved through "synthetic shuffling." In this approach, no physical starting genes are required. Instead, a series of degenerate oligonucleotides that incorporate all desired diversity (for example, naturally occurring diversity and diversity identified by structural analysis) are used to assemble a library of full-length genes (217). In contrast with fragmentation-based methods, in synthetic shuffling every amino acid from a set of parents is allowed to recombine independently of every other amino acid. By breaking the linkages between amino acids normally present in parental genes, synthetic shuffling methods access unique regions of sequence space.

All directed evolution experiments must contend with the constraints described above: principally, the type and quality of diversity present in the library, the library size, and the ability of an assay to accurately identify desired clones from that library. To the extent that a desired phenotype is accessible within these constrains, standard DNA shuffling formats and other formats described below provide a rapid and powerful method to optimize activity. For more demanding phenotypes, such as de novo enzyme design, novel substrate specificity, novel enzyme chemistry, etc., there is a need to maximize the information content of a library so that larger steps through vast regions of sequence and function space may be efficiently explored.

Whole-Genome Shuffling

Whole-cell biocatalysts are widely used for industrial applications such as conversion of feedstock to high-value products, production of high-value natural products, and production of protein pharmaceuticals. Fermentation-based bioprocesses are often limited by the sensitivity of microorganisms to temperature, pH, and solvent, resulting in low yield and productivity. Microorganisms represent a delicate and complex system that infrequently can be modified for industrial production by a single gene alteration. Therefore, the ability to evolve an organism at the whole-genome level is highly desirable. A process known as whole-genome shuffling has been developed in an effort to accomplish this objective (347). This approach combines the advantages of family DNA shuffling with the benefits of crossing entire genomes that occur in conventional breeding (347). Traditional breeding is a long, continuous process of genetic recombination of the parental genomes accompanied by phenotypic selections. It is usually limited to two parental genomes per generation and is affected by the genetic compatibility of the parents. On the other hand, manipulation of commercial microorganisms can also be achieved by an asexual process of repeated cycles of random mutagenesis and screening, often referred to as classical strain improvement (CSI) (3). In contrast, the driving force for the accelerated evolution is the recombination of multiple parents in a recursive manner. The advantage of whole-genome shuffling over CSI has been recently demonstrated with Streptomyces fradiae, a commonly used strain for commercial production of the complex polyketide antibiotic tylosin (347), and with the industrial strain of Lactobacillus for acid tolerance (234). Using a low-production parental strain, two rounds of genome shuffling based on protoplast fusion of mixed populations and screening for tylosin production resulted in mutant strains with productivities similar to that of the commercial strain SF2 (347). However, while it took 20 years and about 1,000,000 assays for the 20 rounds of CSI required to obtain SF2, similar results were produced with 24,000 assays in 1 year of whole-genome shuffling. Patnaik et al. (234) demonstrated the use of whole-genome shuffling for improved acid tolerance in production of lactic acid by lactobacilli. Lactobacillus strains with improved low-pH tolerance were first obtained by CSI in order to generate the initial biodiversity pool and then shuffled for five rounds by protoplast fusion. The improved strains produce threefold more lactic acid than the wild-type strains at pH 4.0.

Whole-genome shuffling is powerful and beneficial in manipulation of organisms (52, 67). It allows the evolution of desired phenotypes by rapid genomic manipulation and stabilization. Directed whole-genome evolution is not limited to microorganisms. By a variety of means, genomes from eukaryotic cells, including regenerable cells from animals and plants, can be recombined recursively for accelerated phenotypic improvement.

Heteroduplex

Recombination in vitro of large genes, such as operons or artificial chromosomes, is difficult to achieve. In an alternative method, in vitro-in vivo DNA recombination takes place with a parental plasmid heteroduplex in an in vivo repair process and has been suggested to be useful for recombination of large genes or entire operons (313). A heteroduplex formed in vitro is used to transform bacterial cells, where repair of regions of nonidentity in the heteroduplex creates a library of new, recombined sequences composed of elements from each parent. However, this method, which is based on the ability of host cells to repair mismatched heteroduplexes, requires high parental gene homology and is limited to two parental genes per event.

Random Chimeragenesis on Transient Templates

Annealing of small fragments as primers, spiking of oligonucleotides as linkers between regions of low homology, and generation of complete synthetic chimera are some of the ways that have been designed to increase frequencies of recombination between low-homology sequences. For example, libraries generated by the random chimeragenesis on transient templates (RACHITT) method showed an average of 14 crossovers per parental gene, a much higher rate than with other reported methods (56). In addition, the RACHITT-derived chimeric genes showed high-frequency recombination at a short region (a few nucleotides). RACHITT produces a single-stranded, full-length transient template containing uracil and single-stranded partial donor fragments. As one or more parental donor gene fragments can simultaneously anneal to the template, this approach generates high-frequency crossovers. One of the common issues in "family DNA shuffling" is the bias against the incorporation of the less homologous genes in the parental gene pool. By selecting one gene as the sole template, RACHITT is able to force the incorporation of a particular gene even when it shares relatively low homology. In some cases, especially when the background activity of one parent is problematic for library screening, RACHITT allows the selection of this parent as a fragmented donor, thus avoiding the presence of the wild-type gene of this parent in the library.

Assembly of Designed Oligonucleotides

Assembly of designed oligonucleotides (ADO) has been described as a useful technique for gene recombination (343). ADO relies on sequence information on the nonconserved regions to design a set of synthetic degenerate oligonucleotides. The flanking region of each synthetic fragment contains sequences of conserved regions that can be used as linkers in homologous recombination. PCR assembly of the fragments is then performed in two steps. First, double-stranded DNA is formed by PCR of the single-stranded oligonucleotides in the absence of primers. The double-stranded DNA is then used for PCR amplification of the whole gene, and the full-length gene products are ligated into an expression vector. The two major advantages of the method are that it allows crossing over to occur for low-homology fragments and that self-hybridization of parental genes is minimized or eliminated. High-quality libraries without a parental gene background are essential, especially when high-throughput screening is not available. The limitation due to relatively short lengths of the synthetic oligonucleotides could be overcome by fragment ligation. ADO has been successfully applied to improve the activities of two Bacillus subtilis lipases, LipA and LipB (343). One library of 3,000 variants obtained by ADO was sufficient to identify six variants with improved enantioselectivity. The major advantage of this method is the ability to create a large diversity of active variants and to eliminate codon bias in parental genes.

Mutagenic and Unidirectional Reassembly

Song et al. (281) developed mutagenic and unidirectional reassembly (MURA) for the generation of libraries of DNA-shuffled and randomly truncated proteins. In this method, DNA sequences that have been generated by DNA shuffling or by incremental truncation can be simultaneously introduced into a parental gene in a single experiment. The MURA process consists of four steps. First, a random fragmentation of the parental gene is obtained by PCR amplification or restriction digestion. The fragments then are reassembled in the presence of the unidirectional primers that contain a specific restriction site. The DNA fragments are gel purified, subjected to T4 DNA polymerase or S1 nuclease treatments in order to polish both termini, and then digested by a primer-specific restriction enzyme. The MURA method has been used for generation of an N-terminally truncated and DNA-shuffled library of Serratia sp. phospholipase A1 (PlaA) in order to alter the substrate specificity of PlaA from a phospholipase to a lipase (281). The authors isolated nine variants that exhibit lipase and phospholipase activities by high-throughput screening of 2,500 to 3,000 transformants. All these variants showed high lipase activity while retaining their phospholipase activities. All the mutant enzymes possess N-terminal deletions of 61 to 71 amino acids as a result of the MURA process and a relatively small number of amino acid substitutions. The dual activities exhibited by the truncated enzymes suggest that the N-terminal region is critical for the phospholipid substrate interactions.

Exon Shuffling

Exon shuffling is an evolutionary mechanism in which recombination of nonhomologous genes generates new genes known as mosaic proteins. The natural exon shuffling process has been described for a number of gene families by domain organization and splice frame analysis of the hemostatic proteases and by structural and sequence analysis of SCAN domain-containing genes (78). As a result, a new method to evolve proteins by in vitro exon shuffling has been suggested (157). Similar to the natural exon shuffling process, in vitro exon shuffling can be carried out using a mixture of chimeric oligonucleotides that allows the control of which exon or combination of exons is to be spliced. One application of exon shuffling is to develop protein pharmaceuticals based on natural human gene sequences, thus potentially reducing the possibility of immune responses (260). For example, it may be possible to minimize the imunogenicity of therapeutic proteins by constructing high-quality human gene libraries that lack random mutations. To complement the construction of such high-quality libraries, protocols such as that described by Zhao and Arnold (350) can be applied. Inclusion of Mn2+ or Mg2+ and high-fidelity DNA polymerase during amplification and reassembly can significantly reduce the point mutation rate. Exon-shuffled libraries of unrelated domains that share no sequence or functional homology can potentially generate new "humanized" genes with valuable functions.

Y-Ligation-Based Block Shuffling

While many methods improve functions through creating and recombining point mutations, Y-ligation-based block shuffling (YLBS) is a general methodology that mimics evolution processes such as domain shuffling, exon shuffling, and module shuffling, and it can be used for generating high-diversity libraries (155, 156). YLBS is based on repeated cycles of ligation of sequence blocks with a stem and two branches (Y-ligation) formed by two types of single-stranded DNA. The ability to integrate desired blocks with variable sizes (from several amino acids to a whole domain) into proteins at any site and any frequency will dramatically increase the diversity pool for directed evolution. YLBS can be an efficient technology to introduce or to eliminate (by deletion block or null block) peptides, exons, and domains.

Nonhomologous Recombination

While protein variants generated by homologous recombination or random point mutations are more likely to maintain structural similarity to the parental proteins, nonhomologous recombination allows the efficient creation of new protein folds. This approach enables the generation of protein structural diversity that may or may not exist in nature, and it is potentially very useful in evolution of multifunctional proteins. Several methods for nonhomologous recombination have been described. They include incremental truncation for the creation of hybrid enzyme (ITCHY) (225), sequence-independent site-directed chimeragenesis (119), sequence homology-independent protein recombination (276), and nonhomologous random recombination (NRR) (23). ITCHY libraries are created by cloning two genes (or gene fragments) in tandem in an expression vector containing two unique restriction sites. The linearized vector allows the generation of truncated fragments either by time-dependent exonuclease III enzyme digestion (224) or by the incorporation of {alpha}-phosphorothioate deoxynucleoside triphosphates (194). Subsequent blunt-ending and treatment with the second restriction enzyme release truncated fragments in various lengths, and chimeras can then be generated by ligation to recyclize the vector. This approach has been combined with an additional recombination step to develop SCRATCHY (193). More recently, the NRR method has been described (23). NRR is based on DNase I fragmentation, blunt-end ligation/extension, and capping using two asymmetrical DNA hairpins to stop the extension. This method potentially provides higher flexibility in modulating fragment size and crossover frequency, as well as in the number of parental genes. The major challenge facing all techniques for sequence-independent recombination of proteins is the presence of large numbers of nonfunctional progeny in the libraries (due to nonsense mutations caused by, for example, frameshifting and/or reversed DNA fragment orientation), thus hindering the search for functional mutants. Therefore, it is critical that a high-throughput screening is in place for the selection; otherwise, a preselection strategy, e.g., downstream fusion of a reporter or selection marker to reduce mutants with internal stop codons, can be applied to generate high-quality libraries.

Combining Rational Design with Directed Evolution

One of the most seductive features of rational/computational approaches to protein design is the ability to access vastly larger regions of sequence space (>1025) than can be searched experimentally. The success of such approaches depends on the ability to successfully predict the fitness of a given sequence. For certain properties, such as protein stability, simple "packing" algorithms are capable of predicting sequences with reasonable accuracy. For more complex phenotypes, the successful application of purely rational/computational methods requires sophisticated scoring (energy) functions. The recent de novo design of a novel protein fold is a spectacular example of the increasing power of computational design (163).

A powerful application of rational design is using it to focus library diversity for directed evolution experiments. In general, computational analysis of a protein's structure is first used to generate sequence diversity and to test those sequences for functional properties that can be modeled (scored) in silico. Only those variants that pass this prescreen are then synthesized and tested experimentally. In this manner, costly and time-consuming experimental searches are limited to regions of sequence space that are consistent with a protein's structure.

In an elegant example of structure-based computational design, Dwyer et al. introduce triosephosphate isomerase activity into a catalytically inert protein scaffold, ribose-binding protein (79). The design strategy consists of three stages. First, a chemical and geometric definition of the catalytic machinery was generated. Second, a combinatorial search was performed to identify positions within the active site where the catalytic machinery and substrate could be placed, while simultaneously satisfying the above constraints. Third, the remainder of the active site was optimized to form a stereochemically complementary binding surface. A total of 14 designs were tested, and one of these exhibited a kcat/Km ratio of 1.5 x 102 for the conversion of dihydroxyacetone phosphate to glyceraldehyde-3-phosphate. This is about 3 orders of magnitude less than the ratio for wild-type triosephosphate isomerase but is nevertheless a rate enhancement of more than 105 over that of the uncatalyzed reaction. Subsequently, the authors use directed evolution to improve the kcat/Km ratio of the designed enzyme. As is often the case, many of the accumulated changes identified by directed evolution lie in regions distal from the active site, and their effect on activity is therefore difficult to rationalize. A key issue for future design strategies lies in understanding how such mutations, which often contribute cooperatively and over long distances, improve activity (284).

One of the great advantages that emerges from the synthesis of rational design and directed evolution is that once a gene with even low levels of starting activity is obtained through design, it may be rapidly optimized by directed evolution (275). Thus, the goal of rational design becomes detecting even a weak starting activity from a focused library, rather than obtaining an optimized level of activity. The complementary use of rational design with directed evolution is a promising path towards the production of proteins with new and improved properties.


   APPLICATIONS OF DIRECTED EVOLUTION
 Top
 Previous
 Next
 References
 
Directed evolution is increasingly used in academic and industrial laboratories to improve protein stability and enhance the activity or overall performance of enzymes and organisms or to alter enzyme substrate specificity and to design new activities. Together with novel techniques for large-scale screening, directed evolution enables the selection of redesigned molecules without the necessity for detailed structural and mechanistic information (reviewed by Arnold [7] and Minshull and Stemmer [209]). In the past years we have seen broad applications of directed evolution in research and product developments of recombinant DNA technologies, biocatalysts, metabolic pathway engineering, pharmaceuticals, and important agricultural traits. Regardless of the research discipline, some common themes or parameters can be observed in the application of directed evolution. For example, directed evolution increasingly appears to be the tool of choice for studying the evolution of and relationship between protein structure and function (2, 114, 138, 192, 226, 259) and for interpretation of the evolutionary significance of biomolecular systems (122, 323). It is also a popular tool for accelerated adaptation of protein functions (e.g., stability, specificity, or affinity) in extreme conditions such as unusual temperatures and organic solvents (198, 204, 221, 222, 327-330), as well as for improvement of recombinant protein biosynthesis (152, 185). Directed evolution has also given rise to altered specificities and activities of enzymes (113-115, 126, 141, 294, 337), enhanced intramolecular interactions (292), modified protein-protein interaction (180), and altered metabolic pathways (263). In the following sections we present some examples of the applications of these technologies.

Directed Evolution of Nucleic-Acid-Modifying Enzymes

An emerging area in biotechnology is the directed evolution of DNA-modifying enzymes. Improving or modifying the site selectivity of restriction endonucleases, recombinases, and other DNA-modifying enzymes (46, 57, 82) can lead to novel applications in genetic engineering, functional genomics, and gene therapy.

Polymerases. Molecular biology technologies such as DNA labeling, PCR, sequencing, site-directed mutagenesis, and some cloning often require DNA polymerases with high activity under suboptimal conditions, such as extreme temperatures and/or in the presence of inhibitors. Compartmentalized self-replication (CSR) is a useful strategy for directed evolution of DNA polymerases or RNA polymerases (89). CSR is based on a feedback loop consisting of a polymerase that replicates only its own encoding gene. Self-replications of polymerase variants generated by error-prone PCR are performed in separated compartments formed by water-in-oil emulsions. Genes encoding improved polymerase under the selection conditions used replicate at higher rates and eventually dominate the mutant population. CSR has been used for evolution of Taq polymerase in the presence of increasing amounts of the inhibitor heparin, resulting in the isolation of a variant that exhibits a 130-fold increase in heparin resistance (89).

Directed evolution has been successfully applied to DNA polymerase for enhanced activity (233) and conversion to an efficient RNA polymerase (232, 333). The 2'-O-methyl-RNA is more stable and has been produced by chemical synthesis. Chelliserrykattil and Ellington established an efficient screening system for selection of highly active polymerases (47). This system creates a so-called "autogene" by cloning the T7 RNA polymerase under the control of its own promoter. In this system the polymerase variants with higher activity will generate more mRNA and can thus be selectively amplified by a reverse transcription-PCR process. The autogene system has allowed the identification of T7 RNA polymerase variants that can efficiently incorporate various 2'-modified nucleotides with good processivities (47, 48). Mixtures of the polymerase mutants with different specificities have produced transcripts with multiple modified nucleotides. DNA polymerase that is capable of incorporating 2'-O-methyl nucleotides has also been created by directed evolution (82).

Nucleases. Nucleases, including restriction endonucleases, are essential enzymes in modern molecular biology and thus are active targets for directed evolution. An intelligently designed selection by compartmentalization of each gene variant in a rabbit reticulocyte transcription/translation system overcomes limitations associated with in vivo screening techniques, allowing the efficient screening of restriction endonuclease libraries (74). Novel selection methods have also been developed for selection of restriction enzymes with altered substrate specificities (80, 168, 256, 353). DNA cleavage specificities have been created from the E. coli RNase P derivatives (59).

Transposase. Naumann and Reznikoff (216) used directed evolution to generate a mutated Tn5 bacterial transposase that could function on transposons with mutated end binding sequences. The Tn5 transposon encodes a 53-kDa transposase protein (Tnp) that facilitates the movement of the entire transposon by first binding to each of the two 19-bp specific binding sequences (known as outside end [OE]), followed by formation of a nucleoprotein complex, blunt-end cleavage, and then transfer to the target DNA. The transposon also promotes the movement of a single OE by using an additional 19-bp inside end sequence (IE). The wild-type Tn5 Tnp activity is inhibited in E. coli as a result of Dam methylation at the IE (IEME). In order to screen for a transposase mutant that functions with mutated inverted repeats, the IE was modified at position 12 from thymine to adenine (IE12A), which results in loss of recognition by the wild-type transposase. As a consequence, insertion of IE12A in the flanking region of the lacZ gene between the transcription and translation start sites results in an inactive transposon. Three rounds of gene shuffling and high-throughput screening for LacZ activity at about 104 colonies per round, followed by analysis of the active variants for activities against OE and IE, has allowed the isolation of a specific hyperactive Tnp variant (TnpsC7). While methylation of IE reduced the wild-type Tnp activity by 100-fold, TnpsC7 activity in the presence of IEME was markedly higher.

Integrase/recombinase. Improved site specificity for large genome modifications has been recently demonstrated for the wild-type {phi}C31 integrase (265). Sclimenti et al. (265) applied two rounds of DNA shuffling in combination with a genetic screen that is capable of identifying improved variants expressing the lacZ reporter gene. This improved enzyme possesses strong preference for target-site DNA sequences and has 10- to 20-fold-higher absolute integration frequencies than the wild-type {phi}C31 integrase. In addition to the demonstration of improved site specificity of this integrase, several other groups have successfully altered the site specificity of the Cre/Flp recombinases by directed evolution (35, 36, 252, 258, 314). The Cre recombinase catalyzes the integration, excision, and rearrangement of two 34-bp, double-stranded recombination sites known as loxP. Santoro and Schultz (258) designed a fluorescence-activated cell sorting-based screening for recombinases that recognize unnatural recombination sites. The screening system consists of a recombinase variant and a reporter gene plasmid, expressing either enhanced yellow fluorescent protein (YFP) or green fluorescent protein (GFP). Using this high-throughput selection system, the authors isolated recombinase variants that show high specificity for unnatural loxP sites and low activity for the wild-type loxP site. Site-specific manipulation of genomes by recombinases is a powerful functional genomic tool. Recombinases such as Cre have been widely used to mutagenize and replace genes in mice. Expanding the recombination sequences of recombinases will improve the efficiency and the quality of production of transgenic animals and plants. The ability to evolve proteins that interact with DNA has broad implications. Efforts to evolve other DNA-binding proteins, such as transcription factors, for tailor-made specificities are under way.

Reporter genes. Although by themselves they usually do not modify nucleic acids, in molecular biology, reporter proteins are often closely associated with other proteins that do. Directed evolution has been applied to optimize the physical properties of fluorescent proteins and small-molecule probes for real-time imaging of live cells (21, 40, 142). Fluorescent probes function as "passive" markers that provide high sensitivity for real-time visualization and tracking of cellular events without perturbing the cells. GFP is widely used for tracking protein localization in vivo and has been evolved by directed evolution (65). Additional fluorescent variants such as YFP and cyan fluorescent protein have been generated by mutagenesis of the wild-type GFP. These fluorescent variants may be used as companion markers for protein colocalization and for tracking protein-protein interactions by fluorescent resonance energy transfer (FRET). Nguyen and Daugherty (220) addressed the dynamic range and sensitivity limitations associated with FRET by designing a strategy in which a cyan fluorescent protein-YFP fusion system is used to allow the detection of subtle improvements, enabling gradual optimization of FRET signals. When this system is coupled with random mutagenesis and targeted saturation mutagenesis, substantial enhancement of FRET dynamic range and sensitivity has been achieved. Another example is the engineering of the Discosoma red fluorescent protein (DsRed). The wild-type, tetrameric DsRed has poor solubility that can affect the function and localization of the tagged proteins. DsRed is also slow in the chromophore maturation process. By applying seven rounds of site-directed mutagenesis and error-prone PCR followed by high-throughput visual screening for fluorescence in microbial cells, Bevis and Glick (21) isolated soluble DsRed variants that also mature 10 to 15 times faster than the wild-type protein. While the improved DsRed isolated by Bevis and Glick retained its tetrameric state, Campbell et al. (40) evolved DsRed to an active monomeric form that matures 10 times faster than the wild-type protein. Their approach was a stepwise evolution of DsRed first to a dimer and then to a monomer. This sequential improvement of DsRed resulted in an active monomeric protein with improved solubility and shorter maturation time, leading to greater tissue penetration and spectral separation from autofluorescence and other fluorescent probes. The next generation of the monomeric fluorescent proteins have been shown to be more photostable, mature more completely, and be more tolerant to forming fusion proteins (274). The improvement of another well-known reporter protein, beta-glucuronidase, was achieved (200, 202). Further evolution successfully converted this enzyme into a beta-galactosidase (202). Beta-galactosidase activity has also been evolved from a fucosidase (72, 345).

Increasing protein solubility by directed evolution is not limited to reporter proteins. Overexpressed proteins in heterologous systems such as E. coli often fail to fold into their native states and are thus accumulated as insoluble inclusion bodies. An efficient method to generate more soluble forms of insoluble proteins is directed evolution. One way to screen for soluble variants is to fuse the variants of an insoluble protein to a reporter for heterologous expression, followed by screening of the reporter protein activity (reviewed by Waldo [317]). Yang et al. (336) utilized a GFP-based screening to evolve the solubility of the Mycobacterium tuberculosis Rv2002 gene product. While overexpression of Rv2002 in E. coli resulted in inclusion bodies, five soluble mutants were identified after three rounds of error-prone PCR and DNA shuffling. Because the Rv2002 mutants are fused with GFP, the soluble Rv2002-GFP emits brighter fluorescence than the wild-type protein. Enzymatic assays indicated that a soluble mutant Rv2002-M3 protein possesses high catalytic activity as an NADH-dependent 3{alpha},20ß-hydroxysteroid dehydrogenase.

Directed Evolution of Biochemical Catalysts

Since the 1980s, recombinant DNA technologies, and recombinant protein expression technology in particular, have revolutionized the chemical industry. Enzymatic catalysts are superior in many industrial processes because of their high selectivity and minimum energy requirement. However, for the potential of industrial enzymes to be fully exploited, many challenges remain. In order to be effective and practical, these enzymes need to be consistently available in high quantities and at low cost, and they need to be active and stable under process conditions. In some cases, product inhibitions pose problems. In addition, many enzymes required for specific reactions have yet to be identified and produced. Directed evolution offers viable solutions for enzyme optimization and development of novel specificities. This area of research has been the subject of a number of recent review articles (11, 27-29, 51, 90, 98, 123, 126, 161, 162, 230, 241, 242, 279, 296, 302, 318).

Proteolytic enzymes. The serine endoprotease subtilisin is a commercially important enzyme. With annual sales over $500 million, the highest among industrial enzymes, subtilisins are widely applied as additives in laundry detergents and other uses. A major challenge in improvement of most industrial enzymes is that the performance is defined not by any single property but by a complex mix of parameters. Although rational design and random mutagenesis have been used to improve single properties such as the thermostability of activity in organic solvents, it is often at the expense of other critical properties. Ness et al. (218) demonstrated multidimensional improvement of subtilisin by DNA shuffling. Twenty-five subtilisin gene fragments obtained from different Bacillus isolates were bred together with the full-length gene for a leading commercial protease and screened for thermostability, solvent stability, and pH dependence (at pH 5, pH 7.5, and pH 10). High frequencies of improvements (4 to 12%) in all parameters were achieved using a relatively small library (654 active clones). In addition, the diversity of combinations of properties ranged well beyond that of the properties of the parental enzymes. Sequence analysis of several high performers under each set of conditions revealed that variants with similar properties could be encoded by different sequences. Thermostability, for example, could be conferred by any one of the at least three different genetic elements. Because of the importance of proteolytic enzymes, directed evolution of proteases and peptidases remains one of the most actively pursued research areas (10, 12, 34, 100, 160, 210, 211, 285, 297, 304, 327-329, 349).

Cellulolytic enzymes. Enzymes that hydrolyze carbohydrates are also active targets for directed evolution. Up to sevenfold enhancement of the thermostability of the endoglucanase EngB has been achieved by introducing sequence diversities from a partially homologous endoglucanase, EngD (213, 214). A library was constructed using genes encoding the cellulosomal endoglucanase EngB and noncellulosomal cellulase EngD from Clostridium cellulovorans. The more thermostable cellulosomal endoglucanases are of high industrial relevance. Cellulosomes from clostridia are efficient at hydrolyzing microcrystalline cellulose. The relatively high efficiency has been attributed to (i) the correct ratio between catalytic domains, which optimizes synergism between them; (ii) appropriate spacing between the individual components to further promote synergism; and (iii) the presence of different enzymatic activities (cellulolytic or hemicellulolytic) in the cellulosome, which can remove other polysaccharides in heterogeneous cell wall materials.

Applications of cell wall-loosening enzymes can be found in a variety of industrial processes. In the pulp and paper industry, enzymatic degradation of the hemicellulose-lignin complexes present in pulps preserves intact cellulose fibers and strongly reduces the amount of bleaching chemicals required. The enzyme laccase is of interest for biobleaching and has been improved in industrially relevant parameters by directed evolution (38). Other applications in which cellulosic hydrolases are used include improvement of dough quality in the baking industry, increasing the feed conversion efficiency of animal feed, clarifying juices, and producing xylose, xylobiose, and xylo-oligomers. In addition, cellulosic hydrolases are important in biomass conversion for novel biofuel and other valuable chemicals. In a broader aspect, directed evolution has been successfully applied to improve many enzymes involved in carbohydrate biosynthesis, modification, and degradation. Examples include ADP-glucose pyrophosphorylase (254), amylosucrase (310), aldolase (86, 326), sugar kinase (120), cellulase (153), amylases (19, 20, 154, 312), xylanases (49, 129, 203), glucose dehydrogenase (14), and beta-glucosidase (13).

Enzymes for bioremediation. Enzymes that cleave carbon-halogen bonds are being studied not only because of the important chemical reactions they catalyze but also for potential use in environmental sciences. Haloalkane dehalogenase converts alkylhalide functionality to an alcohol group with broad substrate specificity. This enzyme has been subjected to directed evolution for improved function in detoxification of halogenated compounds (30, 38, 95, 96, 240, 348). Organophosphate-degrading enzymes have been evolved and selected for broadened substrate specificity (53, 335). Broadened substrate specificity of a biphenyl dioxygenase has also been achieved (33, 87, 164, 291). Efforts in cleaning underground water contamination prompted the evolution of an enzyme for chlorinated ethene degradation (41).

Lipases and esterases. Lipases, which comprise another class of hydrolases, have broad industrial applications. Lipases catalyze the hydrolysis and synthesis of long-chain acylglycerols from triglycerides. For production of biofuel, a single transesterification reaction using lipases in organic solvents can convert vegetable oil to methyl- or other short-chain alcohol esters. Biodegradable biopolymers such as polyphenols, polysaccharides, and polyesters show a considerable degree of diversity and complexity. Lipases and esterases are used as catalysts for polymeric synthesis (e.g., stereoselectivity, regioselectivity, and chemoselectivity) under mild reaction conditions. Lipases are also used in synthesis of fine chemicals, agrochemicals, and pharmaceuticals.

Directed evolution of industrially important lipases has been extensively reviewed (131-134, 247-249). The enantioselectivity of lipases is of biochemical interest. The ability to engineer lipases with high enantioselectivities allows the production of desired enantiopure compounds. A Pseudomonas aeruginosa lipase has been evolved to increase enanselectivity towards the chiral substrate 2-methyldecanoic acid p-nitrophenyl ester. A few rounds of directed evolution produced greater than 25-fold improvement of the enanselectivity. It is interesting that the best variants contain five amino acid changes and most of them are located in the flexible loop regions (183, 249). Using the ADO approach, increased enantioselectivities of two B. subtilis lipases have been identified by screening of a small number of variants (343). The substrate specificity and stability of lipases can also be modified by directed evolution (147, 282). The lipase from Bacillus thermocatenulatus BTL2 exhibits low phospholipase activity. A single round of random mutagenesis followed by screening of 6,000 variants generated progeny with more than a 10-fold increase in phospolipase activities (147). Most of the variants show reduced activities towards medium- and long-chain fatty acyl methyl esters compared to the wild-type enzyme. Moreover, in combination with structure-guided site-directed mutagenesis, further improvement of the phospholipase activity has been achieved. The best variant, which exhibits 17-fold improvement in phospholipase selectivity, has 1.5- to 4-fold-higher activity towards long-chain fatty acyl substrates. In an effort to achieve the opposite goal, the phospholipase A of Serratia has been converted to a lipase by using a combination of DNA shuffling and N-terminal truncations (281).

By sequential generation of random mutagenesis and screening, Moore and Arnold (212) evolved an esterase for deprotection of an antibiotic p-nitrobenzyl ester in aqueous organic solvents. A variant has been found to perform as well in 30% dimethylformamide as the wild-type enzyme in water, a 16-fold improvement in esterase activity. As in many other directed evolution experiments, the successful outcome of this work relied on the establishment of a high-throughput screening assay, this time using the p-nitrophenyl ester. In recent years, a great deal of effort has been devoted to design of screening tools for improvement of lipases and esterases (91, 97). Droge et al. (77) reported the binding of a phosphonate suicide inhibitor to lipase A that is presented by phage display. The specific interaction with the suicide inhibitor provides a fast and reproducible method for selection lipases with novel substrate specificities. Two new triglyceride analogue biotinylated suicide inhibitors have been designed, synthesized, and applied in directed evolution of phage-displayed lipolytic enzymes (70, 71).

Cytochrome P450 enzymes. The cytochrome P450 superfamily is a highly diversified set of heme-containing proteins, and members serve a wide spectrum of functions. In addition to the most common function of catalyzing hydroxylation, P450 proteins perform a variety of reactions, including N oxidation; sulfoxidation; epoxidation; N, S, and O dealkylation; peroxidation; deamination; desulfuration; and dehalogenation. In mammals they are critical for drug metabolism, blood hemostasis, cholesterol biosynthesis, and steroidogenesis. In plants they are involved in plant hormone synthesis, phytoalexin synthesis, flower petal pigment biosynthesis, and most likely hundreds of additional, unknown functions. In fungi they make ergosterol and are involved in pathogenesis by detoxification of host plant defenses. Bacterial P450s are key players in antibiotic synthesis. More recently, cytochrome P450 enzymes have shown promise in industrial applications as new methods for high-level production and high-throughput assays have been developed (4, 18, 306).

A number of cytochrome P450 enzymes have been the targets of directed evolution (50, 54, 83, 250, 255, 306, 307, 331, 332). Cytochrome P450 enzymes are often found to be poorly active, with narrow substrate specificity. The wild-type P450 BM-3, which is specific for long-chain fatty acids, was a target for rational design and directed evolution (181). Based on the crystal structure, eight amino acids were identified for creation of libraries by site-specific randomization mutagenesis of each residue. The libraries were screened by a spectroscopic assay using omega-p-nitrophenoxycarboxylic acids as substrates. By sequential evolution, variants showing specificity towards medium-chain substrates were identified. In a subsequent study (182), one of the variants was found to be able to efficiently hydroxylate indole, resulting in the formation of indigo and indirubin. Further characterization of this mutant revealed that it is capable of hydroxylating several alkanes and alicyclic, aromatic, and heterocyclic compounds, all of which are nonnatural substrates for the wild-type enzyme (6). Many cytochrome P450 monooxygenases are multimeric and membrane associated, with low catalytic efficiencies. Glieder et al. (92) evolved the Bacillus megaterium cytochrome P450 BM-3, which is specific for C12 to C18 fatty acids, to efficiently catalyze the conversion of C3 to C8 alkanes to alcohols. In this case the evolved enzyme exhibits a broad range of substrate specificities, including the gaseous alkane propane, as well as improved activity towards the natural fatty acid substrates. BM-3 has also been engineered to be significantly more tolerant to several cosolvents, including the organic cosolvents dimethyl sulfoxide and tetrahydrofuran (332). Furthermore, the regioselectivity and enantioselectivity of BM-3 have been engineered through a combination of in vitro evolution, and the selectivity appears to be retained in vivo with E. coli cells (238).

Successful evolution of cytochrome P450 requires efficient high-throughput screens that are sensitive to the activities of interest. Horseradish peroxidase couples the phenolic products of hydroxylation of aromatic substrates to generate colored or fluorescent compounds that are easily detectable in high-throughput formats. Joo et al. (139) have taken advantage of this system by coexpressing the coupling enzymes with functional mono- and dioxygenases. Using fluorescent digital imaging, they screened libraries of cytochrome P450cam from Pseudomonas putida for novel activity of chlorobenzene hydroxylation. Joo et al. (140) also utilized this so-called "peroxide shunt" pathway to identify variants showing significantly improved activity for naphthalene hydroxylation in the absence of the NADPH cofactor. Interestingly, the P450 enzyme has recently been used as a model for computational structure-guided evolution (227).

Directed Evolution of Metabolic Pathways

The evolution of whole metabolic pathways is a particularly attractive concept, because most natural and novel compounds are produced by pathways rather than by single enzymes. Genetically up-regulating one enzyme activity in a pathway does not always guarantee an increase in the final product. Therefore, metabolic pathway engineering usually requires the coordinated manipulation of all enzymes in the pathway. The potential for evolving a pathway in the laboratory has long been recognized. For instance, using the ebg operon of E. coli as a model, it has been demonstrated that a pathway can be redirected and that such evolution requires a series of mutations in several structural and regulatory genes (103, 109, 111). However, instead of operons, genes of a pathway are often located in different positions in the genome, making such coordinated engineering difficult. Several strategies can be applied to the directed evolution of metabolic pathways, as follows.

(i) Whole genomes are shuffled (see above) and selected for desired phenotypes or products (239). The successful engineering of polyketide and lactic acid production in Lactobacillus (234, 347) has demonstrated that whole-genome shuffling is one of the most powerful tools in directed evolution of pathways. It is particularly useful when a pathway is not well characterized and key enzymes or genes have not yet been identified or cloned. Phenotypic improvement by whole-genome shuffling is an important milestone for bioprocess optimization. Together with novel techniques for cultivating and identifying previously unrecognized microorganisms (342) and information on biodiversity in terms of species, distribution, and ecosystem function (reviewed by Bull et al. [37]), whole-genome shuffling will continue to expand its impact to the production of high-value biomolecules.

(ii) The genes encoding key enzymes are heterologously expressed to alter an existing pathway. Introduction of an enzyme with novel specificity can redirect the metabolic flux in a host and result in production of new products (261, 321). These recombinant enzymes can be obtained from other organisms known to produce the compounds (299) or by directed evolution to create the desired specificity from an enzyme that normally catalyzes other reactions (144, 315). For instance, under anaerobic conditions yeast does not efficiently produce ethanol by using xylose. By heterologous expression of a xylose isomerase from the fungus Piromyces and selection of yeast transformants on xylose, Kuyper et al. (166) have isolated a mutant strain that exhibits a sixfold increase in the anaerobic growth rate on xylose and higher yields of ethanol. Pathway engineering often requires alteration of the substrate pools for the key steps. Thus, directly targeting enzymes responsible for the production of these substrates can enhance or even redirect biosynthetic pathways (177). To engineer a multienzyme pathway for novel carotenoid production in E. coli, Schmidt-Dannert and colleagues first introduced two genes to produce the precursor phytoene. Subsequently, a library of two shuffled desaturase genes from Erwinia was introduced for the desaturation of phytoene. Divergent lycopene-like compounds with different degrees and positions of desaturation were identified. The pathway of a chosen mutant was further modified by introducing a library of shuffled cyclase genes. The engineering of the carotenoid pathway represents a fine example of how directed evolution can be used to redesign a complex pathway (68, 147, 167, 175, 176, 178, 205, 206, 257, 262, 263, 305, 320, 324).

(iii) In nature, many pathway genes are organized in gene clusters or operons (171, 172). Well-known examples include pathways for polyketide biosynthesis (125) and biosynthesis of certain secondary metabolites (190). Early work using the ebg operon presented convincing arguments for directed evolution of an operon as an effective approach in pathway engineering (103, 105, 108, 109, 111). Directed evolution of naturally existing operons and, in some cases, artificially assembled operons offers a unique and coordinated approach to engineer novel functions. Another demonstration of this approach is the manipulation of an arsenate detoxification pathway by DNA shuffling (63). A plasmid containing the operon of four ars genes was shuffled and selected for increased resistance to arsenic. While the native operon does not confer E. coli resistance to arsenic, several rounds of selection resulted in cell growth in media where the arsenate concentration reached the solubility limit. In another example, the trehalose-6-phosphate synthase/phosphatase operon was evolved to achieve greater trehalose production in E. coli (159, 160). In E. coli, trehalose-6-phosphate synthase and trehalose-6-phosphate phosphatase are encoded by the otsBA operon. Directed evolution of the otsBA operon and screening for trehalose synthesis resulted in 15 positive clones and 12-fold improvement in trehalose production compared to that with the wild-type strain. The same strategy can be applied to artificial operons similar to that constructed for the production of the biopolymer poly(3-hydroxybutyrate-co-3-hydroxyhexanoate) (231). In another example, a metabolically engineered E. coli strain for astaxanthin production has been generated by overexpression of three metabolic enzymes from different origins: the E. coli isopentenyl diphosphate isomerase, the Archaeoglobus fulgidus geranylgeranyl diphosphate synthase (GPS), and the Agrobacterium aurantiacum astaxanthin biosynthesis enzymes (crtWZYIB gene products) (322). In a subsequent effort, repeated cycles of error-prone PCR, which employs a low-fidelity replication step to introduce random point mutations at each round of amplification, were used to evolve one of these key enzymes, GPS (321). A 100% improvement in lycopene production has been detected by screening for deeper orange color in 3,500 colonies. It is tempting to speculate that the application of directed evolution to the synthetic operon that contains isopentenyl diphosphate isomerase, GPS, and crtWZYIB might result in larger amounts of astaxanthan than the levels observed by single-gene evolution.

(iv) The characteristics of a metabolic pathway are a result of the dynamic interaction between its structural genes and the gene regulatory apparatus. Therefore, directed pathway evolution can be achieved by engineering of gene regulation factors that control these pathways (61). The recent exciting progress in engineering of artificial transcription factors has shown that this approach is not only feasible but also advantageous in certain areas of metabolic engineering. Notable advances have mainly been in the generation of artificial zinc finger transcription factors (17, 25, 75, 76, 127, 128, 135, 146, 174, 186, 187, 215, 266-271, 300). Chimeric proteins containing novel DNA-binding domains (such as polydactyl zinc fingers) have shown promise in high-throughput ligand-binding screens, genome-wide gene activation/repression, targeted DNA cleavage, DNA/chromotin modification, and site-specific integration (135). This strategy is particularly powerful when dealing with pathways that are undefined or normally inactive without induction. Engineered transcription factors can also be used to target a known gene regulatory region(s). For example, they can be evolved to bind specific promoter sequences proximal to the binding sites of known and natural transcription factors (94). Transcription factors and their target genes comprise the basic unit in the complex transcriptional regulatory network. Network-wide engineering must deal with higher levels of complexity. The ability to evolve the transcriptional network, however, represents a new possibility in pathway engineering. Yokobayashi et al. proposed the construction of an artificial transcriptional control network and provided examples of how such a genetic circuit can be optimized by a combination of rational design and directed evolution (338, 339). Metabolic pathways often respond to cell-cell communications. An elegantly designed "population control" system was constructed based on a quorum-sensing system, allowing a synthetic bacterial ecosystem to be controlled by cell-cell communication (340). Directed evolution of the major component of this system, the LuxR-type transcriptional regulators, revealed the evolutionary plasticity of the quorum-sensing mechanism (60). Another challenge in pathway engineering is to control the timing of gene expression. Inducible gene regulation systems such as the tetracycline/Tet receptor can be used to switch pathways on and off. Evolving these systems to recognize novel inducers has tremendous practical implications in pathway engineering (264, 280).

Directed Evolution of Pharmaceuticals

Protein pharmaceuticals. Directed evolution has revolutionized the development of novel therapeutic proteins (5, 93, 118, 145, 157, 165, 173, 235, 253). DNA family shuffling of more than 20 human alpha interferon genes, followed by selection of antiviral and antiproliferation activities in murine cells, resulted in greater than 250,000-fold improvement (44). Interestingly, no random mutation occurred in the highly improved proteins; i.e., the novel chimeras were created from the genetic diversity within the parental gene family, a result with intriguing implications for gene evolution. Homologous recombination approaches have also been successfully applied to improvement of the human p53 protein, a tumor suppressor (201, 334). The human prolyl endopeptidase is important in activation of the melphalan prodrug, but the wild-type enzyme is thermolabile. Robotic-assisted directed evolution has significantly improved the thermostability of the enzyme (117). By combining receptor structure-based engineering and directed evolution, an amphioxus insulin-like peptide was converted to mammalian insulin (99). Another exciting area to explore functional diversities is the evolution of hormones and hormone receptors (55, 69, 293). Directed evolution has led to the increase of peroxidase activity of horse heart myoglobin (319). Therapeutic proteases and protease inhibitors are also active targets for directed evolution (191, 288-290). The macromolecular protease inhibitor ecotin is of therapeutic value. By combining directed evolution and stepwise engineering, Stoop and Craik (288) generated ecotin libraries that contain variants with significantly enhanced selectivity towards plasma kallikrein.

Antibodies. Therapeutic antibodies represent the fastest growing area in pharmaceutical development. Considering that in nature the combinatorial antibody diversity is a result of somatic recombination, it is not surprising that directed evolution can be a powerful and practical tool for the creation of high-affinity antibodies in vitro. Techniques such as surface display facilitate high-throughput selection for desired activity (32, 62, 85, 124, 143, 295, 308). Recombination of phage-displayed, low-affinity immunoglobulin M antibodies resulted in variants with increased affinity of several orders of magnitude in just two rounds of evolution (85). The same strategy has yielded stable disulfide bond-free antibody single-chain fragments (244). The requirement for disulfide bond formation has hindered antibody production in systems such as E. coli, and disulfide bond-free antibodies not only potentially simplify production but also provide insight into antibody protein folding. Additional research has aimed at engineering antibodies to achieve extremely high affinities (15, 26, 66, 112, 137, 246). The gene for the llama heavy chain antibody fragment was evolved and selected for improvement in production (309). Antibody variants were identified that exhibited two- to fourfold increases in production while retaining their antigen specificity (341). Crystallographic analysis of one of the evolved antibodies revealed that the mutations conferring significant improvement in affinity do not directly contact the antigen, suggesting that it would be difficult to obtain such results via rational design. Nonetheless, the strategy of combining rational design and directed evolution should accelerate antibody engineering more rapidly than using either approach alone.

Catalytic antibodies are also of interest for directed evolution (298, 301). Superior catalysts for aryl phosphate were generated from synthetic human antibody libraries (43). Antibodies have also been engineered for diagnostic purposes (161).

Vaccines. Directed evolution has played and continues to play an important role in the development of new vaccines (58, 188, 189, 197, 235, 245, 325). To boost immunity, directed evolution can be used to generate improved proteous antigens or other immunomodulatory molecules, DNA vaccines, and whole viruses (see below). On the other hand, certain cytokines and allergens can be bred for down-regulation of allergic immune responses. Recursive library construction and selection allowed the isolation of high-affinity, protective mimotopes against Cryptococcus neoformans (16). Highly immunogenic mimotopes of the hepatitis C virus hypervariable regions have been selected by a combination of DNA shuffling and phage display-based screening (346). A DNA vaccine of the E7 oncogene has been developed and shown to provide protection against tumor cells (223). This strategy of rearranging oncogene sequences presents an advantage over wild-type oncogene-derived DNA vaccines, which carry a risk of de novo tumor induction. Toxic side effects have been associated with the direct administration of recombinant antitumor interleukin-12 protein. A DNA vaccine based on the interleukin-12 gene has been shown to reduce adverse side effects, while its potency and effectiveness have been further improved by directed evolution (179). In addition, high-affinity T-cell receptor variants can be generated and used for detecting peptide-major histocompatibility complex complexes on antigen-presenting cells (121).

Viruses. Breeding of viruses has tremendous practical implications in gene therapy and vaccine development (283). The feasibility was demonstrated using the murine leukemia viruses (MLV). Family shuffling of six MLV produced variants with novel tropism (283). The MLV envelope protein consists of two subunits, SU and TM, associated by a labile disulfide bond. This complex, which interacts with a cellular receptor and mediates fusion with the plasma membrane, is highly sensitive to physical forces during the manufacturing process. As a result, the concentration procedure commonly used for retrovirus vectors is ineffective for manufacturing stocks of high titer. To improve the resistance of the MLV envelope protein to the process of concentration by ultracentrifugation, the envelope regions of six ecotropic strains were shuffled (243). Screening for survival after three consecutive concentration steps resulted in 30- to 100-fold-improved stability compared to the parental viruses. In an effort to establish a pig-tailed macaque model for human immunodeficiency virus (HIV) infection, Pekrun et al. evolved a HIV type 1 variant with a substantially enhanced replication rate (237). In an interesting attempt to control the risks associated with pathogenic phenotypes of high-replicating viral vaccines, a tetracycline-inducible system was introduced to control the HIV gene replication (199). By application of directed evolution, highly infectious viral variants have been isolated; however, the viral replication is strictly controlled by a doxycycline-dependent switching system. An alternate strategy to control viral replication by using the bacteriophage T7 polymerase has also been developed (31).

Therapeutic chemicals. The role of biocatalysis in pharmaceutical production has been rapidly expanding since the establishment of recombinant DNA technology (45, 123). The involvement of enzyme and metabolic pathway engineering in therapeutic chemical production is moving towards the mainstream in the industry, and directed evolution technologies are leading the advance. Applications of directed evolution in development of anti-infection agents were among the early examples demonstrating the power and effectiveness of the technologies. Evolution of polyketide synthases to generate novel antibiotic activities demonstrated that novel compounds can be identified even in small libraries (123). The modular nature of the polyketide synthetic pathway allows an efficient way to create large numbers of polyketide variants by replacing individual modules with a shuffled library (151). Directed evolution of a toluene-xylene monooxygenase resulted in variants that catalyze the synthesis of various valuable fine chemicals, such as catechol (311). The substrate specificity of the cephalosporin acylase has been altered for the improvement of cephalosporin and penicillin production (229, 278). Directed evolution has allowed the identification of "hot spots," in this case, a single amino acid residue crucial for substrate specificity. When this hot spot was subjected to saturation mutagenesis, variants with further improvement or novel specificity were identified (228). Protein engineering using site-directed and/or saturation mutagenesis, guided by information generated from directed evolution, can be an extremely powerful approach to create novel functionalities (73, 88, 208, 316).

Directed Evolution of Agriculturally Important Traits

Agricultural biotechnology offers tremendous promise. Possibilities exist for improvement of crop yields through resistance to pests, including weeds, insects, and disease, as well as tolerance to environmental stresses such as cold and drought. Other areas which may affect eventual yield include postharvest characteristics such as ripening control and prevention of potato sweetening.

In the 20 years since it has been possible to introduce transgenes into plants, many novel strategies have been devised to improve the quality of crops. Many strategies for pest control, cold tolerance, disease control, and other areas of improvement have had positive initial results in laboratory settings; however, the genes have not provided sufficient efficacy to produce commercially viable genetically modified (GM) products. In retrospect this makes sense, since many transgenes that were used in these experiments clearly had not been optimized for use in GM crop plants.

Directed evolution can be used to improve existing traits such as glyphosate resistance and Bacillus thuringiensis toxin expression in commercial crops. It can also be used to develop traits from programs in which initial leads (genes) provided insufficient efficacy. Furthermore, directed evolution can be applied to develop desirable gene functions from gene targets that have low or no activity, resulting in novel traits that would otherwise not have been possible (169).

Existing traits. (i) Glyphosate tolerance. Existing glyphosate resistance traits in corn, cotton, and soybean, based on expression of a microbial enopyruvylshikimate-3-phosphate synthase that is not affected by the herbicide, are effective. However, there is clearly room for improvement. He et al. (116) bred E. coli and Salmonella enterica serovar Typhimurium enopyruvylshikimate-3-phosphate synthases (the enzyme which, when carrying a specific mutation, conditions tolerance to the herbicide) to develop variants with superior properties. Several gene variants from a single round of directed evolution resulted in enzymes simultaneously improved over the best parent in multiple kinetic parameters, including a twofold-improved specific activity, a fivefold-improved Km for phosphoenolpyruvate, and a fivefold decrease in sensitivity to glyphosate. Interestingly, the mutations identified in that study do not coincide with the mutations identified previously by other researchers in their efforts to improve the properties of this enzyme. These results demonstrate that directed evolution can provide novel solutions to improving protein function even for proteins that have undergone extensive improvement through random mutage