**2.1 Computational design of artificial protein**

Structural design by computational simulation provides important information to determine the optimal protein. There are many types of folds and motifs, the elements that compose the tertiary structure. Because of these elements, two proteins can have similar tertiary structures when they have more than 30% homology in their amino acid sequences, and the structural similarity is enhanced in proportion to increasing homology, according to the DBAli database. Moreover, homology models such as consensus protein design [2], phylogeny-based design [3], and design combined with simulation by molecular dynamics [4] have been recently developed. Therefore, the tertiary structure of an unknown protein can be predicted using a known homologous protein.

In the twentieth century, precise prediction of the tertiary structure of proteins had been difficult due to insufficient operation speed of computers. However, the operation speed of computers has drastically improved in the last 10 years. Using a recently developed supercomputer such as Kei computer [5], complicated structural analysis of proteins can be completed in a considerably shorter time. In addition to the progress in computer technology, high-performance DNA sequencers have also entered the next generation. HiSeq2000 DNA sequencer, which equips many flow cells having a huge number of template DNA, can determine giga bases of DNA sequences at one run. Data on genomes of many organisms have been added to data libraries at high speed owing to breakthrough in the DNA sequencing technology. Data on protein structure also have been enriched. The Protein Data Bank (PDB) is a data bank which includes experimental data such as X-ray crystal structural analysis and NMR data, which are important for prediction of 3D structures of proteins [6]. The data in PDB has rapidly increased in the past several years. Accordingly, computational analysis can now yield sufficiently correct 3D structures of proteins and is one of the most prominent tools to design artificial proteins.

**3**

*2.4.1 In vitro display*

*Introductory Chapter: Artificial Enzyme Produced by Directed Evolution Technology*

DNA. Random mutagenesis is a powerful tool to create unknown proteins.

Soils and seawater are abundant resources of microorganisms. According to the study on the sequences of DNAs of 16S rRNA of soil microorganisms [16], the ratio of microorganism which cannot grow under normal culture conditions to total microorganisms was 99%, suggesting that prominent biocatalysis may be present

Metagenome is all genomes contained in the soil or water sample, and metagenomics which deals with metagenome is the other powerful tool to obtain proteins that we cannot even imagine [17, 18]. The procedure to obtain metagenome is as follows: DNAs of all microorganisms obtained from a target portion are purified without culture, digested to appropriated length with a restriction enzyme, and metagenomic library is constructed by combining the digested DNAs with adequate vectors. Metagenomic library of microorganisms living in special and harsh environment is especially useful because the enzymes have extremely prominent activity even though most of them are hard to culture under normal condition. Screening is recently performed to the environment such as hypersaline soda lake sediments [19] and hot environment [20, 21], and their metagenome data is accumulating day by day.

Construction of display libraries and screening are the most important step in directed evolution technology [22]. In vitro and in vivo displays were proposed

**2.3 Novel proteins based on mutagenesis analysis**

among microorganisms which cannot be cultured in media.

**2.4 Display and screening to obtain important artificial proteins**

Mutation at the protein parts determined by computational structure simulation is generally performed by site mutagenesis or saturation mutagenesis method (site-direct mutagenesis at hotspots) [7]. For instance, activity of α-amylase was enhanced 16.7-fold using site-directed mutagenesis [7], and thermostability of transglutaminase was enhanced twelvefold at 60°C by saturation mutagenesis [8]. Truncation was also used for the mutation based on computationally designed results [9]. Enzymes include unnecessary domains, and characteristics of proteins are often improved by truncation. For example, the enzymes truncated at the N-terminus and C-terminus selected using a randomly truncated library showed high thermostability [10] and activity [11]. The most important advantage of site-direct mutagenesis is that the improved protein can be rapidly selected, because library size required for screening is much smaller than that in random mutagenesis. Random mutagenesis is also a powerful tool to obtain novel enzymes [9], and several procedures for random mutagenesis such as error-prone PCR (ep-PCR), DNA shuffling, and staggered extension process (StEP) were proposed. In ep-PCR, random mutagenesis is introduced by PCR using error-prone polymerase under high concentration of Mn2+ and/or nonuninform dNTP concentration [12]. For example, activity of the subtilis E variant obtained by the ep-PCR method was 256 times higher in 60% DMF solution [13]. In DNA shuffling, many DNA variants containing mutations at different sites are digested with deoxyribonuclease, and PCR is performed using these fragments as template DNAs [14]. StEP is an improved method of DNA shuffling [15]. In this method, random mutation is introduced by repeated short DNA extension times in PCR with many DNA variants as template

*DOI: http://dx.doi.org/10.5772/intechopen.85738*

**2.2 Site mutagenesis and random mutagenesis**

**Figure 1.** *Scheme of the screening process for artificial protein or enzyme.*

*Introductory Chapter: Artificial Enzyme Produced by Directed Evolution Technology DOI: http://dx.doi.org/10.5772/intechopen.85738*

### **2.2 Site mutagenesis and random mutagenesis**

*Current Topics in Biochemical Engineering*

**2.1 Computational design of artificial protein**

predicted using a known homologous protein.

to design artificial proteins.

Structural design by computational simulation provides important information to determine the optimal protein. There are many types of folds and motifs, the elements that compose the tertiary structure. Because of these elements, two proteins can have similar tertiary structures when they have more than 30% homology in their amino acid sequences, and the structural similarity is enhanced in proportion to increasing homology, according to the DBAli database. Moreover, homology models such as consensus protein design [2], phylogeny-based design [3], and design combined with simulation by molecular dynamics [4] have been recently developed. Therefore, the tertiary structure of an unknown protein can be

In the twentieth century, precise prediction of the tertiary structure of proteins had been difficult due to insufficient operation speed of computers. However, the operation speed of computers has drastically improved in the last 10 years. Using a recently developed supercomputer such as Kei computer [5], complicated structural analysis of proteins can be completed in a considerably shorter time. In addition to the progress in computer technology, high-performance DNA sequencers have also entered the next generation. HiSeq2000 DNA sequencer, which equips many flow cells having a huge number of template DNA, can determine giga bases of DNA sequences at one run. Data on genomes of many organisms have been added to data libraries at high speed owing to breakthrough in the DNA sequencing technology. Data on protein structure also have been enriched. The Protein Data Bank (PDB) is a data bank which includes experimental data such as X-ray crystal structural analysis and NMR data, which are important for prediction of 3D structures of proteins [6]. The data in PDB has rapidly increased in the past several years. Accordingly, computational analysis can now yield sufficiently correct 3D structures of proteins and is one of the most prominent tools

**2**

**Figure 1.**

*Scheme of the screening process for artificial protein or enzyme.*

Mutation at the protein parts determined by computational structure simulation is generally performed by site mutagenesis or saturation mutagenesis method (site-direct mutagenesis at hotspots) [7]. For instance, activity of α-amylase was enhanced 16.7-fold using site-directed mutagenesis [7], and thermostability of transglutaminase was enhanced twelvefold at 60°C by saturation mutagenesis [8]. Truncation was also used for the mutation based on computationally designed results [9]. Enzymes include unnecessary domains, and characteristics of proteins are often improved by truncation. For example, the enzymes truncated at the N-terminus and C-terminus selected using a randomly truncated library showed high thermostability [10] and activity [11]. The most important advantage of site-direct mutagenesis is that the improved protein can be rapidly selected, because library size required for screening is much smaller than that in random mutagenesis.

Random mutagenesis is also a powerful tool to obtain novel enzymes [9], and several procedures for random mutagenesis such as error-prone PCR (ep-PCR), DNA shuffling, and staggered extension process (StEP) were proposed. In ep-PCR, random mutagenesis is introduced by PCR using error-prone polymerase under high concentration of Mn2+ and/or nonuninform dNTP concentration [12]. For example, activity of the subtilis E variant obtained by the ep-PCR method was 256 times higher in 60% DMF solution [13]. In DNA shuffling, many DNA variants containing mutations at different sites are digested with deoxyribonuclease, and PCR is performed using these fragments as template DNAs [14]. StEP is an improved method of DNA shuffling [15]. In this method, random mutation is introduced by repeated short DNA extension times in PCR with many DNA variants as template DNA. Random mutagenesis is a powerful tool to create unknown proteins.
