**2.2 Biological distribution and functions**

To date, there are at least four hundred proteins with a CSαβ motif have been discovered and deposited in databases. Proteins with the motif are widely distributed among plants, insects, arachnidia and mollusca (Sun et al., 2002;Zhu et al., 2005). They exhibit a wide spectrum of biological activities, including antimicrobial activity, enzyme inhibitory function, inhibition of protein translation and sweet taste (Assadi-Porter et al., 2000;Chen et al., 2005;Clauss & Mitchell-Olds, 2004;Spelbrink et al., 2004;Stec., 2006;Wong & Ng, 2005;Zhu et al., 2002). Proteins with the CSαβ motif usually serve a common function as defenders of their hosts (Lobo et al., 2007;Song et al., 2005;Zasloff, 2002).

Before designing a unique function into a protein scaffold, it is important to understand the relationship between each part of the scaffold and its biological function. Based on the threedimenional structure, protein scaffold containing a CSαβ motif can be devided into three parts: one α-helix, one β-sheet and three loops. It is well known that the three parts bearing different biological functions (Liu et al., 2006;Thevissen et al., 1996;Wong & Ng, 2005;Zhao et al., 2002).

(c) Wheat defensin (UniProt ID: P20158). (d) Scorpion toxin (UniProt ID: P13487)

Fig. 2. The core structure and three dimensional structrues of different specices. The structures are presented in color ribbon. Red: α-helix, green: β-sheet, and cyan: loop. Protein structures are retreved from Protein Data Bank and visualized with PyMol 0.99rc6.

The α-helix is related to antimicrobial ability. As described previously, the helix forms a positively charged cluster. When the positively charged residues were replaced with null or negatively charged amino acids, the anti-microbial ability of the proten is also changed. For its net positive charge, it is believed that proteins with the motif could interact with negatively charged cell membrane (Thevissen et al., 1996;Thomma et al., 2003). Several studies have demostrated that plant defensins are able to pass artificial membranes and lead to ions leaking from one side of the membrane. The mechanism about how plant defensin passing membrane is not revealed, yet.

Role of the β-sheet is less studied and disscussed. The direct mutagensis studies showed when the hydrophobic residues in the β-sheet are alanine substituted, biological function of mutated proteins are dromatically reduced (Walters et al., 2009;Yang et al., 2009). The maximal effects of mutated proteins are only 30-40% maximal effects of the wild type, even at a high protein concentration. Protein-protein docking model showed that the β-sheet could form interaction interface with their counterpart. These data imply that, the distribution of the hydrophobilc residues in the β-sheet plays a role in protein-protein interactions and β-sheet could relate to the protein-protein interaction specificity to their targets.

(a) The core structure of CSαβ motif. (b) Mosquito defensin (UniProt ID: Q17027)

(c) Wheat defensin (UniProt ID: P20158). (d) Scorpion toxin (UniProt ID: P13487)

The α-helix is related to antimicrobial ability. As described previously, the helix forms a positively charged cluster. When the positively charged residues were replaced with null or negatively charged amino acids, the anti-microbial ability of the proten is also changed. For its net positive charge, it is believed that proteins with the motif could interact with negatively charged cell membrane (Thevissen et al., 1996;Thomma et al., 2003). Several studies have demostrated that plant defensins are able to pass artificial membranes and lead to ions leaking from one side of the membrane. The mechanism about how plant defensin

Role of the β-sheet is less studied and disscussed. The direct mutagensis studies showed when the hydrophobic residues in the β-sheet are alanine substituted, biological function of mutated proteins are dromatically reduced (Walters et al., 2009;Yang et al., 2009). The maximal effects of mutated proteins are only 30-40% maximal effects of the wild type, even at a high protein concentration. Protein-protein docking model showed that the β-sheet could form interaction interface with their counterpart. These data imply that, the distribution of the hydrophobilc residues in the β-sheet plays a role in protein-protein interactions and β-sheet could relate to the protein-protein interaction specificity to their

Fig. 2. The core structure and three dimensional structrues of different specices. The structures are presented in color ribbon. Red: α-helix, green: β-sheet, and cyan: loop. Protein structures are retreved from Protein Data Bank and visualized with PyMol

0.99rc6.

targets.

passing membrane is not revealed, yet.

The length of loop regions are different from protein to protein and the loops connecting CSαβ motif can serve as functional epitopes (Figure 3) (Lay & Anderson, 2005;Wijaya et al., 2000;Zhao et al., 2002). For example, loop 1 of the *Arabidopsis thaliana* trypsin inhibitor (ATTp) and loop 2 of cowpea thionine are the functional epitopes for trypsin inhibition (Wijaya et al., 2000;Zhao et al., 2002). The loop 3 of *Vigna radiate* defensin 1 (VrD1) is the functional loop for insect α-amylase inhibition (Lin et al., 2007;Liu et al., 2006).

Fig. 3. Structure-function relationship of peptide with a CSαβ motif. Red: α-helix, green: β strands, and cyan: loops.

### **2.3 Structural ultra-stability**

A protein scaffold with ultra high stability can maintain its three dimensional structure in extreme environments and its functions can be preserved. Therefore, when a protein scaffold applied in biomedical applications, its structural stabilities must be considered. The scaffold should be able to pass a low pH environment in the stomach, resist protease digestion, endure chaotropic agents at high concentrations and so on. These criteria are to ensure the engineered proteins could reach their target sites and perform functions inside body, and are not destroyed. It has been reported that proteins with a CSαβ motif equip ultra stabilities to several extreme environments such as urea at a concentration of 6M, a temperature over than 95°C, and resist protease digestion without changing its structure (Malavasic et al., 1996;Yang et al., 2009). In Table 1, the properties of CSαβ motif and singledomain antibody are listed and compared (Holt et al., 2003;Skerra, 2007;Yang et al., 2009;Yang & Lyu, 2008). For their advantages, such as high specificity and affinity, antibodies still are the most popular protein scaffold for engineering. Antibodies are widely used both in routine laboratorial experiments and clinical diagnosis. In spite of their significant clinical success, several disadvantages, including high cost in manufacturing, large in size, undesired effector functions and complex intellectual property situations, obstruct their development and applications (Jones et al., 2008). Single-domain antibody is only constituted of antibody variable regions and does not have constant regions (Holt et al., 2003). Compared with single-domain antibodies, the proteins with a CSαβ motif have smaller molecular weights and higher structural stabilities (Holt et al., 2003;Skerra, 2007;Yang et al., 2009). As previously described, proteins with a CSαβ motif share low sequence identity but high structural similarity (Lin et al., 2007). Their functions are highly varied and sturctures are ultra-stable. There is a possibility to utilize CSαβ motif as an engineering scaffold for biomedical applications (Yang & Lyu, 2008).


Table 1. Comparison of properties bwtween CSαβ motif and single-domain antibody.

### **2.4 Amino acid usage of CS**αβ **motif**

To understand the amino acid usage of a protein scaffold can reveal relationships among structures, functions and sequence residues (Kristensen et al., 1997;Yang et al., 2009). To completely understand the relationships could be an approach through extensive amino acid substitution and analysis of protein sequences (Corzo et al., 2007;Wang et al., 2006). Amino acid substitution have been performed in plant defensins, brazzein of *Pentadiplandra brazzeana* and VrD1 of *Vigna radiate*, and some key residue positions are discovered (Assadi-Porter et al., 2010;Yang et al., 2009). In both cases, amino acid substitution does not lead to the structure significantly being changed in all positions along the sequence but the replacement in some positions have effects on biochemical function (Assadi-Porter et al., 2010;Yang et al., 2009).

It has been noted that certain amino acids have preference to fold into a given secondary structure (Chan et al., 1995;Zhong & Johnson, 1992). Comprehending preference of amino acids usage will be really helpful to protein engineering and can be as a fundment for designing innovative peptides. The two major classes of CSαβ motif protein are plant defensins and scorpion toxins (Zhu et al., 2005). Currently, there are at least 140 sequences of scorpion toxin and 180 sequences of plant defensin deposited in the SwissProt database and the numbers are continuely increased. The peptide sequences of the scorpion toxin and

used both in routine laboratorial experiments and clinical diagnosis. In spite of their significant clinical success, several disadvantages, including high cost in manufacturing, large in size, undesired effector functions and complex intellectual property situations, obstruct their development and applications (Jones et al., 2008). Single-domain antibody is only constituted of antibody variable regions and does not have constant regions (Holt et al., 2003). Compared with single-domain antibodies, the proteins with a CSαβ motif have smaller molecular weights and higher structural stabilities (Holt et al., 2003;Skerra, 2007;Yang et al., 2009). As previously described, proteins with a CSαβ motif share low sequence identity but high structural similarity (Lin et al., 2007). Their functions are highly varied and sturctures are ultra-stable. There is a possibility to utilize CSαβ motif as an

Molecular weight 11-15 kDa 5-7 kDa Generation of expression library B cell mRNA Synthetic Water solubility Less High

Post-translational modification Glycosylation Disulfide bridge High functional diversity Yes Yes Tolerate to amino acid substitution Loop regions Structural and loop regions Enzyme inhibition Not certain Direct inhibition Membrane binding Not certain Direct binding Legal problem Very complex Simple

Table 1. Comparison of properties bwtween CSαβ motif and single-domain antibody.

To understand the amino acid usage of a protein scaffold can reveal relationships among structures, functions and sequence residues (Kristensen et al., 1997;Yang et al., 2009). To completely understand the relationships could be an approach through extensive amino acid substitution and analysis of protein sequences (Corzo et al., 2007;Wang et al., 2006). Amino acid substitution have been performed in plant defensins, brazzein of *Pentadiplandra brazzeana* and VrD1 of *Vigna radiate*, and some key residue positions are discovered (Assadi-Porter et al., 2010;Yang et al., 2009). In both cases, amino acid substitution does not lead to the structure significantly being changed in all positions along the sequence but the replacement in some positions have effects on biochemical function (Assadi-Porter et al.,

It has been noted that certain amino acids have preference to fold into a given secondary structure (Chan et al., 1995;Zhong & Johnson, 1992). Comprehending preference of amino acids usage will be really helpful to protein engineering and can be as a fundment for designing innovative peptides. The two major classes of CSαβ motif protein are plant defensins and scorpion toxins (Zhu et al., 2005). Currently, there are at least 140 sequences of scorpion toxin and 180 sequences of plant defensin deposited in the SwissProt database and the numbers are continuely increased. The peptide sequences of the scorpion toxin and

Tm 60-78°C

Molecule Single-domain antibody CSαβ motif protein

> Ultra stable 6M Gdn-HCl Tm > 96°C

engineering scaffold for biomedical applications (Yang & Lyu, 2008).

Ultra stability Unfolded 6M Gdn-HCl

Item

**2.4 Amino acid usage of CS**αβ **motif** 

2010;Yang et al., 2009).

plant defensin peptides deposited in the SwissProt database are retrieved and amino acid usage perferacnes are separately analyzed (Table 2 and Table 3).

In Table 2 and Table 3, the twenty amino acids are listed on the top row, the secondary structures and residue positions are listed on the left two columns. Use of amino acids in each position is counted and the usage frequency is calculated (form 0.00 to 1.00). Amino acids with a high usage frequency (> 0.20) are listed on the right column. The results are interesting and different between scorpion toxins and plant defensins. In scorpion toxins, peptides tend to employ similar amino acids in the same position. Therefore, sequences of scorpion toxins are less diverse and more uniformed. In plant defensins, sequences of peptides are more diverse. The major targets of scorpion toxins are ion channels on the neuron cell surface but the targets of plant defensins are various form peptide to peptide (Zhu et al., 2005). This might be the reason for the amino acid usage difference between the two classes of CSαβ motif peptides. In the helix region, scorpion toxins prefer using charged residues and plant defensins are more inclusive of different type of amino acids. The most interesting is the two residues flanking the exzactly conserved glycine on β22. In scorpion toxins, two tyrosins are prefered on β21 and β23, and the rates are 0.44 and 0.75; respectively. In plant defensins, a preceded glycine (β21) is prefered to the glycine on β22 and the frequency is 0.35. It is worth to clarify the role of the two tyrosins surrounding the glycine on β22 of scorpion toxins. Although the frequnectly used amino acids are listed, it just provides a reference for protein designing and there is no necessarity that combination of these amino acids forms a universal sequence.


Table 2. The high frequent amino acids in the structural regions of scorpion β toxin.


Table 3. The high frequent amino acids in the structural regions of plant defensin.
