**3.1.1 Amino acid hydrophobicity scales**

As stated above, protein hydrophobicity is determined by the hydrophobicity of the amino acids that compose it. Hence, it becomes necessary to quantify in any way the hydrophobic contribution of each amino acid. For this purpose, different approaches have been proposed to assign a hydrophobicity value to each one of the standard amino acids (Biswas et al., 2003; Kovacs et al., 2006). These methods are based on theoretical calculations and/or experimental determinations. Besides, the amino acid hydrophobicity scales differ in the hydrophobicity value assigned to each amino acid as well as in the relative position occupied by each one. These scales have been classified into several categories by different authors (Lienqueo et al., 2002; Mahn et al., 2009), based on their underlying principles.

Despite the differences between the hydrophobicity assigned to each residue by the different scales; it is clear a global tendency. Isoleucine shows the highest hydrophobicity in most scales, followed by Tryptophan. Glycine usually has an intermediate hydrophobicity level, i.e. neutral hydrophobicity, and the lowest level is mostly assigned to Aspartic acid (Lienqueo et al., 2007), i.e., this is the most hydrophilic amino acid. The suitability of the hydrophobicity scale depends on the use that will be given to the estimation of the protein or peptide hydrophobicity, as well as on the way to estimate this property. The scales proposed by Miyazawa & Jernigan (1996) and by Cowan & Whittaker (1990) are the most adequate to estimate protein hydrophobicity based on its threedimensional structure (Lienqueo et al., 2007), regarding its behavior in HIC. Additionally, Salgado et al. (2005) proposed that the scale developed by Wertz & Scheraga (1978) is the most adequate to estimate protein hydrophobicity based on the amino acid composition of that protein.


Table 1. Amino acid hydrophobicity scales useful in HIC.

The Miyazawa & Jernigan (1996) scale is based on the three-dimensional structure of proteins, and it represents the contact energy between adjacent amino acids in folded protein. The Wertz & Scheraga (1978) scale is also based on knowledge of the folded protein structure, and it estimates the amino acid hydrophobicity as the ratio between the number of buried residues and the number of residues exposed to the solvent, for each type of standard amino acid. Both scales are based on clusters composed by a significant number of proteins whose three-dimensional structure had been elucidated through experimental methods. Both scales have been classified as indirect scales (Mahn et al., 2009). On the other hand, the Cowan & Whittaker (1990) scale, which has been considered a direct scale, assigned a hydrophobicity value to each standard amino acid based on the retention time of z-derivatives of each amino acid in HPLC. The scales mentioned above are presented in Table 1.

### **3.1.2 Estimation of protein hydrophobicity**

There are different approaches to estimate protein hydrophobicity, which are based on different principles. The classical approach consists of estimating the "average surface hydrophobicity" (φsurface) based on the three-dimensional structure of the macromolecule in its native conformation (Lienqueo et al., 2002; Berggren et al., 2002). This approach considers only the amino acid residues that are accessible to the solvent at the protein surface, by using three-dimensional structural data. This method considers that each amino acid on the protein surface has a hydrophobic contribution proportional to its solvent accessible area, and the hydrophobicity of each residue is given by the amino acid hydrophobicity scale

ALA 0.420 0.660 5.330 0.391 0.520 0.375 ARG -1.560 0.176 4.180 0.202 0.490 0.321 ASN -1.030 0.306 3.710 0.125 0.420 0.196 ASP -0.510 0.433 3.590 0.105 0.370 0.107 CYS 0.840 0.763 7.930 0.819 0.830 0.929 GLN -0.960 0.323 3.870 0.151 0.350 0.071 GLU -0.370 0.467 3.650 0.115 0.380 0.125 GLY 0.000 0.557 4.480 0.252 0.410 0.179 HIS -2.280 0.000 5.100 0.354 0.700 0.696 ILE 1.810 1.000 8.830 0.967 0.790 0.857 LEU 1.800 0.998 8.470 0.908 0.770 0.821 LYS -2.030 0.061 2.950 0.000 0.310 0.000 MET 1.180 0.846 8.950 0.987 0.760 0.804 PHE 1.740 0.983 9.030 1.000 0.870 1.000 PRO 0.860 0.768 3.870 0.151 0.350 0.071 SER -0.640 0.401 4.090 0.188 0.490 0.321 THR -0.260 0.494 4.490 0.253 0.380 0.125 TRP 1.460 0.914 7.660 0.775 0.860 0.982 TYR 0.510 0.682 5.890 0.484 0.640 0.589 VAL 1.340 0.885 7.630 0.770 0.720 0.732

The Miyazawa & Jernigan (1996) scale is based on the three-dimensional structure of proteins, and it represents the contact energy between adjacent amino acids in folded protein. The Wertz & Scheraga (1978) scale is also based on knowledge of the folded protein structure, and it estimates the amino acid hydrophobicity as the ratio between the number of buried residues and the number of residues exposed to the solvent, for each type of standard amino acid. Both scales are based on clusters composed by a significant number of proteins whose three-dimensional structure had been elucidated through experimental methods. Both scales have been classified as indirect scales (Mahn et al., 2009). On the other hand, the Cowan & Whittaker (1990) scale, which has been considered a direct scale, assigned a hydrophobicity value to each standard amino acid based on the retention time of z-derivatives of each amino acid in HPLC. The scales mentioned above are presented in

There are different approaches to estimate protein hydrophobicity, which are based on different principles. The classical approach consists of estimating the "average surface hydrophobicity" (φsurface) based on the three-dimensional structure of the macromolecule in its native conformation (Lienqueo et al., 2002; Berggren et al., 2002). This approach considers only the amino acid residues that are accessible to the solvent at the protein surface, by using three-dimensional structural data. This method considers that each amino acid on the protein surface has a hydrophobic contribution proportional to its solvent accessible area, and the hydrophobicity of each residue is given by the amino acid hydrophobicity scale

Table 1. Amino acid hydrophobicity scales useful in HIC.

**3.1.2 Estimation of protein hydrophobicity** 

Table 1.

**Cowan & Whittaker (1990) Miyazawa & Jernigan (1996) Wertz & Scheraga (1978)**  *Original Normalized Original Normalized Original Normalized* 

developed by Miyazawa & Jernigan (1996) or Cowan & Whittaker (1990), in their normalized form (see Table 1), as shown by equation (10).

$$\phi\_{surface} = \frac{\sum \left( s\_{ani} \cdot \phi\_{ani} \right)}{s\_p} \tag{10}$$

Here, φsurface is the calculated value of the surface hydrophobicity for a given protein, i (i =1, . . ., 20; different i-values indicate different standard amino acids), saai is the solvent accessible area occupied by the amino acid i, φaai is the hydrophobicity value assigned to amino acid i by the hydrophobicity scale, and sp is the total solvent accessible area of the entire protein. It has to be noted that for proteins with a prosthetic group sp is bigger than the sum of the solvent accessible area occupied by the amino acids; and for proteins without prosthetic group, these values are equal. Table 2 shows the average surface hydrophobicity for a group of proteins using the amino acid hydrophobicity scales given in Table 1, and calculated by equation (10). This method for estimating protein hydrophobicity has proven to be valid in several cases (Lienqueo et al., 2002; Lienqueo et al., 2003; Lienqueo et al., 2007); however, this methodology is not valid for proteins that exhibit a highly heterogeneous distribution of the hydrophobic patches on their surfaces (Mahn et al., 2004).


Table 2. Surface hydrophobicity of proteins estimated by equation (9).

Genetic engineering is often used to improve the performance of separation and purification methods. Specifically in HIC, its performance has been improved by the fusion of short hydrophobic peptide tags such as T3, (TP)3, T3P2, T4, (TP)4, T6, T6P2, T8, (WP)2, (WP)4 to a protein of interest (Brandmann et al., 2000; Rodenbrock et al., 2000; Fexby & Bülow, 2004), thus increasing its original hydrophobicity. This genetic engineering strategy has the advantage that the structure/function changes are minimized in relation to the original properties of the native protein. Furthermore, the use of hydrophobic polypeptide tags allows investigating simple and less expensive stationary phases (in comparison with affinity chromatography supports), such as those used in HIC.

As a consequence, methods to calculate the surface hydrophobicity of tagged proteins have been proposed. One of those methods is the one proposed by Simeonidis et al. (2005) that allows computing the "tagged surface hydrophobicity" (φtagged), by equation (11). The surface hydrophobicity of the tagged protein is estimated as the average surface hydrophobicity of the original protein (without the tag) plus the hydrophobicity of the peptide tag. In this case, a fully exposed surface of the amino acids in the tag is assumed. In equation (11), *nk* is the number of amino acids of "*k*" type (usually hydrophobic amino acids, such as tryptophan, leucine and isoleucine) in the tag, and stag\_aa*<sup>k</sup>* is the fully exposed surface of amino acid "*k*" in the tag.

$$\phi\_{\text{lagged}} = \frac{\sum \left( s\_{\text{anti}} \cdot \phi\_{\text{anti}} \right)}{s\_p} + \sum \left( \frac{\left( s\_{\text{tag\\_ank}} \cdot n\_k \right)}{s\_p + \sum \left( s\_{\text{tag\\_ank}} \cdot n\_k \right)} \cdot \phi\_{\text{ank}} \right) \tag{11}$$

Despite the remarkable results reached by the methods described above to estimate protein hydrophobicity, the need of knowing the three-dimensional structure appears as a serious disadvantage. This is especially clear from the ratio between the number of proteins of known three-dimensional structure available in the PDB database (Bermann et al., 2000) and the number of proteins sequenced in the UniProtKB/Swiss-Prot database (Bairoch et al., 2005). Currently (January 2011) this number is closer to 0.13 (70695/534420). This situation points out the need of a procedure based on low level information, such as the amino acidic composition. Salgado et al. (2005) developed a mathematical model to predict the average surface hydrophobicity of a protein based only on its amino acidic composition and, therefore, avoiding the use of its three-dimensional structure.

Equation (12) shows the basic structure of the model. In this equation, ASH represents the average surface hydrophobicity, *ni* is the number of amino acids of class *i* in the protein, ^l is the normalized length of the protein sequence, and *ci* correspond to adjustable parameters. The function *f* accounts for a correction of the amino acid composition of the protein according to different assumptions about the amino acids trend to be exposed to the solvent. The simplest form of *f* considers all the amino acids completely exposed. Parameters for building the function *f* were determined in a large set of non-redundant proteins by Salgado et al. (2005).

$$ASH = c\_0 + \sum\_{i=1}^{20} c\_i \cdot f\left(n\_i\right) + c\_{21}\hat{l} \tag{12}$$

### **3.2 Methods for predicting retention time in HIC**

The approaches discussed above to calculate protein hydrophobicity have been used to predict protein retention time by different methods. The simplest methodology uses straightforward quadratic models, whose parameters depend on the chromatographic conditions used in the HIC run (Lienqueo et al., 2007), and whose variables are DRT and the average surface hydrophobicity of the protein to be separated (φsurface). The most appropriate hydrophobicity scale was found to be that proposed by Miyazawa & Jernigan (1996), in its normalized form. The general model is given by equation (13), where A', B' and C' are the model parameters that depend on the chromatographic conditions, such as type and concentration of salt and type of stationary phase. These parameters have been obtained

As a consequence, methods to calculate the surface hydrophobicity of tagged proteins have been proposed. One of those methods is the one proposed by Simeonidis et al. (2005) that allows computing the "tagged surface hydrophobicity" (φtagged), by equation (11). The surface hydrophobicity of the tagged protein is estimated as the average surface hydrophobicity of the original protein (without the tag) plus the hydrophobicity of the peptide tag. In this case, a fully exposed surface of the amino acids in the tag is assumed. In equation (11), *nk* is the number of amino acids of "*k*" type (usually hydrophobic amino acids, such as tryptophan, leucine and isoleucine) in the tag, and stag\_aa*<sup>k</sup>* is the fully exposed surface

( ) ( )

*s s n s ssn*

Despite the remarkable results reached by the methods described above to estimate protein hydrophobicity, the need of knowing the three-dimensional structure appears as a serious disadvantage. This is especially clear from the ratio between the number of proteins of known three-dimensional structure available in the PDB database (Bermann et al., 2000) and the number of proteins sequenced in the UniProtKB/Swiss-Prot database (Bairoch et al., 2005). Currently (January 2011) this number is closer to 0.13 (70695/534420). This situation points out the need of a procedure based on low level information, such as the amino acidic composition. Salgado et al. (2005) developed a mathematical model to predict the average surface hydrophobicity of a protein based only on its amino acidic composition and,

Equation (12) shows the basic structure of the model. In this equation, ASH represents the average surface hydrophobicity, *ni* is the number of amino acids of class *i* in the protein, ^l is the normalized length of the protein sequence, and *ci* correspond to adjustable parameters. The function *f* accounts for a correction of the amino acid composition of the protein according to different assumptions about the amino acids trend to be exposed to the solvent. The simplest form of *f* considers all the amino acids completely exposed. Parameters for building the function *f* were determined in a large set of non-redundant proteins by Salgado

20

*i ASH c c f n c l* =

The approaches discussed above to calculate protein hydrophobicity have been used to predict protein retention time by different methods. The simplest methodology uses straightforward quadratic models, whose parameters depend on the chromatographic conditions used in the HIC run (Lienqueo et al., 2007), and whose variables are DRT and the average surface hydrophobicity of the protein to be separated (φsurface). The most appropriate hydrophobicity scale was found to be that proposed by Miyazawa & Jernigan (1996), in its normalized form. The general model is given by equation (13), where A', B' and C' are the model parameters that depend on the chromatographic conditions, such as type and concentration of salt and type of stationary phase. These parameters have been obtained

( )

<sup>ˆ</sup> *i i*

=+ ⋅ + ∑ (12)

0 21 1

φ

therefore, avoiding the use of its three-dimensional structure.

**3.2 Methods for predicting retention time in HIC** 

*tag aak k aai aai tagged aak p p tag aak k*

⎛ ⎞ <sup>⋅</sup> <sup>⋅</sup> =+ ⋅ ⎜ ⎟

( )

 φ

\_

+ ⋅ ⎝ ⎠ <sup>∑</sup> <sup>∑</sup> <sup>∑</sup> (11)

\_

of amino acid "*k*" in the tag.

et al. (2005).

φ

from adjusting experimental data to the quadratic model. Table 3 shows the values of A', B' and C' obtained for different operating conditions. The model given by equation (13) is useful for predicting retention times of structurally stable proteins that have a relatively homogeneous distribution of the surface hydrophobicity, such as ribonuclease A.

$$DRT = A \text{'} \cdot \phi\_{surface}^2 + B \text{'} \cdot \phi\_{surface} + C \text{'} \tag{13}$$

Figure 5 shows a scheme of the methodology to predict DRT based on protein hydrophobicity. The procedure begins with the calculation of the protein surface accessible to the solvent, and the fraction of that surface occupied by each kind of amino acid. To calculate this, it is necessary to count on a PDB file, i.e. to know the spatial coordinates of each atom composing the macromolecule, preferably determined experimentally through X-ray crystallography or nuclear magnetic resonance (NMR). Experimentally determined structures can be obtained in The Protein Data Bank (PDB; www.rcsb.org/pdb) database (Bermann et al., 2000). Additionally, three-dimensional models can be found in other databases such as ModBase (http://modbase.compbio.ucsf.edu/modbase-cgi/search\_form.cgi) (Pieper et al., 2009). Also it is required using a computational program or suit to perform the calculation, such as the software GRASP (Nicholls et al., 1991). With this information, the average surface hydrophobicity is calculated by means of equation (10) and using the Miyazawa & Jernigan hydrophobicity scale, in its normalized form. Finally, through a quadratic model like equation (13) the retention time of the protein can be estimated as DRT.


Table 3. Parameters of equation (12) for different operating conditions.

The surface hydrophobicity of tagged proteins (φtagged) has been used by Lienqueo et al. (2007) for predicting the DRT of cutinases tagged with hydrophobic peptides in different matrices for HIC, by means of equation (13) and the methodology represented in Figure 3. The coefficients of the linear model are constants for each set of operating conditions. This approach has proven to be effective in predicting the behavior of tagged proteins in HIC, since it showed a low deviation between predicted and experimental DRT (in the order of 2%), for the tagged cutinases that were studied. Finally, the ASH obtained from equation (11) based on amino acidic composition was used to predict chromatographic behavior in HIC, resulting in a performance 5% better than that observed in the model based on the three-dimensional structure of proteins (equation (10)) (Salgado et al., 2008).

Fig. 5. Methodology for predicting protein retention time in HIC based on surface hydrophobicity. Using a PDB file as input to the program GRASP, the total and partial accessible areas of the exposed amino acids is determined. Using an amino acid hydrophobicity scale and equation (12), the average surface hydrophobicity can be obtained. Then, through simple mathematical correlations the DRTof the protein can be estimated.
