**Gibbs Free Energy Formula for Protein Folding**

Yi Fang

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/52410

## **1. Introduction**

Proteins are life's working horses and nature's robots. They participate in every life process. They form supporting structures of cell, fibre, tissue, and organs; they are catalysts, speed up various life critical chemical reactions; they transfer signals so that we can see, hear, and smell; they protect us against intruders such as bacteria and virus; they regulate life cycles to keep that everything is in order; etc., just mention only a few of their functions.

The first thing drawing our attention of proteins are their size. Proteins are macromolecules, that is, large molecules. Non-organic molecules usually are small, consisting of from a couple of atoms to a couple of dozen atoms. A small protein will have thousands of atoms, large ones have over ten thousand atoms. With their huge number of atoms, one can imaging that how complicated should be of a protein molecule. Fortunately, there are some regularities in these huge molecules, i.e., proteins are polymers building up by monomers or smaller building blocks. The monomers of proteins are amino acids, life employs 20 different amino acids to form proteins. In cell, a series of amino acids joined one by one into amino acids sequences. The order and length of this amino acid sequence is translated from DNA sequences by the universal genetic code. The bond joining one amino acid to the next one in sequence is peptide bond (a covalent bond) with quite regular specific geometric pattern. Thus amino acids sequences are also called peptide chains. But the easy translation and geometric regularities stop here. The peptide chain has everything required to a molecule, all covalent bonds are correctly formed. But to perform a protein's biological function, the peptide chain has to form a specific shape, called the protein's **native structure**. Only in this native structure a protein performs its biological function. Proteins fall to wrong shapes not only will not perform its function, but also will cause disasters. Many disease are known to be caused by some proteins taking wrong structure.

How the peptide chain take its native structure? Is there another genetic code to guide the process of taking to the native structure? In fact, at this stage, life's most remarkable drama

©2012 Fang, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

takes stage. Once synthesized, the peptide chain of a protein spontaneously (some need the help of other proteins and molecules) fold to its native structure. This process is called **protein folding**. At this stage, everything is governed by simple but fundamental physical laws.

for Protein Folding 3

Gibbs Free Energy Formula for Protein Folding 49

Before the actual derivation is given, some basic facts should be stated, such as hydrophobicity, protein structures, and the environment in which the protein folds. Brief description of the methods in the experimental measurements and theoretical derivation of the Gibbs free energy of the protein folding is introduced to give the motivation and idea of the derivation. By making critics on the previous derivation, the necessary concepts would be clarified, what are important in the derivation would be identified, and would set the thermodynamic system that most fit the reality currently known about the protein folding process. Then both classical and quantum statistical derivations were given, the only difference is that in the classical statistically derived formula, the volume and the whole surface area terms in formula (2) are missing. Thus it is that only quantum statistical method gives us the volume and whole surface terms in formula (2). After the derivations, some remarks are made. A direct application of the Gibbs free energy formula (2) is the *ab initio* prediction of proteins' natives structures. Gradient formulas of *G*(**X**) are given to be able to apply the Newton's fastest descending method. Finally, it should be emphasized that the gradient �*G*(**X**) not only can be used to predict the native structure, it is actually the force that forces the proteins to fold as stated in Ben-Naim (2012). In Appendix, integrated gradient

There are 20 different amino acids that appear in natural proteins. All amino acids have a

NH2 is the amino group and COOH is the carboxyl group of the back bone. Single amino acid is in polar state, so the amino group gains one more hydrogen from the carboxyl group, or perhaps the amino group losses one electron to the carboxyl group. Geometrically it is irrelevant since after forming peptide bonds the amino group will loss one H to become NH and the carboxyl group will loss one OH to become CO. Thus an amino acid in the sequence

HR O −−−− H

The group R in FIGURE 1 is called **side chain**, it distinguishes the 20 different amino acids. A side chain can be as small as a single hydrogen atom as in Glycine, or as large as consisting of 18 atoms including two rings as in Tryptophan. 15 amino acids have side chains that contain more than 7 atoms, i.e., more atoms than that of the back bone in an amino acid sequence. Except Glycine, a C*<sup>β</sup>* carbon in a side chain forms a covalent bond with the **central carbon** C*<sup>α</sup>*

common part, or the **back bone** consisting of 9 atoms in FIGURE 1 (except the R).

H H*<sup>α</sup>* O \ | || N −−−−−C*<sup>α</sup>* −−−−− C / | |

formulas of *G*(**X**) on the molecular surface are given.

**2. Proteins**

**2.1. Amino acids**

is also called a **residue**.

**Figure 1.** An generic amino acid.

of the back bone.

The **protein folding problem** then can be roughly divided into three aspects: 1. folding process: such that how fast a peptide chain folds, what are the intermediate structures between the initial shape and the native structure. 2. the mechanics of the folding, such as what is the deriving force. 3. the most direct application to biological study is the prediction of the native structure of a protein from its peptide chain. All three parts of the protein folding problem can have a unified treatment: writing down the Gibbs free energy formula *G*(**X**) for any conformation **<sup>X</sup>** = (**x**1, ··· , **<sup>x</sup>***i*, ··· , **<sup>x</sup>***M*) <sup>∈</sup> **<sup>R</sup>**3*<sup>M</sup>* of protein, where **<sup>x</sup>***<sup>i</sup>* <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> is the atom **<sup>a</sup>***i*'s atomic center.

The fundamental law for protein folding is the **Thermodynamic Principle**: the amino acid sequence of a protein determines its native structure and the native structure of the protein has the minimum Gibbs free energy among all possible conformations as stated in Anfinsen (1973). Let **X** be a conformation of a protein, is there a natural Gibbs free energy function *G*(**X**)? The answer must be positive, as G. N. Lewis said in 1933: "There can be no doubt but that in quantum mechanics one has the complete solution to the problems of chemistry." (quoted from Bader (1990), page 130.) Protein folding is a problem in biochemistry, why such a formula *G*(**X**) has not been found and what is the formula? This chapter is trying to give the answers.

First, the Gibbs free energy formula is given, it has two versions, the chemical balance version (1) and the geometric version (2).

#### **1.1. The formula**

Atoms in a protein are classified into classes *Hi*, 1 ≤ *i* ≤ *H*, according to their levels of hydrophobicity. The formula has two versions, the chemical balance version is:

$$\mathbf{G(X)} = \mu\_{\ell} N\_{\ell}(\mathbf{X}) + \sum\_{i=1}^{H} \mu\_{i} N\_{i}(\mathbf{X})\_{i} \tag{1}$$

where *Ne*(**X**) is the mean number of electrons in the space included by the first hydration shell of **X**, *μ<sup>e</sup>* is its chemical potential. *Ni*(**X**) is the mean number of water molecules in the first hydration layer that directly contact to the atoms in *Hi*, *μ<sup>i</sup>* is the chemical potential.

Let *M***<sup>X</sup>** (see FIGURE 3) be the molecular surface for the conformation **X**, defining *M***<sup>X</sup>** *<sup>i</sup>* ⊂ *M***<sup>X</sup>** as the set of points in *M***<sup>X</sup>** that are closer to atoms in *Hi* than to any atoms in *Hj*, *j* �= *i*. Then the geometric version of *G*(**X**) is:

$$G(\mathbf{X}) = \nu\_{\varepsilon} \mu\_{\varepsilon} V(\Omega \mathbf{x}) + d\_{\text{w}} \nu\_{\varepsilon} \mu\_{\varepsilon} A(M \mathbf{x}) + \sum\_{i=1}^{H} \nu\_{i} \mu\_{i} A(M \mathbf{x}\_{i}), \quad \nu\_{\varepsilon}, \nu\_{i} > 0,\tag{2}$$

where *V*(Ω**X**) is the volume of the domain Ω**<sup>X</sup>** enclosed by *M***X**, *dw* the diameter of a water molecule, and *A*(*M***X**) and *A*(*M***X***<sup>i</sup>* ) the areas of *M***<sup>X</sup>** and *M***X***<sup>i</sup>* , *νe*[*V*(Ω**X**) + *dw A*(*M***X**)] = *Ne*, *νiA*(*M***X***<sup>i</sup>* ) = *Ni*(**X**), 1 ≤ *i* ≤ *H*. The *ν<sup>e</sup>* and *ν<sup>i</sup>* are independent of **X**, they are the average numbers of particles per unit volume and area.

Before the actual derivation is given, some basic facts should be stated, such as hydrophobicity, protein structures, and the environment in which the protein folds. Brief description of the methods in the experimental measurements and theoretical derivation of the Gibbs free energy of the protein folding is introduced to give the motivation and idea of the derivation. By making critics on the previous derivation, the necessary concepts would be clarified, what are important in the derivation would be identified, and would set the thermodynamic system that most fit the reality currently known about the protein folding process. Then both classical and quantum statistical derivations were given, the only difference is that in the classical statistically derived formula, the volume and the whole surface area terms in formula (2) are missing. Thus it is that only quantum statistical method gives us the volume and whole surface terms in formula (2). After the derivations, some remarks are made. A direct application of the Gibbs free energy formula (2) is the *ab initio* prediction of proteins' natives structures. Gradient formulas of *G*(**X**) are given to be able to apply the Newton's fastest descending method. Finally, it should be emphasized that the gradient �*G*(**X**) not only can be used to predict the native structure, it is actually the force that forces the proteins to fold as stated in Ben-Naim (2012). In Appendix, integrated gradient formulas of *G*(**X**) on the molecular surface are given.

## **2. Proteins**

2 Will-be-set-by-IN-TECH

48 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

takes stage. Once synthesized, the peptide chain of a protein spontaneously (some need the help of other proteins and molecules) fold to its native structure. This process is called **protein folding**. At this stage, everything is governed by simple but fundamental physical laws.

The **protein folding problem** then can be roughly divided into three aspects: 1. folding process: such that how fast a peptide chain folds, what are the intermediate structures between the initial shape and the native structure. 2. the mechanics of the folding, such as what is the deriving force. 3. the most direct application to biological study is the prediction of the native structure of a protein from its peptide chain. All three parts of the protein folding problem can have a unified treatment: writing down the Gibbs free energy formula *G*(**X**) for any conformation **<sup>X</sup>** = (**x**1, ··· , **<sup>x</sup>***i*, ··· , **<sup>x</sup>***M*) <sup>∈</sup> **<sup>R</sup>**3*<sup>M</sup>* of protein, where **<sup>x</sup>***<sup>i</sup>* <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> is the atom **<sup>a</sup>***i*'s

The fundamental law for protein folding is the **Thermodynamic Principle**: the amino acid sequence of a protein determines its native structure and the native structure of the protein has the minimum Gibbs free energy among all possible conformations as stated in Anfinsen (1973). Let **X** be a conformation of a protein, is there a natural Gibbs free energy function *G*(**X**)? The answer must be positive, as G. N. Lewis said in 1933: "There can be no doubt but that in quantum mechanics one has the complete solution to the problems of chemistry." (quoted from Bader (1990), page 130.) Protein folding is a problem in biochemistry, why such a formula *G*(**X**) has not been found and what is the formula? This chapter is trying to give the

First, the Gibbs free energy formula is given, it has two versions, the chemical balance version

Atoms in a protein are classified into classes *Hi*, 1 ≤ *i* ≤ *H*, according to their levels of

where *Ne*(**X**) is the mean number of electrons in the space included by the first hydration shell of **X**, *μ<sup>e</sup>* is its chemical potential. *Ni*(**X**) is the mean number of water molecules in the first hydration layer that directly contact to the atoms in *Hi*, *μ<sup>i</sup>* is the chemical potential.

Let *M***<sup>X</sup>** (see FIGURE 3) be the molecular surface for the conformation **X**, defining *M***<sup>X</sup>** *<sup>i</sup>* ⊂ *M***<sup>X</sup>** as the set of points in *M***<sup>X</sup>** that are closer to atoms in *Hi* than to any atoms in *Hj*, *j* �= *i*. Then

where *V*(Ω**X**) is the volume of the domain Ω**<sup>X</sup>** enclosed by *M***X**, *dw* the diameter of a water

) the areas of *M***<sup>X</sup>** and *M***X***<sup>i</sup>*

) = *Ni*(**X**), 1 ≤ *i* ≤ *H*. The *ν<sup>e</sup>* and *ν<sup>i</sup>* are independent of **X**, they are the average

*H* ∑ *i*=1

> *H* ∑ *i*=1

*νiμiA*(*M***X***<sup>i</sup>*

*μiNi*(**X**), (1)

), *νe*, *ν<sup>i</sup>* > 0, (2)

, *νe*[*V*(Ω**X**) + *dw A*(*M***X**)] = *Ne*,

hydrophobicity. The formula has two versions, the chemical balance version is:

*G*(**X**) = *μeNe*(**X**) +

*G*(**X**) = *νeμeV*(Ω**X**) + *dwνeμeA*(*M***X**) +

atomic center.

answers.

**1.1. The formula**

(1) and the geometric version (2).

the geometric version of *G*(**X**) is:

molecule, and *A*(*M***X**) and *A*(*M***X***<sup>i</sup>*

numbers of particles per unit volume and area.

*νiA*(*M***X***<sup>i</sup>*

#### **2.1. Amino acids**

There are 20 different amino acids that appear in natural proteins. All amino acids have a common part, or the **back bone** consisting of 9 atoms in FIGURE 1 (except the R).

NH2 is the amino group and COOH is the carboxyl group of the back bone. Single amino acid is in polar state, so the amino group gains one more hydrogen from the carboxyl group, or perhaps the amino group losses one electron to the carboxyl group. Geometrically it is irrelevant since after forming peptide bonds the amino group will loss one H to become NH and the carboxyl group will loss one OH to become CO. Thus an amino acid in the sequence is also called a **residue**.

**Figure 1.** An generic amino acid.

The group R in FIGURE 1 is called **side chain**, it distinguishes the 20 different amino acids. A side chain can be as small as a single hydrogen atom as in Glycine, or as large as consisting of 18 atoms including two rings as in Tryptophan. 15 amino acids have side chains that contain more than 7 atoms, i.e., more atoms than that of the back bone in an amino acid sequence. Except Glycine, a C*<sup>β</sup>* carbon in a side chain forms a covalent bond with the **central carbon** C*<sup>α</sup>* of the back bone.

#### 4 Will-be-set-by-IN-TECH 50 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

## **2.2. Hydrogen bonds**

A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom (the **accepter**), like nitrogen, oxygen or fluorine (thus the name "hydrogen bond", which must not be confused with a covalent bond to hydrogen). The hydrogen must be covalently bonded to another electronegative atom (forming a **donor group**) to create the hydrogen bond. These bonds can occur between molecules (intermolecular), or within different parts of a single molecule (intramolecular). The hydrogen bond is stronger than the van der Waals interaction, but weaker than covalent or ionic bond. Hydrogen bond occurs in both inorganic molecules such as water and organic molecules such as DNA, RNA, and proteins.

for Protein Folding 5

Gibbs Free Energy Formula for Protein Folding 51

assembly of these secondary structures, connected by turns and irregular loops, is called the **tertiary structure**. For proteins having multiple amino acid sequences or structurally associated with other molecules there are also **quaternary structures**, see Branden and Tooze

The secondary structures are local structures, they are usually in helix, strand, and turn. A common feature of them is that they have regular geometric arrangement of their main chain atoms, such that there are good opportunities to form hydrogen bonds between different residues. Several strands may form sheet, stabilized by regular pattern of hydrogen bonds. Turns and loops are necessary for the extended long chain to transfer to a sphere like shape. Turns are short, 3 or 4 residues long. Loops involves many residues, but without any regular pattern of hydrogen bonds. Loops often form the working place of the protein, therefore

**Figure 2.** *P***<sup>X</sup>** is a bunch of overlapping balls, called the space-filling model, or CPK model.

A thermodynamic system consists of particles in a region <sup>Γ</sup> <sup>⊂</sup> **<sup>R</sup>**<sup>3</sup> and a bath or environment surrounding it. A wall, usually the boundary *∂*Γ separating the system with its surrounding.

**3. Some functions in thermodynamics**

(1999) and Finkelstein and Ptitsyn (2002).

appear on the out surface of the native structure.

Some amino acids' side chains contain hydrogen bond donors or acceptors that can form hydrogen bond with either other side chains in the same protein (intramolecular hydrogen bond) or with surrounding water molecules (inter-molecular hydrogen bond). Those amino acids whose side chains do not contain either donors or acceptors of hydrogen bond are classified as hydrophobic.

## **2.3. Hydrophobicity levels**

Every atom in a protein belongs to a moiety or atom group, according to the moiety's level of ability to form hydrogen bond, the atom is assigned a hydrophobicity level. All the hydrophobicity scales are tested or theorized in some aspects of individual amino acid, either as a independent molecule or as a residue in a protein, in various environments such as solvent, PH value, temperature, pressure, etc. That is just like taking a snap shot of an object with complicated shape. All snap shots are different if taking from different angles of view. Therefore, there are many different classifications of hydrophobicity, for example, in Eisenberg and McLachlan (1986) there are five classes, C, O/N, O−, N+, S. Let a protein have *M* atoms {**a**1, ··· , **a***i*, ··· , **a***M*}. One can assume that there are *H* hydrophobic classes, such that {**a**1, ··· , **<sup>a</sup>***i*, ··· , **<sup>a</sup>***M*} <sup>=</sup> <sup>∪</sup>*<sup>H</sup> <sup>i</sup>*=1*Hi*.

## **2.4. Protein structures**

Let a molecule have *M* atoms, listed as (**a**1, ··· , **a***i*, ··· , **a***M*). A presentation of a structure **X** of this molecule is a series atomic centers (nuclear centers) of the atoms **<sup>a</sup>***i*, **<sup>x</sup>***<sup>i</sup>* <sup>∈</sup> **<sup>R</sup>**3. Hence it can be written as a point in **<sup>R</sup>**3*M*, **<sup>X</sup>** = (**x**1, ··· , **<sup>x</sup>***i*, ··· , **<sup>x</sup>***M*). The space **<sup>R</sup>**3*<sup>M</sup>* then is called the *control space*. The real shape of the structure **X** is realized in **R**3, called the *behavior space* as defined in Bader (1990), it is a bunch of overlapping balls (spheres), *<sup>P</sup>***<sup>X</sup>** <sup>=</sup> <sup>∪</sup>*<sup>M</sup> <sup>i</sup>*=1*B*(**x***i*,*ri*), where *ri* is the van der Waals radius of the atom **<sup>a</sup>***<sup>i</sup>* and *<sup>B</sup>*(**x**,*r*) is the closed ball {**<sup>y</sup>** : <sup>|</sup>**<sup>y</sup>** <sup>−</sup> **<sup>x</sup>**| ≤ *<sup>r</sup>*} ⊂ **<sup>R</sup>**3, of center **x** and radius *r*.

Protein native structures are complicated. Unlike the famous double-helix structure of DNA structure, the only general pattern for protein structure is no pattern at all. To study the native structures of proteins people divide the structures in different levels and make structure classifications.

The amino acid sequence of a protein is called its **primary structure**. Regular patterns of local (along the sequences) structures such as helix, strand, and turn are called the **secondary structure** which contain many intramolecular hydrogen bonds in regular patterns. The global assembly of these secondary structures, connected by turns and irregular loops, is called the **tertiary structure**. For proteins having multiple amino acid sequences or structurally associated with other molecules there are also **quaternary structures**, see Branden and Tooze (1999) and Finkelstein and Ptitsyn (2002).

4 Will-be-set-by-IN-TECH

50 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom (the **accepter**), like nitrogen, oxygen or fluorine (thus the name "hydrogen bond", which must not be confused with a covalent bond to hydrogen). The hydrogen must be covalently bonded to another electronegative atom (forming a **donor group**) to create the hydrogen bond. These bonds can occur between molecules (intermolecular), or within different parts of a single molecule (intramolecular). The hydrogen bond is stronger than the van der Waals interaction, but weaker than covalent or ionic bond. Hydrogen bond occurs in both inorganic

Some amino acids' side chains contain hydrogen bond donors or acceptors that can form hydrogen bond with either other side chains in the same protein (intramolecular hydrogen bond) or with surrounding water molecules (inter-molecular hydrogen bond). Those amino acids whose side chains do not contain either donors or acceptors of hydrogen bond are

Every atom in a protein belongs to a moiety or atom group, according to the moiety's level of ability to form hydrogen bond, the atom is assigned a hydrophobicity level. All the hydrophobicity scales are tested or theorized in some aspects of individual amino acid, either as a independent molecule or as a residue in a protein, in various environments such as solvent, PH value, temperature, pressure, etc. That is just like taking a snap shot of an object with complicated shape. All snap shots are different if taking from different angles of view. Therefore, there are many different classifications of hydrophobicity, for example, in Eisenberg and McLachlan (1986) there are five classes, C, O/N, O−, N+, S. Let a protein have *M* atoms {**a**1, ··· , **a***i*, ··· , **a***M*}. One can assume that there are *H* hydrophobic classes, such

Let a molecule have *M* atoms, listed as (**a**1, ··· , **a***i*, ··· , **a***M*). A presentation of a structure **X** of this molecule is a series atomic centers (nuclear centers) of the atoms **<sup>a</sup>***i*, **<sup>x</sup>***<sup>i</sup>* <sup>∈</sup> **<sup>R</sup>**3. Hence it can be written as a point in **<sup>R</sup>**3*M*, **<sup>X</sup>** = (**x**1, ··· , **<sup>x</sup>***i*, ··· , **<sup>x</sup>***M*). The space **<sup>R</sup>**3*<sup>M</sup>* then is called the *control space*. The real shape of the structure **X** is realized in **R**3, called the *behavior space* as defined in

van der Waals radius of the atom **<sup>a</sup>***<sup>i</sup>* and *<sup>B</sup>*(**x**,*r*) is the closed ball {**<sup>y</sup>** : <sup>|</sup>**<sup>y</sup>** <sup>−</sup> **<sup>x</sup>**| ≤ *<sup>r</sup>*} ⊂ **<sup>R</sup>**3, of

Protein native structures are complicated. Unlike the famous double-helix structure of DNA structure, the only general pattern for protein structure is no pattern at all. To study the native structures of proteins people divide the structures in different levels and make structure

The amino acid sequence of a protein is called its **primary structure**. Regular patterns of local (along the sequences) structures such as helix, strand, and turn are called the **secondary structure** which contain many intramolecular hydrogen bonds in regular patterns. The global

*<sup>i</sup>*=1*B*(**x***i*,*ri*), where *ri* is the

molecules such as water and organic molecules such as DNA, RNA, and proteins.

**2.2. Hydrogen bonds**

classified as hydrophobic.

**2.3. Hydrophobicity levels**

that {**a**1, ··· , **<sup>a</sup>***i*, ··· , **<sup>a</sup>***M*} <sup>=</sup> <sup>∪</sup>*<sup>H</sup>*

**2.4. Protein structures**

center **x** and radius *r*.

classifications.

*<sup>i</sup>*=1*Hi*.

Bader (1990), it is a bunch of overlapping balls (spheres), *<sup>P</sup>***<sup>X</sup>** <sup>=</sup> <sup>∪</sup>*<sup>M</sup>*

The secondary structures are local structures, they are usually in helix, strand, and turn. A common feature of them is that they have regular geometric arrangement of their main chain atoms, such that there are good opportunities to form hydrogen bonds between different residues. Several strands may form sheet, stabilized by regular pattern of hydrogen bonds. Turns and loops are necessary for the extended long chain to transfer to a sphere like shape. Turns are short, 3 or 4 residues long. Loops involves many residues, but without any regular pattern of hydrogen bonds. Loops often form the working place of the protein, therefore appear on the out surface of the native structure.

**Figure 2.** *P***<sup>X</sup>** is a bunch of overlapping balls, called the space-filling model, or CPK model.

## **3. Some functions in thermodynamics**

A thermodynamic system consists of particles in a region <sup>Γ</sup> <sup>⊂</sup> **<sup>R</sup>**<sup>3</sup> and a bath or environment surrounding it. A wall, usually the boundary *∂*Γ separating the system with its surrounding. If no energy and matter can be exchanged through the wall, the system is an **isolated system**. If only energy can be exchanged, the system is a **closed system**. If both energy and matter can be exchanged with the surrounding, the system is an **open system**.

For an open system Γ of variable particles contacting with surrounding thermal and particle bath, let *U*, *T*, *S*, *P*, *V*, *μ* and *N* be the inner energy, temperature, entropy, pressure, volume, chemical potential, and the number of particles of the system Γ respectively, then

$$\mathbf{d}\mathbf{d}I = T\mathbf{d}S - P\mathbf{d}V + \mu \mathbf{d}N,\tag{3}$$

for Protein Folding 7

Protein folding studies the structure of the protein molecule, what is the native structure and why and how the protein folds to it. All these aspects are specific properties of a particle, the protein molecule. To get the Gibbs free energy formula *G*(**X**) for each conformation **X**, statistical mechanics is needed with careful specification of the thermodynamic system.

Statistical mechanics uses ensembles of all microscopic states under the same macroscopic character, for example, all microscopic states corresponding to the same energy *E*. The

where *β* = 1/*kT*, *k* the Boltzmann constant and *T* the temperature. If there are only a series

Various of thermodynamic quantities, such as the inner energy of the system, can be put as

∑<sup>∞</sup>

If only the Halminltonian *<sup>H</sup>*(**q**, **<sup>p</sup>**) is known, where **<sup>q</sup>** = (**q**1, ··· , **<sup>q</sup>***i*, ··· , **<sup>q</sup>***N*) <sup>∈</sup> <sup>Γ</sup>*<sup>N</sup>* is the position of the *<sup>N</sup>* particles in the thermodynamic system <sup>Γ</sup> <sup>⊂</sup> **<sup>R</sup>**<sup>3</sup> under study, and **p** = (**p**1, ··· , **p***i*, ··· , **p***N*) momentums of these particles, the *canonical phase-space density* of

where *N*! is the Gibbs corrector because that the particles in the system is indistinguishable. Z(*T*, *V*, *N*) is called the *canonic partition function*, it depends on the system's temperature *T*, volume *V*, and particle number *N*. Note that under the assumption of the canonic ensemble, they are all fixed for the fixed thermodynamical system Γ. Especially, *V* = *V*(Γ) =

*<sup>T</sup>* [�*H*� <sup>+</sup> *kT* ln <sup>Z</sup>(*T*, *<sup>V</sup>*, *<sup>N</sup>*)] . (13)

*F* = *U* − *TS* = −*kT* ln Z(*T*, *V*, *N*), *G* = *PV* + *F* = *PV* − *kT* ln Z(*T*, *V*, *N*). (14)

From which the Helmholtz free energy *F* = *F*(Γ) and the Gibbs free energy *G* = *G*(Γ) are

*<sup>n</sup>*=<sup>1</sup> exp(−*βEn*)

*<sup>i</sup>*=<sup>1</sup> *Ei* exp(−*βEi*)

**<sup>R</sup>**3*<sup>N</sup>* exp[−*βH*(**q**, **<sup>p</sup>**)]d**p***<sup>N</sup>* <sup>=</sup> exp[−*βH*(**q**, **<sup>p</sup>**)]

*pi* <sup>=</sup> exp(−*βEi*) ∑<sup>∞</sup>

energy levels *E*1, *E*2, ··· , then the probability distribution for canonic ensemble is

*<sup>U</sup>* <sup>=</sup> �*Ei*� <sup>=</sup> <sup>∑</sup><sup>∞</sup>

*pc*(**q**, **<sup>p</sup>**) = exp[−*βH*(**q**, **<sup>p</sup>**)]

<sup>Γ</sup>*<sup>N</sup>* <sup>d</sup>**q***<sup>N</sup>*

*N*!*h*3*<sup>N</sup>*

 Γ*<sup>N</sup>* d**q***<sup>N</sup>* **R**3*<sup>N</sup>*

1 *N*!*h*3*<sup>N</sup>* 

implicitly set that <sup>Γ</sup> <sup>⊂</sup> **<sup>R</sup>**<sup>3</sup> has a volume.

*<sup>S</sup>* <sup>=</sup> *<sup>S</sup>*(Γ) = �−*<sup>k</sup>* ln *pc*� <sup>=</sup> *<sup>k</sup>*

*pE* ∝ exp(−*βE*), (9)

. (10)

Gibbs Free Energy Formula for Protein Folding 53

<sup>Z</sup>(*T*, *<sup>V</sup>*, *<sup>N</sup>*) . (12)

<sup>Γ</sup> d**q**

*<sup>n</sup>*=<sup>1</sup> exp(−*βEn*) . (11)

[*βH*(**q**, **<sup>p</sup>**) + ln <sup>Z</sup>(*T*, *<sup>V</sup>*, *<sup>N</sup>*)]*pc*(**q**, **<sup>p</sup>**)d**p***<sup>N</sup>*

**4.1. The canonic ensemble**

the means:

the system then is

Then the entropy *S* is

<sup>=</sup> <sup>1</sup>

obtained,

probability of this ensemble then is proportional to

By Legendre transformations various extensive quantities can be derived,

$$F = \mathcal{U} - TS, \quad G = \mathcal{U} - TS + PV, \quad \phi = F - \mu N = \mathcal{U} - TS - \mu N \tag{4}$$

where *F*, *G*, and *φ* are Helmholtz, Gibbs free energies, and thermodynamic potential respectively. Then

$$\mathbf{d}^2 \mathbf{d}F = -\mathbf{S} \mathbf{d}T - P \mathbf{d}V + \mu \mathbf{d}N, \quad \mathbf{d}G = -\mathbf{S} \mathbf{d}T + V \mathbf{d}P + \mu \mathbf{d}N, \quad \mathbf{d}\phi = -\mathbf{S} \mathbf{d}T - P \mathbf{d}V - N \mathbf{d}\mu. \tag{5}$$

Which shows that *U* = *U*(*S*, *V*, *N*), *F* = *F*(*T*, *V*, *N*), *G* = *G*(*T*, *P*, *N*), *φ* = *φ*(*T*, *V*, *μ*). All **extensive** quantities satisfy a linear homogeneous relation, i.e., consider a scaling transformation which enlarges the actual amount of matter by a factor *λ*, then all extensive quantities are multiplied by a factor *λ*. *U*, *S*, *V*, *N*, *F*, *G*, *φ* are extensive, while *T*, *P*, *μ* are **intensive**. Thus

$$
\lambda \mathcal{U} = \mathcal{U}(\lambda \mathcal{S}, \lambda V, \lambda \mathcal{N}), \quad \lambda F = F(T, \lambda V, \lambda \mathcal{N}), \quad \lambda G = G(T, P, \lambda \mathcal{N}), \quad \lambda \phi(T, V, \mu) = \phi(T, \lambda V, \mu). \tag{6}
$$

From equations in (5) ( *∂φ <sup>∂</sup><sup>V</sup>* )*T*,*<sup>μ</sup>* = −*P*. By equations in (6)

$$\phi = \frac{\mathbf{d}(\lambda \phi)}{\mathbf{d}\lambda} = V \left(\frac{\partial \phi}{\partial V}\right)\_{T,\mu} = -PV \tag{7}$$

and

$$
\phi(T, V, \mu) = -PV.\tag{8}
$$

Equation (8) is true for any open thermodynamics system.

#### **4. Statistical mechanics**

Thermodynamics is a phenomenological theory of macroscopic phenomena that neglects the individual properties of particles in a system. Statistical mechanics is the bridge between the macroscopic and microscopic behavior. In statistical mechanics, the particles in a system obey either classical or quantum dynamic laws, and the macroscopic quantities are statistical averages of the corresponding microscopic quantities. If the particles obey classical dynamical law, it is the classical statistical mechanics. If the particles obey quantum dynamical law, it is the quantum statistical mechanics. But the averaging to get macroscopic quantities from microscopic ones are in the same principle and formality.

Protein folding studies the structure of the protein molecule, what is the native structure and why and how the protein folds to it. All these aspects are specific properties of a particle, the protein molecule. To get the Gibbs free energy formula *G*(**X**) for each conformation **X**, statistical mechanics is needed with careful specification of the thermodynamic system.

### **4.1. The canonic ensemble**

6 Will-be-set-by-IN-TECH

If no energy and matter can be exchanged through the wall, the system is an **isolated system**. If only energy can be exchanged, the system is a **closed system**. If both energy and matter can

For an open system Γ of variable particles contacting with surrounding thermal and particle bath, let *U*, *T*, *S*, *P*, *V*, *μ* and *N* be the inner energy, temperature, entropy, pressure, volume,

where *F*, *G*, and *φ* are Helmholtz, Gibbs free energies, and thermodynamic potential

d*F* = −*S*d*T* − *P*d*V* + *μ*d*N*, d*G* = −*S*d*T* + *V*d*P* + *μ*d*N*, d*φ* = −*S*d*T* − *P*d*V* − *N*d*μ*. (5)

Which shows that *U* = *U*(*S*, *V*, *N*), *F* = *F*(*T*, *V*, *N*), *G* = *G*(*T*, *P*, *N*), *φ* = *φ*(*T*, *V*, *μ*). All **extensive** quantities satisfy a linear homogeneous relation, i.e., consider a scaling transformation which enlarges the actual amount of matter by a factor *λ*, then all extensive quantities are multiplied by a factor *λ*. *U*, *S*, *V*, *N*, *F*, *G*, *φ* are extensive, while *T*, *P*, *μ* are

*λU* = *U*(*λS*, *λV*, *λN*), *λF* = *F*(*T*, *λV*, *λN*), *λG* = *G*(*T*, *P*, *λN*), *λφ*(*T*, *V*, *μ*) = *φ*(*T*, *λV*, *μ*).

 *∂φ ∂V* 

Thermodynamics is a phenomenological theory of macroscopic phenomena that neglects the individual properties of particles in a system. Statistical mechanics is the bridge between the macroscopic and microscopic behavior. In statistical mechanics, the particles in a system obey either classical or quantum dynamic laws, and the macroscopic quantities are statistical averages of the corresponding microscopic quantities. If the particles obey classical dynamical law, it is the classical statistical mechanics. If the particles obey quantum dynamical law, it is the quantum statistical mechanics. But the averaging to get macroscopic quantities from

*T*,*μ*

*<sup>∂</sup><sup>V</sup>* )*T*,*<sup>μ</sup>* = −*P*. By equations in (6)

<sup>d</sup>*<sup>λ</sup>* <sup>=</sup> *<sup>V</sup>*

*<sup>φ</sup>* <sup>=</sup> <sup>d</sup>(*λφ*)

Equation (8) is true for any open thermodynamics system.

microscopic ones are in the same principle and formality.

*F* = *U* − *TS*, *G* = *U* − *TS* + *PV*, *φ* = *F* − *μN* = *U* − *TS* − *μN* (4)

d*U* = *T*d*S* − *P*d*V* + *μ*d*N*, (3)

52 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

(6)

= −*PV* (7)

*φ*(*T*, *V*, *μ*) = −*PV*. (8)

chemical potential, and the number of particles of the system Γ respectively, then

By Legendre transformations various extensive quantities can be derived,

respectively. Then

**intensive**. Thus

and

From equations in (5) ( *∂φ*

**4. Statistical mechanics**

be exchanged with the surrounding, the system is an **open system**.

Statistical mechanics uses ensembles of all microscopic states under the same macroscopic character, for example, all microscopic states corresponding to the same energy *E*. The probability of this ensemble then is proportional to

$$p\_E \propto \exp(-\beta E),\tag{9}$$

where *β* = 1/*kT*, *k* the Boltzmann constant and *T* the temperature. If there are only a series energy levels *E*1, *E*2, ··· , then the probability distribution for canonic ensemble is

$$p\_i = \frac{\exp(-\beta E\_i)}{\sum\_{n=1}^{\infty} \exp(-\beta E\_n)}.\tag{10}$$

Various of thermodynamic quantities, such as the inner energy of the system, can be put as the means:

$$\langle \mathcal{U} = \langle E\_i \rangle = \frac{\sum\_{i=1}^{\infty} E\_i \exp(-\beta E\_i)}{\sum\_{n=1}^{\infty} \exp(-\beta E\_n)}. \tag{11}$$

If only the Halminltonian *<sup>H</sup>*(**q**, **<sup>p</sup>**) is known, where **<sup>q</sup>** = (**q**1, ··· , **<sup>q</sup>***i*, ··· , **<sup>q</sup>***N*) <sup>∈</sup> <sup>Γ</sup>*<sup>N</sup>* is the position of the *<sup>N</sup>* particles in the thermodynamic system <sup>Γ</sup> <sup>⊂</sup> **<sup>R</sup>**<sup>3</sup> under study, and **p** = (**p**1, ··· , **p***i*, ··· , **p***N*) momentums of these particles, the *canonical phase-space density* of the system then is

$$p\_c(\mathbf{q}, \mathbf{p}) = \frac{\exp[-\beta H(\mathbf{q}, \mathbf{p})]}{\frac{1}{N! h^{\rm IN}} \int\_{\Gamma^{\rm N}} \mathbf{d} \mathbf{q}^N \int\_{\mathbb{R}^{3N}} \exp[-\beta H(\mathbf{q}, \mathbf{p})] \mathbf{d} \mathbf{p}^N} = \frac{\exp[-\beta H(\mathbf{q}, \mathbf{p})]}{\mathcal{Z}(T, V, N)}. \tag{12}$$

where *N*! is the Gibbs corrector because that the particles in the system is indistinguishable. Z(*T*, *V*, *N*) is called the *canonic partition function*, it depends on the system's temperature *T*, volume *V*, and particle number *N*. Note that under the assumption of the canonic ensemble, they are all fixed for the fixed thermodynamical system Γ. Especially, *V* = *V*(Γ) = <sup>Γ</sup> d**q** implicitly set that <sup>Γ</sup> <sup>⊂</sup> **<sup>R</sup>**<sup>3</sup> has a volume.

Then the entropy *S* is

$$S = S(\Gamma) = \langle -k\ln p\_{\boldsymbol{\zeta}} \rangle = \frac{k}{N!h^{3N}} \int\_{\Gamma^N} \mathbf{d} \mathbf{q}^N \int\_{\mathbb{R}^{3N}} \left[ \beta H(\mathbf{q}, \mathbf{p}) + \ln Z(T, V, N) \right] p\_{\boldsymbol{\zeta}}(\mathbf{q}, \mathbf{p}) d\mathbf{p}^N$$

$$= \frac{1}{T} \left[ \langle H \rangle + kT \ln Z(T, V, N) \right]. \tag{13}$$

From which the Helmholtz free energy *F* = *F*(Γ) and the Gibbs free energy *G* = *G*(Γ) are obtained,

$$F = \mathcal{U} - TS = -kT \ln \mathcal{Z}(T, V, N), \quad G = PV + F = PV - kT \ln \mathcal{Z}(T, V, N). \tag{14}$$

Therefore, to obtain the Gibbs free energy one has to really calculate ln Z(*T*, *V*, *N*), a task that often cannot be done.

#### **4.2. The grand canonic ensemble**

The grand canonic ensemble or macroscopic ensemble deals with an open thermodynamic system Γ, i.e., not only energy can be exchanged, matter particles can also be exchanged between Γ and environment. Therefore, the particle number *N* in Γ is variable.

In classical mechanics, suppose that the phase space is (**q**, **<sup>p</sup>**) <sup>∈</sup> <sup>Γ</sup>*<sup>N</sup>* <sup>×</sup> **<sup>R</sup>**3*N*. Let *<sup>H</sup>* be the Hamiltonian, the grand canonic phase-space density is

$$p\_{\mathcal{S}^c}(\mathbf{q}, \mathbf{p}, N) = \frac{\exp[-\beta(H - \mu N)]}{\sum\_{N=0}^{\infty} \frac{1}{N! \mathbb{R}^N} \int\_{\Gamma^N} \mathbf{d} \mathbf{q}^N \int\_{\mathbb{R}^N} \exp[-\beta(H(\mathbf{q}, \mathbf{p}) - \mu N) \mathbf{d} \mathbf{p}^N]} = \frac{\exp[-\beta(H - \mu N)]}{\mathcal{Z}(T, V, \mu)},\tag{15}$$

where *V* = *V*(Γ) is the volume of the system. By definition the entropy is

$$S(\Gamma) = \langle -k\ln p\_{\mathcal{S}^c} \rangle = k \sum\_{N=0}^{\infty} \int\_{\Gamma^N} \mathbf{d} \mathbf{q}^N \int\_{\mathbb{R}^{3N}} \{\beta[H(\mathbf{q}, \mathbf{p}) - \mu N] + \ln \mathcal{Z}\} p\_{\mathcal{S}^c}(\mathbf{q}, \mathbf{p}) d\mathbf{p}^N$$

$$= \frac{1}{T} \left[ \langle H \rangle - \mu \langle N \rangle + kT \ln \mathcal{Z} \right]. \tag{16}$$

Here �*H*� = *U* is the inner energy of the system Γ, �*N*� = *N*(Γ) is the mean number of particles in Γ. More importantly, the function −*kT* ln Z(*T*, *V*, *μ*) is nothing but the grand canonic potential *φ*, from equation (8) it is just −*PV*. Thus

$$G = \mathcal{U} + PV - TS = \mu \langle N \rangle. \tag{17}$$

for Protein Folding 9

To apply the Thermodynamic Principle in the research of protein folding, it is necessary to know the Gibbs free energy formula *G*(**X**) for each conformation **X**. Until now, theoretical derivation of *G*(**X**) is unsuccessful and rarely being tried. Most knowledge of the Gibbs free

The basic principle of experimentally measuring �*G*, the difference in Gibbs free energy between the native and the denatured structures of a protein is as follows. For protein molecules in a solution, the criterion of the protein is in the native structure is that it performes its biological function, otherwise the protein is denatured or not in the native structure. The level of biological function indicates the degree of the denaturation. Let *B* be the native structure, denote its molar concentration as [*B*]. Denote *A* as an non-native structure of the

Three things to be borne in mind: 1. the environment is the physiological environment or similar one such that the protein can spontaneously fold; 2: individual molecule cannot be directly measured, so the measuring is in per mole term, *R* = *NAk* instead of *k* should be used, where *NA* is the Avogadro's number; 3: the environment in reality has constant pressure *P*, hence the enthalpy *H* = *U* + *PV* can replace the inner energy *U*, where *V* is the volume of the

As expressed in (9), the probabilities of the protein takes the conformations *A* and *B* are

where *HA* = *UA* + *PV* and *HB* = *UB* + *PV* are the enthalpy per mole for *A* and *B*, *WA* (*WB*) is the number of ways of the enthalpy *HA* (*HB*) can be achieved by microscopic states. The quantities [*A*] and [*B*] are assumed to be measurable in experiment. Therefore their ration

To see that equation (19) is true, note that the ratio *K* is equal to the ratio *pA*/*pB* and the

But in reality, the ratio *K* is measurable in experiment is only theoretical, since in physiological

There is no way to change the native structure *B* to *A* while keeping the environment unchanged. In experiments, one has to change the environment to get the protein denatured,

*RT* <sup>−</sup> *HA RT*

= *HA* − *HB* − *T*(*R* ln *WA* − *R* ln *WB*) = *HA* − *HB* − *T*(*SA* − *SB*)

<sup>=</sup> *HA* <sup>−</sup> *TSA* <sup>−</sup> (*HB* <sup>−</sup> *TSB*) = *GA* <sup>−</sup> *GB* <sup>=</sup> �*Go*. (20)

<sup>=</sup> <sup>−</sup>*RT HB*

environment *K* ∼= 0, i.e., almost all protein molecules take the native structure *B*.

, *PB* <sup>∝</sup> *WB* exp

<sup>−</sup> *HB RT*

� *<sup>G</sup><sup>o</sup>* <sup>=</sup> <sup>−</sup>*RT* ln *<sup>K</sup>*. (19)

− *RT*(ln *WA* − ln *WB*)

, (18)

Gibbs Free Energy Formula for Protein Folding 55

<sup>−</sup> *HA RT*

energy of protein folding comes from experiment observations.

same protein in the solution and [*A*] its molar concentration.

*pA* <sup>∝</sup> *WA* exp

entropies per mole are *SA* = *R* ln *WA*, *SB* = *R* ln *WB*, therefore

*pB*

**5.1. Experimental measuring of** �*G*

system (it is a subset of the whole **R**3).

*K* = [*A*]/[*B*] is also measurable. Then

<sup>−</sup>*RT* ln *<sup>K</sup>* <sup>=</sup> <sup>−</sup>*RT* ln *pA*

## **5. Experimental measuring and theoretical derivation of the Gibbs free energy of protein folding**

The newly synthesized peptide chain of a protein automatically folds to its native structure in the physiological environment. Change of environment will make a protein denatured, i.e., the protein no longer performs its biological function. The facts that denaturation does not change the protein molecule, that the only thing changed is its structure, was first theorized by Hisen Wu based on his own extensive experiments, Hisen Wu (1931). It was found that after removing the agents that caused the change of environment, some protein can automatically retake its native structure, this is called renaturation or refolding. After many experiments in denaturation and renaturation, Anfinsen summarized the Thermodynamic Principle as the fundamental law of the protein folding, Anfinsen (1973). Anfinsen's work actually show that protein refolds spontaneously after removing denaturation agents. Therefore, in the physiological or similar environment, the native structure has the minimum Gibbs free energy; and in a changed environment, the denatured structure(s) will have the smaller Gibbs free energy. The Thermodynamic Principle of protein folding then is the general thermodynamics law, if a change happens spontaneously, then the end state will have smaller Gibbs free energy than the initial state.

To apply the Thermodynamic Principle in the research of protein folding, it is necessary to know the Gibbs free energy formula *G*(**X**) for each conformation **X**. Until now, theoretical derivation of *G*(**X**) is unsuccessful and rarely being tried. Most knowledge of the Gibbs free energy of protein folding comes from experiment observations.

## **5.1. Experimental measuring of** �*G*

8 Will-be-set-by-IN-TECH

Therefore, to obtain the Gibbs free energy one has to really calculate ln Z(*T*, *V*, *N*), a task that

The grand canonic ensemble or macroscopic ensemble deals with an open thermodynamic system Γ, i.e., not only energy can be exchanged, matter particles can also be exchanged

In classical mechanics, suppose that the phase space is (**q**, **<sup>p</sup>**) <sup>∈</sup> <sup>Γ</sup>*<sup>N</sup>* <sup>×</sup> **<sup>R</sup>**3*N*. Let *<sup>H</sup>* be the

Here �*H*� = *U* is the inner energy of the system Γ, �*N*� = *N*(Γ) is the mean number of particles in Γ. More importantly, the function −*kT* ln Z(*T*, *V*, *μ*) is nothing but the grand

**5. Experimental measuring and theoretical derivation of the Gibbs free**

The newly synthesized peptide chain of a protein automatically folds to its native structure in the physiological environment. Change of environment will make a protein denatured, i.e., the protein no longer performs its biological function. The facts that denaturation does not change the protein molecule, that the only thing changed is its structure, was first theorized by Hisen Wu based on his own extensive experiments, Hisen Wu (1931). It was found that after removing the agents that caused the change of environment, some protein can automatically retake its native structure, this is called renaturation or refolding. After many experiments in denaturation and renaturation, Anfinsen summarized the Thermodynamic Principle as the fundamental law of the protein folding, Anfinsen (1973). Anfinsen's work actually show that protein refolds spontaneously after removing denaturation agents. Therefore, in the physiological or similar environment, the native structure has the minimum Gibbs free energy; and in a changed environment, the denatured structure(s) will have the smaller Gibbs free energy. The Thermodynamic Principle of protein folding then is the general thermodynamics law, if a change happens spontaneously, then the end state will have smaller

*<sup>T</sup>* [�*H*� − *<sup>μ</sup>*�*N*� <sup>+</sup> *kT* ln <sup>Z</sup>] . (16)

**<sup>R</sup>**3*<sup>N</sup>* exp[−*β*(*H*(**q**, **<sup>p</sup>**) <sup>−</sup> *<sup>μ</sup>N*]d**p***<sup>N</sup>* <sup>=</sup> exp[−*β*(*<sup>H</sup>* <sup>−</sup> *<sup>μ</sup>N*)]

54 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

{*β*[*H*(**q**, **<sup>p</sup>**) <sup>−</sup> *<sup>μ</sup>N*] + ln Z }*pgc*(**q**, **<sup>p</sup>**)d**p***<sup>N</sup>*

*G* = *U* + *PV* − *TS* = *μ*�*N*�. (17)

<sup>Z</sup>(*T*, *<sup>V</sup>*, *<sup>μ</sup>*) ,

(15)

between Γ and environment. Therefore, the particle number *N* in Γ is variable.

often cannot be done.

**4.2. The grand canonic ensemble**

∑<sup>∞</sup> *N*=0

*S*(Γ) = �−*k* ln *pgc*� = *k*

<sup>=</sup> <sup>1</sup>

**energy of protein folding**

Gibbs free energy than the initial state.

Hamiltonian, the grand canonic phase-space density is

*pgc*(**q**, **<sup>p</sup>**, *<sup>N</sup>*) = exp[−*β*(*<sup>H</sup>* <sup>−</sup> *<sup>μ</sup>N*)]

<sup>Γ</sup>*<sup>N</sup>* <sup>d</sup>**q***<sup>N</sup>*

 Γ*<sup>N</sup>* d**q***<sup>N</sup>* **R**3*<sup>N</sup>*

∞ ∑ *N*=0

canonic potential *φ*, from equation (8) it is just −*PV*. Thus

where *V* = *V*(Γ) is the volume of the system. By definition the entropy is

1 *N*!*h*3*<sup>N</sup>*  The basic principle of experimentally measuring �*G*, the difference in Gibbs free energy between the native and the denatured structures of a protein is as follows. For protein molecules in a solution, the criterion of the protein is in the native structure is that it performes its biological function, otherwise the protein is denatured or not in the native structure. The level of biological function indicates the degree of the denaturation. Let *B* be the native structure, denote its molar concentration as [*B*]. Denote *A* as an non-native structure of the same protein in the solution and [*A*] its molar concentration.

Three things to be borne in mind: 1. the environment is the physiological environment or similar one such that the protein can spontaneously fold; 2: individual molecule cannot be directly measured, so the measuring is in per mole term, *R* = *NAk* instead of *k* should be used, where *NA* is the Avogadro's number; 3: the environment in reality has constant pressure *P*, hence the enthalpy *H* = *U* + *PV* can replace the inner energy *U*, where *V* is the volume of the system (it is a subset of the whole **R**3).

As expressed in (9), the probabilities of the protein takes the conformations *A* and *B* are

$$p\_A \propto W\_A \exp\left(-\frac{H\_A}{RT}\right), \quad P\_B \propto W\_B \exp\left(-\frac{H\_B}{RT}\right),\tag{18}$$

where *HA* = *UA* + *PV* and *HB* = *UB* + *PV* are the enthalpy per mole for *A* and *B*, *WA* (*WB*) is the number of ways of the enthalpy *HA* (*HB*) can be achieved by microscopic states. The quantities [*A*] and [*B*] are assumed to be measurable in experiment. Therefore their ration *K* = [*A*]/[*B*] is also measurable. Then

$$
\triangle G^{\circ} = -RT\ln K.\tag{19}
$$

To see that equation (19) is true, note that the ratio *K* is equal to the ratio *pA*/*pB* and the entropies per mole are *SA* = *R* ln *WA*, *SB* = *R* ln *WB*, therefore

$$-RT\ln K = -RT\ln\frac{p\_A}{p\_B} = -RT\left(\frac{H\_B}{RT} - \frac{H\_A}{RT}\right) - RT(\ln W\_A - \ln W\_B)$$

$$= H\_A - H\_B - T(R\ln W\_A - R\ln W\_B) = H\_A - H\_B - T(S\_A - S\_B)$$

$$= H\_A - TS\_A - (H\_B - TS\_B) = G\_A - G\_B = \triangle G^\circ. \tag{20}$$

But in reality, the ratio *K* is measurable in experiment is only theoretical, since in physiological environment *K* ∼= 0, i.e., almost all protein molecules take the native structure *B*.

There is no way to change the native structure *B* to *A* while keeping the environment unchanged. In experiments, one has to change the environment to get the protein denatured,

#### 10 Will-be-set-by-IN-TECH 56 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

that is, to change its shape from the native structure *B* to another conformation *A*. Heating the solution is a simple way to change the environment, during the heating, the system absorbs an amount of heat *H*, the system's temperature increased from *T*<sup>0</sup> to *T*1. Then

$$\mathbf{G}(A, T\_1) - \mathbf{G}(B, T\_0) = f(H),\tag{21}$$

for Protein Folding 11

One attempt to theoretically get the Gibbs free energy formula from canonic ensemble is summarized by Lazaridis and Karplus (2003), the theoretical part of it is reported below and why it is not successful will be briefly pointed out. Their notations such as **R** = **X** as conformation, *A* = *F* as the Helmholtz free energy, *Q* = Z as the partition function, Λ = *h*,

Treating the protein folding system as the set of all conformations plus surrounding water molecules with a phase point (**R**,**r**), where **r** are coordinates of *N* water molecules plus their

where *mm* means interactions inside the protein, *mw* between protein and water molecules, and *ww* water to water, all in the atomic level. Triplet interactions *mmm*, *mmw*, etc., can also

*<sup>N</sup>*!Λ3*M*Λ3*<sup>N</sup>* <sup>=</sup> *<sup>Z</sup>*

exp(−*βH*)d**r***N*d**R***<sup>M</sup>*

To separate the contributions made by water molecules and the conformations, the *effective*

*<sup>W</sup>*(**R**) = *Hmm*(**R**) + *<sup>X</sup>*(**R**) = *Hmm*(**R**) <sup>−</sup> *kT* ln�exp(−*βHmw*)�*<sup>o</sup>* <sup>≡</sup> *Hmm*(**R**) + �*G*slv(**R**). (28)

The term �*G*slv(**R**) is called the *solvation free energy* while *Hmm* is the *intra-macromolecular*

exp(−*βHww*)d**r***<sup>N</sup>*

because the interior coordinates has only 3*M* − 6 dimension, the integration of the remaining 6 dimension over the system getting the value *V*8*π*2, implying that each **x***<sup>i</sup>* in **R** can be any point in the system that has volume *V*. As usual, the probability of finding the system at the

exp(−*βHmw* <sup>−</sup> *<sup>β</sup>Hww*)d**r***<sup>N</sup>*

be considered, but for simplicity only take the pairwise atomic interactions.

exp(−*βH*)d**r***N*d**R***<sup>M</sup>*

Applying the canonic ensemble, the canonic partition function is

*H* = *Hmm* + *Hmw* + *Hww*, (24)

*<sup>N</sup>*!Λ3*M*Λ3*<sup>N</sup>* ,

exp(−*βHww*)d**r***<sup>N</sup>* <sup>=</sup> exp(−*βHmm*) exp(−*βX*), (26)

exp(−*βHww*)d**r***<sup>N</sup>* . (27)

exp(−*βW*)d**q**, (29)

+ *kT* ln(*N*!Λ3*M*Λ3*N*). (25)

Gibbs Free Energy Formula for Protein Folding 57

exp(−*βHmw*) exp(−*βHww*)d**r***<sup>N</sup>*

etc., will be kept in this section.

orientations. The Hamiltonian *H* can be decomposed as

*Q* =

�exp(−*βHmw*)�*<sup>o</sup>* =

After changing **R** to interior coordinates **q**, it is stated that

*Z* = *V*8*π*<sup>2</sup>

and the Helmholtz free energy is given by

*energy W* is defined,

Define

*energy*.

exp(−*βW*) = exp(−*βHmm*)

The effective energy *W*(**R**) is:

*<sup>A</sup>* <sup>=</sup> <sup>−</sup>*kT* ln *<sup>Q</sup>* <sup>=</sup> <sup>−</sup>*kT* ln

where *f*(*H*) is a function depending on *H* and its value is obtained from experiment. What really needed is

$$
\triangle \mathcal{G} = \mathcal{G}(A, T\_0) - \mathcal{G}(\mathcal{B}, T\_0). \tag{22}
$$

To get �*G*, interpolation to equation (21) is used to estimate the value in *T*0. Other methods of changing environment face the same problem, i.e., interpolation has neither theoretical nor observation basis.

Equation (19) may give the reason why �*G* is used whenever referring the Gibbs free energy. For experiment, only �*G* can be got. In theoretical derivation, this rule no longer to be followed and moreover, without a base structure to compare to, the notation �*G* will look strange.

More importantly, it should be emphasized again, that the Thermodynamic Principle really says that in the physiological environment the native structure has the minimum Gibbs free energy; and in other environment, the native structure no longer has the minimum Gibbs free energy. Summarizing, it is

$$G(B, T\_0) < G(A, T\_0), \quad G(A, T\_1) < G(B, T\_1). \tag{23}$$

It should always keep in mind that before comparison, first clarify the environment.

When deriving the Gibbs free energy formula, the first thing is also to make clear what is the environment. Another reality that should be borne in mind is that during the protein folding process, the environment does not change.

Remember that after removing the denaturation agent some proteins will spontaneously refold to their native structure, this is called the refolding or renaturation. Distinguish the original protein folding problem and protein refolding problem is another important issue. Only in the refolding case, a theoretical derivation can make the environment change, for example, lower the temperature to the room temperature (around 300K). Some discussions on protein folding are really talking about refolding, because they start from changing the environment from nonphysological to physiological.

While experiment has no way to change the native structure without disturbing the environment, theory can play a role instead. Formulas (1) and (2) give us the chance to compare �*G*, as long as the accurate chemical potentials' values are known.

## **5.2. Theoretical consideration of the protein folding problem**

Protein folding is a highly practical field. Very few attention was paid to its theoretical part. For example, almost nobody has seriously considered the Gibbs free energy formula. Instead, all kinds of empirical models are tried in computer simulation, without any justification in fundamental principle.

One attempt to theoretically get the Gibbs free energy formula from canonic ensemble is summarized by Lazaridis and Karplus (2003), the theoretical part of it is reported below and why it is not successful will be briefly pointed out. Their notations such as **R** = **X** as conformation, *A* = *F* as the Helmholtz free energy, *Q* = Z as the partition function, Λ = *h*, etc., will be kept in this section.

Treating the protein folding system as the set of all conformations plus surrounding water molecules with a phase point (**R**,**r**), where **r** are coordinates of *N* water molecules plus their orientations. The Hamiltonian *H* can be decomposed as

$$H = H\_{mm} + H\_{mw} + H\_{ww\prime} \tag{24}$$

where *mm* means interactions inside the protein, *mw* between protein and water molecules, and *ww* water to water, all in the atomic level. Triplet interactions *mmm*, *mmw*, etc., can also be considered, but for simplicity only take the pairwise atomic interactions.

Applying the canonic ensemble, the canonic partition function is

$$Q = \frac{\int \exp(-\beta H) \mathbf{d} \mathbf{r}^N d\mathbf{R}^M}{N! \Lambda^{3M} \Lambda^{3N}} = \frac{Z}{N! \Lambda^{3M} \Lambda^{3N}}.$$

and the Helmholtz free energy is given by

$$A = -kT \ln Q = -kT \ln \left[ \int \exp(-\beta H) \mathbf{d} \mathbf{r}^{N} \mathbf{d} \mathbf{R}^{M} \right] + kT \ln(N! \Lambda^{3M} \Lambda^{3N}).\tag{25}$$

To separate the contributions made by water molecules and the conformations, the *effective energy W* is defined,

$$\exp(-\beta \mathcal{W}) = \exp(-\beta H\_{mm}) \frac{\int \exp(-\beta H\_{mm} - \beta H\_{ww}) \mathrm{d}\mathbf{r}^{N}}{\int \exp(-\beta H\_{ww}) \mathrm{d}\mathbf{r}^{N}} = \exp(-\beta H\_{mm}) \exp(-\beta \mathcal{X}), \tag{26}$$

Define

10 Will-be-set-by-IN-TECH

that is, to change its shape from the native structure *B* to another conformation *A*. Heating the solution is a simple way to change the environment, during the heating, the system absorbs

where *f*(*H*) is a function depending on *H* and its value is obtained from experiment. What

To get �*G*, interpolation to equation (21) is used to estimate the value in *T*0. Other methods of changing environment face the same problem, i.e., interpolation has neither theoretical nor

Equation (19) may give the reason why �*G* is used whenever referring the Gibbs free energy. For experiment, only �*G* can be got. In theoretical derivation, this rule no longer to be followed and moreover, without a base structure to compare to, the notation �*G* will look

More importantly, it should be emphasized again, that the Thermodynamic Principle really says that in the physiological environment the native structure has the minimum Gibbs free energy; and in other environment, the native structure no longer has the minimum Gibbs free

When deriving the Gibbs free energy formula, the first thing is also to make clear what is the environment. Another reality that should be borne in mind is that during the protein folding

Remember that after removing the denaturation agent some proteins will spontaneously refold to their native structure, this is called the refolding or renaturation. Distinguish the original protein folding problem and protein refolding problem is another important issue. Only in the refolding case, a theoretical derivation can make the environment change, for example, lower the temperature to the room temperature (around 300K). Some discussions on protein folding are really talking about refolding, because they start from changing the

While experiment has no way to change the native structure without disturbing the environment, theory can play a role instead. Formulas (1) and (2) give us the chance to

Protein folding is a highly practical field. Very few attention was paid to its theoretical part. For example, almost nobody has seriously considered the Gibbs free energy formula. Instead, all kinds of empirical models are tried in computer simulation, without any justification in

compare �*G*, as long as the accurate chemical potentials' values are known.

**5.2. Theoretical consideration of the protein folding problem**

It should always keep in mind that before comparison, first clarify the environment.

*G*(*A*, *T*1) − *G*(*B*, *T*0) = *f*(*H*), (21)

56 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

� *G* = *G*(*A*, *T*0) − *G*(*B*, *T*0). (22)

*G*(*B*, *T*0) < *G*(*A*, *T*0), *G*(*A*, *T*1) < *G*(*B*, *T*1). (23)

an amount of heat *H*, the system's temperature increased from *T*<sup>0</sup> to *T*1. Then

really needed is

observation basis.

energy. Summarizing, it is

fundamental principle.

process, the environment does not change.

environment from nonphysological to physiological.

strange.

$$\langle \exp(-\beta H\_{mw}) \rangle\_o = \frac{\int \exp(-\beta H\_{mw}) \exp(-\beta H\_{ww}) \mathrm{d}\mathbf{r}^N}{\int \exp(-\beta H\_{ww}) \mathrm{d}\mathbf{r}^N}. \tag{27}$$

The effective energy *W*(**R**) is:

$$\mathcal{W}(\mathbf{R}) = H\_{\rm mm}(\mathbf{R}) + X(\mathbf{R}) = H\_{\rm mm}(\mathbf{R}) - kT \ln \langle \exp(-\beta H\_{\rm mm}) \rangle\_0 \equiv H\_{\rm mm}(\mathbf{R}) + \triangle G^{\rm SV}(\mathbf{R}). \tag{28}$$

The term �*G*slv(**R**) is called the *solvation free energy* while *Hmm* is the *intra-macromolecular energy*.

After changing **R** to interior coordinates **q**, it is stated that

$$Z = V8\pi^2 \int \exp(-\beta H\_{ww}) \mathrm{d}\mathbf{r}^N \int \exp(-\beta W) \mathrm{d}\mathbf{q}\_{\prime} \tag{29}$$

because the interior coordinates has only 3*M* − 6 dimension, the integration of the remaining 6 dimension over the system getting the value *V*8*π*2, implying that each **x***<sup>i</sup>* in **R** can be any point in the system that has volume *V*. As usual, the probability of finding the system at the

#### 12 Will-be-set-by-IN-TECH 58 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

configuration (**q**) is:

$$p(\mathbf{q}) = \frac{\exp[-\beta W(\mathbf{q})]}{\int \exp[-\beta W(\mathbf{q})] \mathbf{dq}}.\tag{30}$$

for Protein Folding 13

Gibbs Free Energy Formula for Protein Folding 59

More importantly, it is not just one conformation **R**, but all conformations of a single protein are considered in the derivation. As a single point **<sup>R</sup>** <sup>∈</sup> **<sup>R</sup>**3*M*, no structural features of the conformation **R** are considered, i.e., this particle is structureless. Remember that the research object is the conformation of the protein, we cannot treat them as structureless particles. Yes, classical derivations such as the ideal gas system are defined this way, that is because that the interest is not in the individual particle's structure but the macroscopic properties of the idea gas. The lesson then is that instead of considering all conformations together in a system, specific thermodynamic system has to be tailored for each individual conformation **R**. And such a system contains only one conformation **R**, with its structure geometry, and other particles such as water molecules, thus the Gibbs free energy of such a system will be

Perhaps the biggest lesson to be learned is that when solving a problem, one should concentrate on the specific features of the problem to design the ways to attack it, not just

The derivation of Lazaridis and Karplus (2003) gives the effective energy *W*(**R**) as some substitute of the Gibbs free energy without theoretic basis for its relation to the Thermodynamic Principle. Moreover, the formula *W*(**R**) tells us nothing of how to calculate it, all are buried in multiple-integrations without clear delimitation. Being the only function for individual conformation **R**, it was pointed out in Lazaridis and Karplus (2003) that "The function W defines a hyper-surface in the conformation space of the macromolecule in the presence of equilibrated solvent and, therefore, includes the solvation entropy. This hyper-surface is now often called an 'energy landscape'. It determines the thermodynamics and kinetics of macromolecular conformational transitions." From this comment it can be seen that the authors are not against individual quantities such as *W*(**R**) and think they are important to the study of protein folding. Changing the "effective energy" *W*(**R**) to the Gibbs free energy *G*(**R**), the comment really makes sense. The lesson should be learned is that never invent theoretical concepts without firm theoretical basis. Another one is that always keep in

From now on, the notation **X** = **R** will be used to represent a conformation. To put the Thermodynamic Principle in practice, not merely as a talking show, what really needed is *G*(**X**), the Gibbs free energy of each individual conformation **X**, not the effective energy *W*(**R**). One hopes that the formula *G*(**X**) should be calculable, not buried in multiple integrations. To get such a formula, the grand canonic ensemble and eventually the quantum statistics have to

**6. Necessary preparations for the derivation of the Gibbs free energy**

Summarizing what have learned from the critics of the derivation in Lazaridis and Karplus (2003), in any attempt of derivation of the Gibbs free energy formula one has to: 1. clearly state all assumptions used in the derivation; and 2. for each conformation **X**, set a thermodynamic

mind that useful Gibbs free energy formula should be calculable.

system T**<sup>X</sup>** associated with **X**; 3. use the grand canonic ensemble.

indexed by **R**, *G* = *G*(**R**).

be applied.

**formula**

imitate successful classical examples.

Consequently,

$$\int p(\mathbf{q}) \ln p(\mathbf{q}) d\mathbf{q} = -\ln Z + \ln \int \exp(-\beta H\_{\text{ww}}) d\mathbf{r}^{N} + \ln(V8\pi^{2}) - \beta \int p(\mathbf{q}) \mathcal{W}(\mathbf{q}) d\mathbf{q}, \tag{31}$$

From equation (25),

$$A = -kT \int \exp(-\beta H\_{ww}) \mathbf{d} \mathbf{r}^N + kT \ln\left(\frac{\Lambda^{3M}}{V8\pi^2}\right) + \int p(\mathbf{q}) W(\mathbf{q}) \mathbf{d}\mathbf{q} + kT \int p(\mathbf{q}) \ln p(\mathbf{q}) \mathbf{d}\mathbf{q}$$

$$= A^o + kT \ln\left(\frac{\Lambda^{3M}}{V8\pi^2}\right) + \langle W \rangle - TS^{\text{conf}},\tag{32}$$

where *<sup>A</sup><sup>o</sup>* <sup>=</sup> <sup>−</sup>*kT* exp(−*βHww*)d**r***<sup>N</sup>* is the pure Helmholtz free energy of pure solvent; the term <sup>−</sup>*TS*conf <sup>=</sup> *kT <sup>p</sup>*(**q**)ln *<sup>p</sup>*(**q**)d**<sup>q</sup>** is the contribution of the configurational entropy of the macromolecule to the free energy.

The Gibbs free energy is *G* = *A* + *PV*. Since the volume is thought negligible under ambient conditions so Gibbs and Helmholtz free energies are considered identical.

Now for any subset of *A* ⊂ Γ, integrals restricted on *A* gives the Helmholtz energy *AA*, i.e.,

$$A\_A = A^o + kT \ln \left(\frac{\Lambda^{3M}}{V 8\pi^2}\right) + \langle \mathcal{W} \rangle\_A - TS\_A^{\text{conf}}.\tag{33}$$

Thus for two different subsets *A* and *B*, the difference in the Helmholtz free energy is

$$
\begin{split}
\triangle A &= A\_B - A\_A = \langle \mathcal{W} \rangle\_B - \langle \mathcal{W} \rangle\_A - T(S\_B^{\text{conf}} - S\_A^{\text{conf}}) \\ &= \triangle \langle H\_{\text{mm}} \rangle + \triangle \langle \triangle G^{\text{slv}} \rangle - T \triangle S^{\text{conf}}.
\end{split}
\tag{34}
$$

Especially, "If A is the denatured state and B the native state, both of which have to be defined in some way and both of which include many configurations, Eq. (34) gives the free energy of folding."

#### **5.3. Critics of the derivation in Lazaridis and Karplus (2003)**

Protein folding is considered a very practical research field, dominating activities are computer simulations with empirical models. There are very few theoretical discussions about protein folding. This derivation in Lazaridis and Karplus (2003) is a rare example deserving an analysis to see why for decades there has been no theoretic progress in this field. Many lessons can be learned from this example.

One important lesson from the derivation Lazaridis and Karplus (2003) is that when dealing with thermodynamics and statistical mechanics, the thermodynamic system must be clearly defined. The system will occupy a space in **R**3, what is it? How to delimit it?

More importantly, it is not just one conformation **R**, but all conformations of a single protein are considered in the derivation. As a single point **<sup>R</sup>** <sup>∈</sup> **<sup>R</sup>**3*M*, no structural features of the conformation **R** are considered, i.e., this particle is structureless. Remember that the research object is the conformation of the protein, we cannot treat them as structureless particles. Yes, classical derivations such as the ideal gas system are defined this way, that is because that the interest is not in the individual particle's structure but the macroscopic properties of the idea gas. The lesson then is that instead of considering all conformations together in a system, specific thermodynamic system has to be tailored for each individual conformation **R**. And such a system contains only one conformation **R**, with its structure geometry, and other particles such as water molecules, thus the Gibbs free energy of such a system will be indexed by **R**, *G* = *G*(**R**).

12 Will-be-set-by-IN-TECH

*<sup>p</sup>*(**q**) = exp[−*βW*(**q**)]

 Λ3*<sup>M</sup> V*8*π*<sup>2</sup>  + 

where *<sup>A</sup><sup>o</sup>* <sup>=</sup> <sup>−</sup>*kT* exp(−*βHww*)d**r***<sup>N</sup>* is the pure Helmholtz free energy of pure solvent; the term <sup>−</sup>*TS*conf <sup>=</sup> *kT <sup>p</sup>*(**q**)ln *<sup>p</sup>*(**q**)d**<sup>q</sup>** is the contribution of the configurational entropy of the

The Gibbs free energy is *G* = *A* + *PV*. Since the volume is thought negligible under ambient

Now for any subset of *A* ⊂ Γ, integrals restricted on *A* gives the Helmholtz energy *AA*, i.e.,

 Λ3*<sup>M</sup> V*8*π*<sup>2</sup>

Thus for two different subsets *A* and *B*, the difference in the Helmholtz free energy is

�*<sup>A</sup>* <sup>=</sup> *AB* <sup>−</sup> *AA* <sup>=</sup> �*W*�*<sup>B</sup>* − �*W*�*<sup>A</sup>* <sup>−</sup> *<sup>T</sup>*(*S*conf

Especially, "If A is the denatured state and B the native state, both of which have to be defined in some way and both of which include many configurations, Eq. (34) gives the free energy of

Protein folding is considered a very practical research field, dominating activities are computer simulations with empirical models. There are very few theoretical discussions about protein folding. This derivation in Lazaridis and Karplus (2003) is a rare example deserving an analysis to see why for decades there has been no theoretic progress in this field. Many

One important lesson from the derivation Lazaridis and Karplus (2003) is that when dealing with thermodynamics and statistical mechanics, the thermodynamic system must be clearly

defined. The system will occupy a space in **R**3, what is it? How to delimit it?

exp(−*βHww*)d**r***<sup>N</sup>* <sup>+</sup> ln(*V*8*π*2) <sup>−</sup> *<sup>β</sup>*

conditions so Gibbs and Helmholtz free energies are considered identical.

*AA* = *A<sup>o</sup>* + *kT* ln

**5.3. Critics of the derivation in Lazaridis and Karplus (2003)**

lessons can be learned from this example.

exp[−*βW*(**q**)]d**q**. (30)

58 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

*p*(**q**)*W*(**q**)d**q** + *kT*

<sup>+</sup> �*W*� − *TS*conf, (32)

<sup>+</sup> �*W*�*<sup>A</sup>* <sup>−</sup> *TS*conf

<sup>=</sup> ��*Hmm*� <sup>+</sup> ���*G*slv� − *<sup>T</sup>* � *<sup>S</sup>*conf. (34)

*<sup>B</sup>* <sup>−</sup> *<sup>S</sup>*conf *<sup>A</sup>* )

*p*(**q**)*W*(**q**)d**q**, (31)

*p*(**q**)ln *p*(**q**)d**q**

*<sup>A</sup>* . (33)

configuration (**q**) is:

From equation (25),

= *A<sup>o</sup>* + *kT* ln

*p*(**q**)ln *p*(**q**)d**q** = − ln *Z* + ln

 Λ3*<sup>M</sup> V*8*π*<sup>2</sup>

macromolecule to the free energy.

exp(−*βHww*)d**r***<sup>N</sup>* <sup>+</sup> *kT* ln

Consequently,

*A* = −*kT*

folding."

Perhaps the biggest lesson to be learned is that when solving a problem, one should concentrate on the specific features of the problem to design the ways to attack it, not just imitate successful classical examples.

The derivation of Lazaridis and Karplus (2003) gives the effective energy *W*(**R**) as some substitute of the Gibbs free energy without theoretic basis for its relation to the Thermodynamic Principle. Moreover, the formula *W*(**R**) tells us nothing of how to calculate it, all are buried in multiple-integrations without clear delimitation. Being the only function for individual conformation **R**, it was pointed out in Lazaridis and Karplus (2003) that "The function W defines a hyper-surface in the conformation space of the macromolecule in the presence of equilibrated solvent and, therefore, includes the solvation entropy. This hyper-surface is now often called an 'energy landscape'. It determines the thermodynamics and kinetics of macromolecular conformational transitions." From this comment it can be seen that the authors are not against individual quantities such as *W*(**R**) and think they are important to the study of protein folding. Changing the "effective energy" *W*(**R**) to the Gibbs free energy *G*(**R**), the comment really makes sense. The lesson should be learned is that never invent theoretical concepts without firm theoretical basis. Another one is that always keep in mind that useful Gibbs free energy formula should be calculable.

From now on, the notation **X** = **R** will be used to represent a conformation. To put the Thermodynamic Principle in practice, not merely as a talking show, what really needed is *G*(**X**), the Gibbs free energy of each individual conformation **X**, not the effective energy *W*(**R**). One hopes that the formula *G*(**X**) should be calculable, not buried in multiple integrations. To get such a formula, the grand canonic ensemble and eventually the quantum statistics have to be applied.

## **6. Necessary preparations for the derivation of the Gibbs free energy formula**

Summarizing what have learned from the critics of the derivation in Lazaridis and Karplus (2003), in any attempt of derivation of the Gibbs free energy formula one has to: 1. clearly state all assumptions used in the derivation; and 2. for each conformation **X**, set a thermodynamic system T**<sup>X</sup>** associated with **X**; 3. use the grand canonic ensemble.

## **6.1. The assumptions**

All assumptions here are based on well-known facts of consensus among protein folding students. Let U be a protein with *M* atoms (**a**1, ··· , **a***i*, ··· , **a***M*). A structure (conformation) of <sup>U</sup> is a point **<sup>X</sup>** = (**x**1, ··· , **<sup>x</sup>***i*, ··· , **<sup>x</sup>***M*) <sup>∈</sup> **<sup>R</sup>**3*M*, **<sup>x</sup>***<sup>i</sup>* <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> is the atomic center (nuclear) position of **<sup>a</sup>***i*. Alternatively, the conformation **<sup>X</sup>** corresponds to a subset in **<sup>R</sup>**3, *<sup>P</sup>***<sup>X</sup>** <sup>=</sup> <sup>∪</sup>*<sup>M</sup> <sup>i</sup>*=1*B*(**x***i*,*ri*) <sup>⊂</sup> **<sup>R</sup>**<sup>3</sup> where *ri*'s are van der Waals radii.

for Protein Folding 15

**Figure 3.** Two dimensional presenting of molecular surface Richards (1977) and solvent accessible

Let *Ii* ⊂ {1, 2, ··· , *M*} be the subset such that **a***<sup>j</sup>* ∈ *Hi* if and only if *j* ∈ *Ii*. Define *P***<sup>X</sup>** *<sup>i</sup>* =

a volume *<sup>V</sup>*(Ω**X**). For *<sup>S</sup>* <sup>⊂</sup> **<sup>R</sup>**3, denote *<sup>S</sup>* as the closure of *<sup>S</sup>*. Define the hydrophobicity

Although the shape of each atom in a molecule is well defined by the theory of atoms in molecules as in Bader (1990) and Popelier (2000), what concerning us here is the overall

) ≤ dist(**x**, *P***X**\*P***<sup>X</sup>** *<sup>i</sup>*)}, 1 ≤ *i* ≤ *H*. (37)

Gibbs Free Energy Formula for Protein Folding 61

*V*(R**<sup>X</sup>** *<sup>i</sup>*), and for *i* �= *j*, *V*(R**<sup>X</sup>** *<sup>i</sup>* ∩ R**<sup>X</sup>** *<sup>j</sup>*) = 0. (38)

**<sup>X</sup>**. Note that *P***<sup>X</sup>** ⊂ Ω**<sup>X</sup>** and all nuclear centers of atoms in

*M***<sup>X</sup>** *<sup>i</sup>* = *M***<sup>X</sup>** ∩ R**<sup>X</sup>** *<sup>i</sup>*. (39)

*A*(*M***<sup>X</sup>** *<sup>i</sup>*), and if *i* �= *j*, then *A*(*M***<sup>X</sup>** *<sup>i</sup>* ∩ *M***<sup>X</sup>** *<sup>j</sup>*) = 0. (40)

**<sup>X</sup>**. Moreover, Ω**<sup>X</sup>** is bounded, therefore, has

**<sup>X</sup>** such that *∂*Ω**<sup>X</sup>** =

surface Lee and Richards (1971). This figure was originally in Fang and Jing (2010).

*H* ∑ *i*=1

Since *M***<sup>X</sup>** is a closed surface, it divides **R**<sup>3</sup> into two regions Ω**<sup>X</sup>** and Ω�

*H* ∑ *i*=1

*B*(**x***j*,*rj*) ⊂ *P***<sup>X</sup>** and as shown in FIGURE 4,

*<sup>i</sup>*=1R**<sup>X</sup>** *<sup>i</sup>*, *V*(R**X**) =

Let *<sup>V</sup>*(Ω) be the volume of <sup>Ω</sup> <sup>⊂</sup> **<sup>R</sup>**3, then

**<sup>X</sup>** <sup>=</sup> *<sup>M</sup>***<sup>X</sup>** and **<sup>R</sup>**<sup>3</sup> <sup>=</sup> <sup>Ω</sup>**<sup>X</sup>** <sup>∪</sup> *<sup>M</sup>***<sup>X</sup>** <sup>∪</sup> <sup>Ω</sup>�

the water molecules in R**<sup>X</sup>** are contained in Ω�

Let *<sup>A</sup>*(*S*) be the area of a surface *<sup>S</sup>* <sup>⊂</sup> **<sup>R</sup>**3, then

*<sup>i</sup>*=1*M***<sup>X</sup>** *<sup>i</sup>*, *A*(*M***X**) =

<sup>R</sup>**<sup>X</sup>** <sup>=</sup> <sup>∪</sup>*<sup>H</sup>*

subsurface *M***<sup>X</sup>** *<sup>i</sup>*, 1 ≤ *i* ≤ *H*, as

*<sup>M</sup>***<sup>X</sup>** <sup>=</sup> <sup>∪</sup>*<sup>H</sup>*

R**<sup>X</sup>** *<sup>i</sup>* = {**x** ∈ R**<sup>X</sup>** : dist(**x**, *P***X***<sup>i</sup>*

∪*j*∈*Ii*

*∂*Ω�


$$\begin{array}{ll} \boldsymbol{\varepsilon}\_{\text{ij}} \le |\mathbf{x}\_{i} - \mathbf{x}\_{j}|, & \text{no covalent bond between } \mathbf{a}\_{i} \text{ and } \mathbf{a}\_{j};\\ \boldsymbol{d}\_{\text{ij}} - \boldsymbol{\varepsilon}\_{\text{ij}} \le |\mathbf{x}\_{i} - \mathbf{x}\_{j}| \le \boldsymbol{d}\_{\text{ij}} + \boldsymbol{\varepsilon}\_{\text{ij}}, \boldsymbol{d}\_{\text{ij}} \text{ is the standard bond length between } \mathbf{a}\_{i} \text{ and } \mathbf{a}\_{j}. \end{array} \tag{35}$$

All conformations satisfying the steric conditions (35) will be denoted as X and in this chapter only **X** ∈ X will be considered.


## **6.2. The thermodynamic system** T**<sup>X</sup>**

Let *dw* be the diameter of a water molecule and *M***<sup>X</sup>** be the molecular surface of *P***<sup>X</sup>** as defined in Richards (1977) with the probe radius *dw*/2, see FIGURE 3. Define

$$\mathcal{R}\_{\mathbf{X}} = \{ \mathbf{x} \in \mathbb{R}^3 \colon \text{dist}(\mathbf{x}, M\_{\mathbf{X}}) \le d\_w \} \mid P\_{\mathbf{X}} \tag{36}$$

as the first hydration shell surrounding *<sup>P</sup>***X**, where dist(**x**, *<sup>S</sup>*) = inf**y**∈*<sup>S</sup>* |**<sup>x</sup>** − **<sup>y</sup>**|. Then T**<sup>X</sup>** = *P***<sup>X</sup>** ∪ R**<sup>X</sup>** will be our thermodynamic system of protein folding at the conformation **X**.

**Figure 3.** Two dimensional presenting of molecular surface Richards (1977) and solvent accessible surface Lee and Richards (1971). This figure was originally in Fang and Jing (2010).

Let *Ii* ⊂ {1, 2, ··· , *M*} be the subset such that **a***<sup>j</sup>* ∈ *Hi* if and only if *j* ∈ *Ii*. Define *P***<sup>X</sup>** *<sup>i</sup>* = ∪*j*∈*Ii B*(**x***j*,*rj*) ⊂ *P***<sup>X</sup>** and as shown in FIGURE 4,

$$\mathcal{R}\_{\mathbf{X}i} = \{ \mathbf{x} \in \mathcal{R}\_{\mathbf{X}} \colon \text{dist}(\mathbf{x}, P\_{\mathbf{X}\_i}) \le \text{dist}(\mathbf{x}, P\_{\mathbf{X}} \backslash P\_{\mathbf{X}i}) \}, \quad 1 \le i \le H. \tag{37}$$

Let *<sup>V</sup>*(Ω) be the volume of <sup>Ω</sup> <sup>⊂</sup> **<sup>R</sup>**3, then

14 Will-be-set-by-IN-TECH

60 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

All assumptions here are based on well-known facts of consensus among protein folding students. Let U be a protein with *M* atoms (**a**1, ··· , **a***i*, ··· , **a***M*). A structure (conformation) of <sup>U</sup> is a point **<sup>X</sup>** = (**x**1, ··· , **<sup>x</sup>***i*, ··· , **<sup>x</sup>***M*) <sup>∈</sup> **<sup>R</sup>**3*M*, **<sup>x</sup>***<sup>i</sup>* <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> is the atomic center (nuclear) position of

1. The proteins discussed here are monomeric, single domain, self folding globular proteins. 2. Therefore, in the case of our selected proteins, the environment of the protein folding, the physiological environment, is pure water, there are no other elements in the environment, no chaperonins, no co-factors, etc. This is a rational simplification, at least when one considers the environment as only the first hydration shell of a conformation, as in our

4. Anfinsen (1973) showed that before folding, the polypeptide chain already has its main chain's and each residue's covalent bonds correctly formed. Hence, our conformations should satisfy the following steric conditions set in Fang (2005) and Fang and Jing (2010): there are *ij* > 0, 1 ≤ *i* < *j* ≤ *M* such that for any two atoms **a***<sup>i</sup>* and **a***<sup>j</sup>* in *P***<sup>X</sup>** =

*dij* <sup>−</sup> *ij* ≤ |**x***<sup>i</sup>* <sup>−</sup> **<sup>x</sup>***j*| ≤ *dij* <sup>+</sup> *ij*, *dij* is the standard bond length between **<sup>a</sup>***<sup>i</sup>* and **<sup>a</sup>***j*. (35)

All conformations satisfying the steric conditions (35) will be denoted as X and in this

5. A water molecule is treated as a single particle centered at the oxygen nuclear position **<sup>w</sup>** <sup>∈</sup> **<sup>R</sup>**3, and the covalent bonds in it are fixed. In the Born-Oppenheimer approximation, only the conformation **X** is fixed, all particles, water molecules or electrons in the first

Let *dw* be the diameter of a water molecule and *M***<sup>X</sup>** be the molecular surface of *P***<sup>X</sup>** as defined

as the first hydration shell surrounding *<sup>P</sup>***X**, where dist(**x**, *<sup>S</sup>*) = inf**y**∈*<sup>S</sup>* |**<sup>x</sup>** − **<sup>y</sup>**|. Then T**<sup>X</sup>** =

*P***<sup>X</sup>** ∪ R**<sup>X</sup>** will be our thermodynamic system of protein folding at the conformation **X**.

<sup>R</sup>**<sup>X</sup>** <sup>=</sup> {**<sup>x</sup>** <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> : dist(**x**, *<sup>M</sup>***X**) <sup>≤</sup> *dw*} \ *<sup>P</sup>***<sup>X</sup>** (36)

6. As in section 2.3, there are *<sup>H</sup>* hydrophobic levels *Hi*, *<sup>i</sup>* <sup>=</sup> 1, ··· , *<sup>H</sup>*, such that <sup>∪</sup>*<sup>H</sup>*

in Richards (1977) with the probe radius *dw*/2, see FIGURE 3. Define

*ij* ≤ |**x***<sup>i</sup>* − **x***j*|, no covalent bond between **a***<sup>i</sup>* and **a***j*;

*<sup>i</sup>*=1*B*(**x***i*,*ri*) <sup>⊂</sup> **<sup>R</sup>**<sup>3</sup>

*<sup>i</sup>*=1*Hi* =

**<sup>a</sup>***i*. Alternatively, the conformation **<sup>X</sup>** corresponds to a subset in **<sup>R</sup>**3, *<sup>P</sup>***<sup>X</sup>** <sup>=</sup> <sup>∪</sup>*<sup>M</sup>*

**6.1. The assumptions**

where *ri*'s are van der Waals radii.

derivation of the *G*(**X**).

∪*M*

*<sup>k</sup>*=1*B*(**x***k*,*rk*),

3. During the folding, the environment does not change.

chapter only **X** ∈ X will be considered.

hydration shell of *P***X**, are moving.

**6.2. The thermodynamic system** T**<sup>X</sup>**

(**a**1, ··· , **a***i*, ··· , **a***M*).

$$\mathcal{R}\mathbf{x} = \cup\_{i=1}^{H} \mathcal{R}\mathbf{x}\_{i} \quad V(\mathcal{R}\mathbf{x}) = \sum\_{i=1}^{H} V(\mathcal{R}\mathbf{x}\_{i})\_{\prime} \text{ and for } i \neq j, \quad V(\mathcal{R}\mathbf{x}\_{i} \cap \mathcal{R}\mathbf{x}\_{j}) = 0. \tag{38}$$

Since *M***<sup>X</sup>** is a closed surface, it divides **R**<sup>3</sup> into two regions Ω**<sup>X</sup>** and Ω� **<sup>X</sup>** such that *∂*Ω**<sup>X</sup>** = *∂*Ω� **<sup>X</sup>** <sup>=</sup> *<sup>M</sup>***<sup>X</sup>** and **<sup>R</sup>**<sup>3</sup> <sup>=</sup> <sup>Ω</sup>**<sup>X</sup>** <sup>∪</sup> *<sup>M</sup>***<sup>X</sup>** <sup>∪</sup> <sup>Ω</sup>� **<sup>X</sup>**. Note that *P***<sup>X</sup>** ⊂ Ω**<sup>X</sup>** and all nuclear centers of atoms in the water molecules in R**<sup>X</sup>** are contained in Ω� **<sup>X</sup>**. Moreover, Ω**<sup>X</sup>** is bounded, therefore, has a volume *<sup>V</sup>*(Ω**X**). For *<sup>S</sup>* <sup>⊂</sup> **<sup>R</sup>**3, denote *<sup>S</sup>* as the closure of *<sup>S</sup>*. Define the hydrophobicity subsurface *M***<sup>X</sup>** *<sup>i</sup>*, 1 ≤ *i* ≤ *H*, as

$$M\_{\mathbf{X}\dot{i}} = M\_{\mathbf{X}} \cap \overline{\mathcal{R}\_{\mathbf{X}\dot{i}}}.\tag{39}$$

Let *<sup>A</sup>*(*S*) be the area of a surface *<sup>S</sup>* <sup>⊂</sup> **<sup>R</sup>**3, then

$$M\mathbf{X} = \cup\_{i=1}^{H} M\_{\mathbf{X}i\prime} \quad A(M\mathbf{x}) = \sum\_{i=1}^{H} A(M\_{\mathbf{X}i\prime}), \text{ and if } i \neq j, \text{ then } A(M\_{\mathbf{X}i\prime} \cap M\_{\mathbf{X}j\prime}) = 0. \tag{40}$$

Although the shape of each atom in a molecule is well defined by the theory of atoms in molecules as in Bader (1990) and Popelier (2000), what concerning us here is the overall

for Protein Folding 17

*i*=1 R*Ni* **X** *i* d**q***Ni* 

∑ *N*1+···+*NH* =*M*

*<sup>i</sup>*=<sup>1</sup> *μiNi*]}

*H* ∏ *i*=1

 R*Ni* **X** *i* d**q***Ni* **R**3*<sup>M</sup>*

*<sup>i</sup>*=<sup>1</sup> *μiNi*]}

*μiNi* − *βH***X**(**q**, **p**) − ln Z

*H* ∑ *i*=1

*νiA*(*M***<sup>X</sup>** *<sup>i</sup>*) = *Ni*(**X**), 1 ≤ *i* ≤ *H*. (45)

*μiνiA*(*M***<sup>X</sup>** *<sup>i</sup>*). (46)

<sup>Z</sup>(*T*, *<sup>V</sup>*, *<sup>μ</sup>*) . (41)

**<sup>R</sup>**3*<sup>M</sup>* exp{−*β*[*H***<sup>X</sup>** <sup>−</sup> <sup>∑</sup>*<sup>H</sup>*

ln *p***<sup>X</sup>** *p***X**d**p***<sup>M</sup>*

Gibbs Free Energy Formula for Protein Folding 63

*μNi*(**X**). (44)

*<sup>i</sup>*=<sup>1</sup> *<sup>μ</sup>iNi*]}d**p**3*<sup>M</sup>*

*p***X**d**p***<sup>M</sup>* (42)

(43)

*<sup>p</sup>***X**(**q**, **<sup>p</sup>**, *<sup>N</sup>*) = exp{−*β*[*H***X**(**q**, **<sup>p</sup>**) <sup>−</sup> <sup>∑</sup>*<sup>H</sup>*

∞ ∑ *M*=0

> *H* ∏ *i*=1

 R*Ni* **X** *i* d**q***Ni β H* ∑ *i*=1

*μi*�*Ni*� + *kT* ln Z(*T*, *V*, *μ*)

*μiNi*(T**X**) + *kT* ln Z(*T*, *V*, *μ*)

*G*(**X**) = *G*(T**X**) = *U*(**X**) + *PV*(T**X**) − *TS*(T**X**) =

where *U*(**X**) = *U*(T**X**) = �*H*� is the inner energy, *Ni*(**X**) = �*Ni*� the mean number of water molecules in R**<sup>X</sup>** *<sup>i</sup>*. By equation (8), *kT* ln Z(*T*, *V*, *μ*) = −*φ*(*T*, *V*, *μ*) = *PV*(T**X**). Therefore, from

The Gibbs free energy given in formula (44) does not involve any integration at all, just counting the number of water molecules contacting atoms in *Hi*. Furthermore, against the effective energy, potential function *Hmm* plays no role at all, a surprise indeed. But formula (44) also is not easy to calculate, counting the number of water molecules actually need more knowledge of the conformation's boundary, the molecular surface *M***X**. Formula (44) can be

Since every water molecule in R**<sup>X</sup>** *<sup>i</sup>* has contact with the surface *M***<sup>X</sup>** *<sup>i</sup>*, *Ni*(**X**) is proportional to

*H* ∑ *i*=1

*G*(**X**) = *G*(T**X**) =

*<sup>M</sup>*!*h*3*<sup>M</sup>* ∑∑ *Ni*=*<sup>M</sup>* <sup>∏</sup>*<sup>H</sup>*

1

<sup>=</sup> exp{−*β*[*H***X**(**q**, **<sup>p</sup>**) <sup>−</sup> <sup>∑</sup>*<sup>H</sup>*

∑ *N*1+···+*NH* =*M*

> *H* ∑ *i*=1

*H* ∑ *i*=1

∑<sup>∞</sup> *M*=0

The entropy *S*(**X**) = *S*(T**X**) is

= −*k*

<sup>=</sup> <sup>1</sup> *T*

<sup>=</sup> <sup>1</sup> *T*

*G* = *U* + *PV* − *TS*,

Substitute in (44),

*S* = �−*k* ln *p***X**� = −*k*

∞ ∑ *M*=0

�*H*� −

*U*(T**X**) −

directly transfered into a geometric version.

**7.1. Converting formula (44) to a geometric version**

the area *A*(*M***<sup>X</sup>** *<sup>i</sup>*). Therefore, there are *ν<sup>i</sup>* > 0, such that

**Figure 4.** Note that R**X***<sup>i</sup>* generally are not connected, i.e., having more than one block.

shape of the structure *P***X**. The cutoff of electron density *ρ* ≥ 0.001au in Bader (1990) and Popelier (2000) gives the overall shape of a molecular structure that is just like *P***X**, a bunch of overlapping balls. Moreover, the boundary of the *ρ* ≥ 0.001au cutoff is very similar to the molecular surface *M***<sup>X</sup>** which was defined by Richards (1977) and was shown has more physical meaning as the boundary surface of the conformation *P***<sup>X</sup>** in Tuñón *et. al.* (1992) and Jackson and Sternberg (1993).

## **7. Gibbs free energy formula: Classical statistical mechanics derivation**

The grand canonic ensemble or macroscopic ensemble will be applied to derive the desired Gibbs free energy formula *G*(**X**). In addition to let the number of water molecules vary, the assumptions is that the chemical potential *μ* will be different for water molecules contacting to different hydrophobicity levels *Hi* (or falling in R**<sup>X</sup>** *<sup>i</sup>*). Counting the numbers *Ni* of water molecules that contact to atoms in *Hi*, the *N* and *μ* in equation (15) should be modified to *<sup>N</sup>* = (*N*1, ··· , *Ni*, ··· , *NH*), *<sup>μ</sup>* = (*μ*1, ··· , *<sup>μ</sup>i*, ··· , *<sup>μ</sup>H*). Let (**q**, **<sup>p</sup>**) ⊂ R*<sup>M</sup>* **<sup>X</sup>** <sup>×</sup> **<sup>R</sup>**3*<sup>M</sup>* be the water molecules' phase space for a fixed *N*, where *M* = ∑*<sup>H</sup> <sup>i</sup>*=<sup>1</sup> *Ni*. Let *H***<sup>X</sup>** = *H***X**(**q**, **p**) be the Hamiltonian. The grand canonic phase density function will be

$$p\_{\mathbf{X}}(\mathbf{q}, \mathbf{p}, N) = \frac{\exp\{-\beta[H\_{\mathbf{X}}(\mathbf{q}, \mathbf{p}) - \sum\_{i=1}^{H} \mu\_{i} N\_{i}]\}}{\sum\_{\mathbf{M}=0}^{\infty} \frac{1}{M! \bar{n}^{\mathbf{M}}} \sum\_{N\_{i} = M} \prod\_{i=1}^{H} \int\_{\mathcal{R}\_{\mathbf{X}}^{N\_{i}}} \mathbf{dq}^{N\_{i}} \int\_{\mathbb{R}^{3M}} \exp\{-\beta[H\_{\mathbf{X}} - \sum\_{i=1}^{H} \mu\_{i} N\_{i}]\} \mathbf{dq}^{3M}}$$
 
$$= \frac{\exp\{-\beta[H\_{\mathbf{X}}(\mathbf{q}, \mathbf{p}) - \sum\_{i=1}^{H} \mu\_{i} N\_{i}]\}}{\mathcal{Z}(T, V, \mu)}. \tag{41}$$

The entropy *S*(**X**) = *S*(T**X**) is

16 Will-be-set-by-IN-TECH

M**<sup>X</sup>**

62 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

**<sup>X</sup>** <sup>×</sup> **<sup>R</sup>**3*<sup>M</sup>* be the

*<sup>i</sup>*=<sup>1</sup> *Ni*. Let *H***<sup>X</sup>** = *H***X**(**q**, **p**) be the

**RX i** P**<sup>X</sup>**

**RX i**

**Figure 4.** Note that R**X***<sup>i</sup>* generally are not connected, i.e., having more than one block.

shape of the structure *P***X**. The cutoff of electron density *ρ* ≥ 0.001au in Bader (1990) and Popelier (2000) gives the overall shape of a molecular structure that is just like *P***X**, a bunch of overlapping balls. Moreover, the boundary of the *ρ* ≥ 0.001au cutoff is very similar to the molecular surface *M***<sup>X</sup>** which was defined by Richards (1977) and was shown has more physical meaning as the boundary surface of the conformation *P***<sup>X</sup>** in Tuñón *et. al.* (1992) and

**7. Gibbs free energy formula: Classical statistical mechanics derivation**

to *<sup>N</sup>* = (*N*1, ··· , *Ni*, ··· , *NH*), *<sup>μ</sup>* = (*μ*1, ··· , *<sup>μ</sup>i*, ··· , *<sup>μ</sup>H*). Let (**q**, **<sup>p</sup>**) ⊂ R*<sup>M</sup>*

water molecules' phase space for a fixed *N*, where *M* = ∑*<sup>H</sup>*

Hamiltonian. The grand canonic phase density function will be

The grand canonic ensemble or macroscopic ensemble will be applied to derive the desired Gibbs free energy formula *G*(**X**). In addition to let the number of water molecules vary, the assumptions is that the chemical potential *μ* will be different for water molecules contacting to different hydrophobicity levels *Hi* (or falling in R**<sup>X</sup>** *<sup>i</sup>*). Counting the numbers *Ni* of water molecules that contact to atoms in *Hi*, the *N* and *μ* in equation (15) should be modified

**RX m**

**Water**

Jackson and Sternberg (1993).

$$S = \langle -k\ln p\_{\mathbf{X}} \rangle = -k\sum\_{M=0}^{\infty} \sum\_{\mathbf{N}\_1 + \dots + \mathbf{N}\_H = M} \prod\_{i=1}^{H} \int\_{\mathcal{R}\_{\mathbf{X}}^{\mathcal{N}\_i}} \mathbf{d} \mathbf{q}^{\mathcal{N}\_i} \int\_{\mathbb{R}^{3M}} \ln p\_{\mathbf{X}} p\_{\mathbf{X}} \mathbf{d} \mathbf{p}^M$$

$$= -k\sum\_{M=0}^{\infty} \sum\_{\mathbf{N}\_1 + \dots + \mathbf{N}\_H = M} \prod\_{i=1}^{H} \int\_{\mathcal{R}\_{\mathbf{X}}^{\mathcal{N}\_i}} \mathbf{d} \mathbf{q}^{\mathcal{N}\_i} \left[ \beta \sum\_{i=1}^H \mu\_i \mathbf{N}\_i - \beta H \mathbf{x}(\mathbf{q}, \mathbf{p}) - \ln Z \right] p\_{\mathbf{X}} \mathbf{d} \mathbf{p}^M \tag{42}$$

$$r \qquad \dots \qquad \dots \qquad \dots$$

$$\begin{aligned} \dot{\boldsymbol{\mu}} &= \frac{1}{T} \left[ \langle \boldsymbol{H} \rangle - \sum\_{i=1}^{H} \mu\_{i} \langle N\_{i} \rangle + kT \ln \mathcal{Z}(T, \boldsymbol{V}, \boldsymbol{\mu}) \right] \\ &= \frac{1}{T} \left[ \boldsymbol{U}(\mathcal{T}\_{\mathbf{X}}) - \sum\_{i=1}^{H} \mu\_{i} N\_{i}(\mathcal{T}\_{\mathbf{X}}) + kT \ln \mathcal{Z}(T, \boldsymbol{V}, \boldsymbol{\mu}) \right] \end{aligned} \tag{43}$$

where *U*(**X**) = *U*(T**X**) = �*H*� is the inner energy, *Ni*(**X**) = �*Ni*� the mean number of water molecules in R**<sup>X</sup>** *<sup>i</sup>*. By equation (8), *kT* ln Z(*T*, *V*, *μ*) = −*φ*(*T*, *V*, *μ*) = *PV*(T**X**). Therefore, from *G* = *U* + *PV* − *TS*,

$$G(\mathbf{X}) = G(\mathcal{T}\_\mathbf{X}) = U(\mathbf{X}) + PV(\mathcal{T}\_\mathbf{X}) - TS(\mathcal{T}\_\mathbf{X}) = \sum\_{i=1}^{H} \mu N\_i(\mathbf{X}).\tag{44}$$

The Gibbs free energy given in formula (44) does not involve any integration at all, just counting the number of water molecules contacting atoms in *Hi*. Furthermore, against the effective energy, potential function *Hmm* plays no role at all, a surprise indeed. But formula (44) also is not easy to calculate, counting the number of water molecules actually need more knowledge of the conformation's boundary, the molecular surface *M***X**. Formula (44) can be directly transfered into a geometric version.

#### **7.1. Converting formula (44) to a geometric version**

Since every water molecule in R**<sup>X</sup>** *<sup>i</sup>* has contact with the surface *M***<sup>X</sup>** *<sup>i</sup>*, *Ni*(**X**) is proportional to the area *A*(*M***<sup>X</sup>** *<sup>i</sup>*). Therefore, there are *ν<sup>i</sup>* > 0, such that

$$\upsilon\_i A(M\_{\mathbf{X}i}) = N\_i(\mathbf{X})\_\prime \quad 1 \le i \le H. \tag{45}$$

Substitute in (44),

$$G(\mathbf{X}) = G(\mathcal{T}\_\mathbf{X}) = \sum\_{i=1}^{H} \mu\_i \nu\_i A(M\_{\mathbf{X}i}). \tag{46}$$

For each conformation **X**, the molecular surface *M***<sup>X</sup>** is calculable, see Connolly (1983). The areas *A*(*M***X**) and *A*(*M***<sup>X</sup>** *<sup>i</sup>*) are also calculable. Therefore, unlike the formula given in (34), this formula is calculable. Moreover, our derivation theoretically justified the surface area models that will be discussed later, only difference is that the molecular surface area is used here instead of the solvent accessible surface area.

for Protein Folding 19

*Ne* ∑ *ν*=1 �2 *ν* ⎫ ⎬ ⎭

**<sup>X</sup>** *<sup>i</sup>* × T *Ne*

+ *V*ˆ(**X**, **W**, **E**). (48)

Gibbs Free Energy Formula for Protein Folding 65

**<sup>X</sup>**,*N*, then *<sup>H</sup>*<sup>ˆ</sup> **<sup>X</sup>***ψ***X**,*<sup>N</sup>*

. (49)

. (50)

�

*<sup>i</sup>* =

**<sup>X</sup>** ) = H**X**,*N*, 1 ≤ *i* < ∞, comprise an

��

*<sup>μ</sup>iN*ˆ*<sup>i</sup>* <sup>−</sup> *<sup>μ</sup>eN*ˆ*<sup>e</sup>*

*μiNi* − *μeNe*

*<sup>H</sup>*<sup>ˆ</sup> **<sup>X</sup>** <sup>−</sup> *<sup>φ</sup>*(**X**) <sup>−</sup>

���

��

*H* ∑ *i*=1

*<sup>μ</sup>iN*ˆ*<sup>i</sup>* <sup>−</sup> *<sup>μ</sup>eN*ˆ*<sup>e</sup>*

. (51)

approximation has the Hamiltonian

Greiner *et. al.* (1994) and Dai (2007)

**8.4. The Gibbs free energy** *G*(**X**)

<sup>=</sup> <sup>1</sup> *T*

<sup>=</sup> <sup>1</sup> *T* �

�

partition function is

system T**<sup>X</sup>** is

The eigenfunctions *ψ***X**,*<sup>N</sup>*

*Ei* **X**,*Nψ***X**,*<sup>N</sup> <sup>i</sup>* . *<sup>H</sup>*<sup>ˆ</sup> *<sup>X</sup>* <sup>=</sup> <sup>−</sup> *<sup>h</sup>*¯ <sup>2</sup>

2

*<sup>i</sup>* (**W**, **<sup>E</sup>**) <sup>∈</sup> *<sup>L</sup>*<sup>2</sup>

*<sup>ρ</sup>*ˆ**<sup>X</sup>** <sup>=</sup> exp �

exp[−*βφ*(**X**)] = Trace �

= ∑ *i*,*N*

*S*(**X**) = −*k*Trace(*ρ*ˆ**<sup>X</sup>** ln *ρ*ˆ**X**) = −*k*�ln *ρ*ˆ**<sup>X</sup>** � = *kβ*

�*H*<sup>ˆ</sup> **<sup>X</sup>**�−�*φ*(**X**)� −

*U*(**X**) − *φ*(**X**) −

exp �

*H* ∑ *i*=1

*H* ∑ *i*=1

⎧ ⎨ ⎩

1 *mw* *MH* ∑ *j*=1 �2 *<sup>j</sup>* + 1 *me*

0(∏*<sup>H</sup>*

orthonormal basis of <sup>H</sup>**X**,*N*. Denote their eigenvalues (energy levels) as *<sup>E</sup><sup>i</sup>*

**8.3. Grand partition function and grand canonic density operator**

−*β* � *<sup>H</sup>*<sup>ˆ</sup> **<sup>X</sup>** <sup>−</sup>

*<sup>i</sup>*=<sup>1</sup> <sup>R</sup>*Ni*

Since the numbers *Ni* and *Ne* vary, the grand canonic ensemble should be adopted. Let *μ<sup>i</sup>* be the chemical potentials, that is, the Gibbs free energy per water molecule in R**<sup>X</sup>** *<sup>i</sup>*. Let *μ<sup>e</sup>* be electron chemical potential. The grand canonic density operator is like in equation (15), or see

> *H* ∑ *i*=1

where *φ*(**X**) is the grand canonic potential *φ* in equation (8) with the index **X** and the grand

exp � −*β* � *<sup>H</sup>*<sup>ˆ</sup> **<sup>X</sup>** <sup>−</sup>

−*β* � *Ei* **X**,*N* −

As in equation (16), under the grand canonic ensemble the entropy *S*(**X**) = *S*(T**X**) of the

*<sup>μ</sup>i*�*N*ˆ*i*� − *<sup>μ</sup>e*�*N*ˆ*e*�

*μiNi*(**X**) − *μeNe*(**X**)

*<sup>μ</sup>iN*ˆ*<sup>i</sup>* <sup>−</sup> *<sup>μ</sup>eN*ˆ*<sup>e</sup>* <sup>−</sup> *<sup>φ</sup>*(**X**)

*H* ∑ *i*=1

*H* ∑ *i*=1

�

�

�

But still something is missing. That is, the volume *V*(T**X**), an important thermodynamic quantity, does not show here at all. It seems that no way to put the *V*(T**X**) here in the classical statistical mechanics. To resolve this, the quantum statistical mechanics is necessary.

## **8. A quantum statistical theory of protein folding**

In 1929 Dirac wrote: "The underlying physical laws necessary for the mathematical theory of ... the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble." (quoted from Bader (1990), page 132). Yes, the multidimensional Shrödinger equation for protein folding is beyond our ability to solve, no matter how fast and how powerful our computers are. But mathematical theory guarantees that there is a complete set of eigenvalues (energy levels) and eigenfunctions to the Shrödinger equation in the Born-Oppenheimer approximation. Then consider that in the statistical mechanics, ensembles collect all (energy) states of the same system. Although one cannot have exact solutions to the Shrödinger equation, the eigenvalues of it are theoretically known. Thus one can apply the grand canonical ensemble to obtain the desired Gibbs free energy formula *G*(**X**). This is the main idea of the derivation.

### **8.1. The Shrödinger equation**

For any conformation **<sup>X</sup>** <sup>∈</sup> <sup>X</sup>, let **<sup>W</sup>** = (**w**1, ··· , **<sup>w</sup>***i*, ··· , **<sup>w</sup>***N*) <sup>∈</sup> **<sup>R</sup>**3*<sup>N</sup>* be the nuclear centers of water molecules in <sup>R</sup>**<sup>X</sup>** and **<sup>E</sup>** = (**e**1, ··· , **<sup>e</sup>***i*, ··· , **<sup>e</sup>***L*) <sup>∈</sup> **<sup>R</sup>**3*<sup>L</sup>* be electronic positions of all electrons in T**X**. Then the Hamiltonian for the system T**<sup>X</sup>** is

$$\hat{H} = \hat{T} + \hat{V} = -\sum\_{i=1}^{M} \frac{\hbar^2}{2m\_i} \nabla\_i^2 - \frac{\hbar^2}{2m\_w} \sum\_{i=1}^{N} \nabla\_i^2 - \frac{\hbar^2}{2m\_e} \sum\_{i=1}^{L} \nabla\_i^2 + \hat{V}(\mathbf{X}, \mathbf{W}, \mathbf{E}), \tag{47}$$

where *mi* is the nuclear mass of atom **a***i*, *mw* and *me* are the masses of water molecule and electron; �<sup>2</sup> *<sup>i</sup>* is Laplacian in corresponding **<sup>R</sup>**3; and *<sup>V</sup>* the potential.

#### **8.2. The first step of the Born-Oppenheimer approximation**

Depending on the shape of *P***X**, for each *i*, 1 ≤ *i* ≤ *H*, the maximum numbers *N***<sup>X</sup>** *<sup>i</sup>* of water molecules contained in R**<sup>X</sup>** *<sup>i</sup>* vary. Theoretically all cases are considered, i.e., there are <sup>0</sup> ≤ *Ni* ≤ *<sup>N</sup>***<sup>X</sup>** *<sup>i</sup>* water molecules in R**<sup>X</sup>** *<sup>i</sup>*, 1 ≤ *<sup>i</sup>* ≤ *<sup>H</sup>*. Let *<sup>M</sup>*<sup>0</sup> = 0 and *Mi* = <sup>∑</sup>*j*≤*<sup>i</sup> Nj* and **<sup>W</sup>***<sup>i</sup>* = (**w***Mi*−<sup>1</sup>+1, ··· , **<sup>w</sup>***Mi*−<sup>1</sup>+*j*, ··· , **<sup>w</sup>***Mi* ) ∈ R*Ni* **X** *i* , 1 ≤ *i* ≤ *H*, and **W** = (**W**1, **W**2, ··· , **W***MH* ) ∈ ∏*<sup>H</sup> <sup>i</sup>*=<sup>1</sup> <sup>R</sup>*Ni* **<sup>X</sup>** *<sup>i</sup>* denote the nuclear positions of water molecules in R**X**. As well, there will be all possible numbers 0 <sup>≤</sup> *Ne* <sup>&</sup>lt; <sup>∞</sup> of electrons in <sup>T</sup>**X**. Let **<sup>E</sup>** = (**e**1, **<sup>e</sup>**2, ··· , **<sup>e</sup>***Ne* ) <sup>∈</sup> **<sup>R</sup>**3*Ne* denote their nuclear positions. For each fixed **X** ∈ X and *N* = (*N*1, ··· , *NH*, *Ne*), the Born-Oppenheimer approximation has the Hamiltonian

18 Will-be-set-by-IN-TECH

64 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

For each conformation **X**, the molecular surface *M***<sup>X</sup>** is calculable, see Connolly (1983). The areas *A*(*M***X**) and *A*(*M***<sup>X</sup>** *<sup>i</sup>*) are also calculable. Therefore, unlike the formula given in (34), this formula is calculable. Moreover, our derivation theoretically justified the surface area models that will be discussed later, only difference is that the molecular surface area is used here

But still something is missing. That is, the volume *V*(T**X**), an important thermodynamic quantity, does not show here at all. It seems that no way to put the *V*(T**X**) here in the classical

In 1929 Dirac wrote: "The underlying physical laws necessary for the mathematical theory of ... the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble." (quoted from Bader (1990), page 132). Yes, the multidimensional Shrödinger equation for protein folding is beyond our ability to solve, no matter how fast and how powerful our computers are. But mathematical theory guarantees that there is a complete set of eigenvalues (energy levels) and eigenfunctions to the Shrödinger equation in the Born-Oppenheimer approximation. Then consider that in the statistical mechanics, ensembles collect all (energy) states of the same system. Although one cannot have exact solutions to the Shrödinger equation, the eigenvalues of it are theoretically known. Thus one can apply the grand canonical ensemble to obtain the

For any conformation **<sup>X</sup>** <sup>∈</sup> <sup>X</sup>, let **<sup>W</sup>** = (**w**1, ··· , **<sup>w</sup>***i*, ··· , **<sup>w</sup>***N*) <sup>∈</sup> **<sup>R</sup>**3*<sup>N</sup>* be the nuclear centers of water molecules in <sup>R</sup>**<sup>X</sup>** and **<sup>E</sup>** = (**e**1, ··· , **<sup>e</sup>***i*, ··· , **<sup>e</sup>***L*) <sup>∈</sup> **<sup>R</sup>**3*<sup>L</sup>* be electronic positions of all

> *N* ∑ *i*=1 �2 *<sup>i</sup>* <sup>−</sup> *<sup>h</sup>*¯ <sup>2</sup> 2*me*

where *mi* is the nuclear mass of atom **a***i*, *mw* and *me* are the masses of water molecule and

Depending on the shape of *P***X**, for each *i*, 1 ≤ *i* ≤ *H*, the maximum numbers *N***<sup>X</sup>** *<sup>i</sup>* of water molecules contained in R**<sup>X</sup>** *<sup>i</sup>* vary. Theoretically all cases are considered, i.e., there are <sup>0</sup> ≤ *Ni* ≤ *<sup>N</sup>***<sup>X</sup>** *<sup>i</sup>* water molecules in R**<sup>X</sup>** *<sup>i</sup>*, 1 ≤ *<sup>i</sup>* ≤ *<sup>H</sup>*. Let *<sup>M</sup>*<sup>0</sup> = 0 and *Mi* = <sup>∑</sup>*j*≤*<sup>i</sup> Nj* and

possible numbers 0 <sup>≤</sup> *Ne* <sup>&</sup>lt; <sup>∞</sup> of electrons in <sup>T</sup>**X**. Let **<sup>E</sup>** = (**e**1, **<sup>e</sup>**2, ··· , **<sup>e</sup>***Ne* ) <sup>∈</sup> **<sup>R</sup>**3*Ne* denote their nuclear positions. For each fixed **X** ∈ X and *N* = (*N*1, ··· , *NH*, *Ne*), the Born-Oppenheimer

**<sup>X</sup>** *<sup>i</sup>* denote the nuclear positions of water molecules in R**X**. As well, there will be all

) ∈ R*Ni* **X** *i*

*L* ∑ *i*=1 �2

, 1 ≤ *i* ≤ *H*, and **W** = (**W**1, **W**2, ··· , **W***MH* ) ∈

*<sup>i</sup>* <sup>+</sup> *<sup>V</sup>*ˆ(**X**, **<sup>W</sup>**, **<sup>E</sup>**), (47)

statistical mechanics. To resolve this, the quantum statistical mechanics is necessary.

desired Gibbs free energy formula *G*(**X**). This is the main idea of the derivation.

*<sup>i</sup>* is Laplacian in corresponding **<sup>R</sup>**3; and *<sup>V</sup>* the potential.

instead of the solvent accessible surface area.

**8.1. The Shrödinger equation**

*<sup>H</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>T</sup>*<sup>ˆ</sup> <sup>+</sup> *<sup>V</sup>*<sup>ˆ</sup> <sup>=</sup> <sup>−</sup>

**<sup>W</sup>***<sup>i</sup>* = (**w***Mi*−<sup>1</sup>+1, ··· , **<sup>w</sup>***Mi*−<sup>1</sup>+*j*, ··· , **<sup>w</sup>***Mi*

electron; �<sup>2</sup>

∏*<sup>H</sup> <sup>i</sup>*=<sup>1</sup> <sup>R</sup>*Ni*

**8. A quantum statistical theory of protein folding**

electrons in T**X**. Then the Hamiltonian for the system T**<sup>X</sup>** is

*M* ∑ *i*=1

*h*¯ 2 2*mi* �2 *<sup>i</sup>* <sup>−</sup> *<sup>h</sup>*¯ <sup>2</sup> 2*mw*

**8.2. The first step of the Born-Oppenheimer approximation**

$$\hat{H}\_X = -\frac{\hbar^2}{2} \left\{ \frac{1}{m\_{\text{av}}} \sum\_{j=1}^{M\_H} \nabla\_j^2 + \frac{1}{m\_\varepsilon} \sum\_{\nu=1}^{N\_\varepsilon} \nabla\_\nu^2 \right\} + \hat{\mathcal{V}}(\mathbf{X}, \mathbf{W}, \mathbf{E}).\tag{48}$$

The eigenfunctions *ψ***X**,*<sup>N</sup> <sup>i</sup>* (**W**, **<sup>E</sup>**) <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> 0(∏*<sup>H</sup> <sup>i</sup>*=<sup>1</sup> <sup>R</sup>*Ni* **<sup>X</sup>** *<sup>i</sup>* × T *Ne* **<sup>X</sup>** ) = H**X**,*N*, 1 ≤ *i* < ∞, comprise an orthonormal basis of <sup>H</sup>**X**,*N*. Denote their eigenvalues (energy levels) as *<sup>E</sup><sup>i</sup>* **<sup>X</sup>**,*N*, then *<sup>H</sup>*<sup>ˆ</sup> **<sup>X</sup>***ψ***X**,*<sup>N</sup> <sup>i</sup>* = *Ei* **X**,*Nψ***X**,*<sup>N</sup> <sup>i</sup>* .

#### **8.3. Grand partition function and grand canonic density operator**

Since the numbers *Ni* and *Ne* vary, the grand canonic ensemble should be adopted. Let *μ<sup>i</sup>* be the chemical potentials, that is, the Gibbs free energy per water molecule in R**<sup>X</sup>** *<sup>i</sup>*. Let *μ<sup>e</sup>* be electron chemical potential. The grand canonic density operator is like in equation (15), or see Greiner *et. al.* (1994) and Dai (2007)

$$\boldsymbol{\beta}\_{\mathbf{X}} = \exp\left\{-\boldsymbol{\beta} \left[\hat{H}\_{\mathbf{X}} - \sum\_{i=1}^{H} \mu\_{i} \hat{\mathbf{N}}\_{i} - \mu\_{\varepsilon} \hat{\mathbf{N}}\_{\varepsilon} - \boldsymbol{\phi}(\mathbf{X})\right]\right\}.\tag{49}$$

where *φ*(**X**) is the grand canonic potential *φ* in equation (8) with the index **X** and the grand partition function is

$$\exp\left[-\beta\phi(\mathbf{X})\right] = \text{Trace}\left\{\exp\left[-\beta\left(\hat{H}\_{\mathbf{X}} - \sum\_{i=1}^{H} \mu\_{i}\hat{\mathbf{N}}\_{i} - \mu\_{\varepsilon}\hat{\mathbf{N}}\_{\varepsilon}\right)\right]\right\}$$

$$= \sum\_{i,N} \exp\left\{-\beta\left[E\_{\mathbf{X},N}^{i} - \sum\_{i=1}^{H} \mu\_{i}\mathbf{N}\_{i} - \mu\_{\varepsilon}\mathbf{N}\_{\varepsilon}\right]\right\}.\tag{50}$$

#### **8.4. The Gibbs free energy** *G*(**X**)

As in equation (16), under the grand canonic ensemble the entropy *S*(**X**) = *S*(T**X**) of the system T**<sup>X</sup>** is

$$S(\mathbf{X}) = -k \text{Trace}(\rho\_{\mathbf{X}} \ln \rho\_{\mathbf{X}}) = -k \langle \ln \rho\_{\mathbf{X}} \rangle = k \beta \left\langle \hat{H} \mathbf{x} - \phi(\mathbf{X}) - \sum\_{i=1}^{H} \mu\_{i} \hat{\mathbf{N}}\_{i} - \mu\_{\epsilon} \hat{\mathbf{N}}\_{\epsilon} \right\rangle$$

$$= \frac{1}{T} \left[ \langle \hat{H} \mathbf{x} \rangle - \langle \phi(\mathbf{X}) \rangle - \sum\_{i=1}^{H} \mu\_{i} \langle \hat{\mathbf{N}}\_{i} \rangle - \mu\_{\epsilon} \langle \hat{\mathbf{N}}\_{\epsilon} \rangle \right]$$

$$= \frac{1}{T} \left[ \mathcal{U}(\mathbf{X}) - \phi(\mathbf{X}) - \sum\_{i=1}^{H} \mu\_{i} \mathcal{N}\_{i}(\mathbf{X}) - \mu\_{\epsilon} \mathcal{N}\_{\epsilon}(\mathbf{X}) \right]. \tag{51}$$

Denote �*N*ˆ*i*� <sup>=</sup> *Ni*(**X**) as the mean number of water molecules in <sup>R</sup>**<sup>X</sup>** *<sup>i</sup>*, 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>H</sup>*, and �*N*ˆ*e*� <sup>=</sup> *Ne*(**X**) the mean number of electrons in <sup>T</sup>**X**. The inner energy �*H*<sup>ˆ</sup> **<sup>X</sup>**� of the system <sup>T</sup>**<sup>X</sup>** is denoted as *U*(**X**) = *U*(T**X**). By equation (8) and the remark after it *φ*(**X**)(*T*, *V*, *μ*1, ··· , *μH*, *μe*) = −*PV*(**X**), where *V*(**X**) = *V*(T**X**) is the volume of the thermodynamic system T**X**. Thus by equation (51) the Gibbs free energy *G*(**X**) = *G*(T**X**) in formula (1) is obtained:

$$G(\mathbf{X}) = G(\mathcal{T}\mathbf{x}) = PV(\mathbf{X}) + \mathcal{U}(\mathbf{X}) - TS(\mathbf{X}) = \sum\_{i=1}^{H} \mu\_i N\_i(\mathbf{X}) + \mu\_\ell N\_\ell(\mathbf{X}).\tag{52}$$

### **8.5. Converting formula (1) to geometric form (2)**

As in the classical statistical mechanics case,

$$\upsilon\_i A(M\_{\mathbf{X}\dot{i}}) = N\_{\dot{i}}(\mathbf{X})\_\prime \quad 1 \le \dot{i} \le H. \tag{53}$$

for Protein Folding 21

Gibbs Free Energy Formula for Protein Folding 67

The Born-Oppenheimer approximation "treats the electrons as if they are moving in the field of fixed nuclei. This is a good approximation because, loosely speaking, electrons move much faster than nuclei and will almost instantly adjust themselves to a change in nuclear position." Popelier (2000). Since the mass of a water molecule is much less than the mass of a protein, this approximation can be extended to the case of when **X** changes the other particles, electrons

"Up to now there is no evidence to show that statistical physics itself is responsible for any mistakes," the Preface of Dai (2007). Via the ensemble theory of statistical mechanics only one protein molecule and particles in its immediate environment are considered, it is justified since as pointed out in Dai (2007) page 10, "When the duration of measurement is short, or the number of particles is not large enough, the concept of ensemble theory is still valid." And among different ensembles, "Generally speaking, the grand canonic ensemble, with the least restrictions, is the most convenient in the mathematical treatment." Dai (2007) page 16. In fact, the canonic ensemble has been tried and ended with a result that the eigenvalues of the quantum mechanics system have to be really calculated, to do it accurately is impossible.

The derivations in this chapter only puts together the two very common and sound practices: the Born-Oppenheimer approximation (only the first step) and the grand canonic ensemble, and apply them to the protein folding problem. As long as protein folding obeys the

A protein's structure will never be in equilibrium, in fact, even the native structure is only a snapshot of the constant vibration state of the structure. The best description of conformation **X** is given in Chapter 3 of Bader (1990). Simply speaking, a conformation **X** actually is any point **Y** such that all **y***<sup>i</sup>* are contained in a union of tiny balls centered at **x***i*, *i* = 1, ··· , *M*. In this sense, it can only be anticipated that a quasi-equilibrium description (such as the heat engine, Bailyn (1994) page 94) of the thermodynamic states of the protein folding. This has been built-in in the Thermodynamical Principle of Protein Folding. So the quantities such as *S*(**X**), *φ*(**X**), and *G*(**X**) can only be understood in this sense. That is, observing a concrete folding process one will see a series of conformations **X***i*, *i* = 1, 2, 3, ··· . The Thermodynamic Principle then says that measuring the Gibbs free energy *G*(**X***i*) one will observe that eventually *G*(**X***i*) will converge to a minimum value and the **X***<sup>i</sup>* will eventually approach to the native structure. While all the time, no conformation **X***<sup>i</sup>* and thermodynamic

Formulas (1) and (2) theoretically show that hydrophobic effect is the driving force of protein folding, it is not just solvent free energy besides the pairwise interactions such as the Coulombs, etc., as all force fields assumed. Only in the physiological environment the

fundamental physical laws, there should not be any serious error with the derivation.

and water molecules, will quickly adjust themselves to the change as well.

*9.1.2. The statistical physics in general and the grand canonic ensemble in particular*

*9.1.1. The Born-Oppenheimer approximation*

**9.2. Equilibrium and quasi-equilibrium**

system T**X***<sup>i</sup>* are really in equilibrium state.

**9.3. Potential energy plays no role in protein folding**

Similarly, there will be a *ν<sup>e</sup>* > 0 such that *νeV*(T**X**) = *Ne*(**X**). By the definition of T**<sup>X</sup>** and Ω**X**, it is roughly *V*(T**X**\Ω**X**) = *dw A*(*M***X**). Thus

$$N\_{\boldsymbol{\varepsilon}}(\mathbf{X}) = \nu\_{\boldsymbol{\varepsilon}} V(\mathcal{T}\_{\mathbf{X}}) = \nu\_{\boldsymbol{\varepsilon}} [V(\boldsymbol{\Omega} \mathbf{x}) + V(\mathcal{T}\_{\mathbf{X}} \, \boldsymbol{\Omega} \mathbf{x})] = \nu\_{\boldsymbol{\varepsilon}} V(\boldsymbol{\Omega} \mathbf{x}) + d\_{\boldsymbol{\mathcal{W}}} \nu\_{\boldsymbol{\varepsilon}} A(M \mathbf{x}). \tag{54}$$

Substitute equations (45) and (54) into formula (1), formula (2) is obtained.

## **9. Some remarks**

The question to applying fundamental physical laws directly to the protein folding problem is, can it be done? It should be checked that how rigorous is the derivation and be asked that are there any fundamental errors? Possible ways to modify the formula or the derivation will also be discussed.

By applying quantum statistics the protein folding problem is theoretically treated. A theory is useful only if it can make explanations to the observed facts and if it can simplify and improve research methods as well as clarify concepts. It will be shown that *G*(**X**) can do exactly these.

If the same theoretical result can be derived from two different disciplines, it is often not just by chance. An early phenomenological mathematical model Fang (2005), starting from purely geometric reasoning, has achieved formula (2), with just two hydrophobic levels, hydrophobic and hydrophilic.

A theory also has to be falsifiable, that is making a prediction to be checked. The fundamental prediction is that minimizing formula (1) or (2) the native structures will be obtained for the amino acid sequences of proteins considered in the assumptions of the formulas. That can only be done after the actual values of the chemical potentials appear in the formulas, for the physiological environment, are determined.

### **9.1. How rigorous is the derivation?**

Two common tools in physics, the first step of the Born-Oppenheimer approximation in quantum mechanics and the grand canonic ensemble in statistical physics, are applied to obtain formula (1).

## *9.1.1. The Born-Oppenheimer approximation*

20 Will-be-set-by-IN-TECH

Denote �*N*ˆ*i*� <sup>=</sup> *Ni*(**X**) as the mean number of water molecules in <sup>R</sup>**<sup>X</sup>** *<sup>i</sup>*, 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>H</sup>*, and �*N*ˆ*e*� <sup>=</sup> *Ne*(**X**) the mean number of electrons in <sup>T</sup>**X**. The inner energy �*H*<sup>ˆ</sup> **<sup>X</sup>**� of the system <sup>T</sup>**<sup>X</sup>** is denoted as *U*(**X**) = *U*(T**X**). By equation (8) and the remark after it *φ*(**X**)(*T*, *V*, *μ*1, ··· , *μH*, *μe*) = −*PV*(**X**), where *V*(**X**) = *V*(T**X**) is the volume of the thermodynamic system T**X**. Thus by

Similarly, there will be a *ν<sup>e</sup>* > 0 such that *νeV*(T**X**) = *Ne*(**X**). By the definition of T**<sup>X</sup>** and Ω**X**, it

The question to applying fundamental physical laws directly to the protein folding problem is, can it be done? It should be checked that how rigorous is the derivation and be asked that are there any fundamental errors? Possible ways to modify the formula or the derivation will

By applying quantum statistics the protein folding problem is theoretically treated. A theory is useful only if it can make explanations to the observed facts and if it can simplify and improve research methods as well as clarify concepts. It will be shown that *G*(**X**) can do exactly these. If the same theoretical result can be derived from two different disciplines, it is often not just by chance. An early phenomenological mathematical model Fang (2005), starting from purely geometric reasoning, has achieved formula (2), with just two hydrophobic levels, hydrophobic

A theory also has to be falsifiable, that is making a prediction to be checked. The fundamental prediction is that minimizing formula (1) or (2) the native structures will be obtained for the amino acid sequences of proteins considered in the assumptions of the formulas. That can only be done after the actual values of the chemical potentials appear in the formulas, for the

Two common tools in physics, the first step of the Born-Oppenheimer approximation in quantum mechanics and the grand canonic ensemble in statistical physics, are applied to

*Ne*(**X**) = *νeV*(T**X**) = *νe*[*V*(Ω**X**) + *V*(T**X**\Ω**X**)] = *νeV*(Ω**X**) + *dwνeA*(*M***X**). (54)

*H* ∑ *i*=1

66 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

*νiA*(*M***<sup>X</sup>** *<sup>i</sup>*) = *Ni*(**X**), 1 ≤ *i* ≤ *H*. (53)

*μiNi*(**X**) + *μeNe*(**X**). (52)

equation (51) the Gibbs free energy *G*(**X**) = *G*(T**X**) in formula (1) is obtained:

*G*(**X**) = *G*(T**X**) = *PV*(**X**) + *U*(**X**) − *TS*(**X**) =

Substitute equations (45) and (54) into formula (1), formula (2) is obtained.

**8.5. Converting formula (1) to geometric form (2)**

As in the classical statistical mechanics case,

is roughly *V*(T**X**\Ω**X**) = *dw A*(*M***X**). Thus

physiological environment, are determined.

**9.1. How rigorous is the derivation?**

**9. Some remarks**

also be discussed.

and hydrophilic.

obtain formula (1).

The Born-Oppenheimer approximation "treats the electrons as if they are moving in the field of fixed nuclei. This is a good approximation because, loosely speaking, electrons move much faster than nuclei and will almost instantly adjust themselves to a change in nuclear position." Popelier (2000). Since the mass of a water molecule is much less than the mass of a protein, this approximation can be extended to the case of when **X** changes the other particles, electrons and water molecules, will quickly adjust themselves to the change as well.

## *9.1.2. The statistical physics in general and the grand canonic ensemble in particular*

"Up to now there is no evidence to show that statistical physics itself is responsible for any mistakes," the Preface of Dai (2007). Via the ensemble theory of statistical mechanics only one protein molecule and particles in its immediate environment are considered, it is justified since as pointed out in Dai (2007) page 10, "When the duration of measurement is short, or the number of particles is not large enough, the concept of ensemble theory is still valid." And among different ensembles, "Generally speaking, the grand canonic ensemble, with the least restrictions, is the most convenient in the mathematical treatment." Dai (2007) page 16. In fact, the canonic ensemble has been tried and ended with a result that the eigenvalues of the quantum mechanics system have to be really calculated, to do it accurately is impossible.

The derivations in this chapter only puts together the two very common and sound practices: the Born-Oppenheimer approximation (only the first step) and the grand canonic ensemble, and apply them to the protein folding problem. As long as protein folding obeys the fundamental physical laws, there should not be any serious error with the derivation.

## **9.2. Equilibrium and quasi-equilibrium**

A protein's structure will never be in equilibrium, in fact, even the native structure is only a snapshot of the constant vibration state of the structure. The best description of conformation **X** is given in Chapter 3 of Bader (1990). Simply speaking, a conformation **X** actually is any point **Y** such that all **y***<sup>i</sup>* are contained in a union of tiny balls centered at **x***i*, *i* = 1, ··· , *M*. In this sense, it can only be anticipated that a quasi-equilibrium description (such as the heat engine, Bailyn (1994) page 94) of the thermodynamic states of the protein folding. This has been built-in in the Thermodynamical Principle of Protein Folding. So the quantities such as *S*(**X**), *φ*(**X**), and *G*(**X**) can only be understood in this sense. That is, observing a concrete folding process one will see a series of conformations **X***i*, *i* = 1, 2, 3, ··· . The Thermodynamic Principle then says that measuring the Gibbs free energy *G*(**X***i*) one will observe that eventually *G*(**X***i*) will converge to a minimum value and the **X***<sup>i</sup>* will eventually approach to the native structure. While all the time, no conformation **X***<sup>i</sup>* and thermodynamic system T**X***<sup>i</sup>* are really in equilibrium state.

### **9.3. Potential energy plays no role in protein folding**

Formulas (1) and (2) theoretically show that hydrophobic effect is the driving force of protein folding, it is not just solvent free energy besides the pairwise interactions such as the Coulombs, etc., as all force fields assumed. Only in the physiological environment the hydrophobic effect works towards to native structure, otherwise it will push denaturation as discussed in explanation of folding and unfolding. Formulas (1) and (2) show that the Gibbs free energy is actually independent of the potential energy, against one's intuition and a bit of surprising. The explanation is that during the folding process, all covalent bonds in the main chain and each side chain are kept invariant, the potential energy has already played its role in the synthesis process of forming the peptide chain, which of course can also be described by quantum mechanics. According to Anfinsen (1973), protein folding is after the synthesis of the whole peptide chain, so the synthesis process can be skipped and the concentration can be focused on the folding process.

for Protein Folding 23

Gibbs Free Energy Formula for Protein Folding 69

To check this, an experiment should be designed that can suddenly put proteins in a different environment. Formulas (1) and (2) should be written as *G*(**X**, *EnN*). Indeed, the chemical potentials *μ<sup>e</sup>* and *μi*'s are Gibbs free energies per corresponding particles, *μ* = *u* + *Pv* − *Ts*. Two environment parameters, temperature *T* and pressure *P*, explicitly appear in *μ*, the inner energy *u* and entropy *s* may also implicitly depend on the environment. According to formulas (1) and (2), if *μ<sup>i</sup>* < 0, then make more *Hi* atoms to expose to water (make larger *A*(*M***<sup>X</sup>** *<sup>i</sup>*)) will reduce the Gibbs free energy. If *μ<sup>i</sup>* > 0, then the reverse will happen. Increase or reduce the *Hi* atoms' exposure to water (*A*(*M***<sup>X</sup>** *<sup>i</sup>*)), the conformation has to change. The conformation changes to adjust until a conformation **X***<sup>N</sup>* is obtained, such that the net effect of any change of the conformation will either increase some *Hi* atoms' exposure to water while *μ<sup>i</sup>* > 0 or reduce *Hi* atoms' exposure to water while *μ<sup>i</sup>* < 0. In other words, the *G*(**X**, *EnN*) achieves its minimum at *G*(**X***N*, *EnN*). Protein folding, at least for the proteins considered in

In changed environment, the chemical potentials *μ<sup>e</sup>* and *μi*'s in formulas (1) and (2) changed their values. With the changed chemical potentials, *G*(**X**, *EnU*) has the same form as *G*(**X**, *EnN*) but different chemical potentials. Therefore, the structure **X***<sup>U</sup>* will be stable, according to the second inequality in (55), the process is exactly the same as described for the protein folding if the changing environment method does not include introducing new kinds (non-water) of particles, for example, if only temperature or pressure is changed.

Even in the new environment including new kinds of particles, formulas (1) and (2) can still partially explain the denaturation, only that more obstructs prevent the protein to denature to **X***U*, but any way it will end in some structure other than the **X***N*, the protein is denatured. Actually, this is a hint of how to modify the current formulas to extend to general proteins.

In 1959, by reviewing the literature Kauzmann concluded that the hydrophobic effect is the main driving force in protein folding, Kauzmann (1959). Empirical correlation between hydrophobic free energy and aqueous cavity surface area was noted as early as by Reynolds *et.al.* (1974), giving justification of the hydrophobic effect. Various justifications of hydrophobic effect were published, based on empirical models of protein folding, for example, Dill (1990). But the debate continues to present, some still insist that it is the hydrogen bond instead of hydrophobic effect plays the main role of driving force in protein folding, for example, Rose *et. al* (2006). The theoretically derived formulas (1) and (2) can explain why the hydrophobic effect is indeed the driving force. A simulation of reducing hydrophobic area alone by Fang and Jing (2010) shows that the result is the appearance of regularly patterned intramolecular hydrogen bonds associated to the secondary structures.

In fact, according to formulas (1) and (2), if *μ<sup>i</sup>* < 0, then make more *Hi* atoms to appear in the boundary of *P***<sup>X</sup>** will reduce the Gibbs free energy. If *μ<sup>i</sup>* > 0, then the reverse will happen, reducing the exposure of *Hi* atoms to water will reduce the Gibbs free energy. This gives a theoretical explanation of the hydrophobic effect. The kinetic formulas **F***<sup>i</sup>* = − �**x***<sup>i</sup> G*(**X**) (will be discussed later) is the force that push the conformation to change to the native structure. The mechanics stated above works through the chemical potentials *μ<sup>i</sup>* for various levels of hydrophobicity. In physiological environment, all hydrophobic *Hi*'s will have positive *μi*, all

**9.5. Explain hydrophobic effect and the role played by hydrogen bonding**

the assumptions, is explained very well by formulas (1) and (2).

The steric conditions (35) will just keep this early synthesis result, not any **X** = (**x**1, ··· , **x***i*, ··· , **x***M*) is eligible to be a conformation, it has to satisfy the steric conditions (35). The steric conditions not only pay respect to the bond length, it also reflect a lot of physic-chemical properties of a conformation: They are defined via the allowed minimal atomic distances, such that for non-bonding atoms, the allowed minimal distances are: shorter between differently charged or polarized atoms; a little longer between non-polar ones; and much longer (generally greater than the sum of their radii) between the same charged ones, etc. For example, minimal distance between sulfur atoms in Cysteine residues to form disulfide bonds is allowed. And for any newly found intramolecular covalent bond between side chains, such as the isopeptide bonds in Kang and Baker (2011), the steric conditions can be easily modified to allow the newly found phenomenon.

The drawback of the steric conditions is that the minimization in equation (57) becomes a constrained minimization.

## **9.4. Unified explanation of folding and denaturation**

Protein denaturation is easy to happen, even if the environment is slightly changed, as described by Hsien Wu (1931). (Hsien Wu (1931) is the 13th article that theorizes the results of a series experiments, and a preliminary report was read before the Xlllth International Congress of Physiology at Boston, August 19-24, 1929, and published in the *Am. J. Physiol.* for October 1929. In which Hsien Wu first suggested that the denatured protein is still the same molecule, only structure has been changed.) Anfinsen in various experiments showed that after denaturation by changed environment, if removing the denature agent, certain globular proteins can spontaneously refold to its native structure, Anfinsen (1973). The spontaneous renaturation suggests that protein folding does not need outside help, at least to the class of proteins in this chapter. Therefore, the fundamental law of thermodynamics asserts that in the environments in which a protein can fold, the native structure must have the minimum Gibbs free energy. The same is true for denaturation, under the denatured environment, the native structure no longer has the minimum Gibbs free energy, some other structure(s), will have the minimum Gibbs free energy. Thus let *En* present environment, any formula of Gibbs free energy should be stated as *G*(**X**, *En*) instead of just *G*(**X**), unless the environment is specified like in this chapter. Let *EnN* be the physiological environment and *EnU* be some denatured environment, **X***<sup>N</sup>* be the native structure and **X***<sup>U</sup>* be one of the denatured stable structure in *EnU*, then the thermodynamic principle for both of protein folding and unfolding should be that

$$G(\mathbf{X}\_{\text{N}}, \mathbb{E}\boldsymbol{\pi}\_{\text{N}}) < G(\mathbf{X}\_{\text{U}}, \mathbb{E}\boldsymbol{\pi}\_{\text{N}}), \quad G(\mathbf{X}\_{\text{N}}, \mathbb{E}\boldsymbol{\pi}\_{\text{U}}) > G(\mathbf{X}\_{\text{U}}, \mathbb{E}\boldsymbol{\pi}\_{\text{U}}). \tag{55}$$

To check this, an experiment should be designed that can suddenly put proteins in a different environment. Formulas (1) and (2) should be written as *G*(**X**, *EnN*). Indeed, the chemical potentials *μ<sup>e</sup>* and *μi*'s are Gibbs free energies per corresponding particles, *μ* = *u* + *Pv* − *Ts*. Two environment parameters, temperature *T* and pressure *P*, explicitly appear in *μ*, the inner energy *u* and entropy *s* may also implicitly depend on the environment. According to formulas (1) and (2), if *μ<sup>i</sup>* < 0, then make more *Hi* atoms to expose to water (make larger *A*(*M***<sup>X</sup>** *<sup>i</sup>*)) will reduce the Gibbs free energy. If *μ<sup>i</sup>* > 0, then the reverse will happen. Increase or reduce the *Hi* atoms' exposure to water (*A*(*M***<sup>X</sup>** *<sup>i</sup>*)), the conformation has to change. The conformation changes to adjust until a conformation **X***<sup>N</sup>* is obtained, such that the net effect of any change of the conformation will either increase some *Hi* atoms' exposure to water while *μ<sup>i</sup>* > 0 or reduce *Hi* atoms' exposure to water while *μ<sup>i</sup>* < 0. In other words, the *G*(**X**, *EnN*) achieves its minimum at *G*(**X***N*, *EnN*). Protein folding, at least for the proteins considered in the assumptions, is explained very well by formulas (1) and (2).

22 Will-be-set-by-IN-TECH

68 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

hydrophobic effect works towards to native structure, otherwise it will push denaturation as discussed in explanation of folding and unfolding. Formulas (1) and (2) show that the Gibbs free energy is actually independent of the potential energy, against one's intuition and a bit of surprising. The explanation is that during the folding process, all covalent bonds in the main chain and each side chain are kept invariant, the potential energy has already played its role in the synthesis process of forming the peptide chain, which of course can also be described by quantum mechanics. According to Anfinsen (1973), protein folding is after the synthesis of the whole peptide chain, so the synthesis process can be skipped and the concentration can be

The steric conditions (35) will just keep this early synthesis result, not any **X** = (**x**1, ··· , **x***i*, ··· , **x***M*) is eligible to be a conformation, it has to satisfy the steric conditions (35). The steric conditions not only pay respect to the bond length, it also reflect a lot of physic-chemical properties of a conformation: They are defined via the allowed minimal atomic distances, such that for non-bonding atoms, the allowed minimal distances are: shorter between differently charged or polarized atoms; a little longer between non-polar ones; and much longer (generally greater than the sum of their radii) between the same charged ones, etc. For example, minimal distance between sulfur atoms in Cysteine residues to form disulfide bonds is allowed. And for any newly found intramolecular covalent bond between side chains, such as the isopeptide bonds in Kang and Baker (2011), the steric conditions can

The drawback of the steric conditions is that the minimization in equation (57) becomes a

Protein denaturation is easy to happen, even if the environment is slightly changed, as described by Hsien Wu (1931). (Hsien Wu (1931) is the 13th article that theorizes the results of a series experiments, and a preliminary report was read before the Xlllth International Congress of Physiology at Boston, August 19-24, 1929, and published in the *Am. J. Physiol.* for October 1929. In which Hsien Wu first suggested that the denatured protein is still the same molecule, only structure has been changed.) Anfinsen in various experiments showed that after denaturation by changed environment, if removing the denature agent, certain globular proteins can spontaneously refold to its native structure, Anfinsen (1973). The spontaneous renaturation suggests that protein folding does not need outside help, at least to the class of proteins in this chapter. Therefore, the fundamental law of thermodynamics asserts that in the environments in which a protein can fold, the native structure must have the minimum Gibbs free energy. The same is true for denaturation, under the denatured environment, the native structure no longer has the minimum Gibbs free energy, some other structure(s), will have the minimum Gibbs free energy. Thus let *En* present environment, any formula of Gibbs free energy should be stated as *G*(**X**, *En*) instead of just *G*(**X**), unless the environment is specified like in this chapter. Let *EnN* be the physiological environment and *EnU* be some denatured environment, **X***<sup>N</sup>* be the native structure and **X***<sup>U</sup>* be one of the denatured stable structure in *EnU*, then the thermodynamic principle for both of protein folding and unfolding should be

*G*(**X***N*, *EnN*) < *G*(**X***U*, *EnN*), *G*(**X***N*, *EnU*) > *G*(**X***U*, *EnU*). (55)

focused on the folding process.

constrained minimization.

that

be easily modified to allow the newly found phenomenon.

**9.4. Unified explanation of folding and denaturation**

In changed environment, the chemical potentials *μ<sup>e</sup>* and *μi*'s in formulas (1) and (2) changed their values. With the changed chemical potentials, *G*(**X**, *EnU*) has the same form as *G*(**X**, *EnN*) but different chemical potentials. Therefore, the structure **X***<sup>U</sup>* will be stable, according to the second inequality in (55), the process is exactly the same as described for the protein folding if the changing environment method does not include introducing new kinds (non-water) of particles, for example, if only temperature or pressure is changed.

Even in the new environment including new kinds of particles, formulas (1) and (2) can still partially explain the denaturation, only that more obstructs prevent the protein to denature to **X***U*, but any way it will end in some structure other than the **X***N*, the protein is denatured. Actually, this is a hint of how to modify the current formulas to extend to general proteins.

## **9.5. Explain hydrophobic effect and the role played by hydrogen bonding**

In 1959, by reviewing the literature Kauzmann concluded that the hydrophobic effect is the main driving force in protein folding, Kauzmann (1959). Empirical correlation between hydrophobic free energy and aqueous cavity surface area was noted as early as by Reynolds *et.al.* (1974), giving justification of the hydrophobic effect. Various justifications of hydrophobic effect were published, based on empirical models of protein folding, for example, Dill (1990). But the debate continues to present, some still insist that it is the hydrogen bond instead of hydrophobic effect plays the main role of driving force in protein folding, for example, Rose *et. al* (2006). The theoretically derived formulas (1) and (2) can explain why the hydrophobic effect is indeed the driving force. A simulation of reducing hydrophobic area alone by Fang and Jing (2010) shows that the result is the appearance of regularly patterned intramolecular hydrogen bonds associated to the secondary structures.

In fact, according to formulas (1) and (2), if *μ<sup>i</sup>* < 0, then make more *Hi* atoms to appear in the boundary of *P***<sup>X</sup>** will reduce the Gibbs free energy. If *μ<sup>i</sup>* > 0, then the reverse will happen, reducing the exposure of *Hi* atoms to water will reduce the Gibbs free energy. This gives a theoretical explanation of the hydrophobic effect. The kinetic formulas **F***<sup>i</sup>* = − �**x***<sup>i</sup> G*(**X**) (will be discussed later) is the force that push the conformation to change to the native structure.

The mechanics stated above works through the chemical potentials *μ<sup>i</sup>* for various levels of hydrophobicity. In physiological environment, all hydrophobic *Hi*'s will have positive *μi*, all

#### 24 Will-be-set-by-IN-TECH 70 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

hydrophilic *Hi*'s will have negative *μi*. Thus changing conformation *P***<sup>X</sup>** such that the most hydrophilic *Hi* (*μ<sup>i</sup>* = min(*μ*1, ··· , *μH*)) gets the first priority to appear on the boundary, and the most hydrophobic *Hi* (*μ<sup>i</sup>* = max(*μ*1, ··· , *μH*)) gets the first priority to hide in the hydrophobic core to avoid contacting with water molecules, etc. One should keep in mind that all the time, the steric conditions (35) have to be obeyed.

for Protein Folding 25

Gibbs Free Energy Formula for Protein Folding 71

With these highly specially selected peptide sequences, one can assume that while shrinking the various hydrophobic surfaces to form a hydrophobic core, residues are put in positions to form secondary structures and their associated hydrogen bonds. This sounds a little bit too arbitrary. But the huge number of candidate peptide sequences makes the evolutional selection not only possible but also probable. Moreover, a simulation of shrinking hydrophobic surface area alone indeed produced secondary structures and hydrogen bonds. The simulation was reported by Fang and Jing (2010). Without calculating any dihedral angles or electronic charges, without any arbitrary parameter, paying no attention to any particular atom's position, by just reducing hydrophobic surface area (there it was assumed that there are only two kinds of atoms, hydrophobic and hydrophilic), secondary structures and hydrogen bonds duly appeared. The proteins used in the simulation are 2i9c, 2hng, and 2ib0, with 123, 127, and 162 residues. No simulation of any kind of empirical or theoretical models had achieved such a success. More than anything, this simulation should prove that hydrophobic effect alone will give more chance of forming intramolecular hydrogen bonds. Indeed, pushing hydrophilic atoms to make hydrogen bonds with water molecules will give other non-boundary hydrophilic groups more chance to form intramolecular hydrogen bonds. Again formula (2) can partly explain the success of this simulation, when there are only two hydrophobic classes in formula (2), the hydrophobic area presents the main positive part of the Gibbs free energy, reducing it is reducing the Gibbs free energy, no matter what is the

In 1995, Wang *et al* (1995) compared 8 empirical energy models by testing their ability to distinguish native structures and their close neighboring compact non-native structures. Their models WZS are accessible surface area models with 14 hydrophobicity classes of atoms,

*<sup>i</sup>*=<sup>1</sup> *σiAi*. Each two combination of three targeting proteins were used to train WZS to get *σi*, hence there are three models WZS1, WZS2, and WZS3. Among the 8 models, all WZS's performed the best, distinguishing all 6 targeting proteins. The worst performer is the force

These testing and the successes of various surface area models such as Eisenberg and MacLachlan (1986), showed that instead of watching numerous pairwise atomic interactions, the surface area models, though looking too simple, have surprising powers. Now the formula (2) gives them a theoretic justification. On the other hand, the successes of these models also

There is a gap between the accessible surface area model in Eisenberg and Maclanchlan (1986) and the experiment results (surface tension), as pointed out in Tuñón *et. al.* (1992). The gap disappeared when one uses the molecular surface area to replace the accessible surface area, in Tuñón *et. al.* (1992) it was shown that molecular surface area assigned of 72-73 cal/mol/Å2 perfectly fits with the macroscopic experiment data. Later it was asserted that the molecular surface is the real boundary of protein in its native structure by Jackson and Sternberg (1993).

*P***<sup>X</sup>** must be outside the molecular surface *M***X**. Since the assessable surface is in the middle of the first hydration shell, it is better to use the molecular surface *M***<sup>X</sup>** as the boundary of

**<sup>X</sup>**, as shown in FIGURE 3 and FIGURE 4, water molecules contact to

chemical potential's real value.

reenforce the theoretical results.

By the definition of Ω

∑<sup>14</sup>

**9.6. Explanation of the successes of surface area models**

field AMBER 4.0, it failed in distinguishing any of the 6 targets.

But the hydrophobic effect is actually partially working through hydrogen bond formation. This is well presented in the chemical potentials in formulas (1) and (2). In fact, the values of the chemical potentials reflect the ability of the atoms or atom groups to form hydrogen bond, either with another atom group in the protein or with water molecules. This gives a way to theoretically or experimentally determine the values of hydrophilic chemical potentials: checking the actual energy value of the hydrogen bond.

According to Fikelstein and Ptitsyn (2002), energies of hydrogen bonds appearing in protein (intermolecular or intramolecular) are (the positive sign means that to break it energy is needed) and their energies are:

O–H : : : O (21 kJ mol−<sup>1</sup> or 5.0 kcal mol−1); O–H : : : N (29 kJ mol−<sup>1</sup> or 6.9 kcal mol−1); N–H : : : N (13 kJ mol−<sup>1</sup> or 3.1 kcal mol−1); N–H : : : O (8 kJ mol−<sup>1</sup> or 1.9 kcal mol−1).

For hydrophobic ones, it will be more complicated, common sense is that it reduces the entropy that certainly comes from the inability of forming hydrogen bonds with water molecules. Hence although hydrophobic effect is the driving force of protein folding, it works through the atom's ability or inability to form hydrogen bonds with water molecules.

How to explain the intramolecular hydrogen bonds? It seems that formulas (1) and (2) do not address this issue. The possible theory is that the amino acid sequence of a protein is highly selectable in evolution, in fact only a tiny number of amino acid sequences can really become a protein.

Indeed, suppose in average each species (or "kind" of prokaryote) has 105 proteins (*Homo sapiens* has around 3 <sup>×</sup> <sup>10</sup>5), and assume that per protein has 100 variants (versions with tiny difference in the peptide sequence of the protein), then there are at most 1047 peptide sequences that can really produce a natural protein. Now further suppose that only one in 1013 theoretically protein producing peptide sequences on the earth get a chance to be realized, then there will be at most 10<sup>60</sup> possible protein producing peptide sequence. A huge number! The number of peptide sequences of length less than or equal to *n* is

$$N(n) = \sum\_{i=1}^{n} 20^i = \frac{20^{n+1} - 20}{19} = \frac{20}{19}(20^n - 1) \cong 20^{n+0.0171} \cong 10^{1.301(n+0.0171)}.\tag{56}$$

The longest amino acid sequence in the record of ExPASy Proteomics Server has 35,213 residues. Then *N*(35, 213) > 101.3×35,213 > 1045060 and the ratio of the number of potentially protein producing peptide sequences to the number of all possible sequence of length up to 35,213 is less than 1060/1045060 = 10−45000, so tiny a number that it is undistinguishable from zero. Even assuming that the longest peptide sequence is only 400, the ratio is still less than 10−460. How small a chance that a random peptide sequence happens to be a protein's peptide sequence!

With these highly specially selected peptide sequences, one can assume that while shrinking the various hydrophobic surfaces to form a hydrophobic core, residues are put in positions to form secondary structures and their associated hydrogen bonds. This sounds a little bit too arbitrary. But the huge number of candidate peptide sequences makes the evolutional selection not only possible but also probable. Moreover, a simulation of shrinking hydrophobic surface area alone indeed produced secondary structures and hydrogen bonds. The simulation was reported by Fang and Jing (2010). Without calculating any dihedral angles or electronic charges, without any arbitrary parameter, paying no attention to any particular atom's position, by just reducing hydrophobic surface area (there it was assumed that there are only two kinds of atoms, hydrophobic and hydrophilic), secondary structures and hydrogen bonds duly appeared. The proteins used in the simulation are 2i9c, 2hng, and 2ib0, with 123, 127, and 162 residues. No simulation of any kind of empirical or theoretical models had achieved such a success. More than anything, this simulation should prove that hydrophobic effect alone will give more chance of forming intramolecular hydrogen bonds. Indeed, pushing hydrophilic atoms to make hydrogen bonds with water molecules will give other non-boundary hydrophilic groups more chance to form intramolecular hydrogen bonds.

Again formula (2) can partly explain the success of this simulation, when there are only two hydrophobic classes in formula (2), the hydrophobic area presents the main positive part of the Gibbs free energy, reducing it is reducing the Gibbs free energy, no matter what is the chemical potential's real value.

## **9.6. Explanation of the successes of surface area models**

24 Will-be-set-by-IN-TECH

70 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

hydrophilic *Hi*'s will have negative *μi*. Thus changing conformation *P***<sup>X</sup>** such that the most hydrophilic *Hi* (*μ<sup>i</sup>* = min(*μ*1, ··· , *μH*)) gets the first priority to appear on the boundary, and the most hydrophobic *Hi* (*μ<sup>i</sup>* = max(*μ*1, ··· , *μH*)) gets the first priority to hide in the hydrophobic core to avoid contacting with water molecules, etc. One should keep in mind

But the hydrophobic effect is actually partially working through hydrogen bond formation. This is well presented in the chemical potentials in formulas (1) and (2). In fact, the values of the chemical potentials reflect the ability of the atoms or atom groups to form hydrogen bond, either with another atom group in the protein or with water molecules. This gives a way to theoretically or experimentally determine the values of hydrophilic chemical potentials:

According to Fikelstein and Ptitsyn (2002), energies of hydrogen bonds appearing in protein (intermolecular or intramolecular) are (the positive sign means that to break it energy is

O–H : : : O (21 kJ mol−<sup>1</sup> or 5.0 kcal mol−1); O–H : : : N (29 kJ mol−<sup>1</sup> or 6.9 kcal mol−1); N–H : : : N (13 kJ mol−<sup>1</sup> or 3.1 kcal mol−1); N–H : : : O (8 kJ mol−<sup>1</sup> or 1.9 kcal mol−1).

For hydrophobic ones, it will be more complicated, common sense is that it reduces the entropy that certainly comes from the inability of forming hydrogen bonds with water molecules. Hence although hydrophobic effect is the driving force of protein folding, it works

How to explain the intramolecular hydrogen bonds? It seems that formulas (1) and (2) do not address this issue. The possible theory is that the amino acid sequence of a protein is highly selectable in evolution, in fact only a tiny number of amino acid sequences can really become

Indeed, suppose in average each species (or "kind" of prokaryote) has 105 proteins (*Homo sapiens* has around 3 <sup>×</sup> <sup>10</sup>5), and assume that per protein has 100 variants (versions with tiny difference in the peptide sequence of the protein), then there are at most 1047 peptide sequences that can really produce a natural protein. Now further suppose that only one in 1013 theoretically protein producing peptide sequences on the earth get a chance to be realized, then there will be at most 10<sup>60</sup> possible protein producing peptide sequence. A huge number!

The longest amino acid sequence in the record of ExPASy Proteomics Server has 35,213 residues. Then *N*(35, 213) > 101.3×35,213 > 1045060 and the ratio of the number of potentially protein producing peptide sequences to the number of all possible sequence of length up to 35,213 is less than 1060/1045060 = 10−45000, so tiny a number that it is undistinguishable from zero. Even assuming that the longest peptide sequence is only 400, the ratio is still less than 10−460. How small a chance that a random peptide sequence happens to be a protein's peptide

<sup>19</sup> (20*<sup>n</sup>* <sup>−</sup> <sup>1</sup>) <sup>∼</sup><sup>=</sup> <sup>20</sup>*n*+0.0171 <sup>∼</sup><sup>=</sup> <sup>10</sup>1.301(*n*+0.0171)

. (56)

through the atom's ability or inability to form hydrogen bonds with water molecules.

The number of peptide sequences of length less than or equal to *n* is

<sup>19</sup> <sup>=</sup> <sup>20</sup>

<sup>20</sup>*<sup>i</sup>* <sup>=</sup> <sup>20</sup>*n*+<sup>1</sup> <sup>−</sup> <sup>20</sup>

that all the time, the steric conditions (35) have to be obeyed.

checking the actual energy value of the hydrogen bond.

needed) and their energies are:

a protein.

sequence!

*N*(*n*) =

*n* ∑ *i*=1 In 1995, Wang *et al* (1995) compared 8 empirical energy models by testing their ability to distinguish native structures and their close neighboring compact non-native structures. Their models WZS are accessible surface area models with 14 hydrophobicity classes of atoms, ∑<sup>14</sup> *<sup>i</sup>*=<sup>1</sup> *σiAi*. Each two combination of three targeting proteins were used to train WZS to get *σi*, hence there are three models WZS1, WZS2, and WZS3. Among the 8 models, all WZS's performed the best, distinguishing all 6 targeting proteins. The worst performer is the force field AMBER 4.0, it failed in distinguishing any of the 6 targets.

These testing and the successes of various surface area models such as Eisenberg and MacLachlan (1986), showed that instead of watching numerous pairwise atomic interactions, the surface area models, though looking too simple, have surprising powers. Now the formula (2) gives them a theoretic justification. On the other hand, the successes of these models also reenforce the theoretical results.

There is a gap between the accessible surface area model in Eisenberg and Maclanchlan (1986) and the experiment results (surface tension), as pointed out in Tuñón *et. al.* (1992). The gap disappeared when one uses the molecular surface area to replace the accessible surface area, in Tuñón *et. al.* (1992) it was shown that molecular surface area assigned of 72-73 cal/mol/Å2 perfectly fits with the macroscopic experiment data. Later it was asserted that the molecular surface is the real boundary of protein in its native structure by Jackson and Sternberg (1993).

By the definition of Ω **<sup>X</sup>**, as shown in FIGURE 3 and FIGURE 4, water molecules contact to *P***<sup>X</sup>** must be outside the molecular surface *M***X**. Since the assessable surface is in the middle of the first hydration shell, it is better to use the molecular surface *M***<sup>X</sup>** as the boundary of

#### 26 Will-be-set-by-IN-TECH 72 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

the conformation *P***X**. Moreover, the conversion of the mean numbers *Ni*(**X**) to surface area, *Ni*(**X**) = *νiA*(*M***<sup>X</sup>** .*i*), only works for the molecular surface, not for the accessible surface. This can explain the conclusions that molecular surface is a much better boundary than accessible surface as stated in Tuñón *et. al.* (1992) and Jackson and Sternberg (1993).

for Protein Folding 27

Gibbs Free Energy Formula for Protein Folding 73

a graph (**X**, *G*(**X**)) over the space X (all eligible conformations for a given protein), and this is nothing but the Gibbs free energy surface. Mathematically it is a 3*M* dimensional hyper-surface. Its characteristics concerned by students of energy surface theory, such as how rugged it is? how many local minimums are there? is there a funnel? etc., can be answered by

Since the function *G* is actually defined on the whole **R**3*<sup>M</sup>* (on an domain of **R**3*<sup>M</sup>* containing all X is enough), mathematical tools can be explored to study its graph, and compare the results with the restricted conformations. One important question is: Does the absolute minimum

Prediction of protein structures is the most important method to reveal proteins' functions and working mechanics, it becomes a bottle neck in the rapidly developing life science. With more and more powerful computers, this problem is attacked in full front. Various models, homologous or *ab initio*, full atom model or coarse grained, with numerous parameters of which many are quite arbitrary, are used to achieve the goal. Although our computer power growths exponentially, prediction power does not follow that way. At this moment, one should take a deep breath and remind what the great physicist Fermi said: "There are two ways of doing calculations in theoretical physics. One way, and this is the way I prefer, is to have a clear physical picture of the process that you are calculating. The other way is to have a precise and self consistent mathematical formalism." And "I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make

These remarks should also apply to any scientific calculation, not just theoretical physics. Look at the current situation, all *ab initio* prediction models are actually just empirical with many parameters to ensure some success. Fermi's comments remind us that a theory should be based on fundamental physical laws, and contain no arbitrary parameters. Look at formulas (1) and (2), one sees immediately that they are neat, precise and self consistent mathematical formulas. Furthermore, they including no arbitrary parameter, all terms in them have clear physical meanings. Chemical potentials *μ<sup>e</sup>* and *μi*'s, geometric constants *ν<sup>e</sup>* and *νi*'s,

But a theory has to be developed, tested, until justified or falsified. For interested researchers, the tasks are to determine the correct values of the chemical potentials in formula (1) and the geometric ratios *ν<sup>e</sup>* and *ν<sup>i</sup>* in formula (2). There are many estimates to them, but they are either for the solvent accessible surface area such as in Eisenberg and MaLachlan (1986) hence not suit to the experiment data as pointed out in Tuñón *et. al.*, or do not distinguish different hydrophobicity levels as in Tuñón *et. al.* (1992). To get the correct values of the chemical potentials and geometric constants, commonly used method of training with data can be employed, in which one can also test the formulas' ability of discriminating native and nearby compact non-native structures. After that, a direct test is to predict the native structure

simple calculations of the formula.

structure belongs to X?

**10.2. Structure prediction**

him wiggle his trunk." Quoted from Dyson (2004).

can be evalued by theory or experiments, they are not arbitrary at all.

from the amino acid sequence of a protein by minimizing the following:

*G*(**X***N*) = inf

**X**∈X

*G*(**X**). (57)

In fact, the advantage of the solvent accessible surface is that by definition of it one knows exactly each atom occupies which part of the surface, therefore, one can calculate its share in surface area. This fact may partly account why there are so many models based on the solvent accessible surface, even people knew the afore mentioned gap. For other surfaces, one has to define the part of surface that belongs to a specific hydrophobicity class. This was resolved in Fang (2005) via the distance function definition as is used here.

All surface area models neglected one element, the volume of the structure. As early as in the 1970's, Richards and his colleagues already pointed out that the native structure of globular proteins is very dense, or compact, (density = 0.75, Richards (1977)). To make a conformation denser, obviously we should shrink the volume *V*(Ω**X**). The model in Fang (2005) introduced volume term but kept the oversimplification of all atoms are either hydrophobic or hydrophilic. The derivation of formulas (1) and (2) shows that volume term should be counted, but it may be that *νeμ<sup>e</sup>* is very small, in that case, volume maybe really is irrelevant.

## **9.7. Coincidence with phenomenological mathematical model**

If a theoretical result can be derived from two different disciplines, its possibility of correctness will be dramatically increased. Indeed, from a pure geometric consideration, a phenomenological mathematical model, *G*(**X**) = *aV*(Ω**X**) + *bA*(*M***X**) + *cA*(*M***<sup>X</sup>** <sup>1</sup>), *a*, *b*, *c* > 0 (it was assumed that there are only two hydrophobicity levels, hydrophobic and hydrophilic, the hydrophilic surface area *A*(*M***<sup>X</sup>** <sup>2</sup>) is absorbed in *A*(*M***X**) by *A*(*M***<sup>X</sup>** <sup>2</sup>) = *A*(*M***X**) − *A*(*M***<sup>X</sup>** <sup>1</sup>)), was created in Fang (2005). It was based on the well-known global geometric characteristics of the native structure of globular proteins: 1. high density; 2. smaller surface area; 3. hydrophobic core, as demonstrated and summarized in Richards (1977) and Novotny *et.al* (1984). So that to obtain the native structure, one should shrink the volume (increasing the density) and surface area, and form better hydrophobic core (reducing the hydrophobic surface area *A*(*M***<sup>X</sup>** <sup>1</sup>)) simultaneously and cohesively.

The coincidence of formula (2) and the phenomenological mathematical model of Fang (2005) cannot be just a coincidence. Most likely, it is the same natural law reflected in different disciplines. The advantage of formula (2) is that everything there has its physical meaning.

## **10. Applications**

After the derivation it is suitable to point out some immediate applications of the formula *G*(**X**).

## **10.1. Energy surface or landscape**

An obvious application is the construction of Gibbs free energy surface or landscape. Empirical estimate is no longer needed, the Gibbs free energy formula *G* : X → **R** gives a graph (**X**, *G*(**X**)) over the space X (all eligible conformations for a given protein), and this is nothing but the Gibbs free energy surface. Mathematically it is a 3*M* dimensional hyper-surface. Its characteristics concerned by students of energy surface theory, such as how rugged it is? how many local minimums are there? is there a funnel? etc., can be answered by simple calculations of the formula.

Since the function *G* is actually defined on the whole **R**3*<sup>M</sup>* (on an domain of **R**3*<sup>M</sup>* containing all X is enough), mathematical tools can be explored to study its graph, and compare the results with the restricted conformations. One important question is: Does the absolute minimum structure belongs to X?

## **10.2. Structure prediction**

26 Will-be-set-by-IN-TECH

72 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

the conformation *P***X**. Moreover, the conversion of the mean numbers *Ni*(**X**) to surface area, *Ni*(**X**) = *νiA*(*M***<sup>X</sup>** .*i*), only works for the molecular surface, not for the accessible surface. This can explain the conclusions that molecular surface is a much better boundary than accessible

In fact, the advantage of the solvent accessible surface is that by definition of it one knows exactly each atom occupies which part of the surface, therefore, one can calculate its share in surface area. This fact may partly account why there are so many models based on the solvent accessible surface, even people knew the afore mentioned gap. For other surfaces, one has to define the part of surface that belongs to a specific hydrophobicity class. This was resolved in

All surface area models neglected one element, the volume of the structure. As early as in the 1970's, Richards and his colleagues already pointed out that the native structure of globular proteins is very dense, or compact, (density = 0.75, Richards (1977)). To make a conformation denser, obviously we should shrink the volume *V*(Ω**X**). The model in Fang (2005) introduced volume term but kept the oversimplification of all atoms are either hydrophobic or hydrophilic. The derivation of formulas (1) and (2) shows that volume term should be counted, but it may be that *νeμ<sup>e</sup>* is very small, in that case, volume maybe really is

If a theoretical result can be derived from two different disciplines, its possibility of correctness will be dramatically increased. Indeed, from a pure geometric consideration, a phenomenological mathematical model, *G*(**X**) = *aV*(Ω**X**) + *bA*(*M***X**) + *cA*(*M***<sup>X</sup>** <sup>1</sup>), *a*, *b*, *c* > 0 (it was assumed that there are only two hydrophobicity levels, hydrophobic and hydrophilic, the hydrophilic surface area *A*(*M***<sup>X</sup>** <sup>2</sup>) is absorbed in *A*(*M***X**) by *A*(*M***<sup>X</sup>** <sup>2</sup>) = *A*(*M***X**) − *A*(*M***<sup>X</sup>** <sup>1</sup>)), was created in Fang (2005). It was based on the well-known global geometric characteristics of the native structure of globular proteins: 1. high density; 2. smaller surface area; 3. hydrophobic core, as demonstrated and summarized in Richards (1977) and Novotny *et.al* (1984). So that to obtain the native structure, one should shrink the volume (increasing the density) and surface area, and form better hydrophobic core (reducing the hydrophobic

The coincidence of formula (2) and the phenomenological mathematical model of Fang (2005) cannot be just a coincidence. Most likely, it is the same natural law reflected in different disciplines. The advantage of formula (2) is that everything there has its physical meaning.

After the derivation it is suitable to point out some immediate applications of the formula

An obvious application is the construction of Gibbs free energy surface or landscape. Empirical estimate is no longer needed, the Gibbs free energy formula *G* : X → **R** gives

surface as stated in Tuñón *et. al.* (1992) and Jackson and Sternberg (1993).

Fang (2005) via the distance function definition as is used here.

**9.7. Coincidence with phenomenological mathematical model**

surface area *A*(*M***<sup>X</sup>** <sup>1</sup>)) simultaneously and cohesively.

irrelevant.

**10. Applications**

**10.1. Energy surface or landscape**

*G*(**X**).

Prediction of protein structures is the most important method to reveal proteins' functions and working mechanics, it becomes a bottle neck in the rapidly developing life science. With more and more powerful computers, this problem is attacked in full front. Various models, homologous or *ab initio*, full atom model or coarse grained, with numerous parameters of which many are quite arbitrary, are used to achieve the goal. Although our computer power growths exponentially, prediction power does not follow that way. At this moment, one should take a deep breath and remind what the great physicist Fermi said: "There are two ways of doing calculations in theoretical physics. One way, and this is the way I prefer, is to have a clear physical picture of the process that you are calculating. The other way is to have a precise and self consistent mathematical formalism." And "I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk." Quoted from Dyson (2004).

These remarks should also apply to any scientific calculation, not just theoretical physics. Look at the current situation, all *ab initio* prediction models are actually just empirical with many parameters to ensure some success. Fermi's comments remind us that a theory should be based on fundamental physical laws, and contain no arbitrary parameters. Look at formulas (1) and (2), one sees immediately that they are neat, precise and self consistent mathematical formulas. Furthermore, they including no arbitrary parameter, all terms in them have clear physical meanings. Chemical potentials *μ<sup>e</sup>* and *μi*'s, geometric constants *ν<sup>e</sup>* and *νi*'s, can be evalued by theory or experiments, they are not arbitrary at all.

But a theory has to be developed, tested, until justified or falsified. For interested researchers, the tasks are to determine the correct values of the chemical potentials in formula (1) and the geometric ratios *ν<sup>e</sup>* and *ν<sup>i</sup>* in formula (2). There are many estimates to them, but they are either for the solvent accessible surface area such as in Eisenberg and MaLachlan (1986) hence not suit to the experiment data as pointed out in Tuñón *et. al.*, or do not distinguish different hydrophobicity levels as in Tuñón *et. al.* (1992). To get the correct values of the chemical potentials and geometric constants, commonly used method of training with data can be employed, in which one can also test the formulas' ability of discriminating native and nearby compact non-native structures. After that, a direct test is to predict the native structure from the amino acid sequence of a protein by minimizing the following:

$$G(\mathbf{X}\_N) = \inf\_{\mathbf{X} \in \mathcal{X}} G(\mathbf{X}).\tag{57}$$

#### 28 Will-be-set-by-IN-TECH 74 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

This is the first time that a theoretically derived formula of the Gibbs free energy is available. Before this, all *ab intitio* predictions are not really *ab initio*. A combined (theoretical and experimental) search for the values of chemical potentials will be the key for the success of the *ab initio* prediction of protein structure.

for Protein Folding 29

**<sup>b</sup>***ij* <sup>=</sup> **<sup>x</sup>***<sup>j</sup>* <sup>−</sup> **<sup>x</sup>***<sup>i</sup>* |**x***<sup>j</sup>* − **x***i*|

If **b***ij* is rotatable, denoting all nuclear centers in one component by *Rbij* and others in *Fbij* . One can rotate all centers in *Rbij* around **b***ij* for certain angle while keep all centers in *Fbij* fixed. The

*L* can be generated to the molecular surface *M***X**, as shown in Appendix A.

*<sup>L</sup>*(Ω**X**) + *dwνeμeA*

*<sup>L</sup>*•*<sup>N</sup>* <sup>d</sup>H2, *<sup>A</sup>*

where *<sup>N</sup>* is the outer unit normal of *<sup>M</sup>***X**, *<sup>H</sup>* the mean curvature of *<sup>M</sup>***X**, and <sup>H</sup><sup>2</sup> the Hausdorff measure. Define *ft*,*<sup>i</sup>* : **<sup>R</sup>**<sup>3</sup> <sup>→</sup> **<sup>R</sup>** as *ft i*(**x**) = dist(**x**, *<sup>M</sup>***X***<sup>t</sup> <sup>i</sup>*) <sup>−</sup> dist(**x**, *<sup>M</sup>***X***<sup>t</sup>* \*M***X***<sup>t</sup> <sup>i</sup>*), and define on

then let *η* be the unit outward conormal vector of *∂M***<sup>X</sup>** *<sup>i</sup>* (normal to *∂M***<sup>X</sup>** *<sup>i</sup>* but tangent to *M***X**),

(1983). To calculate, the above formulas have to be translated into formulas on the molecular surface *M***X**. These translations are given in Appendix A, they are calculable (all integrals are integrable, i.e., can be expressed by analytic formulas with variables **X**) and were calculated

Let a protein U have *L* rotatable bonds (**b**1, ··· , **b***i*, ··· , **b***L*). Let *θ<sup>i</sup>* denote the dihedral angle around the rotatable bond **b***i*. A conformation **X** of U can be expressed in terms of these

*<sup>L</sup>*•*<sup>N</sup>* )dH<sup>2</sup> <sup>+</sup>

The **X***<sup>t</sup>* is all the information needed in calculating the molecular surface *M***X***<sup>t</sup>*

0,*<sup>i</sup>* <sup>=</sup> *<sup>∂</sup> ft i ∂t t*=0

> *∂M***<sup>X</sup>** *<sup>i</sup>*

 *L*•*η* −

*Lbij*(**x***k*)=(**x***<sup>k</sup>* − **x***i*) ∧ **b***ij*, if **x***<sup>k</sup>* ∈ *Rbij* ; (60)

*Lbij*(**x***k*) = 0, if **<sup>x</sup>***<sup>k</sup>* <sup>∈</sup> *Fbij* . (61)

*H* ∑ *i*=1

> *M***<sup>X</sup>** *H*(

, <sup>d</sup>*f*0,*<sup>i</sup>* <sup>d</sup>*<sup>t</sup>* <sup>=</sup>

> d*f*0,*<sup>i</sup>* d*t* | �*M***<sup>X</sup>** *f*0,*i*|

*G*(**X**) = *G*(Θ), (66)

*<sup>L</sup>*(*M***X**) +

*<sup>L</sup>*(*M***X**) = −2

*L* will generate a family of conformations **X***<sup>t</sup>* =

*νiμiA*

*L*(**x***k*), *k* = 1, ··· , *M*. Moreover, the Lie vector

*<sup>L</sup>*(*M***<sup>X</sup>** *<sup>i</sup>*), (62)

*<sup>L</sup>*•*<sup>N</sup>* )dH2, (63)

0,*i*

<sup>d</sup>H1. (65)

, see Connolly

, (64)

*L*•� *f*0,*<sup>i</sup>* + *f* �

induced Lie vector field

field

with

*M***<sup>X</sup>**

The derivative *G*

Any such a Lie vector field

*G*

*V*

*A*

piecewise on *M***X**.

*10.3.4. The gradient*

(**x**<sup>1</sup> *<sup>t</sup>*, ··· , *xi t*, ··· , *xM t*), where **<sup>x</sup>***k t* <sup>=</sup> **<sup>x</sup>***<sup>k</sup>* <sup>+</sup> *<sup>t</sup>*

*<sup>L</sup>*(**X**) is given by

*<sup>L</sup>*(**X**) = *νeμeV*

 *M***<sup>X</sup>** 

�*M***<sup>X</sup>** *<sup>f</sup>*0,*<sup>i</sup>* <sup>=</sup> � *<sup>f</sup>*0,*<sup>i</sup>* <sup>−</sup> (� *<sup>f</sup>*0,*i*•*<sup>N</sup>* )*<sup>N</sup>* , *<sup>f</sup>* �

 *M***<sup>X</sup>** *<sup>i</sup> H*(

rotatable dihedral angles Θ = (*θ*1, ··· , *θi*, ··· , *θL*), then

*<sup>L</sup>*(Ω**X**) = −

*<sup>L</sup>*(*M***<sup>X</sup>** *<sup>i</sup>*) = −2

*Lbij* will be

. (59)

Gibbs Free Energy Formula for Protein Folding 75

## **10.3. Gradient**

With formula (2) as the Gibbs free energy, the minimization in equation (57) can be pursued by Newton's fastest descending method. To state the result, some definitions are necessary.

## *10.3.1. Molecular graphs*

Given a molecule *U*, let *V* be the set of atoms in *U* and *N* = |*V*| be the number of atoms and label the atoms as **a**1, **a**2, ··· , **a***N*. For 1 ≤ *i*, *j* ≤ *N*, define *Bij* = *n* if atoms *i* and *j* are connected by a bond with valency *n* (one can imagine that *n* is not necessarily a whole number), if *i* and *j* do not form a bond, then *Bij* = 0. The molecule formula of *U* in chemistry can be seen as a graph *G*(*U*)=(*V*, *E*), where *V* acts as the vertex set of *G*(*U*) and *E* is the edge set of *G*(*U*). An edge in *E* is denoted by {*i*, *j*}, If two atoms **a***<sup>i</sup>* and **a***<sup>j</sup>* are connected by a covalent bond, i.e., *Bij* = *n* ≥ 1, then {*i*, *j*} ∈ *E* is an edge. Call *G*(*U*) the **molecular graph** of *U*. FIGURE 1 is a molecular graph if the side chain R consisting of only one atom, such as in the amino acid Glycine.

A graph *G* is connected if from any vertex *v* one can follow the edges in the graph to arrive any other vertex. If a graph is not connected, then it has several **connected components**, each is itself a connected graph. All molecular graphs are connected.

## *10.3.2. Rotatable bonds*

Let **b** = **a***α***a***<sup>β</sup>* be a covalent bond in the molecule *U* connecting two atoms **a***<sup>α</sup>* and **a***β*. The bond **b** is rotatable if and only if: 1. the valency of **b** is not greater than 1; 2. in the molecular graph *G*(*U*), if one deletes {*α*, *β*}, the remaining graph *G*(*U*)\{*α*, *β*} = (*V*, *E*\{*α*, *β*}) has exactly two connected components and neither component has rotational symmetry around the bond **b**.

### *10.3.3. Derivatives of G*(**X**)

Let **x***<sup>i</sup>* = (*xi*, *yi*, *zi*), write **F** = − �**x***<sup>i</sup> G*(**X**) = −(*Gxi* , *Gyi* , *Gzi*)(**X**). The calculation of *Gxi*(**X**), for example, is via Lie vector field induced by moving the atomic position **x***i*. In fact, any infinitesimal change of structure **X** will induce a Lie vector field *<sup>L</sup>* : **<sup>X</sup>** <sup>→</sup> **<sup>R</sup>**3. For example, moving **x***<sup>i</sup>* from **x***<sup>i</sup>* to **x***<sup>i</sup>* + (Δ*xi*, 0, 0) while keep other nuclear center fixed, will induce *Lxi* : **<sup>X</sup>** <sup>→</sup> **<sup>R</sup>**3, such that *Lxi*(**x***i*)=(1, 0, 0) and *Lxi*(**x***j*)=(0, 0, 0) for *<sup>j</sup>* �<sup>=</sup> *<sup>i</sup>*. Similarly *Lyi* and *Lzi* can be described as well. Then write *Gxi* = *G Lxi* , etc. and

$$\bigcirc\_{\mathbf{X}\_{l}} \mathbf{G}(\mathbf{X}) = (\mathbf{G}\_{\mathbf{\tilde{L}}\_{\mathbf{x}\_{l}} \prime} \mathbf{G}\_{\mathbf{\tilde{L}}\_{\mathbf{y}\_{l}} \prime} \mathbf{G}\_{\mathbf{\tilde{L}}\_{\mathbf{z}\_{l}}}) (\mathbf{X}), \tag{58}$$

Rotating around a covalent bond *bij* also induce a Lie vector field *Lbij* : **<sup>X</sup>** <sup>→</sup> **<sup>R</sup>**3. In fact if **<sup>a</sup>***i***a***<sup>j</sup>* form the covalent bond *bij*, then the bond axis is

for Protein Folding 29 Gibbs Free Energy Formula for Protein Folding 75

$$\mathbf{b}\_{lj} = \frac{\mathbf{x}\_j - \mathbf{x}\_i}{|\mathbf{x}\_j - \mathbf{x}\_i|}. \tag{59}$$

If **b***ij* is rotatable, denoting all nuclear centers in one component by *Rbij* and others in *Fbij* . One can rotate all centers in *Rbij* around **b***ij* for certain angle while keep all centers in *Fbij* fixed. The induced Lie vector field *Lbij* will be

$$\vec{L}\_{b\_{\!\!\!\!/}}(\mathbf{x}\_{k}) = (\mathbf{x}\_{k} - \mathbf{x}\_{\!\!\!/}) \wedge \mathbf{b}\_{\!\!\!/ \!\!/ \prime} \text{ if } \mathbf{x}\_{k} \in \mathbb{R}\_{b\_{\!\!\!/} \!\!/}\tag{60}$$

$$
\vec{L}\_{b\_{\vec{l}\rangle}}(\mathbf{x}\_k) = \vec{0}, \text{ if } \mathbf{x}\_k \in F\_{b\_{\vec{l}\rangle}}. \tag{61}
$$

Any such a Lie vector field *L* will generate a family of conformations **X***<sup>t</sup>* = (**x**<sup>1</sup> *<sup>t</sup>*, ··· , *xi t*, ··· , *xM t*), where **<sup>x</sup>***k t* <sup>=</sup> **<sup>x</sup>***<sup>k</sup>* <sup>+</sup> *<sup>t</sup> L*(**x***k*), *k* = 1, ··· , *M*. Moreover, the Lie vector field *L* can be generated to the molecular surface *M***X**, as shown in Appendix A.

The derivative *G <sup>L</sup>*(**X**) is given by

$$\mathbf{G}\_{\vec{L}}(\mathbf{X}) = \nu\_{\varepsilon}\mu\_{\varepsilon}V\_{\vec{L}}(\Omega\_{\mathbf{X}}) + d\_{\text{w}}\nu\_{\varepsilon}\mu\_{\varepsilon}A\_{\vec{L}}(M\_{\mathbf{X}}) + \sum\_{i=1}^{H} \nu\_{i}\mu\_{i}A\_{\vec{L}}(M\_{\mathbf{X}i}),\tag{62}$$

with

28 Will-be-set-by-IN-TECH

74 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

This is the first time that a theoretically derived formula of the Gibbs free energy is available. Before this, all *ab intitio* predictions are not really *ab initio*. A combined (theoretical and experimental) search for the values of chemical potentials will be the key for the success of

With formula (2) as the Gibbs free energy, the minimization in equation (57) can be pursued by Newton's fastest descending method. To state the result, some definitions are necessary.

Given a molecule *U*, let *V* be the set of atoms in *U* and *N* = |*V*| be the number of atoms and label the atoms as **a**1, **a**2, ··· , **a***N*. For 1 ≤ *i*, *j* ≤ *N*, define *Bij* = *n* if atoms *i* and *j* are connected by a bond with valency *n* (one can imagine that *n* is not necessarily a whole number), if *i* and *j* do not form a bond, then *Bij* = 0. The molecule formula of *U* in chemistry can be seen as a graph *G*(*U*)=(*V*, *E*), where *V* acts as the vertex set of *G*(*U*) and *E* is the edge set of *G*(*U*). An edge in *E* is denoted by {*i*, *j*}, If two atoms **a***<sup>i</sup>* and **a***<sup>j</sup>* are connected by a covalent bond, i.e., *Bij* = *n* ≥ 1, then {*i*, *j*} ∈ *E* is an edge. Call *G*(*U*) the **molecular graph** of *U*. FIGURE 1 is a molecular graph if the side chain R consisting of only one atom, such as in the amino acid

A graph *G* is connected if from any vertex *v* one can follow the edges in the graph to arrive any other vertex. If a graph is not connected, then it has several **connected components**, each

Let **b** = **a***α***a***<sup>β</sup>* be a covalent bond in the molecule *U* connecting two atoms **a***<sup>α</sup>* and **a***β*. The bond **b** is rotatable if and only if: 1. the valency of **b** is not greater than 1; 2. in the molecular graph *G*(*U*), if one deletes {*α*, *β*}, the remaining graph *G*(*U*)\{*α*, *β*} = (*V*, *E*\{*α*, *β*}) has exactly two connected components and neither component has rotational symmetry around the bond **b**.

for example, is via Lie vector field induced by moving the atomic position **x***i*. In fact, any

moving **x***<sup>i</sup>* from **x***<sup>i</sup>* to **x***<sup>i</sup>* + (Δ*xi*, 0, 0) while keep other nuclear center fixed, will induce *Lxi* :

*Lxi* , *G Lyi* , *G Lzi*

Rotating around a covalent bond *bij* also induce a Lie vector field *Lbij* : **<sup>X</sup>** <sup>→</sup> **<sup>R</sup>**3. In fact if **<sup>a</sup>***i***a***<sup>j</sup>*

, etc. and

*Lxi*

�**x***<sup>i</sup> G*(**X**)=(*G*

, *Gyi*

*Lxi*(**x***j*)=(0, 0, 0) for *<sup>j</sup>* �<sup>=</sup> *<sup>i</sup>*. Similarly

, *Gzi*)(**X**). The calculation of *Gxi*(**X**),

)(**X**), (58)

*<sup>L</sup>* : **<sup>X</sup>** <sup>→</sup> **<sup>R</sup>**3. For example,

*Lyi* and

*Lzi* can

is itself a connected graph. All molecular graphs are connected.

Let **x***<sup>i</sup>* = (*xi*, *yi*, *zi*), write **F** = − �**x***<sup>i</sup> G*(**X**) = −(*Gxi*

be described as well. Then write *Gxi* = *G*

form the covalent bond *bij*, then the bond axis is

infinitesimal change of structure **X** will induce a Lie vector field

*Lxi*(**x***i*)=(1, 0, 0) and

the *ab initio* prediction of protein structure.

**10.3. Gradient**

Glycine.

*10.3.1. Molecular graphs*

*10.3.2. Rotatable bonds*

*10.3.3. Derivatives of G*(**X**)

**<sup>X</sup>** <sup>→</sup> **<sup>R</sup>**3, such that

$$V\_{\vec{L}}(\Omega \mathbf{x}) = -\int\_{M\mathbf{x}} \vec{L} \bullet \vec{\text{Nd}} \mathcal{H}^2 \,, \quad A\_{\vec{L}}(M\mathbf{x}) = -2 \int\_{M\mathbf{x}} H(\vec{L} \bullet \vec{\text{N}}) \text{d}\mathcal{H}^2 \,, \tag{63}$$

where *<sup>N</sup>* is the outer unit normal of *<sup>M</sup>***X**, *<sup>H</sup>* the mean curvature of *<sup>M</sup>***X**, and <sup>H</sup><sup>2</sup> the Hausdorff measure. Define *ft*,*<sup>i</sup>* : **<sup>R</sup>**<sup>3</sup> <sup>→</sup> **<sup>R</sup>** as *ft i*(**x**) = dist(**x**, *<sup>M</sup>***X***<sup>t</sup> <sup>i</sup>*) <sup>−</sup> dist(**x**, *<sup>M</sup>***X***<sup>t</sup>* \*M***X***<sup>t</sup> <sup>i</sup>*), and define on *M***<sup>X</sup>**

$$\bigtriangledown f\_{0,i} = \bigtriangledown f\_{0,i} - (\bigtriangledown f\_{0,i} \bullet \vec{N}) \,\vec{N}, \quad f'\_{0,i} = \left. \frac{\partial f\_{li}}{\partial t} \right|\_{t=0}, \quad \frac{\mathbf{d} f\_{0,i}}{\mathbf{d}t} = \mathbf{\vec{L}} \bullet \boldsymbol{\Box} f\_{0,i} + f'\_{0,i}, \tag{64}$$

then let *η* be the unit outward conormal vector of *∂M***<sup>X</sup>** *<sup>i</sup>* (normal to *∂M***<sup>X</sup>** *<sup>i</sup>* but tangent to *M***X**),

$$A\_{\vec{L}}(M\_{\mathbf{X}\vec{\boldsymbol{\eta}}}) = -2 \int\_{M\_{\mathbf{X}\vec{\boldsymbol{\eta}}}} H(\vec{L} \bullet \vec{\mathcal{N}}) \mathrm{d}\mathcal{H}^2 + \int\_{\partial M\_{\mathbf{X}\vec{\boldsymbol{\eta}}}} \left[ \vec{L} \bullet \vec{\eta} - \frac{\frac{\mathrm{d}f\_{0\vec{\boldsymbol{\eta}}}}{\mathrm{d}\vec{\boldsymbol{\eta}}}}{|\,\Box\_{M\_{\mathbf{X}\vec{\boldsymbol{\eta}}}} f\_{0\vec{\boldsymbol{\eta}}}|} \right] \mathrm{d}\mathcal{H}^1. \tag{65}$$

The **X***<sup>t</sup>* is all the information needed in calculating the molecular surface *M***X***<sup>t</sup>* , see Connolly (1983). To calculate, the above formulas have to be translated into formulas on the molecular surface *M***X**. These translations are given in Appendix A, they are calculable (all integrals are integrable, i.e., can be expressed by analytic formulas with variables **X**) and were calculated piecewise on *M***X**.

#### *10.3.4. The gradient*

Let a protein U have *L* rotatable bonds (**b**1, ··· , **b***i*, ··· , **b***L*). Let *θ<sup>i</sup>* denote the dihedral angle around the rotatable bond **b***i*. A conformation **X** of U can be expressed in terms of these rotatable dihedral angles Θ = (*θ*1, ··· , *θi*, ··· , *θL*), then

$$G(\mathbb{X}) = G(\Theta),\tag{66}$$

#### 30 Will-be-set-by-IN-TECH 76 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

and the gradient of *G* can be written as

$$\Gamma \supset \mathsf{G}(\Theta) = \left( \frac{\partial \mathsf{G}}{\partial \theta\_1}, \dots, \frac{\partial \mathsf{G}}{\partial \theta\_i}, \dots, \frac{\partial \mathsf{G}}{\partial \theta\_L} \right) (\Theta) = (\mathsf{G}\_{\vec{\mathsf{L}}\_{\mathsf{b}\_1}}, \dots, \mathsf{G}\_{\vec{\mathsf{L}}\_{\mathsf{b}\_l}}, \dots, \mathsf{G}\_{\vec{\mathsf{L}}\_{\mathsf{b}\_L}}) (\mathsf{X}).\tag{67}$$

for Protein Folding 31

<sup>|</sup>**x***α*<sup>1</sup>−**x***α*<sup>0</sup><sup>|</sup> <sup>=</sup> **<sup>x</sup>***α*<sup>0</sup> <sup>+</sup> *<sup>t</sup>***b***α*. Each *<sup>b</sup><sup>α</sup>* divides {**x**1, ··· , **<sup>x</sup>***M*} into two groups *<sup>F</sup><sup>α</sup>* and *<sup>R</sup>α*, balls in *R<sup>α</sup>* will be rotated while balls in *F<sup>α</sup>* will be fixed. Note that these partitions are independent of *P***X**, they only depend on the molecular graph of the protein molecule. Let *M<sup>α</sup>* be this

The formula of rotating a point *X* around a line *L* : **y** = **x** + *t***b** (|**b**| = 1) by an angle *ω* is *<sup>R</sup>*(*X*) = **<sup>x</sup>** <sup>+</sup> **<sup>A</sup>**(*ω*)(*<sup>X</sup>* <sup>−</sup> **<sup>x</sup>**). Let *<sup>I</sup>* be the identity matrix, *<sup>B</sup>* <sup>=</sup> **bb***<sup>T</sup>* and *Zb* the matrix such that the outer product **b**∧*X* = *ZbX*, then the orthonormal matrix **A**(*ω*)=(1 − cos *ω*)*B* + cos *ωI* +

The topology of a protein molecule guarantees that if two bonds *b<sup>α</sup>* and *b<sup>β</sup>* such that *R<sup>α</sup>* ⊂ *Rβ*, then {**x***α*0, **x***α*1} ⊂ *Rβ*. Let *b*<sup>1</sup> and *b*<sup>2</sup> be two bonds and *L*<sup>1</sup> : **x** = **x**<sup>1</sup> + *t***b**<sup>1</sup> and *L*<sup>2</sup> : **x** = **x**<sup>2</sup> + *t***b**<sup>2</sup> be the two rotating lines and *X* ∈ (**x**1, **x**2, ··· , **x***N*). To prove equation (1), there are only two cases to consider: *R*<sup>1</sup> ⊂ *R*<sup>2</sup> and *R*<sup>1</sup> ∩ *R*<sup>2</sup> = ∅. In any case, if *X* ∈ *F*<sup>1</sup> ∩ *F*2, then *M*<sup>1</sup> ◦ *M*2(*X*) =

On the other hand *b*<sup>1</sup> and hence *L*<sup>1</sup> itself will be rotated by *M*2, *L*<sup>3</sup> = *M*2(*L*1) = **x**<sup>3</sup> + *t***b**3, where **x**<sup>3</sup> = **x**<sup>2</sup> + **A**2(*ω*2)(**x**<sup>1</sup> − **x**2), **b**<sup>3</sup> = **A**2(*ω*2)**b**1. Since *X* ∈ *R*<sup>1</sup> ⊂ *R*<sup>2</sup> and *M*2(*X*) ∈ *R*<sup>1</sup> (in the new conformation *M*2(*P*) where rotation around *b*<sup>1</sup> is rotation around *L*3), *M*<sup>1</sup> ◦ *M*2(*X*)

Let **<sup>v</sup>** <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> be an arbitrary vector, writing **<sup>A</sup>**1(*ω*1) = **<sup>A</sup>**1, **<sup>A</sup>**2(*ω*2) = **<sup>A</sup>**2, and **<sup>A</sup>**3(*ω*1) = **<sup>A</sup>**3,

For any orthonormal matrix *O*, (*O***b**1)•(*O***v**) = **b**<sup>1</sup> •**v**, *O*(**b**1∧**v**)=(*O***b**1)∧(*O***v**). Then by

**A**3**A**2**v** = (1 − cos *ω*1)[**b**3•(**A**2**v**)]**b**<sup>3</sup> + cos *ω*1(**A**2**v**) + sin *ω*1**b**3∧(**A**2**v**)

If *R*<sup>1</sup> ∩ *R*<sup>2</sup> = ∅ and *X* ∈ *R*2, then *X* and *M*2(*X*) ∈ *F*<sup>1</sup> hence *M*<sup>1</sup> ◦ *M*2(*X*) = *M*2(*X*) =

*<sup>i</sup>*=1*B*(**x***i*,*ri*) and **x***α*<sup>0</sup> and **x***α*<sup>1</sup> be bonded by *bα*, the rotation line of *b<sup>α</sup>* is **x***α*<sup>0</sup> +

Gibbs Free Energy Formula for Protein Folding 77

*M<sup>α</sup>* ◦*Mβ*(**X**) = *M<sup>β</sup>* ◦*Mα*(**X**), **X** ∈ (**x**1, **x**2, ··· , **x***M*), 1≤*α*, *β*≤ *L*. (1)

*M*2◦*M*1(*X*) =**x**2+**A**2(*ω*2)(**x**1−**x**2)+**A**2(*ω*2)**A**1(*ω*1)(*X*−**x**1). (2)

*M*1◦*M*2(*X*) =**x**<sup>2</sup> + **A**2(*ω*2)(**x**<sup>1</sup> − **x**2) + **A**3(*ω*1)**A**2(*ω*2)(*X* − **x**1). (3)

**A**2**A**1**v**=(1−cos *ω*1)(**b**1•**v**)**A**2**b**1+cos *ω*1**A**2**v**+sin *ω*1**A**2(**b**1∧**v**). (4)

= (1 − cos *ω*1)(**b**1•**v**)**A**2**b**<sup>1</sup> + cos *ω*1**A**2**v** + sin *ω*1**A**2(**b**1∧**v**). (5)

**Appendix**

Let *<sup>P</sup>***<sup>X</sup>** <sup>=</sup> <sup>∪</sup>*<sup>N</sup>*

*t* **<sup>x</sup>***α*<sup>1</sup>−**x***α*<sup>0</sup>

sin *ωZb*.

then

**b**<sup>3</sup> = **A**2(*ω*2)**b**1,

*M*<sup>2</sup> ◦ *M*1(*X*).

**A.1. Rotation order**

**A. Calculations on the molecular surface**

rotation-fixation, it will be shown that

*M*<sup>2</sup> ◦ *M*1(*X*) = *X*. If *X* ∈ *R*<sup>1</sup> ⊂ *R*2, then

will be the rotation *R*<sup>3</sup> around *L*<sup>3</sup> of *M*2(*X*) by angle *ω*1, thus

Since **v** was arbitrary, equations (2) to (5) show equation (1) is true.

If the rotation around **b***<sup>i</sup>* with rotating angle −*sG*� *Lbi* (**X**) on *Rbi* and fix atoms in *Fbi* be denoted as *Mi*, new conformation **<sup>Y</sup>***<sup>s</sup>* = *ML* ◦ *ML*−<sup>1</sup> ◦ ... ◦ *<sup>M</sup>*<sup>1</sup> will be obtained, where *<sup>s</sup>* > 0 is a suitable step length. That is to say, the dihedral angles of **Y***<sup>s</sup>* are

$$\delta \left[ \theta\_1 - sG\_{\widetilde{L}\_{\theta\_1}}(\mathbf{X}), \dots, \theta\_l - sG\_{\widetilde{L}\_{\theta\_l}}(\mathbf{X}), \dots, \theta\_L - sG\_{\widetilde{L}\_{\theta\_L}}(\mathbf{X}) \right].$$

The order of rotations in fact is irrelevant, i.e., by any order, the same conformation **Y***<sup>s</sup>* will always be obtained, as proved in Fang and Jing (2008) and Appendix A. This way one can fast change the structure by simultaneous rotate around all rotatable bonds.

This actually is the Newton's fastest descending method, it reduces the Gibbs free energy *G*(**X**) most efficiently. Afore mentioned simulations of Fang and Jing (2010) used this method.

## **10.4. Kinetics**

There are evidence that some protein's native structure is not the global minimum of the Gibbs free energy, but only a local minimum. If the native structure of a protein achieves the global minimal value of the Gibbs free energy, the folding process is **thermodynamic**; if it is only a local minimum, the folding process is **kinetic**, Lazaridis and Karplus (2003).

With the formula (2) and the gradient just obtained, one actually has the kinetic in hand. In fact, for any atomic position **x***i*, the kinetic force is **F***i*(**X**) = − �**x***<sup>i</sup> G*(**X**), Dai (2007). With formula (2) these quantities are readily calculable as mentioned above. The resulting Newton's fastest descending method will help us find the native structure, either in the thermodynamic case or in the kinetic case, here the thermodynamic and kinetic cases are combined by the Gibbs free energy formula (2) and its derivatives.

The moving along − � *G* method was used in the simulation in Fang and Jing (2010).

## **11. Conclusion**

A quantum statistical theory of protein folding for monomeric, single domain, self folding globular proteins is suggested. The assumptions of the theory fit all observed realities of protein folding. The resulting formulas (1) and (2) do not have any arbitrary parameters and all terms in them have clear physical meaning. Potential energies involving pairwise interactions between atoms do not appear in them.

Formulas (1) and (2) have explanation powers. They give unified explanation to folding and denaturation, to the hydrophobic effect in protein folding and its relation with the hydrogen bonding. The formulas also explain the relative successes of surface area protein folding models. Relation between kinetic and thermodynamic of protein folding is discussed, driving force formula comes from the Gibbs free energy formula (2) are also given. Energy surface theory will be much easier to handle. The concept of �*G* is clarified.

## **Appendix**

30 Will-be-set-by-IN-TECH

as *Mi*, new conformation **<sup>Y</sup>***<sup>s</sup>* = *ML* ◦ *ML*−<sup>1</sup> ◦ ... ◦ *<sup>M</sup>*<sup>1</sup> will be obtained, where *<sup>s</sup>* > 0 is a suitable

*Lbi*

The order of rotations in fact is irrelevant, i.e., by any order, the same conformation **Y***<sup>s</sup>* will always be obtained, as proved in Fang and Jing (2008) and Appendix A. This way one can fast

This actually is the Newton's fastest descending method, it reduces the Gibbs free energy *G*(**X**) most efficiently. Afore mentioned simulations of Fang and Jing (2010) used this method.

There are evidence that some protein's native structure is not the global minimum of the Gibbs free energy, but only a local minimum. If the native structure of a protein achieves the global minimal value of the Gibbs free energy, the folding process is **thermodynamic**; if it is only a

With the formula (2) and the gradient just obtained, one actually has the kinetic in hand. In fact, for any atomic position **x***i*, the kinetic force is **F***i*(**X**) = − �**x***<sup>i</sup> G*(**X**), Dai (2007). With formula (2) these quantities are readily calculable as mentioned above. The resulting Newton's fastest descending method will help us find the native structure, either in the thermodynamic case or in the kinetic case, here the thermodynamic and kinetic cases are

The moving along − � *G* method was used in the simulation in Fang and Jing (2010).

A quantum statistical theory of protein folding for monomeric, single domain, self folding globular proteins is suggested. The assumptions of the theory fit all observed realities of protein folding. The resulting formulas (1) and (2) do not have any arbitrary parameters and all terms in them have clear physical meaning. Potential energies involving pairwise

Formulas (1) and (2) have explanation powers. They give unified explanation to folding and denaturation, to the hydrophobic effect in protein folding and its relation with the hydrogen bonding. The formulas also explain the relative successes of surface area protein folding models. Relation between kinetic and thermodynamic of protein folding is discussed, driving force formula comes from the Gibbs free energy formula (2) are also given. Energy surface

(Θ)=(*G*�

*Lbi*

*L***b**<sup>1</sup>

(**X**), ··· , *θ<sup>L</sup>* − *sG*�

, ··· , *G*� *L***b***i*

76 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

*LbL* (**X**)].

, ··· , *G*� *L***b***<sup>L</sup>*

(**X**) on *Rbi* and fix atoms in *Fbi* be denoted

)(**X**). (67)

and the gradient of *G* can be written as

 *∂G ∂θ*1 , ··· ,

If the rotation around **b***<sup>i</sup>* with rotating angle −*sG*�

[*θ*<sup>1</sup> − *sG*�

step length. That is to say, the dihedral angles of **Y***<sup>s</sup>* are

*Lb*1

*∂G ∂θ<sup>i</sup>*

, ··· , *<sup>∂</sup><sup>G</sup> ∂θ<sup>L</sup>*

(**X**), ··· , *θ<sup>i</sup>* − *sG*�

change the structure by simultaneous rotate around all rotatable bonds.

local minimum, the folding process is **kinetic**, Lazaridis and Karplus (2003).

combined by the Gibbs free energy formula (2) and its derivatives.

theory will be much easier to handle. The concept of �*G* is clarified.

interactions between atoms do not appear in them.

� *G*(Θ) =

**10.4. Kinetics**

**11. Conclusion**

### **A. Calculations on the molecular surface**

#### **A.1. Rotation order**

Let *<sup>P</sup>***<sup>X</sup>** <sup>=</sup> <sup>∪</sup>*<sup>N</sup> <sup>i</sup>*=1*B*(**x***i*,*ri*) and **x***α*<sup>0</sup> and **x***α*<sup>1</sup> be bonded by *bα*, the rotation line of *b<sup>α</sup>* is **x***α*<sup>0</sup> + *t* **<sup>x</sup>***α*<sup>1</sup>−**x***α*<sup>0</sup> <sup>|</sup>**x***α*<sup>1</sup>−**x***α*<sup>0</sup><sup>|</sup> <sup>=</sup> **<sup>x</sup>***α*<sup>0</sup> <sup>+</sup> *<sup>t</sup>***b***α*. Each *<sup>b</sup><sup>α</sup>* divides {**x**1, ··· , **<sup>x</sup>***M*} into two groups *<sup>F</sup><sup>α</sup>* and *<sup>R</sup>α*, balls in *R<sup>α</sup>* will be rotated while balls in *F<sup>α</sup>* will be fixed. Note that these partitions are independent of *P***X**, they only depend on the molecular graph of the protein molecule. Let *M<sup>α</sup>* be this rotation-fixation, it will be shown that

$$M\_{\mathfrak{A}} \circ M\_{\mathfrak{F}}(\mathbf{X}) = M\_{\mathfrak{F}} \circ M\_{\mathfrak{A}}(\mathbf{X}), \mathbf{X} \in (\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_M), 1 \le \mathfrak{a}, \beta \le L. \tag{1}$$

The formula of rotating a point *X* around a line *L* : **y** = **x** + *t***b** (|**b**| = 1) by an angle *ω* is *<sup>R</sup>*(*X*) = **<sup>x</sup>** <sup>+</sup> **<sup>A</sup>**(*ω*)(*<sup>X</sup>* <sup>−</sup> **<sup>x</sup>**). Let *<sup>I</sup>* be the identity matrix, *<sup>B</sup>* <sup>=</sup> **bb***<sup>T</sup>* and *Zb* the matrix such that the outer product **b**∧*X* = *ZbX*, then the orthonormal matrix **A**(*ω*)=(1 − cos *ω*)*B* + cos *ωI* + sin *ωZb*.

The topology of a protein molecule guarantees that if two bonds *b<sup>α</sup>* and *b<sup>β</sup>* such that *R<sup>α</sup>* ⊂ *Rβ*, then {**x***α*0, **x***α*1} ⊂ *Rβ*. Let *b*<sup>1</sup> and *b*<sup>2</sup> be two bonds and *L*<sup>1</sup> : **x** = **x**<sup>1</sup> + *t***b**<sup>1</sup> and *L*<sup>2</sup> : **x** = **x**<sup>2</sup> + *t***b**<sup>2</sup> be the two rotating lines and *X* ∈ (**x**1, **x**2, ··· , **x***N*). To prove equation (1), there are only two cases to consider: *R*<sup>1</sup> ⊂ *R*<sup>2</sup> and *R*<sup>1</sup> ∩ *R*<sup>2</sup> = ∅. In any case, if *X* ∈ *F*<sup>1</sup> ∩ *F*2, then *M*<sup>1</sup> ◦ *M*2(*X*) = *M*<sup>2</sup> ◦ *M*1(*X*) = *X*. If *X* ∈ *R*<sup>1</sup> ⊂ *R*2, then

$$M\_2 \circ M\_1(X) = \mathbf{x}\_2 + \mathbf{A}\_2(\omega\_2)(\mathbf{x}\_1 - \mathbf{x}\_2) + \mathbf{A}\_2(\omega\_2)\mathbf{A}\_1(\omega\_1)(X - \mathbf{x}\_1). \tag{2}$$

On the other hand *b*<sup>1</sup> and hence *L*<sup>1</sup> itself will be rotated by *M*2, *L*<sup>3</sup> = *M*2(*L*1) = **x**<sup>3</sup> + *t***b**3, where **x**<sup>3</sup> = **x**<sup>2</sup> + **A**2(*ω*2)(**x**<sup>1</sup> − **x**2), **b**<sup>3</sup> = **A**2(*ω*2)**b**1. Since *X* ∈ *R*<sup>1</sup> ⊂ *R*<sup>2</sup> and *M*2(*X*) ∈ *R*<sup>1</sup> (in the new conformation *M*2(*P*) where rotation around *b*<sup>1</sup> is rotation around *L*3), *M*<sup>1</sup> ◦ *M*2(*X*) will be the rotation *R*<sup>3</sup> around *L*<sup>3</sup> of *M*2(*X*) by angle *ω*1, thus

$$M\_1 \circ M\_2(X) = \mathbf{x}\_2 + \mathbf{A}\_2(\omega\_2)(\mathbf{x}\_1 - \mathbf{x}\_2) + \mathbf{A}\_3(\omega\_1)\mathbf{A}\_2(\omega\_2)(X - \mathbf{x}\_1). \tag{3}$$

Let **<sup>v</sup>** <sup>∈</sup> **<sup>R</sup>**<sup>3</sup> be an arbitrary vector, writing **<sup>A</sup>**1(*ω*1) = **<sup>A</sup>**1, **<sup>A</sup>**2(*ω*2) = **<sup>A</sup>**2, and **<sup>A</sup>**3(*ω*1) = **<sup>A</sup>**3, then

$$\mathbf{A\_2A\_1v} = (1 - \cos\omega\_1)(\mathbf{b\_1}\bullet\mathbf{v})\mathbf{A\_2b\_1} + \cos\omega\_1\mathbf{A\_2v} + \sin\omega\_1\mathbf{A\_2}(\mathbf{b\_1}\wedge\mathbf{v}).\tag{4}$$

For any orthonormal matrix *O*, (*O***b**1)•(*O***v**) = **b**<sup>1</sup> •**v**, *O*(**b**1∧**v**)=(*O***b**1)∧(*O***v**). Then by **b**<sup>3</sup> = **A**2(*ω*2)**b**1,

$$\mathbf{A}\_3 \mathbf{A}\_2 \mathbf{v} = (1 - \cos \omega\_1)[\mathbf{b}\_3 \bullet (\mathbf{A}\_2 \mathbf{v})] \mathbf{b}\_3 + \cos \omega\_1 (\mathbf{A}\_2 \mathbf{v}) + \sin \omega\_1 \mathbf{b}\_3 \wedge (\mathbf{A}\_2 \mathbf{v})$$

$$= (1 - \cos \omega\_1)(\mathbf{b}\_1 \bullet \mathbf{v}) \mathbf{A}\_2 \mathbf{b}\_1 + \cos \omega\_1 \mathbf{A}\_2 \mathbf{v} + \sin \omega\_1 \mathbf{A}\_2 (\mathbf{b}\_1 \wedge \mathbf{v}). \tag{5}$$

Since **v** was arbitrary, equations (2) to (5) show equation (1) is true.

If *R*<sup>1</sup> ∩ *R*<sup>2</sup> = ∅ and *X* ∈ *R*2, then *X* and *M*2(*X*) ∈ *F*<sup>1</sup> hence *M*<sup>1</sup> ◦ *M*2(*X*) = *M*2(*X*) = *M*<sup>2</sup> ◦ *M*1(*X*).

#### 32 Will-be-set-by-IN-TECH 78 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

The molecular surface is consisted of faces. Thus all integrals can be integrated piecewise on faces. There are three kinds of faces, convex, concave, and saddle, Connolly (1983). The formulas on each kind of face are given below. The notation **x**˙ means *L*(**x**) with *L* the corresponding Lie vector field. All van der Waals radii *ri*, as well as the probe radius *rp*, are constants.

#### **A.2. Convex face**

A convex face is a piece of spherical surface lying on some *Si* = *∂B*(**x***i*,*ri*) and bounded by circular arcs *γν*, *<sup>ν</sup>* <sup>=</sup> 1, ··· , *nF*, let **<sup>v</sup>**<sup>0</sup> *<sup>ν</sup>* and **<sup>v</sup>**<sup>1</sup> *<sup>ν</sup>* be *γν*'s vertices and **c***<sup>ν</sup>* and *r<sup>ν</sup>* the center and radius of *γν*'s circle, *rνφν* the arc length of *γν*, **e***<sup>ν</sup>* <sup>3</sup> = (*zν*1, *zν*2, *zν*3) be the unit vector in the direction of (**v**<sup>0</sup> *<sup>ν</sup>* <sup>−</sup> **<sup>c</sup>***ν*)∧(**v**<sup>1</sup> *<sup>ν</sup>* <sup>−</sup> **<sup>c</sup>***ν*), *<sup>d</sup><sup>ν</sup>* <sup>=</sup> **<sup>e</sup>***<sup>ν</sup>* <sup>3</sup> •(**c***<sup>ν</sup>* <sup>−</sup> **<sup>x</sup>***i*), **<sup>e</sup>***<sup>ν</sup>* <sup>1</sup> <sup>=</sup> **<sup>v</sup>**<sup>0</sup> *<sup>ν</sup>*−**c***ij<sup>ν</sup> <sup>r</sup><sup>ν</sup>* = (*xν*1, *<sup>x</sup>ν*2, *<sup>x</sup>ν*3), **<sup>e</sup>***<sup>ν</sup>* <sup>2</sup> = **<sup>e</sup>***<sup>ν</sup>* <sup>3</sup> <sup>∧</sup> **<sup>e</sup>***<sup>ν</sup>* <sup>1</sup> = (*yν*1, *<sup>y</sup>ν*2, *<sup>y</sup>ν*3), 1 <sup>≤</sup> *<sup>ν</sup>* <sup>≤</sup> *nF*. A point **<sup>x</sup>** on *<sup>F</sup>* has the form **<sup>x</sup>** <sup>=</sup> **<sup>x</sup>***<sup>i</sup>* <sup>−</sup> *riN* and *<sup>X</sup>α*(**x**) = **<sup>x</sup>**˙ <sup>−</sup> *riN*˙ , by *<sup>N</sup>*•*N*˙ <sup>≡</sup> 0 and the general divergence formula on sphere,

$$r\_i \int\_F (\mathbf{X}\_{\mathfrak{A}} \bullet \mathbf{N}) H \, \mathbf{d} \mathcal{H}^2 = \int\_F \mathbf{X}\_{\mathfrak{A}} \bullet \mathbf{N} \, \mathbf{d} \mathcal{H}^2 = \frac{-1}{r\_i} \dot{\mathbf{x}}\_i \bullet \sum\_{\nu=1}^M (\mathbf{X}\_{\nu}, \mathbf{Y}\_{\nu}, \mathbf{Z}\_{\nu})\_{\nu} \tag{6}$$

for Protein Folding 33

� *F*

*<sup>t</sup>*=<sup>0</sup> <sup>=</sup> **<sup>p</sup>**˙ <sup>+</sup> *rN*˙ . Using <sup>|</sup>**p**(*t*) <sup>−</sup> **<sup>x</sup>***i*(*t*)<sup>|</sup> <sup>=</sup> *ri* <sup>+</sup> *<sup>r</sup>* <sup>=</sup> constant, let *bi* = (**x***<sup>i</sup>* <sup>−</sup> **<sup>p</sup>**)•**x**˙ *<sup>i</sup>*,

<sup>⎠</sup>, then det *<sup>A</sup>* �<sup>=</sup> 0, **<sup>p</sup>**˙ <sup>=</sup> *<sup>A</sup>*−1**b**. By *<sup>X</sup><sup>α</sup>* •*<sup>N</sup>* <sup>=</sup> **<sup>p</sup>**˙ •*N*,

*r* **p**˙ • 3 ∑ *i*=1

*<sup>j</sup>* )/2*dij*. Then *fP*(**x**)=(**x** − **p**) • **n***k*, where **n***<sup>k</sup>* = (**x***<sup>k</sup>* − **x**1)/*d*1*<sup>k</sup>* is the directed

−*d*1*<sup>k</sup>*

*<sup>W</sup>*∩*F*(*X<sup>α</sup>* • *<sup>N</sup>*)*H*dH<sup>2</sup> has the similar form as that in equation (12).

*tj*

2*akzj* + *bk* <sup>√</sup>Δ*<sup>k</sup>*

<sup>√</sup>*ck* <sup>−</sup> *bkzj* <sup>+</sup> <sup>2</sup>*ck* � *akz*<sup>2</sup>

2*akzj* + *bk*

*<sup>j</sup>* + *bkzj* + *ck*

(2*akzj* + *bk*)*Vk*

*<sup>j</sup>* + *bkzj* + *ck*

*<sup>j</sup>* + *bkzj* + *ck*

∧*N*/|*N*� *tj*

*<sup>d</sup>*12*d*<sup>13</sup> . *F* ∩ *W* is a spherical polygon with arcs *γν*, 1 ≤ *ν* ≤ *n*, including

, *Bk* <sup>=</sup> *<sup>d</sup>*1*<sup>k</sup>* <sup>+</sup> <sup>2</sup>*y*1*<sup>k</sup>* 2*d*1*<sup>k</sup>*

(*Xi*,*Yi*, *Zi*). (12)

Gibbs Free Energy Formula for Protein Folding 79

<sup>2</sup> (**x**<sup>1</sup> + **x***k*) + *y*1*k***n***<sup>k</sup>* and

2

*tj* |d*tj*,

⎠ (14)

⎠ , (15)

<sup>|</sup> and dH<sup>1</sup> <sup>=</sup> *<sup>r</sup>*|*N*�

*bk* Δ*k* �

− arctan √

⎞

⎞ ⎠ .

<sup>−</sup> *bkVk* <sup>√</sup>*ck*

⎞

*<sup>j</sup>* + *bkzj* + *ck*

<sup>−</sup> *bk* <sup>√</sup>*ck* , (13)

*<sup>j</sup>* + *bktj* + *ck* =

*<sup>X</sup><sup>α</sup>* •*<sup>N</sup>* <sup>d</sup>H<sup>2</sup> <sup>=</sup> <sup>1</sup>

Assume that **x**<sup>1</sup> has different water association with **x**<sup>2</sup> and **x**3, let *Ri* = *ri* + *r*, *dij* = |**x***<sup>i</sup>* − **x***j*|,

perpendicular to it), *k* = 2, 3. The projection of *∂W* ∩ *F* on the **x**1**x**2**x**<sup>3</sup> plane is in the form of

Let **A***<sup>k</sup>* = **x***<sup>j</sup>* − **x**<sup>1</sup> + *Ak*(**x***<sup>k</sup>* − **x**1), **B***<sup>k</sup>* = *Bk*(**x***<sup>k</sup>* − **x**1)+(**x**<sup>1</sup> − **p**), **C***<sup>k</sup>* = **B***k*∧**A***k*. Treating *Ak* and

*<sup>k</sup>* > 0. By *η* = *N*�

� arctan

⎛ ⎝2

> ⎛ ⎝

� *akz*<sup>2</sup>

> ⎛ ⎝

� *akz*<sup>2</sup>

<sup>√</sup>*ckUk* <sup>−</sup> (*bkzj* <sup>+</sup> <sup>2</sup>*ck*)*Uk* � *akz*<sup>2</sup>

*<sup>X</sup>α*(**x**) = <sup>d</sup>*φt*(**x**)

*yij* = (*R*<sup>2</sup>

d*t* � � �

⎛ ⎝

*r* � *F*

one or two curves *γk*, ({*j*, *k*} = {2, 3})

<sup>2</sup> <sup>&</sup>gt; 0, then <sup>Δ</sup>*<sup>k</sup>* <sup>=</sup> <sup>4</sup>*akck* <sup>−</sup> *<sup>b</sup>*<sup>2</sup>

*<sup>X</sup><sup>α</sup>* •*η*dH<sup>1</sup>

+ 2*r*<sup>2</sup> *Jk* Δ*k*

+

Let *Uk* <sup>=</sup> ˙ (**A***<sup>k</sup>* • **<sup>n</sup>***k*), *Vk* <sup>=</sup> ˙ (**B***<sup>k</sup>* • **<sup>n</sup>***k*), *Wk* <sup>=</sup> <sup>|</sup>**C***<sup>k</sup>* • **<sup>n</sup>***k*<sup>|</sup> <sup>&</sup>gt; 0, then

d*fP* d*t* | �*MP fP*| 2*r*<sup>2</sup>*Kk* Δ*k*

<sup>d</sup>H1<sup>=</sup> <sup>±</sup>2*r*<sup>2</sup> *Wk*

+ 2

(*∂W*∩*F*)∩*γ<sup>k</sup>*

� *γk*

where the sign is determined by orientation.

where cos *ω* = (**x**2−**x**1)•(**x**3−**x**1)

�

some *<sup>γ</sup><sup>k</sup>* as above, so �


**x**<sup>1</sup> − **p x**<sup>2</sup> − **p x**<sup>3</sup> − **p** ⎞

(*X<sup>α</sup>* •*N*)*<sup>H</sup>* <sup>d</sup>H<sup>2</sup> <sup>=</sup><sup>−</sup>

Here the *Xi*, *Yi*, and *Zi* are the same as in equations (7) to (9).

unit normal of the dividing plane *Pk* (passing through **p** and **t**1*<sup>k</sup>* = <sup>1</sup>

*tk* <sup>=</sup> *Aktj* <sup>+</sup> *Bk*, 0 <sup>≤</sup> *tj* <sup>≤</sup> *zj*, *Ak* <sup>=</sup> *<sup>d</sup>*1*<sup>j</sup>* cos *<sup>ω</sup>*

*Bk* as constants and setting *Hk* <sup>=</sup> **<sup>p</sup>**˙ • **<sup>C</sup>***k*, *Jk* <sup>=</sup> **<sup>A</sup>**˙ *<sup>k</sup>* • **<sup>C</sup>***k*, and *Kk* <sup>=</sup> **<sup>B</sup>**˙ *<sup>k</sup>* • **<sup>C</sup>***k*. Let *akt*

=2 √ *rHk* Δ*k*

**b** = (*b*1, *b*2, *b*3)*T*, *A* =

*<sup>i</sup>* <sup>−</sup> *<sup>R</sup>*<sup>2</sup>

where

$$\begin{aligned} X\_{\nu} &= \frac{r\_{\nu}^{2}}{2} \{ \phi\_{\nu} z\_{\nu 1} + \sin \phi\_{\nu} [\cos \phi\_{\nu} (\mathbf{x}\_{\nu 2} y\_{\nu 3} + \mathbf{x}\_{\nu 3} y\_{\nu 2}) + \sin \phi\_{\nu} (y\_{\nu 2} y\_{\nu 3} - \mathbf{x}\_{\nu 2} \mathbf{x}\_{\nu 3}) \} \\ &+ r\_{\nu} d\_{\nu} z\_{\nu 2} \left[ y\_{\nu 3} \sin \phi\_{\nu} - \mathbf{x}\_{\nu 3} \left( 1 - \cos \phi\_{\nu} \right) \right], \end{aligned} \tag{7}$$

$$\begin{split} Y\_{\boldsymbol{\nu}} &= \frac{r\_{\boldsymbol{\nu}}^{2}}{2} \{ \phi\_{\boldsymbol{\nu}} \boldsymbol{z}\_{\boldsymbol{\nu}2} + \sin \phi\_{\boldsymbol{\nu}} [\cos \phi\_{\boldsymbol{\nu}} (\mathbf{x}\_{\boldsymbol{\nu}3} \mathbf{y}\_{\boldsymbol{\nu}1} + \mathbf{x}\_{\boldsymbol{\nu}1} \mathbf{y}\_{\boldsymbol{\nu}3}) + \sin \phi\_{\boldsymbol{\nu}} (\mathbf{y}\_{\boldsymbol{\nu}1} \mathbf{y}\_{\boldsymbol{\nu}3} - \mathbf{x}\_{\boldsymbol{\nu}1} \mathbf{x}\_{\boldsymbol{\nu}3})] \} \\ &+ r\_{\boldsymbol{\nu}} d\_{\boldsymbol{\nu}} z\_{\boldsymbol{\nu}3} \left[ y\_{\boldsymbol{\nu}1} \sin \phi\_{\boldsymbol{\nu}} - \mathbf{x}\_{\boldsymbol{\nu}1} \left( 1 - \cos \phi\_{\boldsymbol{\nu}} \right) \right] \,. \end{split} \tag{8}$$

$$\begin{split} Z\_{\boldsymbol{\nu}} &= \frac{r\_{\boldsymbol{\nu}}^{2}}{2} \{ \phi\_{\boldsymbol{\nu}} \boldsymbol{z}\_{\nu 3} + \sin \phi\_{\boldsymbol{\nu}} [\cos \phi\_{\boldsymbol{\nu}} (\mathbf{x}\_{\boldsymbol{\nu}1} \boldsymbol{y}\_{\boldsymbol{\nu}2} + \mathbf{x}\_{\boldsymbol{\nu}2} \boldsymbol{y}\_{\boldsymbol{\nu}1}) + \sin \phi\_{\boldsymbol{\nu}} (\mathbf{y}\_{\boldsymbol{\nu}1} \boldsymbol{y}\_{\boldsymbol{\nu}2} - \mathbf{x}\_{\boldsymbol{\nu}1} \mathbf{x}\_{\boldsymbol{\nu}2})] \} \\ &+ r\_{\boldsymbol{\nu}} d\_{\boldsymbol{\nu}} z\_{\boldsymbol{\nu}1} \left[ \boldsymbol{y}\_{\boldsymbol{\nu}2} \sin \phi\_{\boldsymbol{\nu}} - \mathbf{x}\_{\boldsymbol{\nu}2} \left( 1 - \cos \phi\_{\boldsymbol{\nu}} \right) \right]. \end{split} \tag{9}$$

#### **A.3. Concave face**

A concave face *F* is a spherical polygon on the probe sphere *S* when *S* is simultaneously tangent to 3 balls *B*(**x***i*,*ri*), 1 ≤ *i* ≤ 3. *F* is expressed by parameters *ti* ≥ 0, *i* = 1, 2, 3,

$$\mathbf{x} = \mathbf{p} + rN = \mathbf{p} + r \frac{t\_1 \mathbf{x}\_1 + t\_2 \mathbf{x}\_2 + t\_3 \mathbf{x}\_3 - \mathbf{p}}{|t\_1 \mathbf{x}\_1 + t\_2 \mathbf{x}\_2 + t\_3 \mathbf{x}\_3 - \mathbf{p}|}, \quad t\_1 + t\_2 + t\_3 = 1. \tag{10}$$

$$\phi\_t(\mathbf{x}) = \mathbf{p}(t) + rN(t) = \mathbf{p}(t) + r \frac{t\_1 \mathbf{x}\_1(t) + t\_2 \mathbf{x}\_2(t) + t\_3 \mathbf{x}\_3(t) - \mathbf{p}(t)}{|t\_1 \mathbf{x}\_1(t) + t\_2 \mathbf{x}\_2(t) + t\_3 \mathbf{x}\_3(t) - \mathbf{p}(t)|},\tag{11}$$

$$\begin{aligned} \mathbf{X}\_{\mathbf{d}}(\mathbf{x}) &= \left. \frac{\mathbf{d}\boldsymbol{\rho}\_{i}(\mathbf{x})}{\mathbf{d}t} \right|\_{t=0} = \dot{\mathbf{p}} + r\dot{N}. \text{ Using } |\mathbf{p}(t) - \mathbf{x}\_{i}(t)| = r\_{i} + r = \text{constant, let } b\_{i} = (\mathbf{x}\_{i} - \mathbf{p}) \bullet \dot{\mathbf{x}}\_{i}, \\ \mathbf{b} &= (b\_{1}, b\_{2}, b\_{3})^{T}, A = \begin{pmatrix} \mathbf{x}\_{1} - \mathbf{p} \\ \mathbf{x}\_{2} - \mathbf{p} \\ \mathbf{x}\_{3} - \mathbf{p} \end{pmatrix}, \text{then } \text{det}\, A \neq \mathbf{0}, \dot{\mathbf{p}} = A^{-1}\mathbf{b}. \text{ By } \mathbf{X}\_{\mathbf{d}} \bullet N = \dot{\mathbf{p}} \bullet N, \\\\ & \int\_{-\left(\mathbf{y}\_{1} - \mathbf{y}\_{1}\right) \cap \mathbf{H} \times \mathbf{M}^{2}} - \int\_{-\left(\mathbf{y}\_{1} - \mathbf{y}\_{1}\right) \cap \mathbf{H}^{2}} \frac{1}{2} \cdot \frac{3}{\sum\_{i} \mathbf{y}\_{i} \cdot \mathbf{y}\_{i} \cdot \mathbf{n}} . \end{aligned}$$

$$\operatorname{tr} \int\_{F} (\mathbf{X}\_{\mathrm{d}} \bullet \mathbf{N}) H \, \mathbf{d} \mathcal{H}^{2} = - \int\_{F} \mathbf{X}\_{\mathrm{d}} \bullet \mathbf{N} \, \mathbf{d} \mathcal{H}^{2} = \frac{1}{r} \dot{\mathbf{p}} \bullet \sum\_{i=1}^{3} (\mathbf{X}\_{i\prime} \mathbf{Y}\_{i\prime} \mathbf{Z}\_{i}).\tag{12}$$

Here the *Xi*, *Yi*, and *Zi* are the same as in equations (7) to (9).

32 Will-be-set-by-IN-TECH

The molecular surface is consisted of faces. Thus all integrals can be integrated piecewise on faces. There are three kinds of faces, convex, concave, and saddle, Connolly (1983). The formulas on each kind of face are given below. The notation **x**˙ means *L*(**x**) with *L* the corresponding Lie vector field. All van der Waals radii *ri*, as well as the probe radius *rp*, are

A convex face is a piece of spherical surface lying on some *Si* = *∂B*(**x***i*,*ri*) and bounded by

(*yν*1, *<sup>y</sup>ν*2, *<sup>y</sup>ν*3), 1 <sup>≤</sup> *<sup>ν</sup>* <sup>≤</sup> *nF*. A point **<sup>x</sup>** on *<sup>F</sup>* has the form **<sup>x</sup>** <sup>=</sup> **<sup>x</sup>***<sup>i</sup>* <sup>−</sup> *riN* and *<sup>X</sup>α*(**x**) = **<sup>x</sup>**˙ <sup>−</sup> *riN*˙ ,

<sup>1</sup> <sup>=</sup> **<sup>v</sup>**<sup>0</sup>

*<sup>X</sup><sup>α</sup>* •*<sup>N</sup>* <sup>d</sup>H<sup>2</sup> <sup>=</sup> <sup>−</sup><sup>1</sup>

<sup>2</sup> {*φνzν*<sup>1</sup> <sup>+</sup> sin *φν*[cos *φν*(*xν*2*yν*<sup>3</sup> <sup>+</sup> *<sup>x</sup>ν*3*yν*2) + sin *φν*(*yν*2*yν*<sup>3</sup> <sup>−</sup> *<sup>x</sup>ν*2*xν*3)}

<sup>2</sup> {*φνzν*<sup>2</sup> <sup>+</sup> sin *φν*[cos *φν*(*xν*3*yν*<sup>1</sup> <sup>+</sup> *<sup>x</sup>ν*1*yν*3) + sin *φν*(*yν*1*yν*<sup>3</sup> <sup>−</sup> *<sup>x</sup>ν*1*xν*3)]}

<sup>2</sup> {*φνzν*<sup>3</sup> <sup>+</sup> sin *φν*[cos *φν*(*xν*1*yν*<sup>2</sup> <sup>+</sup> *<sup>x</sup>ν*2*yν*1) + sin *φν*(*yν*1*yν*<sup>2</sup> <sup>−</sup> *<sup>x</sup>ν*1*xν*2)]}

A concave face *F* is a spherical polygon on the probe sphere *S* when *S* is simultaneously

*t*1**x**<sup>1</sup> + *t*2**x**<sup>2</sup> + *t*3**x**<sup>3</sup> − **p** |*t*1**x**<sup>1</sup> + *t*2**x**<sup>2</sup> + *t*3**x**<sup>3</sup> − **p**|

tangent to 3 balls *B*(**x***i*,*ri*), 1 ≤ *i* ≤ 3. *F* is expressed by parameters *ti* ≥ 0, *i* = 1, 2, 3,

*<sup>ν</sup>*−**c***ij<sup>ν</sup>*

*ri* **x**˙ *<sup>i</sup>*• *M* ∑ *ν*=1

+*rνdνzν*<sup>2</sup> [*yν*<sup>3</sup> sin *φν* − *xν*<sup>3</sup> (1 − cos *φν*)] , (7)

+*rνdνzν*<sup>3</sup> [*yν*<sup>1</sup> sin *φν* − *xν*<sup>1</sup> (1 − cos *φν*)] , (8)

+*rνdνzν*<sup>1</sup> [*yν*<sup>2</sup> sin *φν* − *xν*<sup>2</sup> (1 − cos *φν*)] . (9)

*t*1**x**1(*t*) + *t*2**x**2(*t*) + *t*3**x**3(*t*) − **p**(*t*) |*t*1**x**1(*t*) + *t*2**x**2(*t*) + *t*3**x**3(*t*) − **p**(*t*)|

*<sup>ν</sup>* be *γν*'s vertices and **c***<sup>ν</sup>* and *r<sup>ν</sup>* the center and radius

78 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

<sup>3</sup> = (*zν*1, *zν*2, *zν*3) be the unit vector in the direction

*<sup>r</sup><sup>ν</sup>* = (*xν*1, *<sup>x</sup>ν*2, *<sup>x</sup>ν*3), **<sup>e</sup>***<sup>ν</sup>*

<sup>2</sup> = **<sup>e</sup>***<sup>ν</sup>*

(*Xν*,*Yν*, *Zν*), (6)

, *t*<sup>1</sup> + *t*<sup>2</sup> + *t*<sup>3</sup> = 1. (10)

, (11)

<sup>3</sup> <sup>∧</sup> **<sup>e</sup>***<sup>ν</sup>* <sup>1</sup> =

*<sup>ν</sup>* and **<sup>v</sup>**<sup>1</sup>

<sup>3</sup> •(**c***<sup>ν</sup>* <sup>−</sup> **<sup>x</sup>***i*), **<sup>e</sup>***<sup>ν</sup>*

 *F*

constants.

of (**v**<sup>0</sup>

where

**A.2. Convex face**

*<sup>ν</sup>* <sup>−</sup> **<sup>c</sup>***ν*)∧(**v**<sup>1</sup>

circular arcs *γν*, *<sup>ν</sup>* <sup>=</sup> 1, ··· , *nF*, let **<sup>v</sup>**<sup>0</sup>

*ri F*

*<sup>X</sup><sup>ν</sup>* <sup>=</sup> *<sup>r</sup>*<sup>2</sup> *ν*

*<sup>Y</sup><sup>ν</sup>* <sup>=</sup> *<sup>r</sup>*<sup>2</sup> *ν*

*<sup>Z</sup><sup>ν</sup>* <sup>=</sup> *<sup>r</sup>*<sup>2</sup> *ν*

**A.3. Concave face**

of *γν*'s circle, *rνφν* the arc length of *γν*, **e***<sup>ν</sup>*

*<sup>ν</sup>* <sup>−</sup> **<sup>c</sup>***ν*), *<sup>d</sup><sup>ν</sup>* <sup>=</sup> **<sup>e</sup>***<sup>ν</sup>*

**x** = **p** + *rN* = **p** + *r*

*φt*(**x**) = **p**(*t*) + *rN*(*t*) = **p**(*t*) + *r*

by *<sup>N</sup>*•*N*˙ <sup>≡</sup> 0 and the general divergence formula on sphere,

(*X<sup>α</sup>* •*N*)*<sup>H</sup>* <sup>d</sup>H<sup>2</sup> <sup>=</sup>

Assume that **x**<sup>1</sup> has different water association with **x**<sup>2</sup> and **x**3, let *Ri* = *ri* + *r*, *dij* = |**x***<sup>i</sup>* − **x***j*|, *yij* = (*R*<sup>2</sup> *<sup>i</sup>* <sup>−</sup> *<sup>R</sup>*<sup>2</sup> *<sup>j</sup>* )/2*dij*. Then *fP*(**x**)=(**x** − **p**) • **n***k*, where **n***<sup>k</sup>* = (**x***<sup>k</sup>* − **x**1)/*d*1*<sup>k</sup>* is the directed unit normal of the dividing plane *Pk* (passing through **p** and **t**1*<sup>k</sup>* = <sup>1</sup> <sup>2</sup> (**x**<sup>1</sup> + **x***k*) + *y*1*k***n***<sup>k</sup>* and perpendicular to it), *k* = 2, 3. The projection of *∂W* ∩ *F* on the **x**1**x**2**x**<sup>3</sup> plane is in the form of one or two curves *γk*, ({*j*, *k*} = {2, 3})

$$A\_k = A\_k t\_j + B\_{k'} \ 0 \le t\_j \le z\_{j'} \ A\_k = \frac{d\_{1j} \cos \omega}{-d\_{1k}}, \ B\_k = \frac{d\_{1k} + 2y\_{1k}}{2d\_{1k}},\tag{13}$$

where cos *ω* = (**x**2−**x**1)•(**x**3−**x**1) *<sup>d</sup>*12*d*<sup>13</sup> . *F* ∩ *W* is a spherical polygon with arcs *γν*, 1 ≤ *ν* ≤ *n*, including some *<sup>γ</sup><sup>k</sup>* as above, so � *<sup>W</sup>*∩*F*(*X<sup>α</sup>* • *<sup>N</sup>*)*H*dH<sup>2</sup> has the similar form as that in equation (12).

Let **A***<sup>k</sup>* = **x***<sup>j</sup>* − **x**<sup>1</sup> + *Ak*(**x***<sup>k</sup>* − **x**1), **B***<sup>k</sup>* = *Bk*(**x***<sup>k</sup>* − **x**1)+(**x**<sup>1</sup> − **p**), **C***<sup>k</sup>* = **B***k*∧**A***k*. Treating *Ak* and *Bk* as constants and setting *Hk* <sup>=</sup> **<sup>p</sup>**˙ • **<sup>C</sup>***k*, *Jk* <sup>=</sup> **<sup>A</sup>**˙ *<sup>k</sup>* • **<sup>C</sup>***k*, and *Kk* <sup>=</sup> **<sup>B</sup>**˙ *<sup>k</sup>* • **<sup>C</sup>***k*. Let *akt* 2 *<sup>j</sup>* + *bktj* + *ck* = |**A***ktj* + **B***k*| <sup>2</sup> <sup>&</sup>gt; 0, then <sup>Δ</sup>*<sup>k</sup>* <sup>=</sup> <sup>4</sup>*akck* <sup>−</sup> *<sup>b</sup>*<sup>2</sup> *<sup>k</sup>* > 0. By *η* = *N*� *tj* ∧*N*/|*N*� *tj* <sup>|</sup> and dH<sup>1</sup> <sup>=</sup> *<sup>r</sup>*|*N*� *tj* |d*tj*,

$$\begin{split} \int\_{\left(\partial W \cap F\right) \cap \gamma\_{k}} X\_{\alpha} \bullet \mathfrak{gl} \mathcal{H} & \frac{2rH\_{k}}{\sqrt{\Delta\_{k}}} \left( \arctan \frac{2a\_{k}z\_{j} + b\_{k}}{\sqrt{\Delta\_{k}}} - \arctan \frac{b\_{k}}{\sqrt{\Delta\_{k}}} \right) \\ & + \frac{2r^{2}J\_{k}}{\Delta\_{k}} \left( 2\sqrt{c\_{k}} - \frac{b\_{k}z\_{j} + 2c\_{k}}{\sqrt{a\_{k}z\_{j}^{2} + b\_{k}z\_{j} + c\_{k}}} \right) \\ & + \frac{2r^{2}K\_{k}}{\Delta\_{k}} \left( \frac{2a\_{k}z\_{j} + b\_{k}}{\sqrt{a\_{k}z\_{j}^{2} + b\_{k}z\_{j} + c\_{k}}} - \frac{b\_{k}}{\sqrt{c\_{k}}} \right). \end{split} \tag{14}$$

Let *Uk* <sup>=</sup> ˙ (**A***<sup>k</sup>* • **<sup>n</sup>***k*), *Vk* <sup>=</sup> ˙ (**B***<sup>k</sup>* • **<sup>n</sup>***k*), *Wk* <sup>=</sup> <sup>|</sup>**C***<sup>k</sup>* • **<sup>n</sup>***k*<sup>|</sup> <sup>&</sup>gt; 0, then

$$\int\_{\gamma\_{\mathcal{R}}} \frac{\frac{\mathbf{d}f\_{\mathcal{P}}}{\mathbf{d}t}}{|\operatorname{\bf }\gamma\_{M\_{\mathcal{P}}} f\_{\mathcal{P}}|} \mathbf{d}\mathcal{H}^{1} = \frac{\pm 2r^{2}}{W\_{\mathcal{k}}} \left( \frac{(2a\_{k}z\_{j} + b\_{k})V\_{k}}{\sqrt{a\_{k}z\_{j}^{2} + b\_{k}z\_{j} + c\_{k}}} - \frac{b\_{k}V\_{k}}{\sqrt{c\_{k}}} \right.$$

$$+ \quad 2\sqrt{c\_{k}} \mathcal{U}\_{k} - \frac{(b\_{k}z\_{j} + 2c\_{k})\mathcal{U}\_{k}}{\sqrt{a\_{k}z\_{j}^{2} + b\_{k}z\_{j} + c\_{k}}} \right) . \tag{15}$$

where the sign is determined by orientation.

#### **A.4. Saddle face**

A saddle face *F* is generated when the probe *S* simultaneously tangents to two balls *B*(**x**1,*r*1) and *B*(**x**2,*r*2), and rolls around the axis **e**<sup>2</sup> = **<sup>x</sup>**2−**x**<sup>1</sup> *<sup>d</sup>*<sup>12</sup> . The starting and stopping positions of the probe center is **<sup>p</sup>** and **<sup>q</sup>**. Let *<sup>y</sup>* = [(*r*<sup>1</sup> <sup>+</sup> *<sup>r</sup>*)<sup>2</sup> <sup>−</sup> (*r*<sup>2</sup> <sup>+</sup> *<sup>r</sup>*)2]/2*d*<sup>12</sup> and **<sup>t</sup>** <sup>=</sup> <sup>1</sup> <sup>2</sup> (**x**<sup>1</sup> + **x**2) + *y***e**2, *R* = |**p** − **t**|, **e**<sup>1</sup> = (**p** − **t**)/*R*, **e**<sup>3</sup> = **e**1∧**e**2, then *F* is parametrized by 0 ≤ *ψ* ≤ *ψs*, *θ*<sup>1</sup> ≤ *θ* ≤ *θ*2,

$$\mathbf{x}(\boldsymbol{\psi}, \theta) = \mathbf{t} + (R - r\cos\theta)(\cos\psi\mathbf{e}\_1 + \sin\psi\mathbf{e}\_3) + r\sin\theta\mathbf{e}\_2. \tag{16}$$

for Protein Folding 35

*Department of Mathematics, Nanchang University, 999 Xuefu Road, Honggutan New District,*

[1] Anfinsen, C. B. (1973) Principles that govern the folding of protein chains. *Science* 181,

[2] Bader, R. F. W. (1990) *Atoms in Molecules: A Quantum Theory.* (Clarendon Press · Oxford). [3] Bailyn, M. (1994) *A Survey of Thermodynamics.* American Institute of Physics New York. [4] Ben-Naim, A. (2012) Levinthal's question revisited, and answered. *Journal of Biomolecular*

[5] Branden, C. and J. Tooze, J. (1999) *Introduction to Protein Structure*. (Second Edition,

[6] Connolly, M. L. (1983) Analytical molecular surface calculation. *J. Appl. Cryst.*,

[10] Eisenberg, D and McLachlan, A. D. Solvation energy in protein folding and binding.

[12] Fang, Y. (2005) Mathematical protein folding problem. In: D. Hoffman, Ed, *Global Theory of Minimal Surfaces*. (*Proceedings of the Clay Mathematical Proceedings*, 2 2005) pp. 611-622. [13] Fang, Y. and Jing, J. (2008) Implementation of a mathematical protein folding model.

[14] Fang, Y. and Jing, J. (2010) Geometry, thermodynamics, and protein. *Journal of Theoretical*

[15] Finkelstein, A. V. and Ptitsyn, O. B. (2002) *Protein Physics: A Course of Lectures*. Academic

[16] Greiner, W., Neise, L., and Stöker, H. (1994) *Thermodynamics and Statistical Mechanics.*

[17] Jackson, R. M., and Sternberg, M. J. E. (1993) Protein surface area defined. *Nature* 366,

[18] Kang, H. J. and Baker, E. N. (2011) Intramolecular isopeptide bonds: protein crosslinks

[7] Dai, X. (2007) *Advanced Statistical Physics*. (Fudan University Press, Shanghai). [8] Dill, K. A. (1990) Dominant forces in protein folding. *Biochemistry*, 29 7133-7155. [9] Dyson, F. (2004) A meeting with Enrico Fermi: How an intuitive physicist rescued a team

[11] ExPASy Proteomics Server. http://au.expasy.org/sprot/relnotes/relstat.html

*International Journal of Pure and Applied Mathematics*, 42(4), 481-488.

Press, An imprint of Elsevier Science, Amsterdam ... .

<sup>d</sup>H<sup>1</sup> = (*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>*)*φsr* ˙

<sup>d</sup>*<sup>t</sup>* |*t*=<sup>0</sup> =

*θ*0. (22)

Gibbs Free Energy Formula for Protein Folding 81

Let **<sup>n</sup>***<sup>j</sup>* <sup>=</sup>**e**2, then *fPt*(*φt*(**x**))=[*φt*(**x**) <sup>−</sup> **<sup>t</sup>**(*t*)]•**n***j*(*t*), | �*MP fP*|=**n***<sup>j</sup>* • *<sup>η</sup>* <sup>=</sup> 1, and <sup>d</sup>*fPt*(*φt*(*X*))

*θ*0.

d*fP* d*t* | �*MP fP*|

˙ [*φt*(*X*) <sup>−</sup> **<sup>t</sup>**(*t*)]•**e**<sup>2</sup> + [(*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>*)*U*]•**e**˙ <sup>2</sup> <sup>=</sup> *<sup>r</sup>* ˙

**Author details**

**12. References**

223-230.

Garland).

16:548-558.

*Nature* 319, 199-203.

*Biology*, 262, 382-390.

638.

*Nanchang, 330031, China*

Yi Fang

*Structure and Dynamics* 30(1), 113-124 (2012).

from fruitless research. *Nature* 427, 297.

(Spriger-Verlag, New York, Berlin, ... ).

built for stress? *TIBS*, 36(4), 229-237.

*∂WP*∩*F*

where let *<sup>ω</sup><sup>s</sup>* <sup>=</sup> arccos[(**<sup>p</sup>** <sup>−</sup> **<sup>t</sup>**) • (**<sup>q</sup>** <sup>−</sup> **<sup>t</sup>**)/*R*2], then *<sup>ψ</sup><sup>s</sup>* <sup>=</sup> *<sup>ω</sup><sup>s</sup>* or 2*<sup>π</sup>* <sup>−</sup> *<sup>ω</sup>s*. *<sup>θ</sup>*<sup>1</sup> <sup>=</sup> arctan[−(*d*<sup>12</sup> + 2*y*)/2*R*], *θ*<sup>2</sup> = arctan[(*d*<sup>12</sup> − 2*y*)/2*R*]. These data are uniquely determined by the conformation *P*, see Connolly (1983). Let *θk*(*t*) and *φs*(*t*) be similarly defined for the conformation *Pt*, one can define *<sup>φ</sup>t*(*ψ*) = *ψψs*(*t*) *<sup>ψ</sup><sup>s</sup>* , *<sup>φ</sup>t*(*θ*) = *<sup>θ</sup>*1(*t*)(*θ*2−*θ*)+*θ*2(*t*)(*θ*−*θ*2) *<sup>θ</sup>*2−*θ*<sup>1</sup> , and *U*(*ψ*) = cos *ψ***e**<sup>1</sup> + sin *ψ***e**3, then for the same 0 ≤ *φ* ≤ *φ<sup>s</sup>* and *θ*<sup>1</sup> ≤ *θ* ≤ *θ*2,

$$\phi\_l(\mathbf{x}) = \mathbf{t}(t) + [R - r\cos\phi\_l(\theta)]\mathcal{U}(\phi\_l(\psi)) + r\sin\phi\_l(\theta)\mathbf{e}\_2(t). \tag{17}$$

Let *<sup>U</sup>*˙ <sup>=</sup> cos *<sup>ψ</sup>***e**˙ <sup>1</sup> <sup>+</sup> sin *<sup>ψ</sup>***e**˙ 3, *<sup>U</sup>*� <sup>=</sup> <sup>−</sup> sin *<sup>ψ</sup>***e**<sup>1</sup> <sup>+</sup> cos *<sup>ψ</sup>***e**3, then

$$\begin{split} X\_{\mathfrak{A}}(\mathbf{x}) &= \dot{\mathfrak{t}} + (\mathcal{R} + r\dot{\theta}\sin\theta)\mathcal{U} + (\mathcal{R} - r\cos\theta)(\dot{\mathcal{U}} + \dot{\psi}\mathcal{U}') \\ &+ r\dot{\theta}\cos\theta\mathbf{e}\_{2} + r\sin\theta\dot{\mathbf{e}}\_{2}. \end{split} \tag{18}$$

On *<sup>F</sup>*, *<sup>N</sup>* <sup>=</sup> <sup>−</sup> cos *<sup>θ</sup>U*(*ψ*) + sin *<sup>θ</sup>***e**2, dH<sup>2</sup> <sup>=</sup> *<sup>r</sup>*(*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>* cos *<sup>θ</sup>*)d*θ*d*ψ*, 2*<sup>H</sup>* <sup>=</sup> *<sup>R</sup>*−2*<sup>r</sup>* cos *<sup>θ</sup> <sup>r</sup>*(*R*−*<sup>r</sup>* cos *<sup>θ</sup>*). Let *<sup>J</sup>* <sup>=</sup> *<sup>J</sup>*(*ψs*) = *<sup>ψ</sup><sup>s</sup>* <sup>0</sup> *U*(*ψ*)d*ψ*, then

$$\begin{split} 4 \int\_{F} X\_{\sf 0} \bullet N \mathrm{d}\mathcal{H}^{2} &= 4rR(\phi\_{\sf s} \mathbf{i} \bullet \mathbf{e}\_{2} - R \mathbf{j} \bullet \mathbf{e}\_{2})(\cos \theta\_{1} - \cos \theta\_{2}) \\ &+ \, 4rR(\phi\_{\sf s} \dot{R} + \dot{\mathbf{i}} \bullet f)(\sin \theta\_{1} - \sin \theta\_{2}) \\ &- \, r^{2}(\phi\_{\sf s} \dot{\mathbf{i}} \bullet \mathbf{e}\_{2} - R \mathbf{j} \bullet \mathbf{e}\_{2})(\cos 2\theta\_{1} - \cos 2\theta\_{2}) \\ &+ \, r^{2}(\phi\_{\sf s} \dot{R} + \dot{\mathbf{i}} \bullet f)[2(\theta\_{1} - \theta\_{2}) + \sin 2\theta\_{1} - \sin 2\theta\_{2}] \,\mathrm{d}\mathbf{s} \end{split} \tag{19}$$

$$\begin{split} 2\int\_{F} (X\_{\mathfrak{A}} \bullet N) H \mathfrak{H}^{2} &= 2R(\mathfrak{\phi}\_{\mathfrak{s}} \dot{\mathfrak{t}} \bullet \mathbf{e}\_{2} - R \bullet \dot{\mathfrak{e}}\_{2}) (\cos \theta\_{1} - \cos \theta\_{2}) \\ &+ \ 2R(\mathfrak{\phi}\_{\mathfrak{s}} \dot{R} + \dot{\mathfrak{t}} \bullet I)(\sin \theta\_{1} - \sin \theta\_{2}) \\ &- r(\mathfrak{\phi}\_{\mathfrak{s}} \dot{\mathfrak{t}} \bullet \mathbf{e}\_{2} - R \dot{I} \bullet \dot{\mathfrak{e}}\_{2})(\cos 2\theta\_{1} - \cos 2\theta\_{2}) \\ &+ \ r(\mathfrak{\phi}\_{\mathfrak{s}} \dot{R} + \dot{\mathfrak{t}} \bullet I)[2(\theta\_{1} - \theta\_{2}) + \sin 2\theta\_{1} - \sin 2\theta\_{2}]. \end{split} \tag{20}$$

Assume that **x**<sup>1</sup> is hydrophobic and **x**<sup>2</sup> is not, then the dividing plane *P* passing through **p** and **t** and is perpendicular to **e**2. The curve *∂W* ∩ *F* is given by **x**(*ψ*) = **t** + (*R* −*r*)*U*(*ψ*), 0 ≤ *φ* ≤ *φs*, on which dH<sup>1</sup> = (*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>*)d*φ*. The hydrophobic surface integral on *<sup>F</sup>* then is the same as in equation (20), except *θ*<sup>1</sup> = 0. Since on *∂W* ∩ *F*, *η* = *N*� <sup>∧</sup>*<sup>N</sup>* <sup>=</sup> **<sup>e</sup>**2, <sup>d</sup>*θ*(*t*) <sup>d</sup>*<sup>t</sup>* <sup>|</sup>*t*=<sup>0</sup> <sup>=</sup> ˙ *<sup>θ</sup>*<sup>0</sup> <sup>=</sup> ˙ *<sup>θ</sup>*1*θ*2<sup>−</sup> ˙ *θ*2*θ*<sup>1</sup> *<sup>θ</sup>*2−*θ*<sup>1</sup> , by equation (18),

$$\int\_{\partial W \cap F} \mathbf{X}\_{\mathbf{a}} \bullet \eta \, \mathrm{d}\mathcal{H}^1 = (R - r)\phi\_{\mathrm{s}}(r\dot{\theta}\_0 + \dot{\mathbf{t}} \bullet \mathbf{e}\_2) - (R - r)^2 \dot{\mathbf{e}}\_2 \bullet \mathbf{J},\tag{21}$$

Let **<sup>n</sup>***<sup>j</sup>* <sup>=</sup>**e**2, then *fPt*(*φt*(**x**))=[*φt*(**x**) <sup>−</sup> **<sup>t</sup>**(*t*)]•**n***j*(*t*), | �*MP fP*|=**n***<sup>j</sup>* • *<sup>η</sup>* <sup>=</sup> 1, and <sup>d</sup>*fPt*(*φt*(*X*)) <sup>d</sup>*<sup>t</sup>* |*t*=<sup>0</sup> = ˙ [*φt*(*X*) <sup>−</sup> **<sup>t</sup>**(*t*)]•**e**<sup>2</sup> + [(*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>*)*U*]•**e**˙ <sup>2</sup> <sup>=</sup> *<sup>r</sup>* ˙ *θ*0.

$$\int\_{\partial W\_{\mathcal{P}} \cap F} \frac{\frac{\text{d}f\_{\mathcal{P}}}{\text{d}\mathcal{I}}}{\left|\bigtriangleup\_{M\_{\mathcal{P}}} f\_{\mathcal{P}}\right|} \text{d}\mathcal{H}^{1} = (\mathcal{R} - r)\phi\_{\mathcal{S}} r \dot{\theta}\_{0}. \tag{22}$$

## **Author details**

Yi Fang

34 Will-be-set-by-IN-TECH

A saddle face *F* is generated when the probe *S* simultaneously tangents to two balls *B*(**x**1,*r*1)

*R* = |**p** − **t**|, **e**<sup>1</sup> = (**p** − **t**)/*R*, **e**<sup>3</sup> = **e**1∧**e**2, then *F* is parametrized by 0 ≤ *ψ* ≤ *ψs*, *θ*<sup>1</sup> ≤ *θ* ≤ *θ*2,

where let *<sup>ω</sup><sup>s</sup>* <sup>=</sup> arccos[(**<sup>p</sup>** <sup>−</sup> **<sup>t</sup>**) • (**<sup>q</sup>** <sup>−</sup> **<sup>t</sup>**)/*R*2], then *<sup>ψ</sup><sup>s</sup>* <sup>=</sup> *<sup>ω</sup><sup>s</sup>* or 2*<sup>π</sup>* <sup>−</sup> *<sup>ω</sup>s*. *<sup>θ</sup>*<sup>1</sup> <sup>=</sup> arctan[−(*d*<sup>12</sup> + 2*y*)/2*R*], *θ*<sup>2</sup> = arctan[(*d*<sup>12</sup> − 2*y*)/2*R*]. These data are uniquely determined by the conformation *P*, see Connolly (1983). Let *θk*(*t*) and *φs*(*t*) be similarly defined for

**x**(*ψ*, *θ*) = **t** + (*R* − *r* cos *θ*)(cos *ψ***e**<sup>1</sup> + sin *ψ***e**3) + *r* sin *θ***e**2, (16)

*φt*(**x**) = **t**(*t*)+[*R* − *r* cos *φt*(*θ*)]*U*(*φt*(*ψ*)) + *r* sin *φt*(*θ*)**e**2(*t*). (17)

*<sup>θ</sup>* sin *<sup>θ</sup>*)*<sup>U</sup>* + (*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>* cos *<sup>θ</sup>*)(*U*˙ <sup>+</sup> *<sup>ψ</sup>*˙*U*�

*θ* cos *θ***e**<sup>2</sup> + *r* sin *θ***e**˙ 2. (18)

<sup>+</sup> <sup>4</sup>*rR*(*φsR*˙ <sup>+</sup> **<sup>t</sup>**˙•*J*)(sin *<sup>θ</sup>*<sup>1</sup> <sup>−</sup> sin *<sup>θ</sup>*2) (19)

<sup>+</sup> <sup>2</sup>*R*(*φsR*˙ <sup>+</sup> **<sup>t</sup>**˙•*J*)(sin *<sup>θ</sup>*<sup>1</sup> <sup>−</sup> sin *<sup>θ</sup>*2) (20)

<sup>∧</sup>*<sup>N</sup>* <sup>=</sup> **<sup>e</sup>**2, <sup>d</sup>*θ*(*t*)

<sup>d</sup>*<sup>t</sup>* <sup>|</sup>*t*=<sup>0</sup> <sup>=</sup> ˙

*<sup>θ</sup>*<sup>0</sup> <sup>+</sup> **<sup>t</sup>**˙•**e**2) <sup>−</sup> (*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>*)2**e**˙ <sup>2</sup>•*J*, (21)

*<sup>θ</sup>*<sup>0</sup> <sup>=</sup> ˙

*<sup>θ</sup>*1*θ*2<sup>−</sup> ˙ *θ*2*θ*<sup>1</sup> *<sup>θ</sup>*2−*θ*<sup>1</sup> , by

the probe center is **<sup>p</sup>** and **<sup>q</sup>**. Let *<sup>y</sup>* = [(*r*<sup>1</sup> <sup>+</sup> *<sup>r</sup>*)<sup>2</sup> <sup>−</sup> (*r*<sup>2</sup> <sup>+</sup> *<sup>r</sup>*)2]/2*d*<sup>12</sup> and **<sup>t</sup>** <sup>=</sup> <sup>1</sup>

*U*(*ψ*) = cos *ψ***e**<sup>1</sup> + sin *ψ***e**3, then for the same 0 ≤ *φ* ≤ *φ<sup>s</sup>* and *θ*<sup>1</sup> ≤ *θ* ≤ *θ*2,

On *<sup>F</sup>*, *<sup>N</sup>* <sup>=</sup> <sup>−</sup> cos *<sup>θ</sup>U*(*ψ*) + sin *<sup>θ</sup>***e**2, dH<sup>2</sup> <sup>=</sup> *<sup>r</sup>*(*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>* cos *<sup>θ</sup>*)d*θ*d*ψ*, 2*<sup>H</sup>* <sup>=</sup> *<sup>R</sup>*−2*<sup>r</sup>* cos *<sup>θ</sup>*

*<sup>X</sup><sup>α</sup>* •*N*dH2<sup>=</sup> <sup>4</sup>*rR*(*φs***t**˙•**e**<sup>2</sup> <sup>−</sup> *R J*•**e**˙ <sup>2</sup>)(cos *<sup>θ</sup>*<sup>1</sup> <sup>−</sup> cos *<sup>θ</sup>*2)

(*X<sup>α</sup>* •*N*)*H*dH2<sup>=</sup> <sup>2</sup>*R*(*φs***t**˙•**e**<sup>2</sup> <sup>−</sup> *<sup>R</sup>*•**e**˙ <sup>2</sup>)(cos *<sup>θ</sup>*<sup>1</sup> <sup>−</sup> cos *<sup>θ</sup>*2)

Assume that **x**<sup>1</sup> is hydrophobic and **x**<sup>2</sup> is not, then the dividing plane *P* passing through **p** and **t** and is perpendicular to **e**2. The curve *∂W* ∩ *F* is given by **x**(*ψ*) = **t** + (*R* −*r*)*U*(*ψ*), 0 ≤ *φ* ≤ *φs*, on which dH<sup>1</sup> = (*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>*)d*φ*. The hydrophobic surface integral on *<sup>F</sup>* then is the same as in

<sup>2</sup>(*φs***t**˙•**e**<sup>2</sup> <sup>−</sup> *R J*•**e**˙ <sup>2</sup>)(cos 2*θ*<sup>1</sup> <sup>−</sup> cos 2*θ*2)

<sup>−</sup> *<sup>r</sup>*(*φs***t**˙•**e**<sup>2</sup> <sup>−</sup> *R J*•**e**˙ <sup>2</sup>)(cos 2*θ*<sup>1</sup> <sup>−</sup> cos 2*θ*2)

<sup>+</sup> *<sup>r</sup>*(*φsR*˙ <sup>+</sup> **<sup>t</sup>**˙•*J*)[2(*θ*<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*2) + sin 2*θ*<sup>1</sup> <sup>−</sup> sin 2*θ*2].

<sup>2</sup>(*φsR*˙ <sup>+</sup> **<sup>t</sup>**˙•*J*)[2(*θ*<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*2) + sin 2*θ*<sup>1</sup> <sup>−</sup> sin 2*θ*2],

*<sup>d</sup>*<sup>12</sup> . The starting and stopping positions of

80 Thermodynamics – Fundamentals and Its Application in Science Gibbs Free Energy Formula

*<sup>ψ</sup><sup>s</sup>* , *<sup>φ</sup>t*(*θ*) = *<sup>θ</sup>*1(*t*)(*θ*2−*θ*)+*θ*2(*t*)(*θ*−*θ*2)

)

<sup>2</sup> (**x**<sup>1</sup> + **x**2) + *y***e**2,

*<sup>θ</sup>*2−*θ*<sup>1</sup> , and

*<sup>r</sup>*(*R*−*<sup>r</sup>* cos *<sup>θ</sup>*). Let *<sup>J</sup>* <sup>=</sup>

**A.4. Saddle face**

*<sup>J</sup>*(*ψs*) = *<sup>ψ</sup><sup>s</sup>*

equation (18),

<sup>0</sup> *U*(*ψ*)d*ψ*, then

4 *F*

2 *F*

*∂W*∩*F*

and *B*(**x**2,*r*2), and rolls around the axis **e**<sup>2</sup> = **<sup>x</sup>**2−**x**<sup>1</sup>

the conformation *Pt*, one can define *<sup>φ</sup>t*(*ψ*) = *ψψs*(*t*)

Let *<sup>U</sup>*˙ <sup>=</sup> cos *<sup>ψ</sup>***e**˙ <sup>1</sup> <sup>+</sup> sin *<sup>ψ</sup>***e**˙ 3, *<sup>U</sup>*� <sup>=</sup> <sup>−</sup> sin *<sup>ψ</sup>***e**<sup>1</sup> <sup>+</sup> cos *<sup>ψ</sup>***e**3, then

*Xα*(**x**) = **t**˙ + (*R*˙ + *r* ˙

+*r* ˙

− *r*

+ *r*

equation (20), except *θ*<sup>1</sup> = 0. Since on *∂W* ∩ *F*, *η* = *N*�

*<sup>X</sup><sup>α</sup>* •*<sup>η</sup>* <sup>d</sup>H<sup>1</sup> = (*<sup>R</sup>* <sup>−</sup> *<sup>r</sup>*)*φs*(*<sup>r</sup>* ˙

*Department of Mathematics, Nanchang University, 999 Xuefu Road, Honggutan New District, Nanchang, 330031, China*

### **12. References**

	- [19] Kauzmann, W. (1959) Some factors in the interpretation of protein denaturation. *Adv. Protein Chem.* 14, 1-63 (1959).
	- [20] Lazaridis, T., and Karplus, M. (2003) Thermodynamics of protein folding: a microscopic view. *Biophysical Chemistry*, 100, 367-395.

**Information Capacity of Quantum Transfer**

We will begin with a simple type of *stationary stochastic*<sup>1</sup> systems of quantum physics using them within a frame of the **Shannon** Information Theory and Thermodynamics but starting with their *algebraic representation*. Based on this algebraic description a model of information transmission in those systems by defining the Shannon information will be stated in terms of variable about the system state. Measuring on these system is then defined as a spectral decomposition of measured quantities - *operators*. The information capacity formulas, now of the *narrow-band* nature, are derived consequently, for the simple system governed by the **Bose–Einstein** (B–E) Law [bosonic (photonic) channel] and that one governed by the **Fermi-Dirac** (F–D) Law [fermionic (electron) channel]. *The not-zero value for the average input*

Further the *wide–band* information capacity formulas for B–E and F–D case are stated. Also the original *thermodynamic* capacity derivation for the wide–band photonic channel as it was stated by **Lebedev–Levitin** in 1966 is revised. This revision is motivated by apparent relationship between the B–E (photonic) wide–band information capacity and the *heat efficiency* for a certain *heat cycle*, being further considered as the demonstrating model for processes of information transfer in the original wide–band photonic channel. The information characteristics of a *model reverse* heat cycle and, by this model are analyzed, the information arrangement of which is set up to be most analogous to the structure of the photonic channel considered, we see the necessity of returning the transfer medium (the channel itself) to its initial state as a condition for a *sustain, repeatable* transfer. It is not regarded in [12, 30] where a single information transfer act only is considered. Or the return is

<sup>1</sup> We deal with such a system which is taking on at time *t* = 0, 1, . . . states *θ<sup>t</sup>* from a state space **Θ**. If for any *t*<sup>0</sup> the

about a stochastic system. If these probabilities do not depend on the beginning *t*0, a stationary stochastic system is

and reproduction in any medium, provided the original work is properly cited.

©2012 Hejna, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

*IB*(*θt*) tends for *T* → ∞ to probabilities *pt*<sup>0</sup> (*B*) we speak

**Chapter 4**

*T t*0+*T* ∑ *t*=*t*0+1

*energy needed for information transmission existence in F–D systems* is stated [11, 12].

**Channels and Thermodynamic Analogies**

Additional information is available at the end of the chapter

relative frequencies *IB* of events *<sup>B</sup>* <sup>⊂</sup> **<sup>Θ</sup>** is valid that <sup>1</sup>

spoken about.

Bohdan Hejna

**1. Introduction**

http://dx.doi.org/10.5772/50466

