**7. The modeling based on double helix initiation parameters**

Equation 21 establishes how to calculate the total free energy of a sequence of length *N*, according to the NN model, using the methodology based on the modeling by end effects.

On the other hand, in the statistical mechanics viewpoint, the free energy of the duplex formation Δ*GT* relates to the equilibrium constant *K*eq as follows:

$$
\Delta \mathbf{G}\_T = -RT \ln \mathbf{K}\_{\text{eq}}.\tag{23}
$$

Whenever nucleation is the limiting process, the two-state model establishes that once the process is initiated, the helix extends to both ends of the chain [7]. The partition function or the equilibrium constant *K*eq for the duplex formation can then be calculated as follows:

$$K\_{\rm eq} = \sigma \prod\_{i=1}^{N} \mathbf{s}\_{i\prime} \tag{24}$$

where σ is the nucleation equilibrium constant and *si* is the propagation equilibrium constant, which refers to the addition of the *i*th base pair to the preexisting duplex. For heteropolymers, *σ* and *si* depend on the composition of the chain. Inserting Eq. 24 into Eq. 23, we obtain:

$$
\Delta \mathbf{G}\_{\rm T} = -RT \ln \sigma - RT \sum\_{i=1}^{N} \ln s\_{i\_{\rm i}} \tag{25}
$$

that is,

**1.** Mean values and errors are essentially the same, independently of the modeling.

() () () <sup>108</sup> <sup>108</sup> <sup>2</sup> exp theor. 1 1 χ / , *i i*

*G i G i Ni* = =

the free energy irreducible parameters in Tables 1 and 2 are such that they minimize *χ.* The quantity*χ* defines a global minimal deviation, between the theoretical values calculated from the irreducible parameter set for the free energies of the 108 sequences and the experimental values. In Eq. 22, Δ*G*exp(*i*) is the experimental value for the free energy for the *i*th sequence,

dimers for the ensemble of 108 sequences. The value obtained for*χ* considering 10 (or 12) parameters is precisely the same, namely 0.14 kcal/mol per dimer [16], which also coincides with the 12-parameter model using values reported by SantaLucia for the free energies for the 10 duplex dimers [3–6]. This means that, considering only the overall data ensemble quality,

**3.** The intrinsic errors obtained for the contributions by the ends are sensibly larger than the errors for the other irreducible parameters. In this way, in all the decomposition schemes, the contributions of the ends are not so well defined, that is, we could not differentiate its orientation (for example, we could not differentiate A/T from T/A). Thus, or the available experimental data are not still sufficiently precise or even this modeling is still inadequate

**4.** It is also verified that the C/G or G/C end pairing is only slightly more stable than the A/ T or T/A end pairing. However, the intrinsic errors in data shown in Tables 1 and 2 are considerable, allowing for portions of the ranges of possible values of the end parameters to coincide. Thus, strictly speaking, in the modeling based on end effects, there is no

**5.** The errors of the irreducible parameters for free energy were estimated in the following way: Guerra and Licinio selected 100 sets of 80 sequences chosen randomly and then

As shown, end contributions are fit with large errors to experimental data, as compared to the fits of other NN or dimer contributions. Besides A/T from T/A as well as C/G from G/C, ending contributions could not be respectively differentiated. More than that, we could not distinguish between the weak and the strong terminal base pairs. However, using both the sets, one can calculate free energies for DNA oligomers at least as well as standard models considering a larger set of parameters do [3–6]. Guerra and Licinio [16] also extended their analysis and obtained equivalent sets of irreducible parameters for enthalpy and entropy. By simultane‐ ously minimizing the deviations from melting temperatures and entropies of the chains, they obtained the most precise set, which is capable of predicting melting temperatures for DNA

calculated the mean deviation for the parameters obtained from each set [16].

there is no practical reason to prefer a model with a greater number of parameters.

= D -D é ù å å ë û (22)

*i*=1

*N* (*i*) is the total number of duplex

108

**2.** Defining the root-mean-square deviation per dimer *χ* [13, 16] as follows:

<sup>Δ</sup>*G*theor. (*i*) is its corresponding theoretical value, and ∑

differentiation between the terminal base pairs.

to account for end effects.

196 Nucleic Acids - From Basic Aspects to Laboratory Tools

$$
\Delta \mathbf{G}\_T = \Delta \mathbf{G}\_{\text{max}} - RT \sum\_{i=1}^N \ln s\_i. \tag{26}
$$

Equation 26 can be conveniently rewritten as follows:

$$
\Delta \mathbf{G}\_T = \Delta \mathbf{G}\_{\text{mc}} - RT \ln s\_k - RT \sum\_{i=1 \atop i \neq k}^{N} \ln s\_i. \tag{27}
$$

Eqs. 26 and 27 have the same signification, but when writing Eq. 27 in the form shown, we suppose that the formation of the first base pair of the duplex occurs in the *k*th site. Therefore, we can see that, by comparing Eq. 27 with Eq. 21, the nucleation free energy corresponds to the end effects in the NN approach, except by the term −*RT* ln*sk* , that is,

$$
\Delta G\_{\text{max}} - RT \ln s\_k = -RT \ln \sigma s\_k = \Delta G \{ E b\_1 \} + \Delta G \{ b\_N E \}. \tag{28}
$$

Quantity Δ*G*nuc−*RT* ln*sk* , as shown in Eq. 28, in another way corresponds the initiation free energy, Δ*G*init, and, correspondently, *σsk* is the initiation equilibrium constant associated to the formation of the first base pair of the duplex. Furthermore, to the light of the NN modeling, the initiation free energy plays the role of the end effects. Finally, the sum of the propagation free energies corresponds, also to the light of the NN model, to the sum of the dimer free energies with the following equation:

$$1 - RT \sum\_{i=1 \atop i \neq k}^{N} \ln s\_i = \sum\_{i=1}^{N-1} \Delta G \left( b\_i b\_{i+1} \right). \tag{29}$$

Recently, Guerra and Licinio connected to the two approaches, namely the NN and the statistical mechanics approaches, and they calculated the equilibrium constants and free energies for nucleation and propagation of a double helix in the following transition reactions [16]:

$$\begin{array}{l}\text{poly } A \text{+ poly } T \end{array} = \text{poly } A \cdot T \tag{30}$$

$$\begin{array}{l}\text{poly } \mathbb{C} \text{+ poly } G \end{array} = \text{poly } \mathbb{C} \cdot G. \tag{30}$$

For the above homopolymers, they obtained the following nucleation free energies, at standard 1 mol concentration:

$$\begin{aligned} \text{AG}\_{\text{mc}} \left( \text{poly } A \cdot T \right) &= 1.81 \text{ kcal/mol} \\ \Delta \text{G}\_{\text{mc}} \left( \text{poly } \text{C} \cdot \text{G} \right) &= 1.69 \text{kcal/mol} \end{aligned} \tag{31}$$

These values were obtained using values obtained for end effects calculated from the simul‐ taneous least-mean-square-deviations fit of the NN model to the 108-sequence data compiled by Allawi and SantaLucia [2] and listed in Tables 1 and 2, and values experimentally obtained for A/T and C/G base pairings compiled by the Frank-Kamenetskii Group [18]. Once they obtained intrinsically large errors for the end effects, the nucleation free energies for poly *A*Δ*T* and poly *C*Δ*G* homopolymers could be considered essentially similar. This result seemed strange because nucleation free energies would depend on the oligomer composition as a whole. This could indicate that end effects, as usually accounted in the NN models, could have an improper representation, having as consequence, poor fitting parameters, and an incoher‐ ent interpretation of the nucleation. Thus, the usual modeling by end effects must be seen as a didactic and heuristic approximation for DNA properties, but a better modeling needs to be discussed.

nuc

nuc

the end effects in the NN approach, except by the term −*RT* ln*sk* , that is,

Equation 26 can be conveniently rewritten as follows:

198 Nucleic Acids - From Basic Aspects to Laboratory Tools

energies with the following equation:

[16]:

1 mol concentration:

*T i i*

ln ln .

Eqs. 26 and 27 have the same signification, but when writing Eq. 27 in the form shown, we suppose that the formation of the first base pair of the duplex occurs in the *k*th site. Therefore, we can see that, by comparing Eq. 27 with Eq. 21, the nucleation free energy corresponds to

Quantity Δ*G*nuc−*RT* ln*sk* , as shown in Eq. 28, in another way corresponds the initiation free energy, Δ*G*init, and, correspondently, *σsk* is the initiation equilibrium constant associated to the formation of the first base pair of the duplex. Furthermore, to the light of the NN modeling, the initiation free energy plays the role of the end effects. Finally, the sum of the propagation free energies corresponds, also to the light of the NN model, to the sum of the dimer free

> ( ) <sup>1</sup> 1

+

poly *<sup>C</sup>* + poly *<sup>G</sup>* ⇌poly *<sup>C</sup>* <sup>⋅</sup>*G*. (30)

D ×= (31)

*T ki*

*G G RT s RT s*

nuc () ( ) <sup>1</sup> ln ln . *G RT s RT s G Eb G b E kk N* D - =- =D +D s

1 1

*RT s G b b* -

poly *A* + poly *T* ⇌poly *A*⋅*T*

( ) ( ) nuc

*G AT G CG* D ×=

nuc

*i i i k*


= = ¹

ln .

*i i i*

Recently, Guerra and Licinio connected to the two approaches, namely the NN and the statistical mechanics approaches, and they calculated the equilibrium constants and free energies for nucleation and propagation of a double helix in the following transition reactions

For the above homopolymers, they obtained the following nucleation free energies, at standard

poly 1.81 kcal / mol poly 1.69kcal / mol.

*N*

*G G RT s*

1 ln . *N*

> 1

*i i k*

= ¹

*N*

D =D - å (26)

D =D - - å (27)

(28)

(29)

=

As a more appropriate modeling is a necessity, we will look for a more precise interpretation for the nucleation free energy term in the expansion of the free energy of a duplex oligomer. For this, initially, we will write the free energy for the formation of a duplex oligomer as found in some approaches in the literature [4, 6, 19]:

$$
\Delta \mathbf{G}\_{\rm T} = \Delta \mathbf{G}\_{\rm init} + \sum\_{i=1}^{N-1} \Delta \mathbf{G} \left( b\_{i} b\_{i+1} \right) + \Delta \mathbf{G}\_{\rm sym} \tag{32}
$$

where, according such references, Δ*G*init is the "initiation" or "nucleation" free energy. Such quantity, in accordance with these referred references, is related to the difficulty of aligning the two strands and forming the first WC base pair "nucleating" the double helix which, after this step, will propagate to the ends of the chain. Specially in the work of Manyanga et al. [19], Δ*G*init is indiscriminately called the initiation or nucleation free energy. However, the term Δ*G*init in Eq. 32 is the initiation free energy, as can be verified by returning to the discussion that follows Eq. 28. In fact, Eq. 28 shows that nucleation free energy Δ*G*nuc is obtained from the initiation free energy Δ*G*init by adding a term related to the "propagation" of the WC first base pair, −*RT* ln*sk* . Therefore, it becomes clear, from now, that the terms of initiation and nucleation free energies are effectively different. It is also clear that Eq. 32 has significance if and only if Δ*G*init is the initiation free energy. Thus, we can establish the problem: How does the term Δ*G*nuc depend on the sequence composition? Answering to this question will help us to understand why the modeling by end effects that have been used is theoretically incorrect.

The question posed in the last paragraph will guide us throughout this section. To answer it, consider, initially, the general reaction of formation of a double helix of length *N.* Such duplex is formed from two separated and complementary strands *S* and *S'*. This process is the chemical reaction *S* + *S* ′ ⇄*S* ⋅*S* ′ . Figure 3 shows a scheme of the status of the two strands before and after the nucleation of the double helix. Before the nucleation, all the bases in each one of the two strands occupy the single strand state, and the two strands are sufficiently distant one from the other. Thereafter, during the nucleation, all the bases continue in the single strand state, but the strands are approaching one to the other via juxtaposition between the bases *bk* and *bk* ' (1*≤ k ≤ N*). We suppose, with this, that the nucleation occurs in the *k*th site of the double chain. Finally, after the nucleation, the formation of the WC first base pair occurs, that is, the base pair *bk* / *bk* ' is formed. Succeeding the nucleation event and the formation of the first base pair, we have the propagation of the double helix to both the directions, extending to the two ends of the chain, if the transition is a two-state process. As shown in Figure 3, the formation of the first WC base pair is constituted by one nucleation step followed by one propagation step. Therefore, the equilibrium constant *σsk* refers to the formation of the first WC base pair, through the establishment of hydrogen bonds between the bases *bk* and *bk* ' . If the free energy associated to the formation of the first base pair is Δ*G*init, then we can write the equilibrium constant *σsk* as

$$\sigma \mathbf{s}\_k = \exp\left\{-\frac{\Delta \mathbf{G}\_{\rm init}}{RT}\right\} \tag{33}$$

that is,

$$
\Delta \mathbf{G}\_{\rm init} = -RT \ln \sigma \mathbf{s}\_{\rm k} = \Delta \mathbf{G}\_{\rm mac} - RT \ln \mathbf{s}\_{\rm k}.\tag{34}
$$

In order to consider the propagation of the double helix from the nucleating base pair *bk* / *bk* ' and extending to both the ends, Eq. 24 can be modified for to produce:

$$K\_{\rm eq} = \left(\prod\_{i=1}^{k-1} s\_i^{\star \leftarrow} \right) \sigma s\_k \left(\prod\_{i=k+1}^N s\_i^{\rightarrow} \right) \tag{35}$$

In Eq. 35, σ is the nucleation equilibrium constant, *κ = σsk* is the initiation equilibrium constant (which is evidently related to the process of formation of the WC first base pair *bk* / *bk* ' ), and *si* ← (*i < k*) and *si* <sup>→</sup> (*i > k*) are the propagation equilibrium constants related to the propagation of the double helix, by stacking of the base pair *bi* / *bi* ' on the preexistent duplex, respectively, in the 3'–5' (downward) and 5'–3' (upward) directions. Thus, substituting Eq. 35 into Eq. 23, we obtain

$$
\Delta \mathbf{G}\_T = -RT \ln \left[ \left( \prod\_{i=1}^{k-1} \mathbf{s}\_i^{\leftarrow} \right) \sigma \mathbf{s}\_k \left( \prod\_{i=k+1}^N \mathbf{s}\_i^{\rightarrow} \right) \right] = -\sum\_{i=1}^{k-1} RT \ln \mathbf{s}\_i^{\leftarrow} - RT \ln \sigma \mathbf{s}\_k - \sum\_{i=k+1}^N RT \ln \mathbf{s}\_i^{\rightarrow}. \tag{36}
$$

As the propagation equilibrium constant depends on the local composition, we associate to the propagation equilibrium constant for the addition of the *i*th base pair, in downward direction, a value such that

state, but the strands are approaching one to the other via juxtaposition between the bases *bk*

pair, we have the propagation of the double helix to both the directions, extending to the two ends of the chain, if the transition is a two-state process. As shown in Figure 3, the formation of the first WC base pair is constituted by one nucleation step followed by one propagation step. Therefore, the equilibrium constant *σsk* refers to the formation of the first WC base pair,

associated to the formation of the first base pair is Δ*G*init, then we can write the equilibrium

init exp *<sup>k</sup> <sup>G</sup> <sup>s</sup> RT*

ì ü <sup>D</sup> = -í ý

init nuc ln ln . *G RT s G RT s k k* D =- = D s

In order to consider the propagation of the double helix from the nucleating base pair *bk* / *bk*

1 1 *k N ik i i i k K ss s* s

In Eq. 35, σ is the nucleation equilibrium constant, *κ = σsk* is the initiation equilibrium constant

the 3'–5' (downward) and 5'–3' (upward) directions. Thus, substituting Eq. 35 into Eq. 23, we

1 11 1 ln ln ln ln . *k Nk N T ik i i k i i i k i i k G RT s s s RT s RT s RT s*

= = + = = +

<sup>→</sup> (*i > k*) are the propagation equilibrium constants related to the propagation of

 s

ÕÕå å (36)

'

¬® ¬ ®

(which is evidently related to the process of formation of the WC first base pair *bk* / *bk*

¬ ® = = + æ öæ ö <sup>=</sup> ç ÷ç ÷ è øè ø

through the establishment of hydrogen bonds between the bases *bk* and *bk*

s

and extending to both the ends, Eq. 24 can be modified for to produce:

eq

1 1

é ù æ öæ ö D =- ê ú ç ÷ç ÷ =- - -


s

ë û è øè ø

the double helix, by stacking of the base pair *bi* / *bi*

1


 (1*≤ k ≤ N*). We suppose, with this, that the nucleation occurs in the *k*th site of the double chain. Finally, after the nucleation, the formation of the WC first base pair occurs, that is, the

is formed. Succeeding the nucleation event and the formation of the first base

'

î þ (33)

(34)

Õ Õ (35)

on the preexistent duplex, respectively, in

. If the free energy

.

'

'

), and *si* ←

and *bk* '

base pair *bk* / *bk*

constant *σsk* as

that is,

(*i < k*) and *si*

obtain

'

200 Nucleic Acids - From Basic Aspects to Laboratory Tools

$$-RT\ln s\_i^{\leftarrow} = \Lambda G(b\_i b\_{i+1}) \tag{37}$$

Analogously, the propagation equilibrium constant for the addition of the *i+*1th base pair, in upward direction, assumes a value such that

$$-RT\ln s\_{i+1}^{\rightarrow} = \Delta G \big( b\_i b\_{i+1} \big) \tag{38}$$

Thus, from Eqs. 37 and 38, the propagation equilibrium constants would be, to the light of the NN approach, given by

$$s\_i^{\leftarrow} = s\_{i+1}^{\rightarrow} = \exp\left[-\frac{\Delta G \left(b\_i b\_{i+1}\right)}{RT}\right] \tag{39}$$

The first summation in Eq. 36, −∑ *i*=1 *k*−1 *RT* ln*si* <sup>←</sup>, refers to the sum of the free energies of all the duplex dimers in downward direction related to the nucleating base pair *bk* / *bk* ' . In another words, such term is the total free energy related to the propagation of the double helix, starting from the nucleating base pair *bk* / *bk* ' and propagating in downward direction. From Eq. 37, we have clearly that <sup>−</sup> ∑ *i*=1 *k*−1 *RT* ln*si* <sup>←</sup> <sup>=</sup>∑ *i*=1 *k*−1 Δ*G*(*bi bi*+1). Now speaking about the second summation in Eq. 36, <sup>−</sup> ∑ *i*=*k*+1 *N RT* ln*si* <sup>→</sup>, it refers to the free energies of all the duplex dimers in upward direction related to the base pair *bk* / *bk* ' , that is, it is the total free energy related to the propagation of the double helix, starting from the base pair *bk* / *bk* ' and propagating in upward direction. Applying Eq. 38, we have <sup>−</sup> ∑ *i*=*k*+1 *N RT* ln*si* <sup>→</sup> <sup>=</sup>∑ *i*=*k N* −1 Δ*G*(*bi bi*+1). Thus, Eq. 36 can be rewritten as follows:

$$\begin{split} \Lambda G\_{\mathrm{T}} &= -RT \ln \left[ \left( \prod\_{i=1}^{k-1} s\_i^{\star \star} \right) \sigma s\_i \left( \prod\_{i=k+1}^{N} s\_i^{\star \star} \right) \right] \\ &= \sum\_{i=1}^{k-1} \Lambda G \left( b\_i b\_{i+1} \right) - RT \ln \sigma s\_k + \sum\_{i=k}^{N-1} \Lambda G \left( b\_i b\_{i+1} \right), \end{split} \tag{40}$$

that is,

$$\begin{split} \Delta \mathbf{G}\_{\mathrm{T}} &= -RT \ln \left[ \left( \prod\_{i=1}^{k-1} \mathbf{s}\_{i}^{\leftarrow} \right) \sigma \mathbf{s}\_{k} \left( \prod\_{i=k+1}^{N} \mathbf{s}\_{i}^{\rightarrow} \right) \right] \\ &= -RT \ln \sigma \mathbf{s}\_{k} + \sum\_{i=1}^{N-1} \Delta \mathbf{G} \left( \mathbf{b}\_{i} \mathbf{b}\_{i+1} \right), \end{split} \tag{41}$$

where Δ*G*init<sup>=</sup> <sup>−</sup>*RT* ln*σsk* is the initiation free energy, and ∑ *i*=1 *N* −1 Δ*G*(*bi bi*+1) is the sum of the dimer free energies. Defining Δ*G*(*Obk* ) = −*RT* ln*sk* the free energy change associated to the process of the "propagation" of the first WC base pair, we can rewrite Eq. 41 as

$$\begin{split} \Delta \mathbf{G}\_{\mathrm{T}} &= \Delta \mathbf{G}\_{\mathrm{init}} + \sum\_{i=1}^{N-1} \Delta \mathbf{G} \Big( b\_{i} b\_{i+1} \Big) \\ &= \Delta \mathbf{G}\_{\mathrm{max}} + \Delta \mathbf{G} \Big( \mathbf{O} b\_{i} \Big) + \sum\_{i=1}^{N-1} \Delta \mathbf{G} \Big( b\_{i} b\_{i+1} \Big). \end{split} \tag{42}$$

Equation 42 shows that the free energy for the duplex formation can be written in terms of the initiation or the nucleation free energy, producing two approaches completely equivalent (the two equalities in Eq. 42). We will prefer, however, the first because it permits to obtain directly the initiation free energy for the duplex formation, as it will be shown in the next section. In addition, the nucleation free energy can be calculated from the initiation free energy, as shown in Eq. 34. Then, for applying Eq. 42, we will assume that the event of nucleation can occur by approaching the strands to each other via juxtaposition between any bases *bk* and *bk* ' (1*≤ k ≤ N*), with equal probability. The "nucleating" base pair, in turn, can be an A/T or C/G base pair. Thus, if the event of the formation of the first base pair can occur at any site along the double chain with the same probability, we can write the observable initiation free energy as follows:

$$
\left\langle \Delta \mathcal{G}\_{\rm init} \right\rangle = p\_{A/T} \Delta \mathcal{G}^{\prime} \left( \mathcal{A}/T \right) + p\_{C/G} \Delta \mathcal{G}^{\prime} \left( \mathcal{C}/G \right). \tag{43}
$$

In Eq. 43, Δ*G*init is the observable initiation free energy, and *pA*/*<sup>T</sup>* and *pC*/*<sup>G</sup>* =1− *pA*/*<sup>T</sup>* are, respectively, the probabilities with which the first base pair formed in the DNA double chain is A/T and C/G base pair. Finally, Δ*G* ° (*A* / *T* ) and Δ*G* ° (*C* / *G*), are the free energy changes associated to the formation of the first base pair if it is an A/T or C/G base pair, respectively. As our approach is built on the hypothesis that the event of the formation of the first base pair can occur at any site along the chain with equal probability, the probabilities *pA*/*<sup>T</sup>* and *pC*/*<sup>G</sup>* are simply the compositions of A/T and C/G base pairs. Then, we have that *pA*/*<sup>T</sup>* = *χA*/*<sup>T</sup>* = *nA*/*<sup>T</sup>* / *N* , and *pC*/*<sup>G</sup>* = *χC*/*<sup>G</sup>* = *nC*/*<sup>G</sup>* / *N* , where *χ<sup>X</sup>* /*<sup>Y</sup>* and *nX* /*<sup>Y</sup>* are, in turn, respec‐ tively, the relative occurrence number (composition) and the number of *X*/*Y* base pairs occurring along the duplex oligomer in question. Equation 34 shows how the nucleation free energy can be calculated from the initiation free energy. Therefore, the observable nucleation free energy can be written as

$$
\left\langle \Delta G\_{\text{max}} \right\rangle = \left\langle \Delta G\_{\text{init}} \right\rangle + \left\langle RT \ln s\_{\text{k}} \right\rangle. \tag{44}
$$

The equilibrium constant *sk* is associated to the first propagation step, that is, to the formation of the first WC base pair, which can be an A/T or C/G base pair. Invoking newly our simplifying hypothesis, which establishes that the formation of the first base pair can occur with equal probability in any site along the chain, we can write that

$$
\Delta \left\langle RT \ln s\_{\boldsymbol{k}} \right\rangle = -\sum\_{\left\{\boldsymbol{b}\_{1}\boldsymbol{b}\_{2}\right\}} p\_{\boldsymbol{b}\_{1}\boldsymbol{b}\_{2}} \Delta G \left(\boldsymbol{b}\_{1}\boldsymbol{b}\_{2}\right),
\tag{45}
$$

where the summation is over all the possible duplex dimers occurring along the chain, that is, *<sup>b</sup>*1 and *b*<sup>2</sup> can be anyone of the four nucleotides A, T, C, or G. In Eq. 45, *pb*1*b*<sup>2</sup> is the probability with which the base pair *b*<sup>2</sup> / *b*<sup>2</sup> ' is preceded by the base pair *b*<sup>1</sup> / *b*<sup>1</sup> ' . Obviously, such probabilities are equal to the compositions of dimers along the double chain, that is, *pb*1*b*<sup>2</sup> = *χb*1*b*<sup>2</sup> , where *χb*1*b*<sup>2</sup> is the composition of the duplex dimer *b*1*b*<sup>2</sup> −*b*<sup>2</sup> ' *b*1 ' . Therefore,

$$
\left\langle RT \ln s\_k \right\rangle = \left\langle \Delta G \left( Ob\_k \right) \right\rangle = -\sum\_{\left\{ b\_1 b\_2 \right\}} \chi\_{b\_1 b\_2} \Delta G \left( b\_1 b\_2 \right). \tag{46}
$$

Equation 44 can be rewritten as follows:

that is,

202 Nucleic Acids - From Basic Aspects to Laboratory Tools

( )

( )

+

+

° ° D =D +D (43)

() ( )

=

Equation 42 shows that the free energy for the duplex formation can be written in terms of the initiation or the nucleation free energy, producing two approaches completely equivalent (the two equalities in Eq. 42). We will prefer, however, the first because it permits to obtain directly the initiation free energy for the duplex formation, as it will be shown in the next section. In addition, the nucleation free energy can be calculated from the initiation free energy, as shown in Eq. 34. Then, for applying Eq. 42, we will assume that the event of nucleation can occur by

with equal probability. The "nucleating" base pair, in turn, can be an A/T or C/G base pair. Thus, if the event of the formation of the first base pair can occur at any site along the double chain with the same probability, we can write the observable initiation free energy as follows:

init / () () / / /. *G p G AT p GCG A T C G*

In Eq. 43, Δ*G*init is the observable initiation free energy, and *pA*/*<sup>T</sup>* and *pC*/*<sup>G</sup>* =1− *pA*/*<sup>T</sup>* are, respectively, the probabilities with which the first base pair formed in the DNA double chain

associated to the formation of the first base pair if it is an A/T or C/G base pair, respectively. As our approach is built on the hypothesis that the event of the formation of the first base pair can occur at any site along the chain with equal probability, the probabilities *pA*/*<sup>T</sup>* and *pC*/*<sup>G</sup>* are simply the compositions of A/T and C/G base pairs. Then, we have that *pA*/*<sup>T</sup>* = *χA*/*<sup>T</sup>* = *nA*/*<sup>T</sup>* / *N* , and *pC*/*<sup>G</sup>* = *χC*/*<sup>G</sup>* = *nC*/*<sup>G</sup>* / *N* , where *χ<sup>X</sup>* /*<sup>Y</sup>* and *nX* /*<sup>Y</sup>* are, in turn, respec‐ tively, the relative occurrence number (composition) and the number of *X*/*Y* base pairs

(*A* / *T* ) and Δ*G* °

å

1

*i*=1

Δ*G*(*bi*

*N* −1

(41)

(42)

*bi*+1) is the sum of the dimer

'

(*C* / *G*), are the free energy changes

(1*≤ k ≤ N*),

+

1 1 1

ë û è øè ø

¬ ® = = + -

*k N*

s

Õ Õ

*i i k N k i i i*

1

free energies. Defining Δ*G*(*Obk* ) = −*RT* ln*sk* the free energy change associated to the process of

1 init 1 1 1


= -

å

*N T i i i N k i i i*

nuc 1 1

.

*G G Ob G b b*

approaching the strands to each other via juxtaposition between any bases *bk* and *bk*

*G G G bb*

=

å

1


é ù æ öæ ö D =- ê ú ç ÷ç ÷

ln ,

s

=- + D

*RT s G b b*

*T ik i*

*G RT s s s*

ln

where Δ*G*init<sup>=</sup> <sup>−</sup>*RT* ln*σsk* is the initiation free energy, and ∑

is A/T and C/G base pair. Finally, Δ*G* °

the "propagation" of the first WC base pair, we can rewrite Eq. 41 as

D =D + D

=D +D + D

$$
\Delta \left\{ \Delta \mathcal{G}\_{\text{nuc}} \right\} = -\sum\_{\left[\mathbf{b}\_1 \mathbf{b}\_2\right]} \mathcal{X}\_{\mathbf{b}\_1 \mathbf{b}\_2} \Delta \mathcal{G} \left( \mathbf{b}\_1 \mathbf{b}\_2 \right) + \underbrace{\mathcal{X}\_A \Delta \mathcal{G}}\_{\mathbf{T}}^\circ \left( A / T \right) + \underbrace{\mathcal{X}\_{\mathbf{C}/\mathbf{G}} \Delta \mathcal{G}}\_{\mathbf{C}}^\circ \left( \mathbf{C} / \mathbf{G} \right). \tag{47}
$$

From Eq. 47, it becomes clear that the nucleation free energy depends on the composition of the DNA double strand due to the presence of the terms *χA*/*<sup>T</sup>* , *χC*/*<sup>G</sup>*, and *χb*1*b*<sup>2</sup> , in the right side of the equation. According to Eq. 47, as there are 10 possible duplex dimers, Δ*G*nuc must be a function of 10 parameters: the already known eight polymeric irreducible parameters plus two parameters related to the formation of the first base pair, as defined in Eq. 43. We can simplify the approach contained in Eq. 47, discriminating the bases *b*1 and *b*2 only according to the weak– strong classification criteria. In this way, Eq. 47 becomes

$$\begin{cases} \Delta \mathcal{G}\_{\text{vac}} = -\underset{\text{sw}}{\mathcal{X}}\_{\text{uw}} \Delta \mathcal{G} \{ \text{uv}w \} - \underset{\text{sw}}{\mathcal{X}}\_{\text{us}} \Delta \mathcal{G} \{ \text{us} \} - \underset{\text{sw}}{\mathcal{X}}\_{\text{sw}} \Delta \mathcal{G} \{ \text{sw} \} - \\ -\underset{\text{sw}}{\mathcal{X}}\_{\text{us}} \Delta \mathcal{G} \{ \text{us} \} + \underset{\text{w}}{\mathcal{X}}\_{\text{u}} \Delta \mathcal{G}^{\circ} \{ \text{A} \,/\, T \} + \underset{\text{y}}{\mathcal{X}}\_{\text{s}} \Delta \mathcal{G}^{\circ} \{ \text{C} \,/\, G \} . \end{cases} \tag{48}$$

where Δ*G*(*ww*) is the mean free energy of a stack of two weak base pairs, *χww* is its composition, and so on. Using Eq. 12, we obtain the following values for the mean dimer free energies:

$$\begin{aligned} \Delta G(\textit{uvw}) &= \text{S} + \,\, V\_{\textit{z}} + \,\, M\_{\textit{zz}}\\ \Delta G(\text{ss}) &= \text{S} - \,\, V\_{\textit{z}} + \,\, M\_{\textit{zz}}\\ \Delta G(\text{ws}) &= \,\, \Delta G(\text{sw}) = \text{S} - \,\, M\_{\textit{zz}}. \end{aligned} \tag{49}$$

Inserting Eq. 49 into Eq. 48, we can obtain:

$$\begin{aligned} \left\{ \Lambda \mathbf{G}\_{\text{mc}} \right\} &= -\mathbf{S} - V\_z \left( \boldsymbol{\chi}\_{\text{mc}} - \boldsymbol{\chi}\_{\text{ss}} \right) - M\_{zz} \left( \boldsymbol{\chi}\_{\text{mc}} + \boldsymbol{\chi}\_{\text{ss}} - \boldsymbol{\chi}\_{\text{ns}} - \boldsymbol{\chi}\_{\text{ns}} \right) + \\ &+ \boldsymbol{\chi}\_{\text{w}} \Lambda \mathbf{G}^{\top} \left( \mathbf{A} / \mathbf{T} \right) + \boldsymbol{\chi}\_{\text{s}} \Lambda \mathbf{G}^{\top} \left( \mathbf{C} / \mathbf{G} \right) \end{aligned} \tag{50}$$

Equation 42 can be used to predict the free energy of any duplex oligomer if we know the values of all the polymeric irreducible parameters for free energy plus the free energy changes associated to the formation of the first base pair. Now, we can return to the set of 108 sequences compiled by Allawi and SantaLucia to obtain the set of eight polymeric irreducible parameters together with these two additional parameters. This will be done in the following section.

**Figure 3.** The formation of the first WC base pair. (a) Strands *S* and *S'* are sufficiently distant one from the other. All the bases in both the chains are in the single strand state. (b) It occurs an approximation between strands *S* and *S'.* However, all the bases are still in the single strand state. (c) It is formed the first base pair, namely the base pair *bk* / *bk* ' , through the establishment of H bonds between the bases *bk* and *bk* ' . After that, the double helix propagates in both the directions extending to the ends of the chain [17].
