**4. Irreducible representation**

Returning to the quantum mechanics formulation, our intention is to exploit remaining invariants and redundancies from the structure of the matrix operator present in Eq. 2 in order to further reduce its number of parameters. The three-dimensional nucleotide basis should be kept in mind. The sequence-dependent states of an observable will then assume discrete values given by a most compact expansion of its expectation as follows:

In quantum mechanics language, a | *x* base state, for example, is a ring number or purine– pyrimidine class state, whereas | *A* = | *x* + | *y* + | *z* is an adenine molecular state decom‐ posed in terms of proper nucleotide class subspaces. Any pure nucleotide state can thus be

**Figure 2.** Orthonormal *x*–*y*–*z* base set and tetrahedral DNA-nucleotide set representation. Each of the three axes distin‐ guishes a specific molecular class feature. Purines are distinguished from pyrimidines through *x*-coordinate. Amino is distinguished from keto through *y*-coordinate. And, finally, weak WC hydrogen-bridge binding is distinguished from

Each possible nucleotide pair shares one of its fundamental molecular structural characteristics as a group in a given class, which differs from the complementary pair as another group in the same class. This is latent when we observe Eq. 3, which translates perfectly well the intrinsic cubic symmetry of the tetrahedron. From now, we proceed to construct our approach, which will use a complete nucleotide representation, and, then, having seen based on this represen‐ tation, it will provide properties associated to each molecule decomposing them in terms of three differential affinity groups or classes. Therefore, the choice of a tetrahedral set is thus natural and convenient for its intrinsic orthogonality and symmetry properties, which are related to common molecular group classifications. Nevertheless, its main advantage is to fulfill the necessity for a three-dimensional bijective representation of a four-set composition.

Returning to the quantum mechanics formulation, our intention is to exploit remaining invariants and redundancies from the structure of the matrix operator present in Eq. 2 in order

represented in terms of molecular class states.

188 Nucleic Acids - From Basic Aspects to Laboratory Tools

stronger binding through *z*-coordinate [14].

**4. Irreducible representation**

$$\mathbf{E} = \sum\_{i} \left( \mathbf{S} + \mathbf{V} \left| b(i) \right> + \left< b(i) \right| M \left| b(i+1) \right> \right) \tag{4}$$

in substitution to Eq. 2; in Eq. 4, |*b*(*i*) are still the sequence nucleotide states at coordinate *i*, which are given in terms of class states by Eq. 3.

The bracket notation indicates vector and dyadic contractions as usual. The expansion in Eq. 4 is quite intuitive, in the sense that the first two terms represent linear contributions to a property from the sequence composition, whereas the third term comprises nonlinear effects due to NN interference or differential stacking interactions. Comparison with Eq. 2 allows the identification of its components. The first term is a constant or mean contribution to the observable, given as the invariant trace of the square expectation periodic matrix *S* =*Tr*( *Θ*). The trace represents a molecular state independent contraction of the self-matrix diagonal, where, by construction, any pure nucleotide component (Eq. 3) equally squares to one (*bμ* <sup>2</sup> =1). The remaining cross terms of the self-matrix similarly contract to a vector because all pure nucleotide states |*b*(*i*) also have cyclically multiplicative class components (*bx* = *bybz*, etc.). This contraction gives the second term as an order-independent or global-composition contribu‐ tion, with components *V* | =4 *Re* (*Θy*(1)*z*(1), *Θx*(1)*z*(1), *Θx*(1)*y*(1)). The third term is an NN or first-order sequence stacking contribution to the observable. The stacking matrix *M* is a secondrank tensor and has its elements given from the cross expectation matrix as *Mμ<sup>ν</sup>* =2 *Re*(*Θμ*(1)*ν*(2)). The symmetrical sum of the expectation matrix Hermitian conjugates results in a fully contracted real formulation.

Decomposition of nucleotide sequence observable expectation as given in Eq. 4 naturally leads to an irreducible 13-parameter description of physical properties (*S*, *Vμ*, and *Mμν*), which we call the symmetrical set, within the NN approximation. Note that a traditional description of stacking-dependent properties is often stated in terms of the NN dimer composition, that is, as a linear combination of the 16-ordered 5*′-*3*′* NN dimer set *Eij* :

$$\mathbf{E} = \sum\_{i,j=A\_\*T\_\*} \mathbf{N}\_{\vec{\eta}} \mathbf{E}\_{\vec{\eta}} \tag{5}$$

However, the NN dimer set is overspecified, that is, only a smaller set of NN combinations can be a priori obtained from inversions of Eq. 5 because Eq. 5 is supplemented by independent composition closure relations. For implicit circular sequences (or for very long sequences, i.e., polynucleotides), these can be taken as any three of the following:

$$\sum\_{b=A,\,T,\,C,\,G} \left( N\_{Ab} - N\_{bA} \right) = 0$$

$$\sum\_{b=A,\,T,\,C,\,G} \left( N\_{Tb} - N\_{bT} \right) = 0$$

$$\sum\_{b=A,\,T,\,C,\,G} \left( N\_{Gb} - N\_{bC} \right) = 0$$

$$\sum\_{b=A,\,T,\,C,\,G} \left( N\_{Gb} - N\_{bC} \right) = 0$$

reducing the number of independent dimers in the set to arbitrary 13. Similar arguments hold for linear oligomers. In comparison, the decomposition of physical properties in the symmet‐ rical set proposed here is in a fundamental level; since from the beginning, it includes only a priori linearly independent terms and gives contributions to the observable in the hierarchic form of three expectation tensors of increasing rank, corresponding to different levels of analysis. The 16-NN expectations can otherwise be easily obtained as a linear combination of the 13 symmetrical-set tensor components. In that case, it is useful to rewrite Eq. 4 in a form appropriate for NN dimer decomposition as follows:

$$\mathbf{E}\_{b(1)b(2)} = S + \left\langle V \middle| \frac{b\begin{pmatrix} 1 \end{pmatrix} + b\begin{pmatrix} 2 \end{pmatrix}}{2} \right\rangle + \left\langle b(1) \middle| M \middle| b(2) \right\rangle. \tag{7}$$

where, to correctly account for additivity, as given by Eq. 5 for each dimer in a sequence, the two nucleotide linear contributions are halved. Explicitly, one has applying Eq. 3 to Eq. 7:

$$\begin{aligned} \mathbf{E}\_{TA} &= \mathbf{S} + V\_z - \mathbf{M}\_{xx} - \mathbf{M}\_{zy} - \mathbf{M}\_{zx} - \mathbf{M}\_{yx} - \mathbf{M}\_{yz} - \mathbf{M}\_{yz} + \mathbf{M}\_{zx} + \mathbf{M}\_{zy} + \mathbf{M}\_{zz} \\ \mathbf{E}\_{AT} &= \mathbf{S} + V\_z - \mathbf{M}\_{xx} - \mathbf{M}\_{zy} + \mathbf{M}\_{zx} - \mathbf{M}\_{yx} - \mathbf{M}\_{yy} + \mathbf{M}\_{yz} - \mathbf{M}\_{zx} - \mathbf{M}\_{zy} + \mathbf{M}\_{zz} \\ \mathbf{E}\_{CA} &= \mathbf{S} + V\_y - \mathbf{M}\_{xx} - \mathbf{M}\_{xy} - \mathbf{M}\_{zx} + \mathbf{M}\_{yx} + \mathbf{M}\_{yy} + \mathbf{M}\_{yz} - \mathbf{M}\_{zx} - \mathbf{M}\_{zy} - \mathbf{M}\_{zz} \\ \mathbf{E}\_{TG} &= \mathbf{S} - V\_y - \mathbf{M}\_{xx} + \mathbf{M}\_{zy} + \mathbf{M}\_{zx} - \mathbf{M}\_{yx} + \mathbf{M}\_{yz} + \mathbf{M}\_{yz} + \mathbf{M}\_{zx} - \mathbf{M}\_{zy} - \mathbf{M}\_{zx} \end{aligned} \tag{8}$$

and so on. Tensor elements can be either conversely determined from reported dimer values or self-consistently derived from fits to raw polynucleotide data using Eqs. 8 and 5, or directly from Eq. 4, while from a theoretical point of view, molecular symmetry arguments or ab initio calculations could be used to guess tensor structure and values.

#### **4.1. Double strands**

For measurements concerning double strands, aside end effects, it is well known that com‐ plementary strand symmetry further reduces the problem to the statement of only 10 conju‐ gated NN dimer pair values (see the expressions in Eq. 12) linked through two independent composition closure relations as follows:

A Review on the Thermodynamics of Denaturation Transition of DNA Duplex Oligomers in the Context... http://dx.doi.org/10.5772/62574 191

$$\sum\_{b=A,\ T,\ C,\ G} \left(N\_{Ab} - N\_{bA}\right) = 0,\tag{9}$$

$$\sum\_{b=A,\ T,\ C,\ G} \left(N\_{Cb} - N\_{bC}\right) = 0,$$

so that only eight independent parameters should result, while the difficulties in defining a 10-dimer set of parameters from a given set of experimental data persist. In that case, com‐ plementary strand A/T and C/G pairing symmetry in a dimer, as expressed in Eq. 3, gives the conjugate NN base component relations as follows:

( )

*N N*

*Ab bA*

0





0

0

0,

**E** = + + (7)

(6)

(8)

, , ,

*bATCG*

å

=

190 Nucleic Acids - From Basic Aspects to Laboratory Tools

=

= =

å

appropriate for NN dimer decomposition as follows:

()( )

 

composition closure relations as follows:

**E E E**

**4.1. Double strands**

, , ,

*bATCG*

å

, , , ,,,

*bATCG*

å

( )

*N N*

*Tb bT*

( )

*N N*

*Cb bC*

( )

reducing the number of independent dimers in the set to arbitrary 13. Similar arguments hold for linear oligomers. In comparison, the decomposition of physical properties in the symmet‐ rical set proposed here is in a fundamental level; since from the beginning, it includes only a priori linearly independent terms and gives contributions to the observable in the hierarchic form of three expectation tensors of increasing rank, corresponding to different levels of analysis. The 16-NN expectations can otherwise be easily obtained as a linear combination of the 13 symmetrical-set tensor components. In that case, it is useful to rewrite Eq. 4 in a form

*N N*

() ( ) ( ) ( ) 1 2 1 2 1 2, <sup>2</sup> *b b b b S V b Mb* +

*TA z xx xy xz yx yy yz zx zy zz AT z xx xy xz yx yy yz zx zy zz CA y xx xy xz yx yy yz zx zy zz*

=+ - - - - - - + + + =+ - - + - - + - - + =+ - - - + + + - - -

**E** *z zx zy zz* +-- *MMM*

and so on. Tensor elements can be either conversely determined from reported dimer values or self-consistently derived from fits to raw polynucleotide data using Eqs. 8 and 5, or directly from Eq. 4, while from a theoretical point of view, molecular symmetry arguments or ab initio

For measurements concerning double strands, aside end effects, it is well known that com‐ plementary strand symmetry further reduces the problem to the statement of only 10 conju‐ gated NN dimer pair values (see the expressions in Eq. 12) linked through two independent

*SV M M M M M M M M M SV M M M M M M M M M SV M M M M M M M M M*

*TG y xx xy xz yx yy y*

=- - + + - + +

calculations could be used to guess tensor structure and values.

*SV M M M M M M*

where, to correctly account for additivity, as given by Eq. 5 for each dimer in a sequence, the two nucleotide linear contributions are halved. Explicitly, one has applying Eq. 3 to Eq. 7:

*b ATCG Gb bG*

$$\begin{aligned} b'\_x(1) &= -b\_x(2); \; b'\_x(2) = -b\_x(1), \\ b'\_y(1) &= -b\_y(2); \; b'\_y(2) = -b\_y(1), \\ b'\_z(1) &= b\_z(2); \; b'\_z(2) = \; b\_z(1), \end{aligned} \tag{10}$$

where primed bases correspond to the complementary dimer and numerals correspond to the first and second nucleotides along 5′-3′ direction for each strand, that is, both order and *x*,*y* coordinates are inverted for the conjugate pair.

The double-strand expansion can be given as a function of a single-strand sequence taking into account the aforementioned implicit symmetries (by adding contributions from both strands to Eq. 7 taking into account Eq. 10 and then redefining the tensor set, that is, **E**' *<sup>b</sup>*1*b*<sup>2</sup> = **E***b*1*b*<sup>2</sup> + **E***b*1′ *<sup>b</sup>*2′). It is clear in that case that

$$V\_x = V\_z = 0,\\
\left.M\_{xy} = M\_{yx'},\\M\_{xz} = -M\_{zx'},\\\left.M\_{yz} = -M\_{zy}\right.\tag{11}$$

correctly reducing the number of independent elementary tensor set values to 8. From Eqs. 7 and 11, the decomposition for the 10 paired NNs gives a self-consistent set of expectations obeying

 2 2 2 2 2 2 2 2 2 2 *TA z xx yy zz xy xz yz AT z xx yy zz xy xz yz AA TT z xx yy zz xy AG CT xx yy zz xz GA TC xx yy zz xz AC GT xx yy zz SV M M M M M M SV M M M M M M SV M M M M SM M M M SM M M M SM M M* - - - - =+ - - + - - - =+ - - + - + + =+ + + + + =+ - - - =+ - - + =- + - - **E E E E E E** 2 2 2 2 2 2 2 2 *yz CA TG xx yy zz yz GG CC z xx yy zz xy CG z xx yy zz xy xz yz GC z xx yy zz xy xz yz M SM M M M SV M M M M SV M M M M M M SV M M M M M M* - - =- + - + =- + + + - =- - - + + + - =- - - + + - + **E E E E** (12)

while the symmetrical set of eight tensor parameters can be inferred from the inverse relations

$$\begin{aligned} S\_{1} &= \frac{1}{16} \left[ 2\left(\mathbf{E}\_{AA-CT} + \mathbf{E}\_{AC-CT} + \mathbf{E}\_{AC-CT} + \mathbf{E}\_{AC-CT} + \mathbf{E}\_{AC-CT} + \mathbf{E}\_{GC-CT} \right) + \left(\mathbf{E}\_{TA} + \mathbf{E}\_{AT} + \mathbf{E}\_{CG} + \mathbf{E}\_{CC} \right) \right] \\ V\_{2} &= \frac{1}{8} \left[ 2\left(\mathbf{E}\_{AA-CT} - \mathbf{E}\_{GC-CT} \right) + \left(\mathbf{E}\_{TA} + \mathbf{E}\_{AT} - \mathbf{E}\_{CG} - \mathbf{E}\_{CC} \right) \right] \\ M\_{xx} &= \frac{1}{16} \left[ 2\left(\mathbf{E}\_{AA-CT} + \mathbf{E}\_{AC-CT} + \mathbf{E}\_{AC-CT} - \mathbf{E}\_{AC-CT} - \mathbf{E}\_{CA-TC} + \mathbf{E}\_{GC-CT} \right) - \left(\mathbf{E}\_{TA} + \mathbf{E}\_{AT} + \mathbf{E}\_{CG} + \mathbf{E}\_{GC} \right) \right] \\ M\_{yy} &= \frac{1}{16} \left[ 2\left(\mathbf{E}\_{AA-CT} - \mathbf{E}\_{AC-CT} - \mathbf{E}\_{AC-CT} + \mathbf{E}\_{AC-CT} + \mathbf{E}\_{GC-CT} \right) - \left(\mathbf{E}\_{TA} + \mathbf{E}\_{AT} + \mathbf{E}\_{CG} + \mathbf{E}\_{GC} \right) \right] \\ M\_{zz} &= \frac{1}{16} \left[ 2\left(\mathbf{E}\_{AA-CT} - \mathbf{E}\_{AC-CT} - \mathbf{E}\_{AC-CT} - \mathbf{E}\_{AC-CT} + \mathbf{E}\_{GC-CT} \right) + \left(\mathbf{E}\_{TA} + \mathbf{E}\_{AT} + \mathbf{E$$

This decomposition enlightens the meaning of the composition-free *S* term as the 16-dimer ensemble mean expectation value and of *Vz* as the half-differential expectation between ATcontaining and CG-containing dimers. Most importantly, the double determination of *Mxz* and *M yz* values in the last two expressions in Eq. 13 should coincide for a self-consistent set of dimer values. Explicitly, self-consistency introduces links relating to composition order symmetry among dimer properties as follows:

$$\begin{aligned} \mathbf{E}\_{\rm AT} - \mathbf{E}\_{\rm TA} + \mathbf{E}\_{\rm CG} - \mathbf{E}\_{\rm GC} &= \mathbf{2} \left( \mathbf{E}\_{\rm CA-TC} - \mathbf{E}\_{\rm AG-CT} \right) \\ \mathbf{E}\_{\rm AT} - \mathbf{E}\_{\rm TA} + \mathbf{E}\_{\rm GC} - \mathbf{E}\_{\rm CG} &= \mathbf{2} \left( \mathbf{E}\_{\rm CA-TC} - \mathbf{E}\_{\rm AC-GT} \right) . \end{aligned} \tag{14}$$

Note that, analogous to the composition closure relations (Eq. 9), the dimer expectation selfconsistency relations (Eq. 14) may also be combined to read as follows:

$$\sum\_{b=A\_\*\text{ T}\_\*\text{ C}\_\*\text{ C}\_\*} \left(\mathbf{E}\_{Ab} - \mathbf{E}\_{b\text{ A}}\right) = \mathbf{0},$$
 
$$\sum\_{b=A\_\*\text{ T}\_\*\text{ C}\_\*\text{ C}\_\*\text{ C}} \left(\mathbf{E}\_{\text{D}} - \mathbf{E}\_{\text{b}\text{ C}}\right) = \mathbf{0}.$$

#### **5. The modeling based on end effects**

From now, we proceed to extend the irreducible model to investigate how it accommodates end effects. For the case of circular DNA, or even, for a DNA polymer, knowing the eight (polymeric) irreducible parameters (*S*, *Vz*, and the six elements of the *M* matrix) is sufficient for the prediction of additive physical properties. For an oligomer, additional end effects would become important and would need to be accounted for. Thus, to correctly account such effects for, consider the following duplex sequence as follows:

while the symmetrical set of eight tensor parameters can be inferred from the inverse relations

( ) ( )

**E E E E E E EEEE**

= + + + + + + +++ <sup>é</sup> <sup>ù</sup> <sup>ë</sup> <sup>û</sup>

*xx AA TT AG CT GA TC AC GT CA TG GG CC TA AT CG GC*

*zz AA TT AG CT GA TC AC GT CA TG GG CC TA AT CG GC*

= + + - - + - +++ <sup>é</sup> <sup>ù</sup> <sup>ë</sup> <sup>û</sup>


= - - - - + + +++ <sup>é</sup> <sup>ù</sup> <sup>ë</sup> <sup>û</sup>

*AA TT AG CT GA TC AC GT CA TG GG CC TA AT CG GC*

( ) ( )

(**E E** ) ( )

<sup>é</sup> - + + + - +++ <sup>ù</sup> <sup>ë</sup> <sup>û</sup>

*GA TC AC GT CA TG GG CC TA AT CG GC*

(13)

(15)

**E E E E EEEE**

*GA TC*

**E**


( ) ( )


**EEEE E E** (14)

2.

( ) ( )

**E E E E E E EEEE**


This decomposition enlightens the meaning of the composition-free *S* term as the 16-dimer ensemble mean expectation value and of *Vz* as the half-differential expectation between ATcontaining and CG-containing dimers. Most importantly, the double determination of *Mxz* and *M yz* values in the last two expressions in Eq. 13 should coincide for a self-consistent set of dimer values. Explicitly, self-consistency introduces links relating to composition order symmetry

2

Note that, analogous to the composition closure relations (Eq. 9), the dimer expectation self-

( ) ( ) , , ,

**E E**

*Ab bA*

**E E**

From now, we proceed to extend the irreducible model to investigate how it accommodates end effects. For the case of circular DNA, or even, for a DNA polymer, knowing the eight (polymeric) irreducible parameters (*S*, *Vz*, and the six elements of the *M* matrix) is sufficient

0,



0.

*AT TA CG GC GA TC AG CT AT TA GC CG CA TG AC GT*


consistency relations (Eq. 14) may also be combined to read as follows:

*bATCG*

å

= =

å

**5. The modeling based on end effects**

, , ,

*bATCG Cb bC*

**E E E E E E EEEE**

( ) ( )

( ) ( )

( ) ( ) 1 1

**EEEE E E**

**E E EEEE**

**E E EEEE**




**EEEE E** ( )

*Z AA TT GG CC TA AT CG GC*

( )

*xz TA AT CG GC AG CT*

= - + + - =- + -

*Myz TA AT CG GC AC GT CA TG <sup>x</sup>*


*xy AA TT GG CC TA AT CG GC*

= - - +-- é ù ë û

1 1 16 8 4

16 8 4

= - + +-- é ù ë û

*yy AA TT AG CT*

among dimer properties as follows:

= -


192 Nucleic Acids - From Basic Aspects to Laboratory Tools



*S*

*V*

*M*

*M*

*M*

*M*

*<sup>M</sup> <sup>x</sup>*

$$\begin{aligned} \; &E \; b\_1 b\_2 b\_3 \cdots b\_N E\\ \; &E b\_1 b\_2' b\_3' \cdots b\_N' E \end{aligned} \tag{16}$$

where, according to the notation introduced by Gray [10, 11], *E* is a pseudo-base indicating the terminations of the sequence. Pseudo-base *E* simply would represent one of the NNs to the end base pairs, and, under this viewpoint, it indicates interactions between the end base pairs and the surrounding solvent. Following the reasoning line suggested by Licınio and Guerra [13] and introduced in Section 3 of this review, | *A* , |*T* , |*C* , and |*G* , in Eq. 3, would correspond to the 3D part of 4D vectors with the fourth component equals to zero, and | *E* would be a new molecular state, linearly independent with | *A* , |*T* , |*C* , and |*G* , and written as follows:

$$\begin{Bmatrix} E \\ \end{Bmatrix} = \begin{Bmatrix} 0 \\ 0 \\ 0 \\ 1 \end{Bmatrix} \tag{17}$$

Then, applying Eq. 7, and, considering Eq. 11, for the duplex dimer *Eb*<sup>1</sup> −*b*' <sup>1</sup>*E*, we obtain the contribution of the end base pair *b*<sup>1</sup> / *b*' <sup>1</sup> for the thermodynamical stability of the sequence, and analog reasoning can be applied for the end base pair *bN* / *b*' *<sup>N</sup>* . Thus, for the pseudo-duplex dimer *Eb*<sup>1</sup> −*b*' <sup>1</sup>*E*,

$$\mathbf{E}\left(Eb\_{1}\right) = A + B\mathbf{x}\_{1} + Cy\_{1} + D\mathbf{z}\_{1\_{\text{\textquotedblleft}}} \tag{18}$$

where *A*, *B*, *C*, and, *D* are parameters that determine the property under consideration. And for the pseudo-duplex dimer *bN E* −*Eb*' *<sup>N</sup>* , :

$$\mathbf{E}\left(b\_{N}\mathbf{E}\right) = \mathbf{A} + B\mathbf{x}'\_{N} + \mathbf{C}\mathbf{y}'\_{N} + D\mathbf{z}'\_{N\_{\ast}} \tag{19}$$

where, in Eqs. 18 and 19, *xk* is the *x*-component of the vector |*bk* , and so on. According to Eqs. 18 and 19, the orientation of the end base pair would be important; for example, one A/T end base pair would not produce the same effect as one T/A end base pair. Therefore, at least in theory, it would be necessary to discriminate four-end pairings, which are listed in the following:

$$\begin{array}{ccccccccc}\text{EA}\_{\prime} & \text{ET}\_{\prime} & \text{EC}\_{\prime} & \text{EG} & & & \\ \text{ET}\_{\prime} & \text{EA}\_{\prime} & \text{EG}\_{\prime} & \text{EC}\_{\prime} & & \\ \end{array} \tag{20}$$

Finally, we can conclude that the four possible end base pairs in Eq. 20 can be expanded in terms of four parameters, namely *A*, *B*, *C*, and *D*. Consequently, for a duplex oligomer, the additional four parameters related to the ends should be added to the eight polymeric parameters already known, producing a total of 12 irreducible parameters, in the light of the modeling based on the end effects.
