**Thermodynamics of Microarray Hybridization**

Raul Măluţan and Pedro Gómez Vilda

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51624

## **1. Introduction**

Microarrays make the use of hybridization properties of nucleic acids to monitor Deoxyribonucleic acid (DNA) or Ribonucleic acid (RNA) abundance on a genomic scale in different types of cells. The hybridization process takes place between surface-bound DNA sequences - the probes, and the DNA or RNA sequences in solution - the targets. Hybridization is the process of combining complementary, single-stranded nucleic acids into a single molecule. Nucleotides will bind to their complement under normal conditions, so two perfectly complementary strands will bind to each other readily. Conversely, due to the different geometries of the nucleotides, a single inconsistency between the two strands will prevent them from binding.

In oligonucleotide microarrays hundreds of thousands of oligonucleotides are synthesized *in situ* by means of photochemical reaction and mask technology. Probe design in these microarrays is based on complementarity to the selected gene or an expressed sequence tag (EST) reference sequence. An important component in designing an oligonucleotide array is ensuring that each probe binds to its target with high specificity.

The dynamics of the hybridization process underlying genomic expression is complex as thermodynamic factors influencing molecular interaction are still fields of important research [1] and their effects are not taken into account in the estimation of genetic expression by the algorithms currently in use.

## **2. State of the art**

Many techniques have been developed to identify trends in the expression levels inferred from DNA microarray data, and recently the attention was devoted to methods to obtain accurate expression levels from raw data on the underlying principles of the thermodynamics and hybridization kinetics. The development of DNA chips for rapidly

screening and sequencing unknown DNA segments mainly relies on the ability to predict the thermodynamic stability of the complexes formed by the oligonucleotide probes.

Thermodynamics of Microarray Hybridization 465

DNA is a nucleic acid that contains the genetic instructions monitoring the biological development of all cellular forms of life, and many viruses. DNA is a long polymer of nucleotides and encodes the sequence of the amino-acid residues in proteins using the genetic code, a triplet code of nucleotides. DNA it is organized as two complementary strands, head-to-toe, with the hydrogen bonds between them. Each strand of DNA is a chain of chemical "building blocks", called nucleotides, of which there are four types: adenine (A), cytosine (C), guanine (G) and thymine (T). Between the two strands, each base can only bond with one single predetermined other base: A with T, T with A, C with G, and G with

Hybridization refers to the annealing of two nucleic acid strands following the base pairing rule. As shown in Figure 1, at high temperatures approximately 90°C to 100°C the complementary strands of DNA separate, denature, yielding single-stranded molecules. Two single strands under appropriate conditions of time and temperature e.g. 65°C, will renaturate to form the double stranded molecule. Nucleic acid hybrids can be formed between two strands of DNA, two strands of RNA or one strand of DNA and one of RNA. Nucleic acids hybridization is useful in detecting DNA or RNA sequences that are complementary

**Figure 1.** DNA-RNA hybridization. Hybridization is the process of combining complementary, single-

stranded nucleic acids into a single molecule. (from [12])

**3. DNA hybridization** 

to any isolated nucleic acid.

C, being the only possible combination.

The thermodynamics of nucleic acids have been studied from different points of view. Wu *et al.* [2] analyze the temperature-independent and temperature-dependent thermodynamic parameters of DNA/DNA and RNA/DNA oligonucleotide duplexes. The differences between DNA polymer and oligonucleotide nearest-neighbour thermodynamic trends, and the salt dependence of nucleic acid denaturation allowed to SantaLucia [3] to show that there is length dependence to salt effects but not to the nearest-neighbour propagation energies.

An early study on DNA microarray hybridization [4] found that it was strongly dependent on the rate constants for DNA adsorption/desorption in the non-probe covered regions of the surface, the two-dimensional diffusion coefficient, and the size of probes and targets and also suggested that sparse probe coverage may provide results equal to or better than those obtained with a surface totally covered with DNA probes. A theoretical analysis of the kinetics of DNA hybridization demonstrated that diffusion was important in determining the time required to reach equilibrium and was proportional to the equilibrium binding constant and to the concentration of binding sites [5].

Newer studies on hybridization kinetics and thermodynamics reveal that perfect match sequences require less time to reach saturation than mismatches. The experimental results of Dai *et al.* [6] exhibit inverse temporal behaviour, resulting that surface-bound oligos hybridizing primarily with their perfect complement sequence tend to equilibrate more slowly than do those whose binding is dominated by mismatch duplexes. Considering the assumptions, it has been demonstrated [7] that the hybridization time can in fact increase the accuracy of expression ratios, and that this effect may be more dramatic for larger fold changes. Separation between specific and nonspecific binding events can avoid the confusion about what RNA hybridizes the probes. In this case analysis of the perfect match and mismatch intensities in terms of simple single-base related parameters indicates that the intensity of complementary MM introduces a systematic source of variation compared with the intensity of the respective PM probe [8].

The hybridization of nucleic acids was modelled [9] according with the supposition that the process of hybridization goes through an intermediate state in which an initial short contact region has a single-stranded conformation prior to binding.

The hybridization theory gave the possibility of developing models that can be used to obtain improved measures of expression useful for data analysis. Naef and Magnasco [10] propose a simpler model to describe the probe effect that considers only the sequence composition of the probes. They demonstrate that the interactions between nearest neighbours add much predictive power for specific signal probe effects. The stochastic model proposed by Wu and Irizarry [11] can be used to improve the expression measure or in the normalization and summarization of the data.

## **3. DNA hybridization**

464 Thermodynamics – Fundamentals and Its Application in Science

constant and to the concentration of binding sites [5].

the intensity of the respective PM probe [8].

region has a single-stranded conformation prior to binding.

in the normalization and summarization of the data.

energies.

screening and sequencing unknown DNA segments mainly relies on the ability to predict

The thermodynamics of nucleic acids have been studied from different points of view. Wu *et al.* [2] analyze the temperature-independent and temperature-dependent thermodynamic parameters of DNA/DNA and RNA/DNA oligonucleotide duplexes. The differences between DNA polymer and oligonucleotide nearest-neighbour thermodynamic trends, and the salt dependence of nucleic acid denaturation allowed to SantaLucia [3] to show that there is length dependence to salt effects but not to the nearest-neighbour propagation

An early study on DNA microarray hybridization [4] found that it was strongly dependent on the rate constants for DNA adsorption/desorption in the non-probe covered regions of the surface, the two-dimensional diffusion coefficient, and the size of probes and targets and also suggested that sparse probe coverage may provide results equal to or better than those obtained with a surface totally covered with DNA probes. A theoretical analysis of the kinetics of DNA hybridization demonstrated that diffusion was important in determining the time required to reach equilibrium and was proportional to the equilibrium binding

Newer studies on hybridization kinetics and thermodynamics reveal that perfect match sequences require less time to reach saturation than mismatches. The experimental results of Dai *et al.* [6] exhibit inverse temporal behaviour, resulting that surface-bound oligos hybridizing primarily with their perfect complement sequence tend to equilibrate more slowly than do those whose binding is dominated by mismatch duplexes. Considering the assumptions, it has been demonstrated [7] that the hybridization time can in fact increase the accuracy of expression ratios, and that this effect may be more dramatic for larger fold changes. Separation between specific and nonspecific binding events can avoid the confusion about what RNA hybridizes the probes. In this case analysis of the perfect match and mismatch intensities in terms of simple single-base related parameters indicates that the intensity of complementary MM introduces a systematic source of variation compared with

The hybridization of nucleic acids was modelled [9] according with the supposition that the process of hybridization goes through an intermediate state in which an initial short contact

The hybridization theory gave the possibility of developing models that can be used to obtain improved measures of expression useful for data analysis. Naef and Magnasco [10] propose a simpler model to describe the probe effect that considers only the sequence composition of the probes. They demonstrate that the interactions between nearest neighbours add much predictive power for specific signal probe effects. The stochastic model proposed by Wu and Irizarry [11] can be used to improve the expression measure or

the thermodynamic stability of the complexes formed by the oligonucleotide probes.

DNA is a nucleic acid that contains the genetic instructions monitoring the biological development of all cellular forms of life, and many viruses. DNA is a long polymer of nucleotides and encodes the sequence of the amino-acid residues in proteins using the genetic code, a triplet code of nucleotides. DNA it is organized as two complementary strands, head-to-toe, with the hydrogen bonds between them. Each strand of DNA is a chain of chemical "building blocks", called nucleotides, of which there are four types: adenine (A), cytosine (C), guanine (G) and thymine (T). Between the two strands, each base can only bond with one single predetermined other base: A with T, T with A, C with G, and G with C, being the only possible combination.

Hybridization refers to the annealing of two nucleic acid strands following the base pairing rule. As shown in Figure 1, at high temperatures approximately 90°C to 100°C the complementary strands of DNA separate, denature, yielding single-stranded molecules. Two single strands under appropriate conditions of time and temperature e.g. 65°C, will renaturate to form the double stranded molecule. Nucleic acid hybrids can be formed between two strands of DNA, two strands of RNA or one strand of DNA and one of RNA. Nucleic acids hybridization is useful in detecting DNA or RNA sequences that are complementary to any isolated nucleic acid.

**Figure 1.** DNA-RNA hybridization. Hybridization is the process of combining complementary, singlestranded nucleic acids into a single molecule. (from [12])

Finding the location of a gene or gene product by adding specific radioactive or chemically tagged probes for the gene and detecting the location of the radioactivity or chemical on the chromosome or in the cell after hybridization is called *in-situ* hybridization.

Thermodynamics of Microarray Hybridization 467

knowledge of the equilibrium state. An understanding of the equilibrium state is also necessary to identify the relative importance of kinetic controls of the performance of the DNA microarrays. The effect of the cross-hybridization on probe intensity is predictable in the oligonucleotide microarrays, and models for avoiding this have been developed [14],

[15], [16] some aspects of it going to be described in the following section.

**Figure 3.** Cross-hybridization on a nucleotide probe. In specific hybridization the sequences are completely complementary, while in non-specific or cross hybridization the sequences contain

Black and Hartley [18] define enthalpy as the sum of the internal energy of a thermodynamic system plus the energy associated with work done by the system on the

Because enthalpy is a property, its value can be determined for a simple compressible substance once two independent, intensive thermodynamic properties of the substance are known, and the change in enthalpy is independent of the path followed between two

> *<sup>Q</sup> dS T*

*H U pV* (1)

(2)

atmosphere, which is the product of the pressure times the volume, as in equation (1)

**4. Technical factors affecting gene expression** 

In [18] the entropy, *S*, was defined using the following equation:

**4.1. Thermodynamics parameters** 

mismatches. (from [17])

equilibrium states

In the same way, in microarray technology, hybridization is used in comparing mRNA abundance in two samples, or in one sample and a control. RNA from the sample and control are extracted and labeled with two different fluorescent labels, *e.g.* a red dye for the RNA from the sample population and green dye for that from the control population. Both extracts are washed over the microarray and gene sequences from the extracts hybridized to their complementary single-strand DNA molecule previously attached to the microarray. Then, to measure the abundance of the hybridized RNA, the array is excited by a laser.

In the oligonucleotide microarrays the hybridization process occurs in the same way, the only difference here is that the sequences to be laid over the chip are sequences of 25 nucleotides length, perfect complementary to same length sequence of the gene, PM – perfect match, and sequences of 25 nucleotides length, designed to correspond to PM, but having the middle base - the 13th one, changed by its complementary base, MM – mismatch, as in Figure 2. The MM probes give some estimates of the random hybridization and cross hybridization signals. One principle to be followed in the design of oligonucleotide arrays is ensuring that the probes bind to their target with high accuracy. When the two strands are completely complementary they will bind by a specific hybridization, as it can be seen in Figure 3. On the contrary if there are mismatches between the nucleotides of the strands and they bind, a process called non-specific hybridization or cross-hybridization occurs.

The hybridization process has been studied from point of view of interaction between base pairs, the interaction with unintended targets and also from its kinetics processes. Because in practice the DNA chips are immersed in the target solution for a relatively short time, the arrival to equilibrium is not guaranteed. Yet full analysis of the reaction kinetics requires knowledge of the equilibrium state. An understanding of the equilibrium state is also necessary to identify the relative importance of kinetic controls of the performance of the DNA microarrays. The effect of the cross-hybridization on probe intensity is predictable in the oligonucleotide microarrays, and models for avoiding this have been developed [14], [15], [16] some aspects of it going to be described in the following section.

**Figure 3.** Cross-hybridization on a nucleotide probe. In specific hybridization the sequences are completely complementary, while in non-specific or cross hybridization the sequences contain mismatches. (from [17])

## **4. Technical factors affecting gene expression**

#### **4.1. Thermodynamics parameters**

466 Thermodynamics – Fundamentals and Its Application in Science

Finding the location of a gene or gene product by adding specific radioactive or chemically tagged probes for the gene and detecting the location of the radioactivity or chemical on the

In the same way, in microarray technology, hybridization is used in comparing mRNA abundance in two samples, or in one sample and a control. RNA from the sample and control are extracted and labeled with two different fluorescent labels, *e.g.* a red dye for the RNA from the sample population and green dye for that from the control population. Both extracts are washed over the microarray and gene sequences from the extracts hybridized to their complementary single-strand DNA molecule previously attached to the microarray. Then, to measure the abundance of the hybridized RNA, the array is excited by a laser.

In the oligonucleotide microarrays the hybridization process occurs in the same way, the only difference here is that the sequences to be laid over the chip are sequences of 25 nucleotides length, perfect complementary to same length sequence of the gene, PM – perfect match, and sequences of 25 nucleotides length, designed to correspond to PM, but having the middle base - the 13th one, changed by its complementary base, MM – mismatch, as in Figure 2. The MM probes give some estimates of the random hybridization and cross hybridization signals. One principle to be followed in the design of oligonucleotide arrays is ensuring that the probes bind to their target with high accuracy. When the two strands are completely complementary they will bind by a specific hybridization, as it can be seen in Figure 3. On the contrary if there are mismatches between the nucleotides of the strands and

they bind, a process called non-specific hybridization or cross-hybridization occurs.

5' 3' x x x x x x x x x x x

TGTGATGGTGGAATGGTCAGAAGGACTCCTATGATACACCCACGCA CAGTCTTCCTGAGGATACTATGTGG CAGTCTTCCTGACGATACTATGTGG

Perfect Match Mismatch

Perfect Match probe cells

Mismatch probe cells

mRNA reference sequence

**Figure 2.** Perfect Match – Mismtach probeset strategy. Sequence of 25-mer length complementary to the selected part of mRNA sequence form a Perfect Match probe, while the Mismatch probe is artificially created by changing middle base with its complementary. In an oligonucleotide array a gene is

The hybridization process has been studied from point of view of interaction between base pairs, the interaction with unintended targets and also from its kinetics processes. Because in practice the DNA chips are immersed in the target solution for a relatively short time, the arrival to equilibrium is not guaranteed. Yet full analysis of the reaction kinetics requires

represented by 11 to 20 probes. (modified from [13])

Probesets of PM and MM

chromosome or in the cell after hybridization is called *in-situ* hybridization.

Black and Hartley [18] define enthalpy as the sum of the internal energy of a thermodynamic system plus the energy associated with work done by the system on the atmosphere, which is the product of the pressure times the volume, as in equation (1)

$$H = \mathcal{U} + pV\tag{1}$$

Because enthalpy is a property, its value can be determined for a simple compressible substance once two independent, intensive thermodynamic properties of the substance are known, and the change in enthalpy is independent of the path followed between two equilibrium states

In [18] the entropy, *S*, was defined using the following equation:

$$dS = \frac{\delta Q}{T} \tag{2}$$

where *Q* is an amount of heat introduced to the system and *T* is a constant absolute temperature. Since this definition involves only differences in entropy, the entropy itself is only defined up to an arbitrary additive constant.

Thermodynamics of Microarray Hybridization 469

is self complementary and zero if it is not self-complementary. The total difference in the

For a specific temperature one can compute the total free energy using the values from Table 1. As described in [19] the melting temperature *Tm* is defined as the temperature at which half of the strands are in double helical and half are in the random-coil state. A random-coil state is a polymer conformation where the monomer subunits are oriented randomly while

For self-complementary oligonucleotides, the *Tm* for individual melting curves was

where *R* is the general gas constant, *i.e.* 1.987cal/K mol, the *CT* is the total strand concentration, and *Tm* is given in K. For non-self-complementary molecules, *CT* in equation

**Sequence ΔH**

shows the values of the total enthalpy and entropy for the dimmer duplexes as used in [3].

optical melting curves of a variety of short synthetic DNA duplexes in 1 M Na+.

The nearest-neighbour parameters of Delcourt et al. (1991) [20], SantaLucia et al. (1996) [19], Sugimoto et al. (1996) [15] and Allawi et al. (1997) [21] were evaluated from the analysis of

/ ln *o o*

**kcal/mol** 

AA/TT -7.9 -22.2 AT/TA -7.2 -20.4 TA/AT -7.2 -21.3 CA/GT -8.5 -22.7 GT/CA -8.4 -22.4 CT/GA -7.8 -21.0 GA/CT -8.2 -22.2 CG/GC -10.6 -27.2 GC/CG -9.8 -24.4 GG/CC -8.0 -19.9 Init. w/term G•C 0.1 -2.8 Init. w/term A•T 2.3 4.1 Symmetry correction 0 -1.4 **Table 1.** Unified oligonucleotide *H* and *S* nearest neighbour parameters in 1M NaCl. The table

37

calculated from the fitted parameters using the following equation:

*<sup>o</sup> G* , can be computed from *<sup>o</sup> H* and *<sup>o</sup> S* parameters using the

*o oo G H TS* (4)

*m T T H S RC* (5)

**ΔS kcal/mol** 

free energy at 37o, <sup>37</sup>

(5) was replaced by *CT*/4.

still being bonded to adjacent units.

equation:

The following models to be described use the state function parameters, enthalpy and entropy. State functions define the properties of a thermodynamic state. In a change between two thermodynamic states, the change in value of the state function is given by the symbol .

The standard enthalpy change, *H* , is the difference in the standard enthalpies of formation between the products and the reactants. This state function is associated with changes in bonding between reactants and products. Changes in enthalpy during reactions are measured by calorimetry experiments.

The standard entropy change, *S* , is the difference in standard entropies between reactants and products. Entropy is a measure of the degree of order in a chemical system due to bond rotations, other molecular motions, and aggregation. The more random a system (disorder), the greater the entropy is. The larger a structure, the more degrees of freedom it has, and the greater its entropy.

## **4.2. Interaction between pairs**

The nucleic acid duplex stability can be endangered by the interaction between the nucleotide bases. Thermodynamics for double helix formation of DNA/DNA, RNA/RNA or DNA/RNA can be estimated with nearest neighbour parameters. Enthalpy change, *H* , entropy change, *S* , free energy change, *G* , and melting temperature, *Tm*, were obtained on the basis of the nearest-neighbour model.

The nearest-neighbour model for nucleic acids, known as the NN model, assumes that the stability of a given base pair depends on the identity and orientation of neighbouring base pairs [3]. Previous studies in NN model parameters were brought forth in [15] and [19].

In the NN model, sequence dependent stability is considered in terms of nearest-neighbour doublets. In duplex DNA there are 10 such unique internal nearest-neighbour doublets. Listed in the 5'-3' direction, these are AT/AT TA/TA AA/TT AC/GT CA/TG TC/GA CT/AG CG/CG GC/GC and GG/CC. Dimmer duplexes are represented with a slash separating strands in antiparallel orientation *e.g.* AC/TG means 5'-AC-3' Watson–Crick base-paired with 3'-TG-5'.

The total difference in the free energy of the folded and unfolded states of a DNA duplex can be approximated at 37o, with a nearest-neighbour model:

$$\begin{aligned} \Delta G^{o} \text{(total)} &= \sum\_{i} n\_{i} \Delta G^{o} \left( i \right) + \Lambda G^{o} \left( \text{init w/term G} \cdot \text{C} \right) \\ &+ \Lambda G^{o} \left( \text{init w/term A} \cdot \text{T} \right) + \Lambda G^{o} \left( \text{sym} \right) \end{aligned} \tag{3}$$

where *G i* are the standard free-energy changes for 10 possible Watson-Crick nearest neighbours, *e.g.* <sup>37</sup> 1 AA/TT . *o o G G* , <sup>37</sup> 2 TA/AT . *o o G G* , *ni* is the number of occurrences of each nearest neighbour, *i*, and sym *<sup>o</sup> G* equals +0.43 kcal/mol if the duplex is self complementary and zero if it is not self-complementary. The total difference in the free energy at 37o, <sup>37</sup> *<sup>o</sup> G* , can be computed from *<sup>o</sup> H* and *<sup>o</sup> S* parameters using the equation:

468 Thermodynamics – Fundamentals and Its Application in Science

only defined up to an arbitrary additive constant.

are measured by calorimetry experiments.

*Q* is an amount of heat introduced to the system and *T* is a constant absolute

temperature. Since this definition involves only differences in entropy, the entropy itself is

The following models to be described use the state function parameters, enthalpy and entropy. State functions define the properties of a thermodynamic state. In a change between two thermodynamic states, the change in value of the state function is given by the symbol .

The standard enthalpy change, *H* , is the difference in the standard enthalpies of formation between the products and the reactants. This state function is associated with changes in bonding between reactants and products. Changes in enthalpy during reactions

The standard entropy change, *S* , is the difference in standard entropies between reactants and products. Entropy is a measure of the degree of order in a chemical system due to bond rotations, other molecular motions, and aggregation. The more random a system (disorder), the greater the entropy is. The larger a structure, the more degrees of freedom it has, and the

The nucleic acid duplex stability can be endangered by the interaction between the nucleotide bases. Thermodynamics for double helix formation of DNA/DNA, RNA/RNA or DNA/RNA can be estimated with nearest neighbour parameters. Enthalpy change, *H* , entropy change, *S* , free energy change, *G* , and melting temperature, *Tm*, were obtained

The nearest-neighbour model for nucleic acids, known as the NN model, assumes that the stability of a given base pair depends on the identity and orientation of neighbouring base pairs [3]. Previous studies in NN model parameters were brought forth in [15] and [19].

In the NN model, sequence dependent stability is considered in terms of nearest-neighbour doublets. In duplex DNA there are 10 such unique internal nearest-neighbour doublets. Listed in the 5'-3' direction, these are AT/AT TA/TA AA/TT AC/GT CA/TG TC/GA CT/AG CG/CG GC/GC and GG/CC. Dimmer duplexes are represented with a slash separating strands in antiparallel orientation *e.g.* AC/TG means 5'-AC-3' Watson–Crick base-paired with 3'-TG-5'.

The total difference in the free energy of the folded and unfolded states of a DNA duplex

*o o*

*G G*

init w/term A T sym

where *G i* are the standard free-energy changes for 10 possible Watson-Crick nearest neighbours, *e.g.* <sup>37</sup> 1 AA/TT . *o o G G* , <sup>37</sup> 2 TA/AT . *o o G G* , *ni* is the number of occurrences of each nearest neighbour, *i*, and sym *<sup>o</sup> G* equals +0.43 kcal/mol if the duplex

*o o o i i*

*G nGi G*

total init w/term G C

(3)

where

greater its entropy.

**4.2. Interaction between pairs** 

on the basis of the nearest-neighbour model.

can be approximated at 37o, with a nearest-neighbour model:

$$
\Delta \mathbf{G}\_{\text{37}}^{\circ} = \Delta H^{\circ} - T\Delta \mathbf{S}^{\circ} \tag{4}
$$

For a specific temperature one can compute the total free energy using the values from Table 1. As described in [19] the melting temperature *Tm* is defined as the temperature at which half of the strands are in double helical and half are in the random-coil state. A random-coil state is a polymer conformation where the monomer subunits are oriented randomly while still being bonded to adjacent units.

For self-complementary oligonucleotides, the *Tm* for individual melting curves was calculated from the fitted parameters using the following equation:

$$T\_m = \Delta H^o \,/\left(\Delta S^o + R \ln \mathcal{C}\_T\right) \tag{5}$$

where *R* is the general gas constant, *i.e.* 1.987cal/K mol, the *CT* is the total strand concentration, and *Tm* is given in K. For non-self-complementary molecules, *CT* in equation (5) was replaced by *CT*/4.


**Table 1.** Unified oligonucleotide *H* and *S* nearest neighbour parameters in 1M NaCl. The table shows the values of the total enthalpy and entropy for the dimmer duplexes as used in [3].

The nearest-neighbour parameters of Delcourt et al. (1991) [20], SantaLucia et al. (1996) [19], Sugimoto et al. (1996) [15] and Allawi et al. (1997) [21] were evaluated from the analysis of optical melting curves of a variety of short synthetic DNA duplexes in 1 M Na+.

The observed trend in nearest-neighbor stabilities at 37°C is GC/CG = CG/GC > GG/CC > CA/GT = GT/CA = GA/CT = CT/GA > AA/TT > AT/TA > TA/AT, as in Table 2. This trend suggests that both sequence and base composition are important determinants of DNA duplex stability. It has long been recognized that DNA stability depends of the percent G-C content.

Thermodynamics of Microarray Hybridization 471

(6)

represents the number of RNA molecule

are the binding energies for gene specific and

(7)

(8)

According with their method, the observed signal *I*ij for probe *i* in the probe set for gene *j* is

1 1 *ij ij*

where *B* is the background intensity, *N*j is the number of expressed mRNA molecules

respectively nonspecific binding. These energies are calculated as the weighted sum of

 <sup>1</sup> , *ij k k k E bb* 

 \* \*\* <sup>1</sup> , *ij k k k E bb* 

The positional-dependent-nearest-neighbour model appears to indicate that the two ends of a probe contribute less to binding stability according to their weight factors, see Figure 4. a). It also can be observed that there is a dip in the gene specific binding weight factors of MM probes around the mismatch position, probably due the mismatch which destabilizes the duplex structure. In Figure 4. b) it can be noted that stacking energies in the positionaldependent-nearest-neighbour model can give an explanation for the presence of negative

This model, together with the nearest neighbour model solves the problem of binding on microarrays, but still there are factors that affect the gene expression measuring. One of them affects the process of competing adsorption and desorption of target RNA to from

*<sup>k</sup>* are the weight factors that depend on the position along the probe from

*b b* is the same as the stacking energy used in nearest neighbour

*j ij <sup>E</sup> <sup>E</sup> <sup>N</sup> <sup>N</sup> I B e e* 

\* \*

modelled as:

stacking energies:

probe pair signals.

model [15].

*k* and \* 

the 5' to 3' end, and <sup>1</sup> , *k k*

probe-target duplexes at the chip surface.

**Figure 4.** a) weight factors; b) nearest-neighbour stacking energy. (from [16])

where

contributing to gene specific binding, *N*\*

contributing to nonspecific binding, *E* and *E*\*


**Table 2.** Comparison of computed NN free energy parameters at 37oC

On the other hand, the nearest neighbour *H* parameters from Table 1, do not follow this trend. This suggests that stacking, hydrogen bonding, and other contributions to the *H* present a complicated sequence dependence.

## **4.3. Interaction with unintended targets**

As seen in previous sections the major issue in microarray oligonucleotide technology is the selection of probe sequences with high sensitivity and specificity. It has been shown [22] that the use of MM probes for assessment of non-specific binding is unreliable. Since the duplex formation in solution has been studied using the nearest neighbour model [3], [15] the microarray design in terms of probe selection has been achieved by using a model based on the previously mentioned nearest neighbour model [16]. The model of Zhang *et al.* presents some modification to the nearest neighbour model, firstly to assign different weight factors at each nucleotide position on a probe with the scope of reflecting that the binding parts of a probe may contribute differently to the stability of bindings, and secondly to take into account two different modes of binding the probes: gene specific binding, *i.e.* formation of DNA-RNA duplexes with exact complementary sequences, and non-specific binding, *i.e.*  formation of duplexes with many mismatches between the probe and the attached RNA molecule. They called their model, the positional-dependent-nearest-neighbour model.

According with their method, the observed signal *I*ij for probe *i* in the probe set for gene *j* is modelled as:

470 Thermodynamics – Fundamentals and Its Application in Science

**Table 2.** Comparison of computed NN free energy parameters at 37oC

present a complicated sequence dependence.

**4.3. Interaction with unintended targets** 

content.

**Sequence** 

The observed trend in nearest-neighbor stabilities at 37°C is GC/CG = CG/GC > GG/CC > CA/GT = GT/CA = GA/CT = CT/GA > AA/TT > AT/TA > TA/AT, as in Table 2. This trend suggests that both sequence and base composition are important determinants of DNA duplex stability. It has long been recognized that DNA stability depends of the percent G-C

AA/TT -0.67 -1.02 -1.20 -1.00 AT/TA 0.62 -0.73 -0.90 -0.88 TA/AT -0.70 -0.60 -0.90 -0.58 CA/GT -1.19 -1.38 -1.70 -1.45 GT/CA -1.28 -1.43 -1.50 -1.44 CT/GA -1.17 -1.16 -1.50 -1.28 GA/CT -1.12 -1.46 -1.50 -1.30 CG/GC -1.87 -2.09 -2.80 -2.17 GC/CG -1.85 -2.28 -2.30 -2.24 GG/CC -1.55 -1.77 -2.10 -1.84 Average -1.20 -1.39 -1.64 -1.42 Init. w/term G•C NA 0.91 1.70 0.98 Init. w/term A∙T NA 1.11 1.70 1.03

On the other hand, the nearest neighbour *H* parameters from Table 1, do not follow this trend. This suggests that stacking, hydrogen bonding, and other contributions to the *H*

As seen in previous sections the major issue in microarray oligonucleotide technology is the selection of probe sequences with high sensitivity and specificity. It has been shown [22] that the use of MM probes for assessment of non-specific binding is unreliable. Since the duplex formation in solution has been studied using the nearest neighbour model [3], [15] the microarray design in terms of probe selection has been achieved by using a model based on the previously mentioned nearest neighbour model [16]. The model of Zhang *et al.* presents some modification to the nearest neighbour model, firstly to assign different weight factors at each nucleotide position on a probe with the scope of reflecting that the binding parts of a probe may contribute differently to the stability of bindings, and secondly to take into account two different modes of binding the probes: gene specific binding, *i.e.* formation of DNA-RNA duplexes with exact complementary sequences, and non-specific binding, *i.e.*  formation of duplexes with many mismatches between the probe and the attached RNA molecule. They called their model, the positional-dependent-nearest-neighbour model.

**ΔG 37 (kcal/mol)** *Delcourt et al. SantaLucia et al. Sugimoto et al. Allawi et al.* 

$$I\_{ij} = \frac{N\_j}{1 + e^{\frac{E\_{ij}}{\epsilon\_{ij}}}} + \frac{N^\*}{1 + e^{\frac{E\_{ij}^\*}{\epsilon\_{ij}}}} + B \tag{6}$$

where *B* is the background intensity, *N*j is the number of expressed mRNA molecules contributing to gene specific binding, *N*\* represents the number of RNA molecule contributing to nonspecific binding, *E* and *E*\* are the binding energies for gene specific and respectively nonspecific binding. These energies are calculated as the weighted sum of stacking energies:

$$E\_{ij} = \sum \alpha\_k \varepsilon \left( b\_{k'} b\_{k+1} \right) \tag{7}$$

$$E\_{ij}^\* = \sum o\_k^\* \varepsilon^\* \left( b\_{k'} b\_{k+1} \right) \tag{8}$$

where *k* and \* *<sup>k</sup>* are the weight factors that depend on the position along the probe from the 5' to 3' end, and <sup>1</sup> , *k k b b* is the same as the stacking energy used in nearest neighbour model [15].

The positional-dependent-nearest-neighbour model appears to indicate that the two ends of a probe contribute less to binding stability according to their weight factors, see Figure 4. a). It also can be observed that there is a dip in the gene specific binding weight factors of MM probes around the mismatch position, probably due the mismatch which destabilizes the duplex structure. In Figure 4. b) it can be noted that stacking energies in the positionaldependent-nearest-neighbour model can give an explanation for the presence of negative probe pair signals.

This model, together with the nearest neighbour model solves the problem of binding on microarrays, but still there are factors that affect the gene expression measuring. One of them affects the process of competing adsorption and desorption of target RNA to from probe-target duplexes at the chip surface.

**Figure 4.** a) weight factors; b) nearest-neighbour stacking energy. (from [16])

#### **4.4. Kinetic processes in hybridization thermodynamics**

#### *4.4.1. Derivation of the Langmuir isotherm*

For molecules in contact with a solid surface at a fixed temperature, the Langmuir Isotherm, developed by Irving Langmuir in 1916, describes the partitioning between the gas phase and adsorbed species as a function of applied pressure.

The adsorption process between gas phase molecules, A, vacant surface sites, S, and occupied surface sites, SA, can be represented by the following chemical equation, assuming that there are a fixed number of surface sites present on the surface, as in Figure 5.

$$S + A \overset{\rightarrow}{\leftarrow} SA \tag{9}$$

Thermodynamics of Microarray Hybridization 473

(12)

(13)

(14)

, equation (12), where *a d bkk* . Here *b* is only a

*t* is the fraction of sites occupied by probe-target

proportional to the fractions of occupied probes. The fraction of probe

  *t* of

(15)

Rearranging the equations (10) and (11) one can obtain the expression for surface coverage:

1 *bP bP*

The equilibrium that may exist between gas adsorbed on a surface and molecules in the gas phase is a dynamic state, *i.e.* the equilibrium represents a state in which the rate of adsorption of molecules onto the surface is exactly counterbalanced by the rate of desorption of molecules back into the gas phase. It should therefore be possible to derive an isotherm for the adsorption process simply by considering and equating the rates for these

The rate of adsorption will be proportional to the pressure of the gas and the number of vacant sites for adsorption. If the total number of sites on the surface is *N*, then the rate of

1 *<sup>a</sup>*

*<sup>d</sup> k pN dt* 

The rate of change of the coverage due to the adsorbate leaving the surface (desorption) is

*d <sup>d</sup> k N dt* 

In these equations, *ka* and *kd* are the rate constants for adsorption and desorption respectively, and *p* is the pressure of the adsorbate gas. At equilibrium, the coverage is independent of time and thus the adsorption and desorption rates are equal. The solution to

Burden *et al.* [14] develop a dynamic adsorption model based on Langmuir isotherm. If *x* is

duplex, then in the forward absorption, target mRNA attaches to probe at a rate 1 *<sup>f</sup> kx t*

unoccupied probes; and in the backward desorption reaction, target mRNA detaches from

<sup>1</sup> *f b*

 

*kx t k t*

proportional to the concentration of specific target mRNA and the fraction 1

sites occupied by probe-target duplexes is then given by the differential equation:

*d t*

*dt* 

constant if the enthalpy of adsorption is independent of coverage.

*4.4.3. Kinetic derivation* 

change of the surface coverage due to adsorption is:

proportional to the number of adsorbed species:

this condition gives us a relation for

the concentration of mRNA target and

*4.4.4. Dynamic absorption model* 

probes at a rate *<sup>b</sup> k t*

two processes.

When considering adsorption isotherms it is conventional to adopt a definition of surface coverage ( ) which defines the maximum (saturation) surface coverage of a particular adsorbate on a given surface always to be unity, *i.e.* max = 1.

#### *4.4.2. Thermodynamic derivation*

An equilibrium constant *k* can be written in terms of the concentrations of "reactants" and "products":

$$k = \frac{\left\lceil SA \right\rceil}{\left\lceil S \right\rceil \left\lceil A \right\rceil} \tag{10}$$

where:

[SA] is proportional to the surface coverage of adsorbed molecules, or proportional to ;

[S] is proportional to the number of vacant sites, (1 – );

[A] is proportional to the pressure of gas, *P*.

Thus it is possible to define another equilibrium constant, *b*:

$$b = \frac{\theta}{(1-\theta)P} \tag{11}$$

**Figure 5.** Absorption process.

Rearranging the equations (10) and (11) one can obtain the expression for surface coverage:

$$\theta = \frac{bP}{1 + bP} \tag{12}$$

#### *4.4.3. Kinetic derivation*

472 Thermodynamics – Fundamentals and Its Application in Science

*4.4.1. Derivation of the Langmuir isotherm* 

coverage (

"products":

where:

*4.4.2. Thermodynamic derivation* 

adsorbed species as a function of applied pressure.

adsorbate on a given surface always to be unity, *i.e.* max

[S] is proportional to the number of vacant sites, (1 –

Thus it is possible to define another equilibrium constant, *b*:

[A] is proportional to the pressure of gas, *P*.

**Figure 5.** Absorption process.

**4.4. Kinetic processes in hybridization thermodynamics** 

For molecules in contact with a solid surface at a fixed temperature, the Langmuir Isotherm, developed by Irving Langmuir in 1916, describes the partitioning between the gas phase and

The adsorption process between gas phase molecules, A, vacant surface sites, S, and occupied surface sites, SA, can be represented by the following chemical equation, assuming

*S A SA*

When considering adsorption isotherms it is conventional to adopt a definition of surface

An equilibrium constant *k* can be written in terms of the concentrations of "reactants" and

*SA*

*S A* 

1

*P* 

*k*

[SA] is proportional to the surface coverage of adsorbed molecules, or proportional to

*b*

) which defines the maximum (saturation) surface coverage of a particular

);

= 1.

(9)

(10)

(11)

;

that there are a fixed number of surface sites present on the surface, as in Figure 5.

The equilibrium that may exist between gas adsorbed on a surface and molecules in the gas phase is a dynamic state, *i.e.* the equilibrium represents a state in which the rate of adsorption of molecules onto the surface is exactly counterbalanced by the rate of desorption of molecules back into the gas phase. It should therefore be possible to derive an isotherm for the adsorption process simply by considering and equating the rates for these two processes.

The rate of adsorption will be proportional to the pressure of the gas and the number of vacant sites for adsorption. If the total number of sites on the surface is *N*, then the rate of change of the surface coverage due to adsorption is:

$$\frac{d\theta}{dt} = k\_a p N \left(1 - \theta\right) \tag{13}$$

The rate of change of the coverage due to the adsorbate leaving the surface (desorption) is proportional to the number of adsorbed species:

$$\frac{d\theta}{dt} = -k\_d N \theta$$

In these equations, *ka* and *kd* are the rate constants for adsorption and desorption respectively, and *p* is the pressure of the adsorbate gas. At equilibrium, the coverage is independent of time and thus the adsorption and desorption rates are equal. The solution to this condition gives us a relation for , equation (12), where *a d bkk* . Here *b* is only a constant if the enthalpy of adsorption is independent of coverage.

#### *4.4.4. Dynamic absorption model*

Burden *et al.* [14] develop a dynamic adsorption model based on Langmuir isotherm. If *x* is the concentration of mRNA target and *t* is the fraction of sites occupied by probe-target duplex, then in the forward absorption, target mRNA attaches to probe at a rate 1 *<sup>f</sup> kx t* proportional to the concentration of specific target mRNA and the fraction 1 *t* of unoccupied probes; and in the backward desorption reaction, target mRNA detaches from probes at a rate *<sup>b</sup> k t* proportional to the fractions of occupied probes. The fraction of probe sites occupied by probe-target duplexes is then given by the differential equation:

$$\frac{d\theta\left(t\right)}{dt} = k\_f \mathbf{x}\left(1 - \theta\left(t\right)\right) - k\_b \theta\left(t\right) \tag{15}$$

For the initial condition 0 0 , equation (15) has the following solution:

$$\theta\left(t\right) = \frac{\mathcal{X}}{\mathcal{X} + \mathcal{K}} \left[1 - e^{-\left(\mathbf{x} + \mathbf{K}\right)k\_f t}\right] \tag{16}$$

Thermodynamics of Microarray Hybridization 475

 1 exp *<sup>T</sup> C t <sup>t</sup> T K*

where *K* defined as in equation (16) is an equilibrium dissociation constant, and

Recent studies [24], [25] confirm the hypothesis that the hybridization process for the each of the probe pairs follows a time model according to the one from Figure 7. This model of evolution predicts that the probability of hybridization will be almost zero if not enough time interval is provided for the experiment to take place, and that in the limit, if enough

A practical solution to the different hybridization dynamics can be solved by using multiple regressions to convey PM-MM probe pairs to equivalent thermodynamic conditions by

The last procedure will be explained in more detail in the following paragraphs.

**Figure 7.** Theoretical model for perfect match hybridization. Intensity of perfect match versus

From equation (20) one can assume that a model to solve the multiple regression problem

where *a* and *b* are parameters to be estimated adaptively using *least square fitting* and the

Vertical least square fitting proceeds by finding the sum of the squares of the vertical

1 *bx ya e* (21)

denoting a characteristic time over which the system reaches equilibrium.

 1 *<sup>f</sup> kTK*

time is allowed saturation will take place.

hybridization time. (adapted from [24])

*gradient method*.

**5.2. Exponential regression model** 

deviations *R*2 of parameters *a* and *b*:

implicit in this study will have the following form:

processing diachronic hybridization experiments [26].

(20)

where *K kk b f* .

Using equation (16) Burden *et al.* estimate the measured fluorescence intensity *I*, with *I*0 as the background intensity at zero concentration, to be:

$$I(\mathbf{x}, t) = I\_0 + \frac{b\mathbf{x}}{\mathbf{x} + \mathbf{K}} \left[ 1 - e^{-(\mathbf{x} + \mathbf{K})k\_f t} \right] \tag{17}$$

At equilibrium, the intensity *I*(*x*) at target concentration *x* follows *Langmuir Isotherm* (12):

**Figure 6.** Hyperbolic response function for the intensity *I(x)* according to the Langmuir isotherm.

#### **5. Hybridization dynamics compensation**

#### **5.1. Modelling hybridization by thermodynamics**

It is well known that hybridization processes may be seen under the point of view of general thermodynamic conditions [23], meaning that the hybridization probability of a given test segment will be defined by its thermodynamic conditions, *i.e.* by its hybridization temperature. Regarding this, one can state that hybridization process will respond to the dynamic equation:

$$P + T \xleftarrow{\kappa\_f} \mathcal{C} \tag{19}$$

where *P* represents the number of oligonucleotides available for hybridization, *T* the concentration of free RNA target, *C* the number of bound complexes, *k*f and *k*b are the respective forward and backwards rate constants for the reaction. This equation has as a natural solution the following expression in the time domain:

Thermodynamics of Microarray Hybridization 475

$$C\left(t\right) = \frac{T}{T+K} \left[1 - \exp\left(-t/\tau\right)\right] \tag{20}$$

where *K* defined as in equation (16) is an equilibrium dissociation constant, and 1 *<sup>f</sup> kTK* denoting a characteristic time over which the system reaches equilibrium.

Recent studies [24], [25] confirm the hypothesis that the hybridization process for the each of the probe pairs follows a time model according to the one from Figure 7. This model of evolution predicts that the probability of hybridization will be almost zero if not enough time interval is provided for the experiment to take place, and that in the limit, if enough time is allowed saturation will take place.

A practical solution to the different hybridization dynamics can be solved by using multiple regressions to convey PM-MM probe pairs to equivalent thermodynamic conditions by processing diachronic hybridization experiments [26].

The last procedure will be explained in more detail in the following paragraphs.

**Figure 7.** Theoretical model for perfect match hybridization. Intensity of perfect match versus hybridization time. (adapted from [24])

#### **5.2. Exponential regression model**

474 Thermodynamics – Fundamentals and Its Application in Science

the background intensity at zero concentration, to be:

**5. Hybridization dynamics compensation** 

**5.1. Modelling hybridization by thermodynamics** 

natural solution the following expression in the time domain:

0 0 , equation (15) has the following solution:

(16)

(17)

(18)

 <sup>1</sup> *<sup>f</sup> <sup>x</sup> x Kkt t e x K*

Using equation (16) Burden *et al.* estimate the measured fluorescence intensity *I*, with *I*0 as

 <sup>0</sup> , 1 *<sup>f</sup> bx x Kkt I xt I <sup>e</sup> x K* 

At equilibrium, the intensity *I*(*x*) at target concentration *x* follows *Langmuir Isotherm* (12):

*x K*

 <sup>0</sup> *bx Ix I*

**Figure 6.** Hyperbolic response function for the intensity *I(x)* according to the Langmuir isotherm.

It is well known that hybridization processes may be seen under the point of view of general thermodynamic conditions [23], meaning that the hybridization probability of a given test segment will be defined by its thermodynamic conditions, *i.e.* by its hybridization temperature. Regarding this, one can state that hybridization process will respond to the dynamic equation:

> *K f Kb PT C*

where *P* represents the number of oligonucleotides available for hybridization, *T* the concentration of free RNA target, *C* the number of bound complexes, *k*f and *k*b are the respective forward and backwards rate constants for the reaction. This equation has as a

(19)

For the initial condition

where *K kk b f* .

From equation (20) one can assume that a model to solve the multiple regression problem implicit in this study will have the following form:

$$y = a \left(1 - e^{-b \times x}\right) \tag{21}$$

where *a* and *b* are parameters to be estimated adaptively using *least square fitting* and the *gradient method*.

Vertical least square fitting proceeds by finding the sum of the squares of the vertical deviations *R*2 of parameters *a* and *b*:

$$R^2 = \sum\_{i} \left[ y\_i - a \left( 1 - e^{-b \mathbf{x}\_i} \right) \right]^2 \tag{22}$$

Thermodynamics of Microarray Hybridization 477

Considering these assumptions data records have been created from experimental data fitted by the above mentioned models. The time dynamics of hybridization for both probe

Firstly, the diachronic data distribution for an evolution from 0 to 30 minutes is shown in Figure 8 in both cases, for the PM probe set and the MM probe set, and in the following figures, *i.e.* Figure 9 and Figure 10, show this time evolution for 60 and 120 minutes is also

**Figure 8.** Time dynamics of hybridization corresponding to perfect and mismatch probes, for a

time is allowed to some probes, the mismatches will also hybridize completely.

**Figure 9.** Time dynamics of hybridization corresponding to perfect and mismatch probes, for a

The next step on data analysis was to look at the probe profiles, at certain times. Figure 11 shows the regression parameters obtained for time constants. The profiles of the perfect and mismatch were extracted for two different time values underlining the fact that if enough

sets and their profiles were evaluated at certain time intervals.

shown following the model in equation (20).

maximum of 30 minutes.

maximum of 60 minutes.

where:

$$
\varepsilon\_i = y\_i - a \left( 1 - e^{-bx\_i} \right) \tag{23}
$$

is the estimation error incurred for each component.

With this notation equation (22) will became:

$$\mathcal{R}^2 = \sum\_i \varepsilon\_i^2 \tag{24}$$

The condition of *R*2 to be at a minimum is that

$$\frac{\partial \left( R^2 \right)}{\partial a} = 0 \tag{25}$$

$$\frac{\partial \left(R^2\right)}{\partial b} = 0\tag{26}$$

From equations (24), (25) and (26) one will obtain:

$$\frac{\partial \left( R^2 \right)}{\partial a} = \sum\_{i} \varepsilon\_i \frac{\partial \varepsilon\_i}{\partial a} = -\sum\_{i} \varepsilon\_i \left( 1 - e^{-bx\_i} \right) = 0 \tag{27}$$

$$\frac{\partial \left( R^2 \right)}{\partial b} = \sum\_{i} \varepsilon\_i \frac{\partial \varepsilon\_i}{\partial b} = -\sum\_{i} \varepsilon\_i a \mathbf{x}\_i e^{-b\mathbf{x}\_i} = \mathbf{0} \tag{28}$$

A solution for equations (27) and (28) can be found using the gradient method. In this case the parameters are going to be computed adaptively:

$$a\_{k+1} = a\_k - \beta\_a \frac{\partial \left( R^2 \right)}{\partial a} = a\_k + \beta\_a \sum\_{i} \varepsilon\_{i,k} \left( 1 - e^{-b\_k \mathbf{x}\_i} \right) \tag{29}$$

$$b\_{k+1} = b\_k - \beta\_b \frac{\partial \{ R^2 \}}{\partial b} = b\_k + \beta\_b \sum\_i \varepsilon\_{i,k} a\_k e^{-b\_k \mathbf{x}\_i} \tag{30}$$

where *i k*, is defined as in equation (23) and *β* is a parameter used as an adjust step.

#### **5.3. Application for experimental data**

The experimental part has been complemented with artificially simulated test probes used for algorithmic validation. A diachronic database was also being produced to estimate hybridization time constants for different gene segments.

Considering these assumptions data records have been created from experimental data fitted by the above mentioned models. The time dynamics of hybridization for both probe sets and their profiles were evaluated at certain time intervals.

476 Thermodynamics – Fundamentals and Its Application in Science

is the estimation error incurred for each component.

With this notation equation (22) will became:

The condition of *R*2 to be at a minimum is that

From equations (24), (25) and (26) one will obtain:

the parameters are going to be computed adaptively:

hybridization time constants for different gene segments.

**5.3. Application for experimental data** 

*R*

2

<sup>2</sup>

*R*

*a a*

*b b*

*R aa a e a*

2

*k ka k a ik*

<sup>2</sup>

*R b b b ae b*

 

where:

where *i k*, 

<sup>2</sup>

<sup>1</sup> *<sup>i</sup> bx*

2 2 *i i <sup>R</sup>* 

<sup>2</sup>

<sup>2</sup>

*i i i i*

*R b*

*R a*

0

0

 

*i i i i i*

A solution for equations (27) and (28) can be found using the gradient method. In this case

1 , <sup>1</sup> *k i b x*

1 , *k i b x k kb k b ik k*

is defined as in equation (23) and *β* is a parameter used as an adjust step.

The experimental part has been complemented with artificially simulated test probes used for algorithmic validation. A diachronic database was also being produced to estimate

 

(27)

(28)

(29)

(30)

*e*

1 0 *<sup>i</sup> <sup>i</sup> bx*

0 *<sup>i</sup> <sup>i</sup> bx*

*i*

 

*i*

  *ax e*

*R ya e* (22)

*ya e* (23)

(24)

(25)

(26)

<sup>2</sup> 1 *<sup>i</sup> bx i i*

*i i* 

Firstly, the diachronic data distribution for an evolution from 0 to 30 minutes is shown in Figure 8 in both cases, for the PM probe set and the MM probe set, and in the following figures, *i.e.* Figure 9 and Figure 10, show this time evolution for 60 and 120 minutes is also shown following the model in equation (20).

**Figure 8.** Time dynamics of hybridization corresponding to perfect and mismatch probes, for a maximum of 30 minutes.

The next step on data analysis was to look at the probe profiles, at certain times. Figure 11 shows the regression parameters obtained for time constants. The profiles of the perfect and mismatch were extracted for two different time values underlining the fact that if enough time is allowed to some probes, the mismatches will also hybridize completely.

**Figure 9.** Time dynamics of hybridization corresponding to perfect and mismatch probes, for a maximum of 60 minutes.

Thermodynamics of Microarray Hybridization 479

**Figure 12.** Top template shows the iterative matching for hidden expression levels. Bottom template

The thermodynamics of oligonucleotide hybridization processes where PM-MM results do not show the expected behaviour, thus affecting to the reliability of expression estimation,

was studied in this chapter and the following conclusions were emphasized:

shows the iterative matching for perfect and mismatch hybridization.

**Figure 13.** Results for the iterative process of matching.

**6. Conclusions** 

**Figure 10.** Time dynamics of hybridization corresponding to perfect and mismatch probes, for a maximum of 120 minutes.

Considering this and applying the regression algorithm, we observed that this algorithm searches for the matching values of expression levels of probes sets and for estimated values of perfect and mismatch probes. One of the steps of this iterative algorithm can be seen in Figure 12.

**Figure 11.** Profiles corresponding to perfect and mismatch probes for time constants, at 30 and 100 minutes.

Once the iterative process was complete, certain probes have reached their target. In the expression level estimation most of the perfect match probes obtained the expected values, while some of the mismatch probes did not reach their target, Figure 13. Similar results were obtained in the case of matching hybridization for time constants.

**Figure 12.** Top template shows the iterative matching for hidden expression levels. Bottom template shows the iterative matching for perfect and mismatch hybridization.

**Figure 13.** Results for the iterative process of matching.

#### **6. Conclusions**

478 Thermodynamics – Fundamentals and Its Application in Science

maximum of 120 minutes.

Figure 12.

minutes.

**Figure 10.** Time dynamics of hybridization corresponding to perfect and mismatch probes, for a

Considering this and applying the regression algorithm, we observed that this algorithm searches for the matching values of expression levels of probes sets and for estimated values of perfect and mismatch probes. One of the steps of this iterative algorithm can be seen in

**Figure 11.** Profiles corresponding to perfect and mismatch probes for time constants, at 30 and 100

obtained in the case of matching hybridization for time constants.

Once the iterative process was complete, certain probes have reached their target. In the expression level estimation most of the perfect match probes obtained the expected values, while some of the mismatch probes did not reach their target, Figure 13. Similar results were

The thermodynamics of oligonucleotide hybridization processes where PM-MM results do not show the expected behaviour, thus affecting to the reliability of expression estimation, was studied in this chapter and the following conclusions were emphasized:

 Modelling the hybridization process through thermodynamical principles reproduces exponential-like behaviour for each P-T segment pair.

Thermodynamics of Microarray Hybridization 481

[5] Livshits M A, Mirzabekov A D (1996) Theoretical analysis of the kinetics of DNA hybridization with gel-immobilized oligonucleotides. Biophysical Journal. 71:2795 –

[6] Dai H, Meyer M, Stepaniants S, Ziman M, Stoughton R (2002) Use of hybridization kinetics for differentiating specific from non-specific binding to oligonucleotide

[7] Dorris D R, *et al.* (2003) Oligodeoxyribonucleotide probe accessibility on a threedimensional DNA microarray surface and the effect of hybridization time on the

[8] Binder H, Preibisch S (2005) Specific and Nonspecific Hybridization of Oligonucleotide

[9] Wang J Y, Drlica K (2003) Modelling hybridization kinetics. Mathematical Bioscience.

[10] Naef F, Magnasco M O (2003) Solving the riddle of the bright mismatches: Labeling and effective binding in oligonucleotide arrays. Physical Review E., 68:011906-1 – 011906-4 [11] Wu Z, Irizarry R A (2004) Stochastic models inspired by hybridization theory for short oligonucleotide arrays. Proc. of the 8th Annual International Conference on Research in

[13] Lipshutz R L, Fodor S P A, Gingeras T R, Lockhart D J (1999) High density synthetic

[14] Burden C, Pittelkow Y E, Wilson S R (2004) Statistical Analysis of Adsorption Models for Oligonucleotide Microarrays. Statistical Applications in Genetics and Molecular

[15] Sugimoto N *et al* (1996) Improved thermodynamic parameters and helix initiation factor

[16] Zhang L, Miles M F, Aldape K D (2003) A model of molecular interactions on short

[17] Huang J C, Morris Q D, Hughes T R, Frey B J (2005) GenXHC: a probabilistic generative model for cross-hybridization compensation in high-density genome-wide microarray

[18] Black W Z, Hartley J G (1991) Thermodynamics. Second Edition. SI Version. Harper

[19] SantaLucia Jr J, Allawi H T, Seneviratne P A (1996) Improved Nearest-Neighbor Parameters for Predicting DNA Duplex Stability. Biochemistry. 35(11): 3555 – 3562 [20] Delcourt S G, Blake R D (1991) Stacking Energies in DNA. The Journal of Biological

[21] Allawi H T, SantaLucia Jr J. Thermodynamics and NMR of Internal G•T Mismatches in

[22] Li C, Wong W H (2001) Model-based analysis of oligonucleotide arrays: Expression

[23] El Samad H, Khammash M, Petzold L, Gillespie D (2005) Stochastic Modelling of Gene Regulatory Networks. Int. Journal of Robust and Nonlinear Control. 15(15):691 – 711

to predict stability of DNA duplexes. Nucleic Acids Research. 24:4501 – 4505

oligonucleotide microarrays, Nature Biotechnology. 21(7):818 – 821

index computation and outlier detection. PNAS USA. 98(1):31 – 36

microarrays. Nucleic Acids Research. 30(16): e86.1 – e86.8

accuracy of expression ratios. BMC Biotechnology. 3:6

Probes on Microarrays. Biophysical Journal. 89:337 – 352

oligonucleotide arrays. Nature Genetics Supplement. 21:20 – 24

Computational Molecular Biology. 98 – 106 [12] www.accessexcellence.org/RC/VL/GG/index.html

data. Bioinformatics. 21:i222 – i231

Chemistry. 266: 15160 – 15169

DNA. Biochemistry. 36:10581 – 10594

2801

183:37 – 47

Biology. 3:35

Collins Publisher


## **Author details**

Raul Măluţan\* *Communications Department, Technical University of Cluj Napoca, Cluj Napoca, Romania* 

Pedro Gómez Vilda *DATSI, Universidad Politécnica de Madrid, Madrid, Spain* 

## **Acknowledgement**

This work was supported by the project "Development and support of multidisciplinary postdoctoral programmes in major technical areas of national strategy of Research - Development - Innovation" 4D-POSTDOC, contract no. POSDRU/89/1.5/S/52603, project cofunded by the European Social Fund through Sectoral Operational Programme Human Resources Development 2007-2013.

## **7. References**


<sup>\*</sup> Corresponding Author


regression parameters adaptively.

*DATSI, Universidad Politécnica de Madrid, Madrid, Spain* 

modelling.

**Author details** 

Pedro Gómez Vilda

**7. References** 

 \*

Corresponding Author

**Acknowledgement** 

Resources Development 2007-2013.

Computing. 93: 255-261

Biochemistry. 269:2821 – 2830

Raul Măluţan\*

exponential-like behaviour for each P-T segment pair.

Modelling the hybridization process through thermodynamical principles reproduces

 The hybridization process should be confined to the time interval where linear growth is granted, this is, at the beginning of the exponential curve shown in Figure 6. Adaptive fitting may be used to predict and regress expression levels on a specific test probe to common thermodynamic conditions. Time constants may be inferred from the

The main features of the PM-MM probe sets may be reproduced from probabilistic

It may be expected that more precise and robust estimations could be produced using

This work was supported by the project "Development and support of multidisciplinary postdoctoral programmes in major technical areas of national strategy of Research - Development - Innovation" 4D-POSTDOC, contract no. POSDRU/89/1.5/S/52603, project cofunded by the European Social Fund through Sectoral Operational Programme Human

[1] Malutan R, Gómez Vilda P, Berindan Neagoe I, Borda M (2011) Hybridization Dynamics Compensation in Microarray Experiments. Advances in Intelligent and Soft

[2] Wu P, Nakano S, Sugimoto N (2002) Temperature dependence of thermodynamic properties for DNA/DNA and RNA/DNA duplex formation. European Journal of

[3] SantaLucia Jr. J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA

[4] Chan V, Graves D.J., McKenzie S.E. (1995) The Biophysics of DNA Hybridization with

nearest-neighbor thermodynamics. PNAS on Biochemistry. 95:1460 – 1465

Immobilized Oligonucleotides Probes. Biophysical Journal. 69:2243 – 2255

this technique with diachronically expressed hybridization experiments.

*Communications Department, Technical University of Cluj Napoca, Cluj Napoca, Romania* 

	- [24] Dai H, Meyer M, Stepaniants S, Ziman M, Stoughton R (2002) Use of hybridization kinetics for differentiating specific from non-specific binding to oligonucleotide microarrays. Nucleic Acids Research. 30(16):e86.1 – e86.8

**Chapter 19** 

© 2012 Hou and Hou, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

and reproduction in any medium, provided the original work is properly cited.

© 2012 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

**Probing the Thermodynamics of Photosystem I** 

Thermodynamics of a chemical reaction is a fundamental and vital issue for complete understanding of the reaction at the molecular level and involves the elucidation of the energy level of reactant and products, direction of reaction, and driving force or spontaneity of the reaction (Tadashi, 2011). Most the chemical reactions are enthalpy driven and are determined by chemical bonding energy of the reactants and products. However, some of the chemical reaction or process is entropy driven and are largely due the probability or disorder of the system during the reaction. Protein denaturation and dissolution of potassium iodide in water are such examples. In chemistry and biology, especially electron transfer reaction, the entropy changes are often assumes small and negligible. The understanding of thermodynamics of electron transfer reactions is relatively limited

To study the thermodynamics of reaction in chemistry and biology, photosynthetic reaction is an excellent model system. The photosynthesis involves multiple electron transfer reaction driven by sunlight under room temperature and neutral pH (Blankenship, 2002; Diner and Rappaport, 2002; Golbeck, 2006). The understanding of light-induced electron transfer reaction in photosynthesis will provide fundamental knowledge of chemical reactions and guide the design and fabrication in artificial photosynthetic system in address the global energy and environmental crisis in the 21st century (Lewis and Nocera, 2006). In particular the solar energy storage of solar energy using water splitting reaction mimicking photosynthesis might solved energy and pure water problems at the same time (Kanan and Nocera, 2008; Cook et al., 2010; Hou, 2010, 2011). The electron transfer reactions in photosynthesis involves four major chlorophyll binding protein complexes: Photosystem II, cytochrome b6f, photosystem I, and ATP synthase (Figure 1). Photosystem I and photosystem II are belong to two types of different reaction centers in nature, respectively. Type I reaction centers incorporate a phylloquinone or menaquinone as secondary electron

**by Spectroscopic and Mutagenic Methods** 

Xuejing Hou and Harvey J.M. Hou

http://dx.doi.org/10.5772/2615

**1. Introduction** 

(Mauzerall, 2006).

Additional information is available at the end of the chapter

