**Molecular Interactions in Chromatographic Retention: A Tool for QSRR/QSPR/QSAR Studies**

Vilma Edite Fonseca Heinzen1\*, Berenice da Silva Junkes2, Carlos Alberto Kuhnen3 and Rosendo Augusto Yunes1 *1Department of Chemistry, Federal University of Santa Catarina, University Campus, Trindade, Florianópolis, Santa Catarina, 2Federal Institute of Education, Science and Technology of Santa Catarina, Florianópolis, SC, 3Department of Physics, Federal University of Santa Catarina, University Campus, Trindade, Florianópolis, Santa Catarina, Brazil* 

#### **1. Introduction**

24 Molecular Interactions

Smiljanić, J.D.; Kijevčanin, M.Lj.; Djordjević, B.D.; Grozdanić, D.K. & Šerbanović, S.P.

Stryjek, R. & Vera, J.H. (1986). PRSV: an improved Peng-Robinson equation of state for pure

Swain, B.B. (1984). Dielectric properties of binary mixtures of polar liquids, I. Mutual correlation. *Acta Chimica Hungarica*, Vol.117, No.4, pp. 383-391, ISSN 0231-3146 Swarbrik, J.& Boyland, J.C. (1993) *Encyclopedia of Pharmaceutical Technology*, Marcel Dekker,

Tanaka, R. & Toyama, S. (1997). Excess Molar Volumes and Excess Molar Heat Capacities

Tasić, A.Ž.; Grozdanić, D.K.; Djordjević, B.D.; Šerbanović, S.P. & Radojković, N. (1995).

Traetteberg, M. & Hedberg, K.(1994). Structure and Conformations of 1,4-Butanediol:

Twu, C. H.; Coon, J. E.; Bluck, D. & Tilton, B. (1999). CEOS/AE mixing rules from infinite

Vazquez, S.; Mosquera, R.A.; Rios, M.A. & Van Alsenoy, C. (1988*).* Ab initio-gradient

*American Chemical Society,* Vol.116, No.4, pp. 1382-1387, ISSN 0002-7863 Treszczanowicz, A.J.; Kiyohara, O. & Benson, G.C. (1981). Excess volumes for n-alkanols +

*Data*, Vol.53, No.8, pp. 1965-1969, ISSN 0021-9568

pp. 323-333, ISSN 0008-4034

253–260, ISSN 0021-9614.

167, ISSN 0166-1280

Vol. 158-160, pp. 271-281, ISSN 0378-3812

Inc., ISBN 0824728181, New York

*Data*, Vol. 42, No.5, pp. 871-874, ISSN 0021-9568

*Engineering Data*, Vol.40, No.3, pp. 586-588, ISSN 0021-9568.

*Thermophysics*, Vol.29, No.2 , pp. 586-609, ISSN 0195-928X

over the temperature range (288.15 to 313.15) K. *Journal of Chemical & Engineering* 

(2008b). Temperature Dependence of Densities and Excess Molar Volumes of the Ternary Mixture (1-Butanol + Chloroform + Benzene) and its Binary Constituents (1-Butanol + Chloroform and 1-Butanol + Benzene). *International Journal of* 

compounds and mixtures. *Canadian Journal of Chemical Engineering*, Vol. 64, No.2,

for Binary Mixtures of (Ethanol + Benzene, or Toluene, or o-Xylene, or Chlorobenzene) at a Temperature of 298.15 K. *Journal of Chemical & Engineering* 

Refractive Indices and Densities of the System Acetone + Benzene + Cyclohexane at 298.15 K. Changes of Refractivity and of Volume on Mixing. *Journal of Chemical &* 

Electron-Diffraction Evidence for Internal Hydrogen Bonding. *Journal of the* 

n-alkanes. IV. Binary mixtures of decan-1-ol + n-pentane, + n-hexane, + n-octane, + n-decane, and + n-hexadecane. *Journal of Chemical Thermodynamics*, Vol.13, No.3, pp.

pressure to zero pressure and then to no reference pressure. *Fluid Phase Equilibria*,

optimized molecular geometry and conformational analysis of 1,3-propanediol at the 4-21G level. *Journal of Molecular Structure (Theochem)*,Vol.181, No.1-2, pp. 149Molecular interactions play a fundamental role in the behavior of the chemical and physical properties of any physico-chemical system. In gas chromatography (GC) the chromatographic retention is a very complex process. It involves the interaction of molecules through multiple intermolecular forces, such as dispersion (or London forces), orientation (dipole–dipole or Keesom forces), induction (dipole–induced dipole or Debye forces), and electron donor–acceptor forces including hydrogen–bonding, leading to the partition of the solute between the gas and liquid phases (Kaliszan, 1987; Peng, 2000; Yao et al., 2002). Other factors, such as steric hindrance of substituent groups within the solute molecule, can also affect the chromatographic behavior (Fritz et al., 1979; Peng et al., 1988). It is clear that correlations between gas chromatographic retention indices (RIs) and molecular parameters provide significant information on the molecular structure, retention time and the possible mechanism of absorption and elution (Körtvélyesi et al., 2001). Several topological, geometric, electronic, and quantum chemical descriptors have been used in research on quantitative structure–property and structure–activity relationships (QSPR/QSAR) (Karelson et al., 1996; Katritzky & Gordeeva, 1993; Kier & Hall, 1990). The topological descriptors have shown their efficacy in the prediction of diverse physicochemical and biological properties of various types of compounds (Amboni et al., 2000; Arruda et al., 1993; Estrada, 2001a, 2001b; Heinzen & Yunes, 1993, 1996; Heinzen et al., 1999a; Kier & Hall, 1986; Randic, 1993, 2001; Ren, 2002a, 2002b, 2002c; Sabljic & Trinajstic, 1981). In general, these indices are numbers containing relevant information regarding the structure of molecules. Most of the measured physicochemical properties are steric properties, and therefore they may be reasonably well described by topological indices. However, in some cases, these indices also contain structural information related to the electronic and/or polar features of molecules (Galvez et al., 1994; Hall et al., 1991). The molecular size, shape, polarity, and ability to participate in hydrogen bonding are among the different factors that can contribute to the physicochemical properties or biological activities of a molecule. It is well known that these factors are related to intermolecular interactions such as van der Waals forces.

The use of graph–theoretical topological indices in QSPR/QSAR studies has sparked great interest in recent years. The topological indices have become a powerful tool for predicting numerous physicochemical properties and/or biological activities of compounds, as well as for molecular design. One of the most important properties that have been extensively studied is the chromatographic retention (Estrada & Gutierrez, 1999; Ivanciuc, O. et al., 2000; Ivanciuc T. & Ivanciuc, O., 2002; Katritzky et al., 1994; Katritzky et al., 2000; Pompe & Novic, 1999**;** Ren, 1999, 2002a). Quantitative structure–chromatographic retention relationship (QSRR) studies have been widely investigated by gas chromatography (GC) and high– performance liquid chromatography (HPLC) (Markuszenwski &. Kaliszan, 2002). Topological indices (TI) are obtained *via* mathematical operations from the corresponding molecular graphs of compounds (Ivanciuc, O. et al., 2002; Kier & Hall, 1976; Liu, S.-S. et al. 2002; Marino et al., 2002; Rios–Santamarina et al., 2002; Toropov & Toropova, 2002) in contrast to the physicochemical characterization used by traditional QSAR (García– Domenech et al., 2002). One of the main advantages of TI is that they can be easily and rapidly computed for any constitutional formula yielding good correlation abilities. However, important disadvantages should be noted including the difficulties encountered in encoding stereo–chemical information, for example, to distinguish between *cis–* and *trans–*isomers, and their lack of physical meaning. Many topological indices have been proposed since the pioneering studies by Wiener (Wiener, 1947) and by Kier on the use of QSAR (Kier & Hall, 1976). The TI developed for QSAR/QSRR studies can be illustrated by Estrada's approach to edge weights using quantum chemical parameters (Estrada, 2002) and by Ren's atom–type AI topological indices derived from the topological distance sums and vertex degree (Ren, 2002d).

Based on a chromatographic behavior hypothesis, our group developed a topological index called the semi–empirical topological index (*IET*). This index was initially developed to predict the chromatographic retention of linear and branched alkanes and linear alkenes, with the objective of differentiating their *cis–* and *trans–* isomers and obtaining QSRR models (Heinzen et al., 1999b). The excellent results achieved stimulated our group to extend the new topological descriptor to other classes of compounds (Amboni et al., 2002a, 2002b; Arruda et al., 2008; Junkes et al., 2002a, 2002b, 2003a, 2003b, 2004; Junkes et al., 2005; Porto et al., 2008). The equation obtained to calculate the *IET* was generated from the molecular graph and the values of the carbon atoms, and the functional groups were attributed observing the experimental chromatographic behavior and supported by theoretical considerations. This was carried out due to the difficulty in obtaining a complete theoretical description of the interaction between the stationary phase and the solute. Based only on theoretical equations or hypotheses it is not possible, for example, to estimate how the molecular conformation of the solute affects the intermolecular forces. In view of this, it seems reasonable to assume that from the experimental behavior we can obtain insights regarding these factors in order to apply them to other processes involved in QSPR studies. Thus, it can be noted that the semi–empirical topological index has a clear physical meaning.

electronic and/or polar features of molecules (Galvez et al., 1994; Hall et al., 1991). The molecular size, shape, polarity, and ability to participate in hydrogen bonding are among the different factors that can contribute to the physicochemical properties or biological activities of a molecule. It is well known that these factors are related to intermolecular

The use of graph–theoretical topological indices in QSPR/QSAR studies has sparked great interest in recent years. The topological indices have become a powerful tool for predicting numerous physicochemical properties and/or biological activities of compounds, as well as for molecular design. One of the most important properties that have been extensively studied is the chromatographic retention (Estrada & Gutierrez, 1999; Ivanciuc, O. et al., 2000; Ivanciuc T. & Ivanciuc, O., 2002; Katritzky et al., 1994; Katritzky et al., 2000; Pompe & Novic, 1999**;** Ren, 1999, 2002a). Quantitative structure–chromatographic retention relationship (QSRR) studies have been widely investigated by gas chromatography (GC) and high– performance liquid chromatography (HPLC) (Markuszenwski &. Kaliszan, 2002). Topological indices (TI) are obtained *via* mathematical operations from the corresponding molecular graphs of compounds (Ivanciuc, O. et al., 2002; Kier & Hall, 1976; Liu, S.-S. et al. 2002; Marino et al., 2002; Rios–Santamarina et al., 2002; Toropov & Toropova, 2002) in contrast to the physicochemical characterization used by traditional QSAR (García– Domenech et al., 2002). One of the main advantages of TI is that they can be easily and rapidly computed for any constitutional formula yielding good correlation abilities. However, important disadvantages should be noted including the difficulties encountered in encoding stereo–chemical information, for example, to distinguish between *cis–* and *trans–*isomers, and their lack of physical meaning. Many topological indices have been proposed since the pioneering studies by Wiener (Wiener, 1947) and by Kier on the use of QSAR (Kier & Hall, 1976). The TI developed for QSAR/QSRR studies can be illustrated by Estrada's approach to edge weights using quantum chemical parameters (Estrada, 2002) and by Ren's atom–type AI topological indices derived from the topological distance sums and

Based on a chromatographic behavior hypothesis, our group developed a topological index called the semi–empirical topological index (*IET*). This index was initially developed to predict the chromatographic retention of linear and branched alkanes and linear alkenes, with the objective of differentiating their *cis–* and *trans–* isomers and obtaining QSRR models (Heinzen et al., 1999b). The excellent results achieved stimulated our group to extend the new topological descriptor to other classes of compounds (Amboni et al., 2002a, 2002b; Arruda et al., 2008; Junkes et al., 2002a, 2002b, 2003a, 2003b, 2004; Junkes et al., 2005; Porto et al., 2008). The equation obtained to calculate the *IET* was generated from the molecular graph and the values of the carbon atoms, and the functional groups were attributed observing the experimental chromatographic behavior and supported by theoretical considerations. This was carried out due to the difficulty in obtaining a complete theoretical description of the interaction between the stationary phase and the solute. Based only on theoretical equations or hypotheses it is not possible, for example, to estimate how the molecular conformation of the solute affects the intermolecular forces. In view of this, it seems reasonable to assume that from the experimental behavior we can obtain insights regarding these factors in order to apply them to other processes involved in QSPR studies. Thus, it can be noted that the semi–empirical topological index has a clear physical meaning.

interactions such as van der Waals forces.

vertex degree (Ren, 2002d).

The semi–empirical topological index (*IET*) allowed the creation of a new descriptor, the electrotopological index, *ISET*, which was recently developed by our group and applied to QSPR studies to predict the chromatographic retention index for a large number of organic compounds, including aliphatic hydrocarbons, alkanes and alkenes, aldehydes, ketones, esters and alcohols (Souza et al., 2008, 2009a, 2009b, 2010). The new descriptor for the above series of molecules can be quickly calculated from the semi-empirical, quantum-chemical, AM1 method and correlated with the approximate numerical values attributed by the semiempirical topological index to the primary, secondary, tertiary and quaternary carbon atoms. Thus, unifying the quantum-chemical with the topological method provided a threedimensional picture of the atoms in the molecule. It is important to note that the AM1 method portrays more reliable semi-empirical charges, dipoles and bond lengths than those obtained from time-consuming, low-quality, *ab initio* methods, that is, when employing a minimal basis set in *ab initio* calculations. Despite the fact that the calculated partial atomic charges may be less reliable than other molecular properties, and that different semiempirical methods give values for the net charges with poor numerical agreement, it is important to recognize that their calculation is easy and that the values at least indicate the trends of the charge density distributions in the molecules. Since many chemical reactions or physico-chemical properties are strongly dependent on local electron densities, net atomic charges and other charge-based descriptors are currently used as chemical reactivity indices.

For alkanes and alkenes, this correlation allowed the creation of a new semi-empirical electrotopological index (*ISET*) for QSRR models based on the fact that the interactions between the solute and the stationary phase are due to electrostatic and dispersive forces. This new index, *ISET*, is able to distinguish between the *cis-* and *trans-*isomers directly from the values for the net atomic charges of the carbon atoms that are obtained from quantumchemical calculations (Souza et al., 2008). For polar molecules like aldehydes, ketones, esters and alcohols, the presence of heteroatoms like oxygen changes considerably the charge distribution of the corresponding hydrocarbons, leading to a small increase in the interactions between the solute and the stationary phase (Souza et al., 2009a, 2009b, 2010). An appropriate way to calculate the *ISET* was developed, taking into account the dipole moment exhibited by these molecules and the atomic charges of the heteroatoms and the carbon atoms attached to them. By considering the stationary phase as non-polar material the interaction between these molecules and the stationary phase becomes electrostatic with the contribution of dispersive forces. These interactions were slowly increased relative to the corresponding hydrocarbons. Hence, the interactions between the molecules and the stationary phase were slowly increased as a result of the charge redistribution that occurred in presence of the heteroatom. This charge redistribution accounted for the dipole moment of the molecules. Clearly the main outcomes in terms of the charge distribution due the presence of the (oxygen) heteroatoms occur in the neighborhood, and the excess charge of these atoms leads to electrostatic interactions that are stronger relative to the weak dispersive dipolar interactions.

#### **2. Semi-empirical topological index (***IET***)**

Three important factors led us to develop the semi-empiric topological index: (i) no topological index alone was able to differentiate between the *cis*- and *trans*- isomeric structures of alkenes; (ii) if all the carbon atoms have a value of 100 as indicated by Kovàts, from the experimental results it is not possible to determine a constant value for each of the different carbon atoms (secondary, tertiary and quaternary) of alkanes; (iii) when the Kovàts indices of retention for very branched hydrocarbons (alkanes) are correlated with the number of carbon atoms an unacceptable linearization is observed. It is known that the chromatographic process of separation results from the forces that operate between solute molecules and the molecules of the stationary phase. The retention of alkanes and alkenes is due to the number of carbon atoms and the interaction of each specific carbon atom with the stationary phase. The interaction of the stationary phase with the carbon atoms is determined by its electrical properties and by the steric hindrance to this interaction by other carbon atoms attached to it. The values attributed to the carbon atoms were based on the results of the experimental chromatographic behavior of the molecules that measure the real electrical and steric characteristics of the carbons. For this reason the index is denominated semi-empirical. The representation of the molecules was based on the molecular graph theory, where the carbon atoms are considered as the vertexes of the graph and the hydrogens are suppressed (Hansen & Jurs, 1988). Thus, it is called a topological index.

#### **2.1 Calculation of** *IET* **for alkanes and alkenes**

Values were attributed to the carbon atoms (vertex of the molecular graphs) according to the following considerations. (i) According to the Kovàts convention, the correlation between the retention index and number of carbon atoms is linear for the alkanes (Kovàts, 1968). However, branched alkanes do not present this linear relationship with the Kovàts index, since the retention of the tertiary and quaternary carbon atoms is decreased by the steric effects of their neighboring groups. It is evident that secondary, tertiary and quaternary carbon atoms have values of less than 100 u.i., as previously attributed by Kovàts. (ii) Observing the experimental chromatographic behavior, approximate numerical values were attributed: 100 u.i. for the carbon atom in the methyl group in agreement with Kovàts, 90 u.i. for the secondary carbon atoms, 80 u.i. for the tertiary and 70 u.i. for the quaternary. All values were divided by 100 to make them consistent with the common topological values. (iii) The contribution of these carbon atoms to the chromatographic retention is also dependent on the neighboring substituent groups due to steric effects. In order to estimate the steric effects, it was observed that the values for the experimental RI decreased as the branch increased, showing a log trend. Therefore, it was necessary to add the value of the logarithm of each adjacent carbon atom. Thus, the new semi-empirical topological index (*IET*) is expressed as:

$$I\_{ET} = \sum\_{i} (\mathbf{C}\_i + \delta\_i) \tag{1}$$

$$\delta\_i = \sum\_{j \sim i} \log \mathbf{C}\_j$$

where Ci is the value attributed to each carbon atom in the molecule and i is the sum of the logarithm of the value for each adjacent carbon atom (C1, C2, C3 and C4) and **~** means 'adjacent to'. (iv) For alkenes, the main interaction force between the solute and stationary phase is the dispersive force, which is reduced by neighboring steric effects, however, the electrostatic force is also involved. The influence of conformational effects on the intermolecular forces makes it

from the experimental results it is not possible to determine a constant value for each of the different carbon atoms (secondary, tertiary and quaternary) of alkanes; (iii) when the Kovàts indices of retention for very branched hydrocarbons (alkanes) are correlated with the number of carbon atoms an unacceptable linearization is observed. It is known that the chromatographic process of separation results from the forces that operate between solute molecules and the molecules of the stationary phase. The retention of alkanes and alkenes is due to the number of carbon atoms and the interaction of each specific carbon atom with the stationary phase. The interaction of the stationary phase with the carbon atoms is determined by its electrical properties and by the steric hindrance to this interaction by other carbon atoms attached to it. The values attributed to the carbon atoms were based on the results of the experimental chromatographic behavior of the molecules that measure the real electrical and steric characteristics of the carbons. For this reason the index is denominated semi-empirical. The representation of the molecules was based on the molecular graph theory, where the carbon atoms are considered as the vertexes of the graph and the hydrogens are suppressed (Hansen & Jurs, 1988). Thus, it is called a topological

Values were attributed to the carbon atoms (vertex of the molecular graphs) according to the following considerations. (i) According to the Kovàts convention, the correlation between the retention index and number of carbon atoms is linear for the alkanes (Kovàts, 1968). However, branched alkanes do not present this linear relationship with the Kovàts index, since the retention of the tertiary and quaternary carbon atoms is decreased by the steric effects of their neighboring groups. It is evident that secondary, tertiary and quaternary carbon atoms have values of less than 100 u.i., as previously attributed by Kovàts. (ii) Observing the experimental chromatographic behavior, approximate numerical values were attributed: 100 u.i. for the carbon atom in the methyl group in agreement with Kovàts, 90 u.i. for the secondary carbon atoms, 80 u.i. for the tertiary and 70 u.i. for the quaternary. All values were divided by 100 to make them consistent with the common topological values. (iii) The contribution of these carbon atoms to the chromatographic retention is also dependent on the neighboring substituent groups due to steric effects. In order to estimate the steric effects, it was observed that the values for the experimental RI decreased as the branch increased, showing a log trend. Therefore, it was necessary to add the value of the logarithm of each adjacent carbon atom. Thus, the new semi-empirical topological index

> *ET i i i*

*<sup>i</sup> <sup>j</sup> j~i logC*

where Ci is the value attributed to each carbon atom in the molecule and i is the sum of the logarithm of the value for each adjacent carbon atom (C1, C2, C3 and C4) and **~** means 'adjacent to'. (iv) For alkenes, the main interaction force between the solute and stationary phase is the dispersive force, which is reduced by neighboring steric effects, however, the electrostatic force is also involved. The influence of conformational effects on the intermolecular forces makes it

*I (C )* (1)

index.

(*IET*) is expressed as:

**2.1 Calculation of** *IET* **for alkanes and alkenes** 

very difficult to predict these effects based only on theoretical considerations. For this reason, the values attributed to the carbon atom of the double bond for alkenes were calculated by numerical approximation based on the experimental retention indices, as described in our previous publication (Heinzen et al., 1999b; Junkes et al., 2002a).

#### **2.2 Calculation of** *IET* **for compounds with oxygen-containing functional groups**

The values attributed to the carbon atoms and functional groups (vertex of the molecular graphs) were based on the following considerations: (i) For this group of compounds, the main intermolecular forces that contribute to their chromatographic behavior on low polarity stationary phases are dispersive and inductive forces. The values attributed to functional groups are also based on the experimental retention index. (ii) The –COO- (ester), C=O (ketone or aldehyde) and C-OH (alcohol) groups were considered as a single vertex of the molecular graph of the compounds studied. This was carried out due to the difficulty and the inconsistency associated with calculating the individual values of the carbon atoms and the oxygen atoms of these groups. Thus, better numerical approximations were obtained, capable of reflecting the experimental chromatographic behavior of these compounds, when these groups were treated as a single vertex. (iii) The same considerations that were taken into account during the development of the semi-empirical topological method for the prediction of retention indices of alkanes and alkenes (Heinzen et al., 1999b; Junkes et al., 2002a) were employed to develop the *IET* for oxo-compounds. (iv) The contribution of the carbon atoms and functional groups to the chromatographic retention was represented by a single symbol, Ci, as indicated in Equation 1. The semi-empirical topological index can be expressed by a general Equation, for the entire set of compounds included in this work, where: Ci = value attributed to the –COO- (ester), C=O (ketone or aldehyde), C–OH (alcohol) groups and/or to each carbon atom, i, in the molecule. i = the sum of the logarithm of the values of each adjacent carbon atom (C1, C2, C3, and C4 ) and/or the logarithm of the value of the –COO- (ester), C=O (ketone or aldehyde), C-OH (alcohol) groups, and ~ means 'adjacent to'. In a first step, an approximate *IET* (IEta) was calculated for each compound. This was achieved using the equation previously obtained for linear alkanes containing from 3 to 10 carbon atoms and Kovàts experimental retention indices of compounds (Heinzen et al., 1999b). (v) Subsequently, the values of Ci for primary and secondary carbon atoms, previously attributed to alkanes (Heinzen et al., 1999b), and the approximate *IET*, calculated above, were used in Equation 1 in order to calculate the values of –COO-, C=O and C-OH groups of linear compounds. Thus, values were attributed to each class of functional group according to the position of the group in the carbon chain. (vi) One of the fundamental factors taken into consideration for the development of this topological index was the importance of the steric and other mutual intramolecular interactions between the functional group and nearby atoms. Therefore, for branched molecules, different values were attributed to carbon atoms in the , , and position with respect to the functional groups compared to those previously attributed to alkanes (Heinzen et al., 1999b) as described in the literature (Amboni et al., 2002a, 2002b; Junkes et al., 2003b, 2004).

The values of Ci for the carbon atoms and the values attributed to the functional groups of esters, aldehydes, ketones and alcohols are listed in Table 1 of Junkes et al. (Junkes et al., 2004,).

#### **2.3 Calculation of** *IET* **for alkylbenzene compounds**

The same considerations employed in the generation of the semi-empirical topological index, *I*ET, for linear and branched alkanes and alkenes (Heinzen et al., 1999b; Junkes et al., 2002a) were applied to this group of compounds (alkylbenzenes). Firstly, the molecules were represented by hydrogen-suppressed molecular graphs based on chemical graph theory (Hansen & Jurs, 1988) where the carbon atoms were considered as vertexes of the molecular graph of these compounds. The contribution of each carbon atom to the chromatographic retention is represented by a single symbol, C*i*, as can be observed from Eq. (1) where Ci is the value attributed to (=C<) fragments and/or each carbon atom *i* in the molecule; and δ*<sup>i</sup>* is the sum of the logarithm of the values for each adjacent carbon atom (C1, C2, C3 and C4). The values of Ci for the carbon atoms of linear, branched, *ortho*, *meta* and *para*  substituted, tri-substituted and tetra-substituted alkyl benzenes can be seen in Porto et al. (Porto et al., 2008).

#### **2.4 Calculation of** *IET* **for halogenated aliphatic compounds**

The present approach is based on the representation of molecules by hydrogen-suppressed molecular graphs which, in turn, are based on chemical graph theory, where the carbon atoms (Ci) are the graph vertexes. As with the carbon atoms the C–X and X–C–X fragments (where X = chlorine, bromine, or iodine atom) are considered a vertex of the molecular graph of these compounds, as previously considered for the functional groups (Heinzen et al., 1999b). The *IET* is expressed as equation (1) where Ci is the value attributed to each carbon atom *i* and/or to C–X or X–C–X fragments in the molecule; and δi is the sum of the logarithm of the values for each adjacent carbon atom (C1, C2, C3, and C4) and/or the logarithm of the values of the adjacent C–X and X–C–X fragment. The values to be attributed to the carbon atoms, and to the functional group (Ci) for halogenated hydrocarbons, are calculated by numerical approximation based on the experimental retention index (RIExp) values and supported by theoretical considerations. The values of Ci for the carbon atoms of linear and branched halogenated aliphatic compounds can be obtained in Arruda et al. (Arruda et al., 2008).

#### **2.5 Development of QSRR models using the** *IET*

As the starting point, the *IET* was developed for alkanes on a low polarity stationary phase. These are the simplest compounds and their properties are almost completely dependent on topological features. Subsequently, this novel topological descriptor was extended to different classes of organic compounds with more complex structural features. A summary of the best simple linear regression models (RI = b + a IET) and the statistical data for each data set of compounds, obtained in previous QSRR studies, is given in Table 1.

#### **3. The semi-empirical electrotopological index,** *ISET*

The semi–empirical topological index (*IET*) discussed in the previous section allows the creation of a new descriptor, the electrotopological index, *ISET*, which was developed and applied to QSPR studies to predict the retention index, boiling points and octanol/water partition coefficient (Log P), for a large amount of organic compounds, including aliphatic hydrocarbons alkanes and alkenes, aldehydes, ketones, esters and alcohols (Souza et al.,

The same considerations employed in the generation of the semi-empirical topological index, *I*ET, for linear and branched alkanes and alkenes (Heinzen et al., 1999b; Junkes et al., 2002a) were applied to this group of compounds (alkylbenzenes). Firstly, the molecules were represented by hydrogen-suppressed molecular graphs based on chemical graph theory (Hansen & Jurs, 1988) where the carbon atoms were considered as vertexes of the molecular graph of these compounds. The contribution of each carbon atom to the chromatographic retention is represented by a single symbol, C*i*, as can be observed from Eq. (1) where Ci is the value attributed to (=C<) fragments and/or each carbon atom *i* in the molecule; and δ*<sup>i</sup>* is the sum of the logarithm of the values for each adjacent carbon atom (C1, C2, C3 and C4). The values of Ci for the carbon atoms of linear, branched, *ortho*, *meta* and *para*  substituted, tri-substituted and tetra-substituted alkyl benzenes can be seen in Porto et al.

The present approach is based on the representation of molecules by hydrogen-suppressed molecular graphs which, in turn, are based on chemical graph theory, where the carbon atoms (Ci) are the graph vertexes. As with the carbon atoms the C–X and X–C–X fragments (where X = chlorine, bromine, or iodine atom) are considered a vertex of the molecular graph of these compounds, as previously considered for the functional groups (Heinzen et al., 1999b). The *IET* is expressed as equation (1) where Ci is the value attributed to each carbon atom *i* and/or to C–X or X–C–X fragments in the molecule; and δi is the sum of the logarithm of the values for each adjacent carbon atom (C1, C2, C3, and C4) and/or the logarithm of the values of the adjacent C–X and X–C–X fragment. The values to be attributed to the carbon atoms, and to the functional group (Ci) for halogenated hydrocarbons, are calculated by numerical approximation based on the experimental retention index (RIExp) values and supported by theoretical considerations. The values of Ci for the carbon atoms of linear and branched halogenated aliphatic compounds can be

As the starting point, the *IET* was developed for alkanes on a low polarity stationary phase. These are the simplest compounds and their properties are almost completely dependent on topological features. Subsequently, this novel topological descriptor was extended to different classes of organic compounds with more complex structural features. A summary of the best simple linear regression models (RI = b + a IET) and the statistical data for each

The semi–empirical topological index (*IET*) discussed in the previous section allows the creation of a new descriptor, the electrotopological index, *ISET*, which was developed and applied to QSPR studies to predict the retention index, boiling points and octanol/water partition coefficient (Log P), for a large amount of organic compounds, including aliphatic hydrocarbons alkanes and alkenes, aldehydes, ketones, esters and alcohols (Souza et al.,

data set of compounds, obtained in previous QSRR studies, is given in Table 1.

**2.3 Calculation of** *IET* **for alkylbenzene compounds** 

**2.4 Calculation of** *IET* **for halogenated aliphatic compounds** 

obtained in Arruda et al. (Arruda et al., 2008).

**2.5 Development of QSRR models using the** *IET*

**3. The semi-empirical electrotopological index,** *ISET*

(Porto et al., 2008).


Table 1. Summary of the best simple linear regressions (RICalc = a + b IET) found for different data set on low polarity stationary phases.

2008, 2009a, 2009b, 2010). This new descriptor for this series of molecules can be quickly calculated from atomic charges obtained through the semi-empirical quantum-chemical, AM1 method (Bredow & Jug, 2005; Smith, 1996), since it was found that atomic charges correlated with the approximate numerical values attributed by the semi-empirical topological index to the primary, secondary, tertiary and quaternary carbons atoms.

#### **3.1 Calculation of** *ISET* **for alkanes and alkenes**

For alkanes and alkenes, the above-mentioned correlation allowed the creation of a new semi-empirical electrotopological index (*ISET*) for QSRR models based on the fact that the interactions between the solute and the stationary phase are due to electrostatic and dispersive forces (Souza et al., 2008). This new index, *ISET*, is able to distinguish between the *cis*- and *trans*-isomers directly from the values of the net atomic charges of the carbon atoms that are obtained from quantum-chemical calculations. More precisely, this new semiempirical electrotopological index, *ISET*, was developed based on the refinement of the previous semi-empirical topological index, *IET*. The values for the Ci fragments that were firstly attributed from the experimental chromatographic retention and theoretical deductions have an excellent relationship with the net atomic charge of the carbon atoms. Thus, the values attributed to the vertices in the hydrogen-suppressed graph of carbon atoms (Ci) are calculated from the correlation between the net atomic charge in each carbon atom, which is obtained from quantum chemical semi-empirical calculations, and the Ci fragments for primary, secondary, tertiary and quaternary carbon atoms (1.0, 0.9, 0.8 and 0.7, respectively) obtained from the experimental values. This shows that it is possible to calculate a new index, *ISET* (the semi-empirical electrotopological index) through the net atomic charge values obtained from a Mulliken population analysis using the semiempirical AM1 method and their correlation with the values attributed to the different types of carbon atoms. This demonstrates that the *ISET* encodes information on the charge distribution of the solute which drives the dispersive and electrostatic interactions between the solute (alkanes and alkenes) and the stationary phase (Souza et al., 2008).

Since the interactions between the solute and the stationary phase are dispersive for alkanes and electrostatic for alkenes, the chromatographic retention is strongly dependent on the electronic charge distribution of each carbon atom of these molecules. A simple linear regression equation was obtained between the values of the carbon atoms, SETi values, based on experimental gas chromatography retention (for primary (1.0), secondary (0.9), tertiary (0.8) and quaternary (0.7) carbon atoms) and the net atomic charges (δi) of these atoms, as given in Equation (2).

$$\text{SET}\_{\text{\tiny\text{\tiny\text{\tiny}}}} = \text{-1.77125}\text{\S}\_{\text{\tiny\text{\tiny}}} + 0.62417 \tag{2}$$

This indicates that the physical reality encoded by the semi-empirical topological index (*IET*) developed in our laboratory is completely related to net atomic charges which, as is well known, are important forces in intermolecular interactions. It is clear that the interactions between the non-polar stationary phases and the different compounds were determined predominantly through the electronic charge distribution of the molecular structures of the compounds analyzed by gas chromatography. From Equation (1) it is clear that knowledge of the net atomic charges is sufficient to calculate the SETi value for all kinds of carbon atoms and not only the values given by the carbon models (that is 1.0, 0.9, 0.8 and 0.7) or in specific tables. Hence, the above method of calculating the SETi values of the carbon atoms allows a new index to be created, denominated the semiempirical electrotopological index, *ISET*. Considering the steric effects of the neighboring carbon atoms, as was observed in the calculation of *IET* , this new index can be calculated according to Equation (3).

$$\mathbf{I\_{SET}} = \sum\_{\mathbf{i},\mathbf{j}} (\mathbf{SET\_{i}} + \log \mathbf{SET\_{j}}) \tag{3}$$

In the above expression the *i* sum is over all the atoms of the molecule (excluding the H atoms) and the j is an inner sum of atoms attached to the *i* atom. The *cis*-2-pentene and *trans*-2-pentene molecules represented in the graph below are taken as an example of the *ISET* calculation.

The net atomic charges and SETi values for the above molecules are given in Table 2 below.


Table 2. The net atomic charge (*i*) and the *SETi* values for each carbon atom of *cis*-2-pentene and *trans*-2-pentene molecules.

The *ISET* calculation now follows:

32 Molecular Interactions

calculate a new index, *ISET* (the semi-empirical electrotopological index) through the net atomic charge values obtained from a Mulliken population analysis using the semiempirical AM1 method and their correlation with the values attributed to the different types of carbon atoms. This demonstrates that the *ISET* encodes information on the charge distribution of the solute which drives the dispersive and electrostatic interactions between

Since the interactions between the solute and the stationary phase are dispersive for alkanes and electrostatic for alkenes, the chromatographic retention is strongly dependent on the electronic charge distribution of each carbon atom of these molecules. A simple linear regression equation was obtained between the values of the carbon atoms, SETi values, based on experimental gas chromatography retention (for primary (1.0), secondary (0.9), tertiary (0.8) and quaternary (0.7) carbon atoms) and the net atomic charges (δi) of these

SET = -1.77125δ + 0.62417 i i (2)

This indicates that the physical reality encoded by the semi-empirical topological index (*IET*) developed in our laboratory is completely related to net atomic charges which, as is well known, are important forces in intermolecular interactions. It is clear that the interactions between the non-polar stationary phases and the different compounds were determined predominantly through the electronic charge distribution of the molecular structures of the compounds analyzed by gas chromatography. From Equation (1) it is clear that knowledge of the net atomic charges is sufficient to calculate the SETi value for all kinds of carbon atoms and not only the values given by the carbon models (that is 1.0, 0.9, 0.8 and 0.7) or in specific tables. Hence, the above method of calculating the SETi values of the carbon atoms allows a new index to be created, denominated the semiempirical electrotopological index, *ISET*. Considering the steric effects of the neighboring carbon atoms, as was observed in the calculation of *IET* , this new index can be calculated

I = (SET + logSET ) SET <sup>i</sup> <sup>j</sup> i,j

In the above expression the *i* sum is over all the atoms of the molecule (excluding the H atoms) and the j is an inner sum of atoms attached to the *i* atom. The *cis*-2-pentene and *trans*-2-pentene molecules represented in the graph below are taken as an example of the *ISET*

 *cis*-2-pentene *trans*-2-pentene

The net atomic charges and SETi values for the above molecules are given in Table 2 below.

(3)

the solute (alkanes and alkenes) and the stationary phase (Souza et al., 2008).

atoms, as given in Equation (2).

according to Equation (3).

calculation.


As expected on physical-chemical grounds, the AM1 calculation reveals that the optimized structures of the *cis-* and *trans-* isomers have slightly different charge distributions. As can be seen from the above results, the Mulliken population analysis gives the net atomic charges of the carbon atoms for each isomer, which implies that the difference in the values for the SETi fragments is sufficient to give different *ISET* values.

#### **3.2 Calculation of** *ISET* **for compounds with oxygen-containing functional groups**

#### **3.2.1 Ketones and aldehydes**

For polar molecules like aldehydes, ketones, esters and alcohols, the presence of heteroatoms like oxygen changes considerably the charge distribution of the corresponding hydrocarbons giving a small increase in the interactions between the solute and the stationary phase. An appropriate way to calculate the *ISET* was developed that takes into account the dipole moment exhibited by these molecules and the atomic charges of the heteroatoms and the carbon atoms attached to them (Souza et al., 2009a). By considering the stationary phase as non-polar material, the interactions are slowly increased relative to the corresponding hydrocarbons due to the charge redistribution that occurs in presence of the heteroatom. This charge redistribution accounts for the dipole moment of the molecules. Thus, the dipolar charge distribution in such molecules leads to a small increase in the interactions of the solute with the stationary phase relative to hydrocarbons where the dipole moment is zero, or almost zero. Clearly the major effects on the charge distribution due the presence of the (oxygen) heteroatoms occur in the neighborhood and the excess charge of these atoms leads to electrostatic interactions that are stronger relative to the weak dispersive dipolar interactions (Christian, 1990).

In relation to the chromatographic retention it can be observed, for instance, that the molecules 2-hexanone, 3-hexanone and hexanal have experimental retention indices of 767, 764 and 776, respectively, and for the corresponding hydrocarbon molecule in the absence of the heteroatom, that is, the heptane, the retention index is 700. Due to the presence of the heteroatom (oxygen) there is an increase in the retention index of around 10%. Hence, the interactions between the molecules and the stationary phase are slowly increased and clearly this is due to the charge redistribution that occurs in the presence of the heteroatom. This charge redistribution accounts for the dipole moment of molecules like aldehydes and ketones. The dispersive force between these kinds of molecules and the stationary phase includes the charge-dipole interactions and dipole–induced dipole interactions which are weak relative to the electrostatic interactions. Thus, the dipolar charge distribution in such molecules leads to a small increase in the interactions of the solute with the stationary phase relative to hydrocarbons where the dipole moment is zero. Initially, it appears that the above-mentioned factors mean that the retention index can be calculated as in equation 3, and the same applies to the heteroatoms, but including subtle alterations that incorporate the effects of the dispersive dipolar interactions.

All of these factors can be included in the calculation of the retention index through a small increase in the SETi values for the heteroatoms and the carbon atoms attached to them. This was carried out by multiplying the SETi values of these atoms by a function Aµ which is dependent on the dipole moment of the molecule and the net charge at the oxygen and carbon atoms (to include both the electrostatic and dispersive interactions). Since we must have Aµ = 1, when the dipole moment is zero or almost zero (as in the case of alkanes and alkenes) in a first attempt to achieve this function a linear dependence on the molecular dipole moment **µ** is considered, that is, **Aµ = 1 + (µ /µF)**, where **µF** is a local function (in the units of the dipole moment) in the sense that it is dependent on the net charge of oxygen and carbon atoms. On the one hand this definition of **Aµ** works only if **µ**/**µF** > 1, since **Aµ** must reflect the small increase in the interactions due to dipolar dispersive forces. On the other hand good choices for the definition of **µF** for ketones and aldehydes (as we shall see below) means that the ratio **µ**/**µF** can be much greater than unity showing clearly that it is not possible to apply the above definition to **Aµ**. Considering that **µ**/**µF** > 1 then **Aµ** can not be a polynomial function of **µ**/**µF**. Thus, **Aµ** must have a weaker dependence on the dipole moment than the linear one and this weak dependence can be achieved through a logarithmic function since it is clear that the function **f(x) = x** increases much faster than the function **f(x) = log (1+x)**. Taking these factors into account it is possible to achieve a definition of **Aµ** that differs slightly from unity and is logarithmically dependent on the dipole moment of the molecule, as seen in equation 4

$$\mathbf{A}\_{\text{fl}} = \mathbf{1} + \log(\mathbf{1} + \frac{\mu}{\mu\_{\text{F}}}),\tag{4}$$

where **µ** is the calculated molecular dipole moment and **µF** is a local function which is dependent on the charges of the atoms belonging to the C=O bond. Clearly, **µF** must be directly related to the net charge of the oxygen atoms since it must reflect some contribution to the electrostatic interaction between these molecules and the stationary phase. In this regard, **µF** may also be related to the atomic charge of the carbon atom of the functional group C=O or related to the difference between the atomic charges of these atoms. Hence, **µF**

In relation to the chromatographic retention it can be observed, for instance, that the molecules 2-hexanone, 3-hexanone and hexanal have experimental retention indices of 767, 764 and 776, respectively, and for the corresponding hydrocarbon molecule in the absence of the heteroatom, that is, the heptane, the retention index is 700. Due to the presence of the heteroatom (oxygen) there is an increase in the retention index of around 10%. Hence, the interactions between the molecules and the stationary phase are slowly increased and clearly this is due to the charge redistribution that occurs in the presence of the heteroatom. This charge redistribution accounts for the dipole moment of molecules like aldehydes and ketones. The dispersive force between these kinds of molecules and the stationary phase includes the charge-dipole interactions and dipole–induced dipole interactions which are weak relative to the electrostatic interactions. Thus, the dipolar charge distribution in such molecules leads to a small increase in the interactions of the solute with the stationary phase relative to hydrocarbons where the dipole moment is zero. Initially, it appears that the above-mentioned factors mean that the retention index can be calculated as in equation 3, and the same applies to the heteroatoms, but including subtle alterations that incorporate

All of these factors can be included in the calculation of the retention index through a small increase in the SETi values for the heteroatoms and the carbon atoms attached to them. This was carried out by multiplying the SETi values of these atoms by a function Aµ which is dependent on the dipole moment of the molecule and the net charge at the oxygen and carbon atoms (to include both the electrostatic and dispersive interactions). Since we must have Aµ = 1, when the dipole moment is zero or almost zero (as in the case of alkanes and alkenes) in a first attempt to achieve this function a linear dependence on the molecular dipole moment **µ** is considered, that is, **Aµ = 1 + (µ /µF)**, where **µF** is a local function (in the units of the dipole moment) in the sense that it is dependent on the net charge of oxygen and carbon atoms. On the one hand this definition of **Aµ** works only if **µ**/**µF** > 1, since **Aµ** must reflect the small increase in the interactions due to dipolar dispersive forces. On the other hand good choices for the definition of **µF** for ketones and aldehydes (as we shall see below) means that the ratio **µ**/**µF** can be much greater than unity showing clearly that it is not possible to apply the above definition to **Aµ**. Considering that **µ**/**µF** > 1 then **Aµ** can not be a polynomial function of **µ**/**µF**. Thus, **Aµ** must have a weaker dependence on the dipole moment than the linear one and this weak dependence can be achieved through a logarithmic function since it is clear that the function **f(x) = x** increases much faster than the function **f(x) = log (1+x)**. Taking these factors into account it is possible to achieve a definition of **Aµ** that differs slightly from unity and is logarithmically dependent on the

<sup>μ</sup> A = 1 + lo <sup>μ</sup> g(1 + )

where **µ** is the calculated molecular dipole moment and **µF** is a local function which is dependent on the charges of the atoms belonging to the C=O bond. Clearly, **µF** must be directly related to the net charge of the oxygen atoms since it must reflect some contribution to the electrostatic interaction between these molecules and the stationary phase. In this regard, **µF** may also be related to the atomic charge of the carbon atom of the functional group C=O or related to the difference between the atomic charges of these atoms. Hence, **µF**

μF

, (4)

the effects of the dispersive dipolar interactions.

dipole moment of the molecule, as seen in equation 4

can be defined in different ways and some definitions of **µF** can be used in preliminary calculations. As expected, after some preliminary calculations, the best choice was for ketones **µF** = **d|QC - QO|** where **d** is the calculated C=O bond length and **|QC - QO|** is the absolute value of the difference between the atomic charges at the carbon and oxygen atoms. This definition of **µF** is an attempt to take into account the contribution of the atomic charge of the oxygen atoms and the respective bonded carbon atom to the electrostatic interactions. For aldehydes the terminal carbon atom of the C=O bond is attached to a hydrogen and thus it is necessary to consider the net positive charge in this polar region of the molecule as the sum of the atomic charges of the carbon and hydrogen atoms. This means that for aldehydes the best choice for **µF** was **µF** = **d|QC + QH - QO|**. Therefore, equation 4 indicates that there is an increase in the interaction between the molecules and the stationary phase due to the presence of the dipole moment and that this contribution may be screened by the charge located on the heteroatoms (oxygen atoms) if **µ**/**µF** < 1, or may be increased if **µ**/**µF** > 1. In the case of ketones and aldehydes the local function **µF** is less than the dipole moment showing that **Aµ** receives an appreciable contribution from the atomic charges of these atoms. This reveals the contribution of oxygen to the electrostatic interaction between the solute and the stationary phase. Therefore, to include the dispersive dipolar interactions in the calculation of the retention index we multiply the SETi values for the heteroatoms (oxygen) and the carbon atoms attached to them by the dipolar function **Aµ** given in

$$\mathbf{I}\_{\rm SET} = \sum\_{\mathbf{i}, \mathbf{j}} (\mathbf{A}\_{\mu} \mathbf{S} \mathbf{E} \mathbf{\bar{r}}\_{\mathbf{i}} + \log \mathbf{A}\_{\mu} \mathbf{S} \mathbf{E} \mathbf{\bar{r}}\_{\mathbf{j}}) \,\tag{5}$$

where the SETi values are obtained using equation 2. As in equation 3, in the above expression the *i* sum is over the all the atoms of the molecule (excluding the H atoms) and the j is an inner sum of the atoms attached to the *i* atom. In the above expression, for the *I*SET the dipolar function **Aµ** is taken as unity for the remaining carbon atoms of the molecules. Equation 4 reduces to equation 2 when the dipole moment of the molecule is zero or almost zero, as is the case for alkanes and alkenes since **Aµ** = 1 for µ = 0.

equation 4. That is, in this model the *ISET* is calculated as in equation 5

The 3-hexanone and hexanal molecules represented in the graph below are taken as an example of the *ISET* calculation.


The net atomic charges and SETi values are given in Table 3 below.

Table 3. The net atomic charge (*i*) and the *SETi* values for each carbon and oxygen atom of 3-hexanone and hexanal molecules.

The *ISET* calculation now follows:

$$\mathbf{I\_{SET}} = \underset{\mathbf{i}, \mathbf{j}}{\operatorname{\mathbf{T}}} \text{(A}\_{\mathbf{\mu}} \mathbf{SET\_{\dot{\mathbf{i}}} + \log \mathbf{A}\_{\mathbf{\mu}} \mathbf{SET\_{\dot{\mathbf{j}}}}) }{\operatorname{\mathbf{E}}}$$

3-hexanone

SETO1 = 1.9507 + log 0.3899 = 1.5416 SETC1 = 0.9892 + log 0.9998 = 0.9891 SETC2 = 0.9998 + log 0.9892 + log 0.3899 = 0.5860 SETC3 = 0.3899 + log 0.9998 + log 1.9507 + log 0.9998 = 0.6799 SETC4 = 0.9998 + log 0.3899+ log 0.8988= 0.5444 SETC5 = 0.8988 + log 0.9998 + log 0.9998 = 0.8986 SETC6 = 0.9998+ log 0.8988 = 0.9535 ISET = 6.1931 hexanal SETO1 = 1.9328 + log 0.5094 = 1.6398 SETC1 = 0.5094 + log 1.9328 + log 1.0371 = 0.8114 SETC2 = 1.0371+ log 0.5094 + log 0.8988 = 0.6978 SETC3 = 0.8988 + log 1.0371 + log 0.9041 = 0.8708 SETC4 = 0.9041 + log 0.8988 + log 0.9059 = 0.8148 SETC5 = 0.9059 + log 0.9041 + log 0.9980 = 0.8612 SETC6 = 0.9980 + log 0.9059 = 0.9550 ISET = 6.6508

#### **3.2.2 Esters**

36 Molecular Interactions

O1 -0.288 1.1346 1.9507 -0.289 1.1363 1.9328 C1 -0.206 0.9892 - +0.183 0.2995 0.5094 C2 -0.212 0.9998 - -0.233 1.0371 - C3 -0.224 0.2268 0.3899 -0.155 0.8988 - C4 -0.212 0.9998 - -0.158 0.9041 - C5 -0.155 0.8988 - -0.159 0.9059 - C6 -0.212 0.9998 - -0.211 0.9980 -

I = (A SET +lo <sup>μ</sup> gA SET ) SET <sup>i</sup> <sup>μ</sup> <sup>j</sup> i,j

*i*) and the *SETi* values for each carbon and oxygen atom of

*i SETi AuSETi* 

The net atomic charges and SETi values are given in Table 3 below.

*i SETi AuSETi* 

Table 3. The net atomic charge (

The *ISET* calculation now follows:

3-hexanone

ISET = 6.1931

ISET = 6.6508

hexanal

3-hexanone and hexanal molecules.

SETO1 = 1.9507 + log 0.3899 = 1.5416 SETC1 = 0.9892 + log 0.9998 = 0.9891

SETC6 = 0.9998+ log 0.8988 = 0.9535

SETO1 = 1.9328 + log 0.5094 = 1.6398

SETC6 = 0.9980 + log 0.9059 = 0.9550

SETC2 = 0.9998 + log 0.9892 + log 0.3899 = 0.5860

SETC4 = 0.9998 + log 0.3899+ log 0.8988= 0.5444 SETC5 = 0.8988 + log 0.9998 + log 0.9998 = 0.8986

SETC1 = 0.5094 + log 1.9328 + log 1.0371 = 0.8114 SETC2 = 1.0371+ log 0.5094 + log 0.8988 = 0.6978 SETC3 = 0.8988 + log 1.0371 + log 0.9041 = 0.8708 SETC4 = 0.9041 + log 0.8988 + log 0.9059 = 0.8148 SETC5 = 0.9059 + log 0.9041 + log 0.9980 = 0.8612

SETC3 = 0.3899 + log 0.9998 + log 1.9507 + log 0.9998 = 0.6799

Atoms 3-hexanone hexanal

For esters the major effects related to the charge distribution are due to the presence of the two oxygen atoms and they occur on these atoms and in their neighborhood (their adjacent carbon atoms). The excess charge of these atoms leads to electrostatic interactions that are stronger than the weak dispersive dipolar interactions. For esters, all these factors were included in the calculation of the retention index through a small increase in the *SETi* values for heteroatoms and the carbon atoms attached to them (Souza et al., 2009b). As in the case of ketones and aldehydes, it was verified that the introduction of the dipole moment of the molecule is not sufficient to explain the chromatographic behavior of these molecules. Thus, it was necessary to introduce an equivalent local dipole moment of the (- COOC-) group that contributes to the increase in the retention value. This was carried out by multiplying the *SETi* values of the atoms belonging to the O=C-O-C group by the function **Aµ** which is dependent on the dipole moment of the molecule and the net charge of the oxygen and carbon atoms (to include both the electrostatic and dispersive interactions). The same approach used for ketones and aldehydes was applied to esters, that is, considering that **Aµ** has a weaker dependence on the dipole moment than the linear one, as given in equation 4. For esters **µF** is an equivalent local dipole moment (in the units of dipole moment) which is dependent on the charges of the atoms belonging to the O=C-O-C group. Clearly **µF** must be directly related to the net charge of the oxygen atoms since it must reflect some contribution to the electrostatic interaction between these atoms and the stationary phase. In this regard, **µF** may also be related to the atomic charge of the carbon atom of the functional group C=O or related to the difference between the atomic charges of these atoms. Hence, **µF** can be defined in different ways and some definitions of **µF** can be used in preliminary calculations.

Esters have two oxygen atoms and thus it is possible to define two local functions, one being dependent on the charges and bond length of the C=O1 bond and another on the charges and bond length of the C-O2 bond. Therefore, it was necessary to perform some calculations with different definitions for the equivalent local dipole moment. After the preliminary calculations it was found that for esters the charge difference, QO - QC, does not give reasonable results because the charges of the oxygen atoms mask the charge of the carbonyl carbon. As expected, our best choice was for the esters **µF1 = d1|QO1|** and **µF2 = d2|QO2|**, where **d1** and **d2** are the calculated C1=O1 and C1-O2 bond lengths and **|QO1|** and **|QO2|** are the absolute values of the atomic charges of the oxygen atoms (O1 and O2). The equivalent local dipole moment is then calculated as the magnitude of the vectorial sum of two dipole moments, that is, **µF1** = (**µ2F1 + µ2F2 + 2 µF1 µF2 cosθ)1/2** , where **θ** is the angle between the C=O1 and C-O2 bonds. For formates, a specific charge distribution occurs in the polar region of the molecules and the best mathematical model for the local moment was that which takes into account the contribution to the electrostatic interactions that originate from the atomic charges of the oxygen atoms, the carbon atoms and the H atom belonging to the C1O1O2CAl group of the formate molecules (CAl represents the carbon on the alcoholic side). Thus, the equivalent dipoles were built from the net charges of the HC1O1, HC1O2 and O2CAl groups of atoms. The equivalent dipoles associated with these net charges are:

**µF1** = **d1|QH + QC1 - QO1|**, **µF2** = **d2|QH + QC2 - QO2|**and **µF3** = **d3|QCA1 - QO2|** where **d1** and **d2** are the calculated C1=O1 and C1-O2 bond lengths and **d3** is the calculated CAl-O2 bond length. In a first approach, the local moments **µF2** and **µF3** are considered to be collinear and another equivalent dipole is obtained from the difference between **µF2** and **µF3**, that is, **µF4** = **µF2 - µF3** and the final equivalent local moment is calculated as above, that is, **µF** = (**µ2F1 + µ2F4 + 2 µF1 µF4 cosθ)1/2** where **θ** is the angle between the C=O1 and C-O2 bonds. Hence, for formates the charge of the hydrogen atom attached to the carbon atom of the COO functional group is also considered, as in the case of aldehydes, because the charge of the H atom contributes explicitly to the positive charge of the local polar region of the molecule. The above-mentioned best definitions for **µF** imply that the present approach to calculating the retention index considers important polar features of the organic functions, such as ketones, aldehydes and esters, through the information carried by the local moment **µF**. In other words, according to Equation (4) there is an increase in the interaction between the molecules and the stationary phase due to the presence of a dipole moment and this contribution may be screened by the charge located on the heteroatoms and the carbon atom of the C=O group if , µF > µ or may be increased if µF < µ. In the case of esters, the local function **µF** is less than the dipole moment showing that **Aµ** has an appreciable contribution from the atomic charges of those atoms. This verifies, for esters, the contribution of the oxygen atom to the electrostatic interaction between the solute and the stationary phase.

Therefore, in the case of esters the *ISET* value is here calculated as in Equation (5), where the *SETi* values are obtained using Equation (2) through AM1 calculations of the net atomic charges. As mentioned above, Equation (5) is calculated by multiplying the *SETi* values of the atoms belonging to the C1O1O2CAl group by the dipolar function A**µ** which is taken as unity for the remaining carbon atoms of the molecules. Hence, Equation (5) is a general definition for the electrotopological index that can be applied to different organic functions, which are specified through appropriate definitions of the equivalent local moment **µF**. The preliminary applications of *ISET* as given by Equation (5) showed that this expression overestimates the calculated retention index for branched esters and underestimates the results for methyl esters. This finding reveals the need to consider other definitions for the local moment **µF** for branched esters and methyl esters. However, another easy choice is to take into account the steric effects for the branched esters and methyl esters. The simplest way to do this is to consider the steric hindrance of the CAl carbon atom of the C1O1O2CAl group and the carbon atom attached to the acid side of the COO functional group (here named the CAc carbon). As seen in Equation (2), the log SETj factor gives, precisely, the steric effect of atom *j*. Thus, to include a steric correction (sc) in Equation (5) for branched esters the term **sc = n logSET(CAC) + n logSET(CA1)** was added, where **n** is the number of branches of the ester. On the other hand, for methyl esters the CAl carbon is bound to three H atoms and it is necessary to remove the overestimated steric effects of the **logAµSETj** terms in Equation (5). For methyl esters this is easily achieved by including a second steric correction (ssc) by adding the term **ssc = -log SET(CA1)** to equation (5). Very good results were obtained using this approach, which reveals that in this model the complex steric effects in branched esters can be included simply by considering the steric hindrance using the net charge (through the SETi values) of the two carbon atoms bound to the alcoholic and acid sides of the COO functional group. The calculation of *ISET* for a large amount of molecules is easily carried out by means of a FORTRAN code developed in our lab that calculates *ISET* by reading the output data (calculated net charges, dipole moment and atomic positions) from AM1 semi-empirical calculations.

#### **3.2.3 Alcohols**

38 Molecular Interactions

collinear and another equivalent dipole is obtained from the difference between **µF2** and **µF3**, that is, **µF4** = **µF2 - µF3** and the final equivalent local moment is calculated as above, that is, **µF** = (**µ2F1 + µ2F4 + 2 µF1 µF4 cosθ)1/2** where **θ** is the angle between the C=O1 and C-O2 bonds. Hence, for formates the charge of the hydrogen atom attached to the carbon atom of the COO functional group is also considered, as in the case of aldehydes, because the charge of the H atom contributes explicitly to the positive charge of the local polar region of the molecule. The above-mentioned best definitions for **µF** imply that the present approach to calculating the retention index considers important polar features of the organic functions, such as ketones, aldehydes and esters, through the information carried by the local moment **µF**. In other words, according to Equation (4) there is an increase in the interaction between the molecules and the stationary phase due to the presence of a dipole moment and this contribution may be screened by the charge located on the heteroatoms and the carbon atom of the C=O group if , µF > µ or may be increased if µF < µ. In the case of esters, the local function **µF** is less than the dipole moment showing that **Aµ** has an appreciable contribution from the atomic charges of those atoms. This verifies, for esters, the contribution of the oxygen atom to the electrostatic interaction between the

Therefore, in the case of esters the *ISET* value is here calculated as in Equation (5), where the *SETi* values are obtained using Equation (2) through AM1 calculations of the net atomic charges. As mentioned above, Equation (5) is calculated by multiplying the *SETi* values of the atoms belonging to the C1O1O2CAl group by the dipolar function A**µ** which is taken as unity for the remaining carbon atoms of the molecules. Hence, Equation (5) is a general definition for the electrotopological index that can be applied to different organic functions, which are specified through appropriate definitions of the equivalent local moment **µF**. The preliminary applications of *ISET* as given by Equation (5) showed that this expression overestimates the calculated retention index for branched esters and underestimates the results for methyl esters. This finding reveals the need to consider other definitions for the local moment **µF** for branched esters and methyl esters. However, another easy choice is to take into account the steric effects for the branched esters and methyl esters. The simplest way to do this is to consider the steric hindrance of the CAl carbon atom of the C1O1O2CAl group and the carbon atom attached to the acid side of the COO functional group (here named the CAc carbon). As seen in Equation (2), the log SETj factor gives, precisely, the steric effect of atom *j*. Thus, to include a steric correction (sc) in Equation (5) for branched esters the term **sc = n logSET(CAC) + n logSET(CA1)** was added, where **n** is the number of branches of the ester. On the other hand, for methyl esters the CAl carbon is bound to three H atoms and it is necessary to remove the overestimated steric effects of the **logAµSETj** terms in Equation (5). For methyl esters this is easily achieved by including a second steric correction (ssc) by adding the term **ssc = -log SET(CA1)** to equation (5). Very good results were obtained using this approach, which reveals that in this model the complex steric effects in branched esters can be included simply by considering the steric hindrance using the net charge (through the SETi values) of the two carbon atoms bound to the alcoholic and acid sides of the COO functional group. The calculation of *ISET* for a large amount of molecules is easily carried out by means of a FORTRAN code developed in our lab that calculates *ISET* by reading the output data (calculated net charges, dipole moment and

solute and the stationary phase.

atomic positions) from AM1 semi-empirical calculations.

As observed for the preceding compounds, for alcohols the major effects on the charge distribution are due the presence of the oxygen atom and they occur at the site of and close to their neighbors (adjacent carbon atoms). The excess charge at these atoms leads to electrostatic interactions that are stronger than the weak dispersive dipolar interactions. Thus, it is clear that it is necessary to introduce an equivalent local dipole moment for each of the organic functions that participate in increasing the retention value. For alcohols, as in the case of ketones, aldehydes and esters, this was achieved by multiplying the *SETi* values of the atoms belonging to the C-OH group by a function **Aµ** as defined by equation 4, with **µF** being the equivalent local dipole moment which is dependent on the charges of the atoms belonging to the C-OH group (Souza et al., 2010). Clearly, **µF** is directly related to the net atomic charge of the oxygen atoms since it must reflect some contribution to the electrostatic interaction between these atoms and the stationary phase. Thus, **µF** may also be related to the atomic charge of the carbon atom of the functional group C-OH or to the difference between the atomic charges of these atoms. Hence, as with the other organic functions, **µF** can be defined in different ways and some of these definitions can be used in the preliminary calculations. For primary alcohols the terminal carbon atom of the C-O bond is attached to two hydrogen atoms and thus it is necessary to consider the net positive charge in this polar region of the molecule as the sum of the atomic charges of the carbon and hydrogen atoms. Thus, for primary alcohols we found that the best definition of the local moment is related to the charges of all atoms at the polar head of the molecules, that is, **µF** = d**|QC +(QH1 +QH2)/2 - QO -QHO|** where **d** is the calculated C-O bond length and **|QC +(QH1 +QH2)/2 - QO -QHO|** is the absolute value of the difference between the net atomic charge at the carbon (*QC*) plus the average charge of the hydrogen atoms attached to it **(QH1 + QH2)/2** and the charges of the oxygen atom (*QO*) and the hydrogen attached to it (*QHo*). For secondary, tertiary and quaternary alcohols the best choice for the local moment is related to the net atomic charge of the C and O atoms only, that is, **µF = d|QC - QO|**, where **d** is the length of the C-O bond and **|QC - QO|** is the absolute value of the difference between the charge of the carbon atom and oxygen atom attached to it. These definitions of **µF** attempt to take into account the contribution to the electrostatic interactions originating from the polar region of the molecules. Therefore, this shows again that Equation 4 represents a dipolar contribution to the interactions between the molecules and the stationary phase (which originates from the presence of a molecular dipole moment) and this contribution is decreased by the charge of the heteroatoms (oxygen atoms) when **µF** > µ, or increased when **µF** < µ. This reveals the contribution of oxygen to the electrostatic interaction between the solute and stationary phases.

For alcohols, the *ISET* values are calculated as in Equation (5), where the *SETi* values are obtained using Equation (2) through AM1 calculations of the net atomic charges.

#### **3.3 Development of QSPR models using the** *ISET*

The molecular descriptor *ISET* was developed first for alkanes and alkenes on a low polarity stationary phase and then extended to oxo-compounds through the inclusion of the molecular dipole moment and a local dipole moment in its definition. The models for the best simple linear regression between the retention index and the molecular descriptor (RI = b + a ISET) and the statistical data for each class of compounds, obtained in previous QSRR studies, are given in Table 4. For esters and alcohols good correlations between the retention index and the *ISET* were obtained also for stationary phases with different polarities (not included in Table 4). The good statistical results achieved (Table 4) employing *ISET* are better or equivalent to those obtained using multiple linear regression employing many molecular descriptors.


Table 4. Summary of the best simple linear regressions (RICalc = b + a ISET) found for different classes of compounds on low-polarity stationary phases.

As can be seen from Table 4, the QSRR models for 179 representative linear and branched alkanes and alkenes, obtained with the *ISET* using the net atomic charge to calculate more precisely the Ci fragment values of *IET*, were of good quality for the statistical parameters obtained. This new descriptor *ISET* contains information on the 3D features of molecules, and discriminates between geometrical isomers, such as *cis*- and *trans*-alkenes, and between conformers, and the elution sequence is correct for the majority of the compounds.

The results obtained for aldehydes and ketones are similar to those reported by Ren (Ren, 2003) in multiple linear regression models for 33 aldehydes and ketones using Xu and AI topological indices and by Héberger and co-workers using quantum-chemical descriptors (SW and µ) (Héberger et al., 2001) and physico-chemical properties (TBp, MW, log P) (Héberger et al., 2000) for 31 and 35 compounds, respectively. For esters the results obtained by single linear regression using the *ISET* are better than those reported by Lu et al. (Lu et al., 2006). For SE-30 and OV-7 stationary phases the results are also better than those found by Liu et al. (Liu et al., 2007) and for more polar stationary phases the statistical parameters differ only slightly. Both of these studies use multiple linear regression (MLR) between RI and the topological indices for 90 saturated esters on stationary phases with different polarities.

Several authors have developed QSRR models, based on MLRs, to predict the RI values for saturated alcohols. For example, Guo et al. (Guo et al., 2000), using the MLR analysis and artificial neural networks technique, obtained the statistical parameters r2=0.9982, SD=8.21, N=19 for an SE-30 stationary phase. In a previous study, the best statistical parameters of the MLR models obtained by Farkas and Héberger (Farkas & Héberger, 2005), employing four molecular descriptors, were r=0.9804, SD=14.22, r2 CV=0.9801 and N=44 for an OV-1 stationary phase. Therefore, our prediction results, on low polarity stationary phases, using the *ISET* as a single descriptor, showed statistical quality comparable to similar studies reported by the above authors. Furthermore, the statistical parameters of the present approach have a good agreement with those obtained for alkanes and alkenes, aldehydes

studies, are given in Table 4. For esters and alcohols good correlations between the retention index and the *ISET* were obtained also for stationary phases with different polarities (not included in Table 4). The good statistical results achieved (Table 4) employing *ISET* are better or equivalent to those obtained using multiple linear regression employing many molecular

03 Esters SE-30 100 115.0 -74.7 0.9981 0.9980 7.6 Souza et

04 Alcohols SE-30 31 126.0 -186.6 0.9990 0.9977 9.3 Souza et

Table 4. Summary of the best simple linear regressions (RICalc = b + a ISET) found for different

As can be seen from Table 4, the QSRR models for 179 representative linear and branched alkanes and alkenes, obtained with the *ISET* using the net atomic charge to calculate more precisely the Ci fragment values of *IET*, were of good quality for the statistical parameters obtained. This new descriptor *ISET* contains information on the 3D features of molecules, and discriminates between geometrical isomers, such as *cis*- and *trans*-alkenes, and between

The results obtained for aldehydes and ketones are similar to those reported by Ren (Ren, 2003) in multiple linear regression models for 33 aldehydes and ketones using Xu and AI topological indices and by Héberger and co-workers using quantum-chemical descriptors (SW and µ) (Héberger et al., 2001) and physico-chemical properties (TBp, MW, log P) (Héberger et al., 2000) for 31 and 35 compounds, respectively. For esters the results obtained by single linear regression using the *ISET* are better than those reported by Lu et al. (Lu et al., 2006). For SE-30 and OV-7 stationary phases the results are also better than those found by Liu et al. (Liu et al., 2007) and for more polar stationary phases the statistical parameters differ only slightly. Both of these studies use multiple linear regression (MLR) between RI and the topological indices for 90 saturated esters on stationary phases with different

Several authors have developed QSRR models, based on MLRs, to predict the RI values for saturated alcohols. For example, Guo et al. (Guo et al., 2000), using the MLR analysis and artificial neural networks technique, obtained the statistical parameters r2=0.9982, SD=8.21, N=19 for an SE-30 stationary phase. In a previous study, the best statistical parameters of the MLR models obtained by Farkas and Héberger (Farkas & Héberger, 2005), employing four molecular descriptors, were r=0.9804, SD=14.22, r2CV=0.9801 and N=44 for an OV-1 stationary phase. Therefore, our prediction results, on low polarity stationary phases, using the *ISET* as a single descriptor, showed statistical quality comparable to similar studies reported by the above authors. Furthermore, the statistical parameters of the present approach have a good agreement with those obtained for alkanes and alkenes, aldehydes

conformers, and the elution sequence is correct for the majority of the compounds.

SQ 179 120.8 -36.8 0.9980 0.9980 10.7 Souza et

HP-1 42 123.9 -13.2 0.9993 0.9993 11.7 Souza et

CV SD Ref.

al., 2008

al., 2009a

al., 2009b

al., 2010

Compounds Phase N a b r2 r2

classes of compounds on low-polarity stationary phases.

descriptors.

polarities.

01 Alkanes and Alkenes

02 Aldehydes and ketones and ketones, for saturated esters and for alcohols using the semi-empirical topological index, *IET*, previously developed. These results show clearly that *ISET* is a molecular descriptor that embodies in an appropriated manner the net atomic charges and charge distribution of molecules since the retention index embodies the intermolecular interactions between the stationary phase and the molecules.

The fact that properties that are determined by intermolecular forces can be adequately modeled by the *ISET* descriptor can be easily seen in its relationship with the boiling point (BP). For alcohols a good correlation was obtained through a simple linear model (BP = b + a ISET), as can be seen in Table 5. The QSPR model obtained for the experimental BP of 134 compounds showed high values for the coefficient of determination and cross validation coefficient showing the good predictive capacity of the model.


Table 5. The coefficients and statistical parameters for linear regression between experimental boiling point and *ISET*.

This model can explain 98.20% of the variances in the experimental values and most of these compounds (N=101) are not included in the initial model used to build the *ISET*, showing the external stability of the model. These results are similar to those obtained using *IET* for 146 aliphatic alcohols and can be compared with those obtained by Ren (Ren, 2002b), but using MLR models, for 138 compounds with five descriptors.

The octanol-water partition coefficient (log P) of compounds, which is a measure of hydrophobicity, is widely used in numerous Quantitative Structure-Activity Relationship (QSAR) models for predicting the pharmaceutical properties of molecules. The partition coefficient is a property that is determined by intermolecular forces and thus it is expected that it can be described by a molecular descriptor such as *ISET*. The results obtained in the statistical analysis of the single linear regression between experimental log P values and *ISET* are shown in Table 6 for each class of compounds.


Table 6. The coefficients and statistical parameters for linear regression between experimental log P values and *ISET*.

The results in Table 6 indicate that the theoretical partition coefficients calculated using the ISET method give good agreement with the experimental partition coefficients. The QSPR models obtained with *ISET* showed high values for the correlation coefficient (r > 0.99), and the leave-one-out cross-validation demonstrates that the final models are statistically significant and reliable (r2 cv > 0.98). As can be observed, this model explains more than 99% of the variance in the experimental values for this set of compounds. Among the various classes of compounds the best results obtained with the *ISET* method are for hydrocarbons (Table 6), which is related to the fact that the present model was developed initially for this class of organic compounds. As can be seen in Table 6, the lowest standard deviation was obtained for the correlation of aldehydes and for alcohols the correlation was stronger. The range of standard deviations obtained verifies the applicability of the present approach to different classes of organic compounds. For alcohols, the earlier approach of Duchowicz et al. (Duchowicz et al., 2004), based on the concept of flexible topological descriptors and on the optimization of correlation weights of local graphic invariants, is applied to model the octanol/water partition coefficient of a representative set of 62 alcohols, resulting in a satisfactory prediction with a standard deviation of 0.22. Recently, Liu et al. (Liu et al. 2009) carried out a QSPR study to predict the log P for 58 aliphatic alcohols using novel molecular indices based on graph theory, by dividing the molecular structure into substructures obtaining models with good stability and robustness, and values predicted using the multiple linear regression method are close to the experimental values (r = 0.9959 and SD = 0.15). The above results show the reliability of the present model calculation based on the semi-empirical calculation of atomic charges and local dipole moments using only one descriptor, *ISET*. This new approach to polar molecules, with the introduction of the remodeled *ISET* index including the contribution of the dipole moment of the molecule and an effective local dipole moment associated with the net charges of the atoms of the carbonyl group, opens new possibilities for studies on the chromatographic and other properties of different organic functions.

#### **4. Conclusions**

It is known that the chromatographic process of separation results from the forces that operate between solute molecules and the molecules of the stationary phase. These forces are called van-der-Waals forces since van der Waals recognized them as the reason for the non-ideal behavior of the real gases. Intermolecular forces are usually classified according to two distinct categories: i) the first category corresponds to the directional, induction and dispersion forces which are non-specific; and ii) the second group corresponds to hydrogen bonding forces and the forces of charge transfer or electron-pair donor-acceptor forces which are specific.

In the development of the semi-empirical topological index (*IET*) it was considered that the retention of alkanes is due to the number and interaction of each specific carbon atom with the stationary phase, considered as non-polar, which is determined by its electrical characteristic and by the steric hindrance by other carbon atoms attached to it. In this case only dispersion forces due to the continuous electronic movement, at any instant, result in a small dipole moment which can fluctuate and polarize the electron system of the neighboring atoms or molecules. For the alkenes, some carbon atoms with greater electronegativity give the molecules a dipole moment and for this reason besides the dispersion forces, electrostatics forces play an important role. However, in this method the behavior of this kind of carbon atom is determined from the experimental data and indicated in specific tables. As the values were obtained from the experimental data they encode the real physical interaction force. In the case of oxo-compounds, the presence of atoms with different carbon atom electronegativity introduces a dipole moment in the functional group and a change in the dipole moment of the whole molecule. These factors were considered in order to obtain the different values for the functional groups and they were able to encode the physical force involved in the chromatographic separation.

The new semi-empirical electrotopologiocal index (*ISET*) demonstrated that the values for the carbon atoms that are not tetrahedral and functional groups (considering the new local dipole created by the heteroatom) can be calculated from the net atomic charges that are obtained from quantum-chemical calculations. In the case of esters, the major effects are due to the presence of the two oxygen atoms and their adjacent carbon atoms. As in the case of aldehydes and ketones it was verified that the introduction of the dipole moment of the molecules is not sufficient to explain the chromatographic behavior. Thus, it was necessary to introduce an equivalent local dipole moment of the ester group that contributes to the increase in the retention value. In the case of esters two local functions must be considered according to the charges and the bond lengths of the C=O and C-O bonds. Thus, the semiempirical electrotopological index was developed based on the refinement of the previously developed semi-empirical topological index, unifying the quantum-quantum chemical with the topological method to provide a three-dimensional picture of the atoms in the molecule.

The *IET* and *ISET* were generated to predict the chromatographic retention indices and other physical-chemical properties and to obtain the quantitative structure-retention relationship (QSRR/QSPR). The efficiency and the applicability of these descriptors were demonstrated through the good statistical quality and high internal stability obtained for the different classes of compounds studied.

#### **5. Acknowledgement**

The authors acknowledge the support from CNPq and CRQ-XIII (Brazil) for this research.

#### **6. References**

42 Molecular Interactions

of the variance in the experimental values for this set of compounds. Among the various classes of compounds the best results obtained with the *ISET* method are for hydrocarbons (Table 6), which is related to the fact that the present model was developed initially for this class of organic compounds. As can be seen in Table 6, the lowest standard deviation was obtained for the correlation of aldehydes and for alcohols the correlation was stronger. The range of standard deviations obtained verifies the applicability of the present approach to different classes of organic compounds. For alcohols, the earlier approach of Duchowicz et al. (Duchowicz et al., 2004), based on the concept of flexible topological descriptors and on the optimization of correlation weights of local graphic invariants, is applied to model the octanol/water partition coefficient of a representative set of 62 alcohols, resulting in a satisfactory prediction with a standard deviation of 0.22. Recently, Liu et al. (Liu et al. 2009) carried out a QSPR study to predict the log P for 58 aliphatic alcohols using novel molecular indices based on graph theory, by dividing the molecular structure into substructures obtaining models with good stability and robustness, and values predicted using the multiple linear regression method are close to the experimental values (r = 0.9959 and SD = 0.15). The above results show the reliability of the present model calculation based on the semi-empirical calculation of atomic charges and local dipole moments using only one descriptor, *ISET*. This new approach to polar molecules, with the introduction of the remodeled *ISET* index including the contribution of the dipole moment of the molecule and an effective local dipole moment associated with the net charges of the atoms of the carbonyl group, opens new possibilities for studies on the chromatographic and other

It is known that the chromatographic process of separation results from the forces that operate between solute molecules and the molecules of the stationary phase. These forces are called van-der-Waals forces since van der Waals recognized them as the reason for the non-ideal behavior of the real gases. Intermolecular forces are usually classified according to two distinct categories: i) the first category corresponds to the directional, induction and dispersion forces which are non-specific; and ii) the second group corresponds to hydrogen bonding forces and the forces of charge transfer or electron-pair donor-acceptor forces

In the development of the semi-empirical topological index (*IET*) it was considered that the retention of alkanes is due to the number and interaction of each specific carbon atom with the stationary phase, considered as non-polar, which is determined by its electrical characteristic and by the steric hindrance by other carbon atoms attached to it. In this case only dispersion forces due to the continuous electronic movement, at any instant, result in a small dipole moment which can fluctuate and polarize the electron system of the neighboring atoms or molecules. For the alkenes, some carbon atoms with greater electronegativity give the molecules a dipole moment and for this reason besides the dispersion forces, electrostatics forces play an important role. However, in this method the behavior of this kind of carbon atom is determined from the experimental data and indicated in specific tables. As the values were obtained from the experimental data they encode the real physical interaction force. In the case of oxo-compounds, the presence of atoms with different carbon atom electronegativity introduces a dipole moment in the

properties of different organic functions.

**4. Conclusions** 

which are specific.


Christian, R. (1990). Solvents and Solvent effects in Organic Chemistry, 2nd ed. VCH ,

Duchowicz, P. R.; Castro, E. A.; Toropov, A. A.; Nesterova, A. I. & Nabiev, O. M. (2004).

Estrada, E. & Gutierrez, Y. (1999). Modeling chromatographic parameters by novel graph

Estrada, E. (2001). Generalization of topological indices. *Chemical Physics Letters*, Vol.336, pp.

Estrada, E. (2001). Recent Advances on the Role of Topological Indices in Drug Discovery

Estrada, E. (2002). The Structural Interpretation of the Randić Index. *Internet Electronic Journal of Molecular Design,* Vol.*1*, pp. 360–366, http://www.biochempress.com. Farkas, O. & Héberger, K. (2005). Comparison of ridge regression, partial least-squares,

Fritz, D. F.; Sahil, A. & Kováts, E. (1979). Study of the adsorption effects on the surface of

Galvez, J.; Garcia, R.; Salabert, M. T. & Soler, R. (1994). *Journal of Chemistry Information* 

García–Domenech, R. ; Catalá–Gregori, A.; Calabuig, C.; Antón–Fos, G.; del Castillo, L. &

Guo, W.; Lu, Y. & Zheng, X. M. (2000). The predicting study for chromatographic retention index of saturated alcohols by MLR and ANN. *Talanta,* Vol.51, pp. 479–488 Hall, L. H.; Mohney, B. & Kier, L. B. (1991). *Journal of Chemistry Information Computational and* 

Hansen, P. J. & Jurs, P. C. (1988). Chemical applications of graph theory. Part I. Fundamental and topological indices. *Journal of Chemical Education,* Vol.65, pp. 574-580. Heinzen, V. E. F. & Yunes, R. A. (1993). *Jornal of Chromatography A*, Vol.654, pp. 83–89. Heinzen, V. E. F. & Yunes, R. A. (1996). *Jornal of Chromatography A*, Vol.719, pp. 462–467. Heinzen, V. E. F.; Cechinel Filho V. & Yunes, R. A. (1999a). *IL Farmaco*, Vol.54, pp. 125–129. Heinzen, V. E. F.; Soares, M. F. & Yunes, R.A. (1999b). Semi-empirical topological method

Héberger, K.; Görgényi, M. & Sjörström, M. (2000). Partial least squares modeling of

and alkanes. *Journal of Chromatography* A, Vol.849, pp. 495–506.

Research. *Current Medicinal Chemistry*, Vol.8, pp. 1573–1588.

*Computational and Science*, Vol.34, pp. 520–525.

339–350, http://www.biochempress.com.

*Science*, Vol.31, pp. 76–82.

pp. 595–600.

QSPR Modeling of the octanol/water partition coefficient of alcohols by means of optimization of correlation weights of local graph invariants. *Journal of the Argentine* 

theoretical sub–structural approach, *Journal of Chromatography A*, Vol.858, pp. 187–

pairwise correlation, forward-and best subset selection methods for prediction of retention indices for aliphatic alcohols. *Journal of Chemical Information and Modeling,*

poly–(ethylene glycol)–coated column packings. *Journal of Chromatography*, Vol.*186*,

Gálvez, J. (2002). Predicting Antifungal Activity: A Computational Screening Using Topological Descriptors. *Internet Electronic Journal of Molecular Design,* Vol.1, pp.

for prediction of the chromatographic retention of cis- and trans- alkene isomers

retention data of oxo compounds in gas chromatography. *Chromatographia,* Vol.51,

Germany.

199.

248–252.

Vol.45, pp. 339–346.

pp. 63–80.

*Chemical Society,* Vol.92, pp. 29-42.


Katritzky, A. R.; Chen, K.; Maran, U. & Carlson, D. A. (2000). QSRR correlation and

Kier L. B. & Hall, L H. (1976). Molecular connectivity chemistry and drug research,

Kier, L. B. & Hall, L. H. (1986). Molecular connectivity in structure activity studies. Research

Kier, L. B. & Hall, L. H. (1990). An Electrotopological State Index for Atoms in Molecules.

Körtvélyesi, T., Görgényi, M. & Hérberger, K. (2001). Correlation between retention indices

Kováts, E. (1968). Zu fragen der polarität. Die method der linearkombination der

Liu, S.-S.; Liu, H.-L. ; Shi, Y.-Y. & Wang, L.-S. (2002). QSAR of Cyclooxygenase–2 (COX–2)

Liu, F.; Cao, C. & Cheng, B. (2011). A quantitative structure-property relationship (QSPR)

substructure. *International Journal of Molecular Sciences,* Vol.12, pp. 2448-2462. Lu, C.; Guo, W. & Yin, C. (2006). Quantitative structure-retention relationship study of gas

using novel topological indices, *Analytica Chimica Acta,* Vol.561, pp. 96–102. Marino, D. J. G.; Peruzzo, P. J.; Castro, E. A. & Toropov, A. A. (2002). QSAR Carcinogenic

Markuszenwski, M. &. Kaliszan, R. (2002). Quantitative structure–retention relationships in

Peng, C. T.; Ding, S. F.; Hua, R. L. & Yang, Z. C. (1988). Prediction of retention indexes I.

Peng, C. T. (2000). Prediction of retention indices V. Influence of electronic effects and

Pompe, M. & Novic, M. (1999). Prediction of gas–chromatographic retention indices using

of different polarity. *Analytica Chimica Acta*, Vol.*428*, pp. 73–82.

wechselwirkungskräfte (LKWW). *Chimie*, Vol.22, pp. 459-462.

and quantum–chemical descriptor of ketones and aldehydes on stationary phases

Inhibition by 2,3-Diarylcyclopentenones Based on MEDV–13. *Internet Electronic Journal of Molecular Design*, Vol.*1*, pp. 310–318, http://www.biochempress.com. Liu, F.; Liang, Y.; Cao, C. & Zhou, N. (2007). QSPR study of GC retention indices for

saturated esters on seven stationary phases based on novel topological indices,

study of aliphatic alcohols by the method of dividing the molecular structure into

chromatographic retention indices of saturated esters on different stationary phases

Study of Methylated Polycyclic Aromatic Hydrocarbons Based on Topological Descriptors Derived from Distance Matrices and Correlation Weights of Local Graph Invariants. *Internet Electronic Journal of Molecular Design, Vol.1*, pp. 115–133,

affinity high–performance liquid chromatography, *Journal of Chromatography B*,

Structure–retention index relationship on apolar columns. *Journal of*

column polarity on retention index. *Journal of Chromatography A*, Vol.*903*, pp. 117–

topological descriptors, *Journal of Chemistry Information Computational and Science*,

by insects, *Analytical Chemistry,* Vol.72, pp. 101–109.

Academic Press, New York, USA.

*Pharmaceutical Research*, Vol.7, pp. 801–807.

Studies Press, Letchworth, UK.

*Talanta,* Vol.72, pp. 1307–1315.

http://www.biochempress.com.

*Chromatography*, Vol.*436*, pp. 137–172.

Vol.768, pp. 55–66.

Vol.*39*, pp. 59–67.

143.

predictions of GC retention indexes for methyl–branched hydrocarbons produced

