*3.1.4 External validation*

retention property. The values of the 28 descriptors are shown in Table S1 (in

The stepwise multiple linear regression (MLR) procedure based on the forward selection and backward elimination method (including the critical probability: *P*value <0.05 for all descriptors and for the model complete) was employed to

The 2D-QSRR model built using stepwise MLR is represented by the following

In this equation, V is the Connolly solvent-excluded volume, N is the number of compounds, r is the correlation coefficient, r<sup>2</sup> is the coefficient of determination, RMSE is the root mean square of the errors, F is the Fisher's criterion, and *P* is the

The predicted Log(LRI) values calculated from equation are given in **Table 4** in

comparison to the observed values. The correlation between the predicted and

The 2D-QSRR model expressed by the equation of stepwise MLR method is

*Correlations of observed and predicted Log(LRI) with MLR stepwise (training set in blue; test set in red).*

N = 24; r = 0.990; r<sup>2</sup> = 0.980; RMSE = 0.008; F = 1085.981; *P* < 0.0001.

It is observed that the coefficient of correlation r is high, and RMSE is low, which makes it possible to indicate that the model is reliable. A *P* value much smaller than 0.05 indicates that the regression equation is statistically significant; thus, we can conclude, with confidence, that the model provides a significant

*Log LRI* ð Þ¼ <sup>2</sup>*:*<sup>935</sup> <sup>þ</sup> <sup>1</sup>*:*<sup>682</sup> � <sup>10</sup>�<sup>3</sup> � <sup>V</sup> (1)

CV obtained using the leave-one-out (LOO)

CV greater than 0.5 is the basic condition for qualifying a

Supplementary Material).

equation:

significance level.

amount of information [29, 30].

observed Log(LRI) is shown in **Figure 2**.

*3.1.3 Internal validation (cross-validation)*

validated by its appreciable value of r2

procedure. The value of r2

**Figure 2.**

**160**

*3.1.2 Stepwise multiple linear regression MLR*

determine the best regression model.

*Sino-Nasal and Olfactory System Disorders*

The model created in the calculation process using the alkylated phenols is used to predict the retention property values (Log(LRI)) of the remaining (five molecules). The results obtained by stepwise MLR model are very sufficient to conclude the performance of models; it is confirmed by the test done with the five compounds (rtest = 0.938; r2 test = 0.880).

#### *3.1.5 Domain of applicability*

Evaluation of the applicability domain of the 2D-QSRR model is considered as an important step to establish that the model is reliable to make predictions within the chemical space for which it was developed [31]. In this chapter, we used leverage approach [20]. Leverage of a given chemical compound hi is defined as follows:

$$\mathbf{h}\_{\mathbf{i}} = \mathbf{x}\_{\mathbf{i}}^{\mathrm{T}} (\mathbf{X}^{\mathrm{T}} \mathbf{X})^{-1} \mathbf{x}\_{\mathbf{i}} (\mathbf{i} = \mathbf{1} \dots \mathbf{n}) \tag{2}$$

where xi is the descriptor row of the query compound and X is the descriptor matrix of the training set compounds used to develop the model. As a prediction tool, the warning leverage h\* is defined as follows:

$$\mathbf{h}^\* = \mathbf{3}(\mathbf{P} + \mathbf{1})/\mathbf{n} \tag{3}$$

**Figure 3.** *Williams plot to evaluate the applicability domain of stepwise MLR model.*


**Table 3.**

where n is the number of training compounds and *P* is the number of descriptors in the model.

leverage values greater than the warning h\* value and could be high leverage compound influencing the performance of the model. However, their standard residual values are very low and within the established limit [32]. As a result, this compound could be considered as influential in fitting the model performance but not necessarily outliers to be deleted from the training dataset, and thus, the model

*2D- and 3D-QSRR Studies of Linear Retention Indices for Volatile Alkylated Phenols*

can be applied with confidence within the defined applicability domain.

developed stepwise MLR model is reliable.

*DOI: http://dx.doi.org/10.5772/intechopen.89576*

**3.2 3D-QSRR study**

shown in **Figure 4**.

*3.2.2 CoMFA result*

where r<sup>2</sup>

**Figure 4.**

**163**

*3.2.1 Molecular alignment*

For all the compounds in the training and test sets, their standardized residuals are smaller than three standard deviation units (3 δ) except one test compound (No 29). Thus, compound no. 29 can be as outlier. Because this compound is one of the test set compounds, there is no need to remove this compound from the data set. Therefore, the predicted of linear retention indices values (Log(LRI)) by the

All other compounds were aligned on the basis of the common structure (compound no. 1). Alignment of training and test set compounds using distill module is

The 3D-QSRR models were obtained from the CoMFA analysis, and its statistical parameters are listed in **Table 3**. The values of predicted Log(LRI) are calculated by CoMFA model, and the observed values are given in **Table 4**. The correlations of

We use cross-validation as an internal test of the quality of the PLS models. And to evaluate the predictive power of a QSRR model (external test), the Log(LRI) of the remained set of five molecules (test set) are deduced from the constructed model with the 24 compounds (training set) by CoMFA model (**Table 3**).

square of the non-CV coefficient; SE is the standard error of prediction; F is the F-

The 3D-QSRR models gave good statistical results in terms of r2 value (r2 = 0.998) for the CoMFA model. This approach has good predictive capability

*3D-QSRR structure superposition and alignment of training set using molecule no. 18 as a template.*

CV is the square of the LOO cross-validation (CV) coefficient; r2 is the

test is the external

predicted and observed Log(LRI) values are illustrated in **Figure 5**.

test value; N is the optimum number of components; and r<sup>2</sup>

validation correlation coefficient for test set compounds.

From the Williams plot (**Figure 3**), it is obvious that all the compounds in the data set are within the applicability domain of the model (the warning leverage limit is 0.250) except one training compound (no. 18); these compounds have their


*a Test set.*

#### **Table 4.**

*Actual and predicted Log(LRI) along with residual of training and test sets using stepwise MLR and CoMFA models.*

*2D- and 3D-QSRR Studies of Linear Retention Indices for Volatile Alkylated Phenols DOI: http://dx.doi.org/10.5772/intechopen.89576*

leverage values greater than the warning h\* value and could be high leverage compound influencing the performance of the model. However, their standard residual values are very low and within the established limit [32]. As a result, this compound could be considered as influential in fitting the model performance but not necessarily outliers to be deleted from the training dataset, and thus, the model can be applied with confidence within the defined applicability domain.

For all the compounds in the training and test sets, their standardized residuals are smaller than three standard deviation units (3 δ) except one test compound (No 29). Thus, compound no. 29 can be as outlier. Because this compound is one of the test set compounds, there is no need to remove this compound from the data set.

Therefore, the predicted of linear retention indices values (Log(LRI)) by the developed stepwise MLR model is reliable.

#### **3.2 3D-QSRR study**

where n is the number of training compounds and *P* is the number of descriptors

**2D-QSRR 3D-QSRR MLR Residual CoMFA Residual**

From the Williams plot (**Figure 3**), it is obvious that all the compounds in the data set are within the applicability domain of the model (the warning leverage limit is 0.250) except one training compound (no. 18); these compounds have their

1a 3.067 3.076 0.009 3.062 0.005 3.095 3.104 0.010 3.094 0.001 3.104 3.105 0.001 3.103 0.001 3.103 3.105 0.001 3.100 0.003 3.124 3.128 0.004 3.124 0.000 3.137 3.130 0.007 3.141 0.004 3.136 3.130 0.006 3.137 0.001 3.151 3.161 0.010 3.149 0.002 <sup>a</sup> 3.165 3.162 0.003 3.150 0.015 3.165 3.162 0.003 3.163 0.002 <sup>a</sup> 3.150 3.152 0.001 3.150 0.000 3.152 3.155 0.003 3.147 0.005 3.152 3.155 0.003 3.150 0.002 3.196 3.191 0.005 3.196 0.000 3.225 3.220 0.005 3.226 0.001 3.251 3.249 0.003 3.250 0.001 3.276 3.277 0.002 3.280 0.004 3.300 3.306 0.007 3.298 0.002 3.142 3.133 0.009 3.137 0.005 3.128 3.133 0.005 3.132 0.004 3.128 3.133 0.006 3.127 0.001 3.114 3.132 0.018 3.128 0.014 3.148 3.133 0.015 3.144 0.004 3.136 3.134 0.003 3.136 0.000 3.171 3.162 0.010 3.168 0.003 3.158 3.161 0.002 3.170 0.012 3.168 3.162 0.006 3.167 0.001 <sup>a</sup> 3.146 3.161 0.015 3.164 0.018 <sup>a</sup> 3.191 3.161 0.029 3.175 0.016

*Actual and predicted Log(LRI) along with residual of training and test sets using stepwise MLR and CoMFA*

**No Log(LRI) (obs.) Log(LRI) (calc.)**

in the model.

*Sino-Nasal and Olfactory System Disorders*

*LRI: linear retention indices.*

*a Test set.*

**Table 4.**

*models.*

**162**

### *3.2.1 Molecular alignment*

All other compounds were aligned on the basis of the common structure (compound no. 1). Alignment of training and test set compounds using distill module is shown in **Figure 4**.

#### *3.2.2 CoMFA result*

The 3D-QSRR models were obtained from the CoMFA analysis, and its statistical parameters are listed in **Table 3**. The values of predicted Log(LRI) are calculated by CoMFA model, and the observed values are given in **Table 4**. The correlations of predicted and observed Log(LRI) values are illustrated in **Figure 5**.

We use cross-validation as an internal test of the quality of the PLS models. And to evaluate the predictive power of a QSRR model (external test), the Log(LRI) of the remained set of five molecules (test set) are deduced from the constructed model with the 24 compounds (training set) by CoMFA model (**Table 3**).

where r<sup>2</sup> CV is the square of the LOO cross-validation (CV) coefficient; r2 is the square of the non-CV coefficient; SE is the standard error of prediction; F is the Ftest value; N is the optimum number of components; and r<sup>2</sup> test is the external validation correlation coefficient for test set compounds.

The 3D-QSRR models gave good statistical results in terms of r2 value (r2 = 0.998) for the CoMFA model. This approach has good predictive capability

**Figure 4.** *3D-QSRR structure superposition and alignment of training set using molecule no. 18 as a template.*

The obtained results show that, to increase propriety of alkylated phenols, we will increase Connolly solvent-excluded volume (V) value of these molecules. Moreover, to decrease property, we will decrease the descriptor (V) value, by adding suitable substituents, and calculate their property using the regression equation. This study consists of the first step explored to code a particular odor of this group of molecules, followed by docking molecular study that allows understand the mechanism of activation of olfactory receptor present in the nasal cavity

*2D- and 3D-QSRR Studies of Linear Retention Indices for Volatile Alkylated Phenols*

We can also use the results of 3D-QSRR to design new alkylated phenols with higher or lower retention property values (Log(LRI)). The CoMFA contour plots were able to identify that molecular fragments, functional groups, and physicochemical properties strongly correlated with the linear retention indices of this

The steric interaction is represented by green and yellow contours, while electrostatic interaction is denoted by red and blue contours. The green region around the 2, 3, 4, 5, and 6 positions (the carbon to which the initial ▬OH is bonded is counted as the first position) (**Figure 6a**) indicates that bulky groups are favored and they might increase the property. That can explain very well why the property of the alkylated phenols with a group bigger than Et group (case of compounds 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, and 18) is higher than those of other compounds. We can also explain, that for the alkylated phenols with a group smaller than Pr group, on the one hand, the property of dimethylated phenols is higher than those of monoalkylated phenols and, on the other hand, the property of the trimethylated phenols is higher than those of dimethylated phenols and monoalkylated phenols. The bigger green region is observed around the four positions in comparison with the other positions, suggesting that groups with steric tolerance are required at this position to reach the green area, which means to increase the property, this fact can

*(a) Std\* coeff. contour maps of CoMFA analysis with 2 Å grid spacing in combination with compound 18. Steric fields: green contours (80% contribution) indicate regions where bulky groups increase property, while yellow contours (20% contribution) indicate regions where bulky groups decrease property. (b) Std\* coeff. contour maps of CoMFA analysis with 2 Å grid spacing in combination with compound 18. Electrostatic fields: blue contours (80% contribution) indicate regions where groups with positive charges increase property, while red contours (20% contribution) indicate regions where groups with negative charges increase property.*

series. CoMFA steric and electrostatic contours are shown in **Figure 6**.

by this type of molecules.

*DOI: http://dx.doi.org/10.5772/intechopen.89576*

**Figure 6.**

**165**

#### **Figure 5.**

*Correlations of observed and predicted Log(LRI) derived from CoMFA model (training set in blue; test set in red).*


#### **Table 5.**

*Statistical parameters for the stepwise MLR and PLS models using multidimensional-QSRR analyses.*

gives good results (r2 CV = 0.956). The model was able to establish a satisfactory relationship between the molecular descriptors and the linear retention indices of the studied compounds. The results obtained by CoMFA analysis are sufficient to conclude the performance of the model; it is confirmed by the test done with the five compounds (**Table 3**).

### **3.3 Comparison between 2D- and 3D-QSRR results and design of novel alkylated phenols**

Aiming to provide a comparison among the stepwise MLR and CoMFA models, **Table 5** lists the main statistical indicators for 2D- and 3D-QSRR models.

A comparison of the quality of stepwise MLR and CoMFA model (**Table 5**) shows that the two approaches stepwise MLR and CoMFA have better predictive capability gives better results. Stepwise MLR and CoMFA models were able to found a suitable relationship between the chemical descriptors and the linear retention indices of the studied molecules.

Multidimensional-QSRR correlates retention property with the physicochemical and structural descriptors of a series of molecules. It has been habitually used to predict retention of new molecules and to propose molecules with preferred properties. The constructed models can be used for the designing of new alkylated phenols with higher or lower property values (Log(LRI)).

In this way, we can design new compounds by adding suitable substituents and calculate their property using stepwise MLR equation. The stepwise MLR equation indicated the positive correlation of the Connolly solvent-excluded volume (V).

### *2D- and 3D-QSRR Studies of Linear Retention Indices for Volatile Alkylated Phenols DOI: http://dx.doi.org/10.5772/intechopen.89576*

The obtained results show that, to increase propriety of alkylated phenols, we will increase Connolly solvent-excluded volume (V) value of these molecules. Moreover, to decrease property, we will decrease the descriptor (V) value, by adding suitable substituents, and calculate their property using the regression equation. This study consists of the first step explored to code a particular odor of this group of molecules, followed by docking molecular study that allows understand the mechanism of activation of olfactory receptor present in the nasal cavity by this type of molecules.

We can also use the results of 3D-QSRR to design new alkylated phenols with higher or lower retention property values (Log(LRI)). The CoMFA contour plots were able to identify that molecular fragments, functional groups, and physicochemical properties strongly correlated with the linear retention indices of this series. CoMFA steric and electrostatic contours are shown in **Figure 6**.

The steric interaction is represented by green and yellow contours, while electrostatic interaction is denoted by red and blue contours. The green region around the 2, 3, 4, 5, and 6 positions (the carbon to which the initial ▬OH is bonded is counted as the first position) (**Figure 6a**) indicates that bulky groups are favored and they might increase the property. That can explain very well why the property of the alkylated phenols with a group bigger than Et group (case of compounds 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, and 18) is higher than those of other compounds. We can also explain, that for the alkylated phenols with a group smaller than Pr group, on the one hand, the property of dimethylated phenols is higher than those of monoalkylated phenols and, on the other hand, the property of the trimethylated phenols is higher than those of dimethylated phenols and monoalkylated phenols. The bigger green region is observed around the four positions in comparison with the other positions, suggesting that groups with steric tolerance are required at this position to reach the green area, which means to increase the property, this fact can

#### **Figure 6.**

gives good results (r2

r = 0.990; r<sup>2</sup> = 0.980 rCV = 0.988; r2

rtest = 0.938; r2

CV = 0.977

*Sino-Nasal and Olfactory System Disorders*

test = 0.880

**Figure 5.**

MLR Stepwise

**Table 5.**

**164**

*in red).*

five compounds (**Table 3**).

**alkylated phenols**

tion indices of the studied molecules.

phenols with higher or lower property values (Log(LRI)).

CV = 0.956). The model was able to establish a satisfactory

PLS CoMFA

r = 0.999; r2 = 0.998 rCV = 0.979; r2

rtest = 0.955; r2

CV = 0.959

test = 0.913

relationship between the molecular descriptors and the linear retention indices of the studied compounds. The results obtained by CoMFA analysis are sufficient to conclude the performance of the model; it is confirmed by the test done with the

*Statistical parameters for the stepwise MLR and PLS models using multidimensional-QSRR analyses.*

*Correlations of observed and predicted Log(LRI) derived from CoMFA model (training set in blue; test set*

**2D-QSRR 3D-QSRR**

Aiming to provide a comparison among the stepwise MLR and CoMFA models,

Multidimensional-QSRR correlates retention property with the physicochemical and structural descriptors of a series of molecules. It has been habitually used to predict retention of new molecules and to propose molecules with preferred properties. The constructed models can be used for the designing of new alkylated

In this way, we can design new compounds by adding suitable substituents and calculate their property using stepwise MLR equation. The stepwise MLR equation indicated the positive correlation of the Connolly solvent-excluded volume (V).

A comparison of the quality of stepwise MLR and CoMFA model (**Table 5**) shows that the two approaches stepwise MLR and CoMFA have better predictive capability gives better results. Stepwise MLR and CoMFA models were able to found a suitable relationship between the chemical descriptors and the linear reten-

**3.3 Comparison between 2D- and 3D-QSRR results and design of novel**

**Table 5** lists the main statistical indicators for 2D- and 3D-QSRR models.

*(a) Std\* coeff. contour maps of CoMFA analysis with 2 Å grid spacing in combination with compound 18. Steric fields: green contours (80% contribution) indicate regions where bulky groups increase property, while yellow contours (20% contribution) indicate regions where bulky groups decrease property. (b) Std\* coeff. contour maps of CoMFA analysis with 2 Å grid spacing in combination with compound 18. Electrostatic fields: blue contours (80% contribution) indicate regions where groups with positive charges increase property, while red contours (20% contribution) indicate regions where groups with negative charges increase property.*

be used to further explain why compounds 14, 15, 16, 17, and 18 have highest property than those of all other compounds.

The CoMFA electrostatic contour plot is displayed in **Figure 6b**. A blue contour indicates that substituents should be electron deficient, and red color indicates that substituents should be electron rich. The blue contour near the 2, 3, 4, and 5 positions (**Figure 6b**) indicates that electron-donating substituents (such Alkyl group) are beneficial for propriety in this area. The electrostatic contour map displays a region of red contours neighbor to the 1 and 6 positions indicating that groups with negative charges may increase the property.

All these findings may be used to design improved compounds with higher or lower retention property, as observed in the CoMFA maps, by adding suitable substituents.
