*2.1.1. Hierarchical Cluster Analysis (HCA)*

The Hierarchical Cluster Analysis is a technique to evaluate the distance between de samples and group in a plot calling dendogram. Theses distance can be calculated utilizing different methods as Euclidean or Mahalanobis or Manhattan distance, for example. For the Euclidean distance is using the equation 1, for Mahalanobis distance is using the equation 2 and for Manhattan distance is using equation 3:

$$\text{Distance} = \sqrt{(\mathbf{X}\_1 - \mathbf{Y}\_1)^2 + (\mathbf{X}\_2 + \mathbf{Y}\_2)^2 + \dots + (\mathbf{X}\_n + \mathbf{Y}\_n)^2} \tag{1}$$

Where:

Xn and Yn are the coordinates of sample X and Y in the nth dimension of row space.

$$\text{Distance} = \sqrt{(\mathbf{X}\_{\text{l}} - \mathbf{Y}\_{\text{l}})^{T}\mathbf{C}^{-1}(\mathbf{X}\_{\text{l}} + \mathbf{Y}\_{\text{l}})} \tag{2}$$

Where:

Xi and Yj are column vectors for objects *i* and *j*, respective and *C* is the covariance matrix.

Chemometrics: Theory and Application 123

$$\text{Distance} = \sum\_{\mathbf{l}=\mathbf{1}}^{\mathbf{p}} |\mathbf{X}\_{\mathbf{l}} - \mathbf{Y}\_{\mathbf{l}}| \tag{3}$$

Where:

122 Multivariate Analysis in Management, Engineering and the Sciences

Thus, the chemometrics show to be wide may be used in several area of knowledge.

In analytical chemistry when we have the data set, it is important find similarities and differences between samples based on measurements. For this is necessary to use methods according with information about the samples. And can be: Unsupervised (HCA and PCA)

In this group there are two methods: Hierarchical Cluster Analysis (HCA) and Principal Components Analysis (PCA), and the goal is to evaluate if there is any clustering in data set

The Hierarchical Cluster Analysis is a technique to evaluate the distance between de samples and group in a plot calling dendogram. Theses distance can be calculated utilizing different methods as Euclidean or Mahalanobis or Manhattan distance, for example. For the Euclidean distance is using the equation 1, for Mahalanobis distance is using the equation 2

Xn and Yn are the coordinates of sample X and Y in the nth dimension of row space.

Xi and Yj are column vectors for objects *i* and *j*, respective and *C* is the covariance matrix.

Distance = �(X� − Y�)� + (X� + Y�)� + ⋯ + (X� + Y�)� (1)

Distance = �(X� − Y�)����(X� + Y�) (2)

**Figure 2.** Clustering by PCA

**2. Pattern recognition** 

and Supervised methods (KNN)

**2.1. Unsupervised methods** 

Where:

Where:

without using the class about samples.

*2.1.1. Hierarchical Cluster Analysis (HCA)* 

and for Manhattan distance is using equation 3:

Xi and Yi are vectors.

When performed the estimate for distance, so is possible plot the dendogram. A general dendogram is showing below (Figure 3). In this dendogram is possible to see the samples (letters) and the distances (numbers). Samples belonging to clusters A, has a distance of 0,2 from one another. Same time the sample B has a distance 0,5 from cluster A. The value of distance can change according with the distance used to calculate.

**Figure 3.** The general dendogram where above are the distances and right side are the samples

#### *2.1.2. Principal Components Analysis (PCA)*

The Principal Components Analysis (PCA) has the goal available the distances between the points using few axes in the row plot. In a matrix, each row is the point in the graphic below (Figure 2). So the aim is study the relationship between these samples to find the similarity and differences. In this general example are using two principal components (PC1 and PC2). The first PC (PC1) describes the major points in the graph and the maximum amount of variance, while the PC2 explain the remaining points. It is important to know that the sum of percentage described by PC´s must be close 100%. Another propriety of PC´s is about de position. The PC´s are always perpendiculars one with another.

The PCA technical can be used to define which variables are more important in a process. For this analysis is necessary use the factors (column in the matrix) and objects (row in the matrix). When the aim is to determine which variable are more important for the process is used *loading* and when want studying the relationship between objects is used *scores* 

#### **2.2. Supervised methods**

The Supervised methods are using when want to construct a model using the class membership for future samples. In this group, KNN is a technical widely used when the goal is this.

#### *2.2.1. K- Nearrest Neighbor (K-NN)*

The KNN technical allows use the samples or clusters to identify another samples or clusters. For this is necessary to calculate the distances between them, using a Euclidean or Mahalanobis or Manhattan distance, for example. The minimum distance is calculated and the object is assigned to the corresponding class. A classification is dependent on the number of objects in each class.

#### **3. Chemometrics in medicinal chemistry**

#### **3.1. The QSAR principle: Hansch analysis**

The development of new drugs is a continuous challenge, before uncountable diseases the lack an adequate pharmaceutical approach. The modern medicinal chemists concern specially with methods based upon rational and quantitative procedures, aiming to focus on potentially efficient candidates. In that context, the use of chemometric methods is very important, in quantitative structure-activity relationship (QSAR) studies, and it presupposes that the biological activity (BA), measured through a biological response (BR), keeps a relationship with chemical structure (CS):

$$BR = f\{\text{CS}\}\tag{4}$$

Chemometrics: Theory and Application 125

lipophylic effect (π) of the R groups, the makes distinction among the several series

 <sup>1</sup> log log *MIC = = a <sup>σ</sup> + b <sup>π</sup>+ c MIC*

The Hansch´s hypothesis that RB may be related to specific physico-chemical to each substituent present in the basic skeleton in a congener series of similar BA led to the proposition of numerous descriptors, of different kinds, useful to the identification of the

There are several physico-chemical descriptors, useful in QSAR studies that can be divided in categories: constitutional, topological, stereochemical and electronic ones, beside the so

This kind of descriptor is related to the presence of structural characteristics that can affect the BA, such as: amount of unsaturated bonds, amount of hydrogen-bond donors, average

These are descriptors that represent shape and connectivity, such as: ramifications, spacing

Steric descriptors exist to describe effects related to the size of chemical groups and

These variables are related to molecular electronic densities, and are used to be calculated by quantum methods. One can mention as examples: dipole moments, atomic partial charges, highest occupied molecular orbital energy (HOMO) and lowest unoccupied

Indicator variables represent a useful way to convert a qualitative information into quantitative once, just as the occurrence of some kind of structural feature – setting 1 when

groups, unsaturations, etc. The Kier [5] and Wiener [6] descriptors are typical.

hindrance behavior. Taft steric descriptor, *Es*, [7] is a common example.

(7)

representatives, can be expressed as

where *a*, *b* and *c* are the multiple regression coefficients.

principal effects that show up in drug action.

**3.2. Physico-chemical descriptors** 

called indicator variables.

ring size, etc.

*3.2.1. Constitutional descriptors* 

*3.2.2. Topological descriptors* 

*3.2.4. Eletronic descriptors* 

molecular orbital energy (LUMO).

*3.2.5. Indicator variable and Taylor analysis* 

*3.2.3. Steric (or stereochemical) descriptors* 

The first attempt to quantitatively relate chemical structure to chemical behavior in a series of structuraly kindred compounds remounts to 1940´s, with Hammett [3] who, studying the meta- and para-substituted benzoic acids at 25°C, stablished linear relationships between the R = X substituted benzoic acid ionization constant (KX) and the ionization constant of the non-substituted benzoic acid (R = H):

$$\begin{aligned} \left(m - /p - R\right) \text{C}\_6\text{H}\_4\text{COOH} &\rightarrow \left(m - /p - R\right) \text{C}\_6\text{H}\_4\text{COO}^- + H^+ \\\\ \sigma &= \log\left(\frac{K\_X}{K\_H}\right) - \log\left(K\_X\right) - \log\left(K\_H\right) \end{aligned} \tag{5}$$

The σ constant is group-specific, and represents the electronic effect (inductive and resonance type) pursuit by R group. In 1964, Corwin Hansch [4] combined the use of the electronic constants to the lipophylic parameter (π), which represents the contribution of each R group to the overall lipophylicity:

$$
\pi = \log\left(\frac{P\_X}{P\_H}\right) = \log\left(P\_X\right) - \log\left(P\_H\right) \tag{6}
$$

where *PX* is the X-substituted compound octanol-water partition coefficient, and *PH*, the partition coefficient for a non-substituted compound. Thus, a QSAR equation evolves some kind of RB, for example, the negative logarithm of the minimal inhibitory concentration (MIC) for am antimicrobial compounds series (-log(MIC)), and the electronic (σ) and lipophylic effect (π) of the R groups, the makes distinction among the several series representatives, can be expressed as

$$-\log\left(MIC\right) = \log\left(\frac{1}{MIC}\right) = a \cdot \sigma + b \cdot \pi + c \tag{7}$$

where *a*, *b* and *c* are the multiple regression coefficients.

The Hansch´s hypothesis that RB may be related to specific physico-chemical to each substituent present in the basic skeleton in a congener series of similar BA led to the proposition of numerous descriptors, of different kinds, useful to the identification of the principal effects that show up in drug action.

#### **3.2. Physico-chemical descriptors**

124 Multivariate Analysis in Management, Engineering and the Sciences

**3. Chemometrics in medicinal chemistry** 

**3.1. The QSAR principle: Hansch analysis** 

relationship with chemical structure (CS):

the non-substituted benzoic acid (R = H):

each R group to the overall lipophylicity:

The KNN technical allows use the samples or clusters to identify another samples or clusters. For this is necessary to calculate the distances between them, using a Euclidean or Mahalanobis or Manhattan distance, for example. The minimum distance is calculated and the object is assigned to the corresponding class. A classification is dependent on the

The development of new drugs is a continuous challenge, before uncountable diseases the lack an adequate pharmaceutical approach. The modern medicinal chemists concern specially with methods based upon rational and quantitative procedures, aiming to focus on potentially efficient candidates. In that context, the use of chemometric methods is very important, in quantitative structure-activity relationship (QSAR) studies, and it presupposes that the biological activity (BA), measured through a biological response (BR), keeps a

The first attempt to quantitatively relate chemical structure to chemical behavior in a series of structuraly kindred compounds remounts to 1940´s, with Hammett [3] who, studying the meta- and para-substituted benzoic acids at 25°C, stablished linear relationships between the R = X substituted benzoic acid ionization constant (KX) and the ionization constant of

*X H*

*X H*

*X*

*K*

*K*

*H*

*H P*

*P* 

6 4 6 4 / /

log log log

The σ constant is group-specific, and represents the electronic effect (inductive and resonance type) pursuit by R group. In 1964, Corwin Hansch [4] combined the use of the electronic constants to the lipophylic parameter (π), which represents the contribution of

log log log *<sup>X</sup>*

*π = =P P*

where *PX* is the X-substituted compound octanol-water partition coefficient, and *PH*, the partition coefficient for a non-substituted compound. Thus, a QSAR equation evolves some kind of RB, for example, the negative logarithm of the minimal inhibitory concentration (MIC) for am antimicrobial compounds series (-log(MIC)), and the electronic (σ) and

*σ = =K K*

*m p R C H COOH m p R C H COO + H*

*BR = f CS* (4)

*+*

(5)

(6)

*2.2.1. K- Nearrest Neighbor (K-NN)* 

number of objects in each class.

There are several physico-chemical descriptors, useful in QSAR studies that can be divided in categories: constitutional, topological, stereochemical and electronic ones, beside the so called indicator variables.

#### *3.2.1. Constitutional descriptors*

This kind of descriptor is related to the presence of structural characteristics that can affect the BA, such as: amount of unsaturated bonds, amount of hydrogen-bond donors, average ring size, etc.

#### *3.2.2. Topological descriptors*

These are descriptors that represent shape and connectivity, such as: ramifications, spacing groups, unsaturations, etc. The Kier [5] and Wiener [6] descriptors are typical.

#### *3.2.3. Steric (or stereochemical) descriptors*

Steric descriptors exist to describe effects related to the size of chemical groups and hindrance behavior. Taft steric descriptor, *Es*, [7] is a common example.

#### *3.2.4. Eletronic descriptors*

These variables are related to molecular electronic densities, and are used to be calculated by quantum methods. One can mention as examples: dipole moments, atomic partial charges, highest occupied molecular orbital energy (HOMO) and lowest unoccupied molecular orbital energy (LUMO).

#### *3.2.5. Indicator variable and Taylor analysis*

Indicator variables represent a useful way to convert a qualitative information into quantitative once, just as the occurrence of some kind of structural feature – setting 1 when this feature is present, and 0 otherwise. The Taylor QSAR [8] approach employs indicator variables.

#### **3.3. Chemometric methods applied to drug design**

Chemometric statistical methods find in QSAR a large application field, considering that the multivariate problems are inherent to it.

#### *3.3.1. Discriminatory and classificatory methods*

Those methods aim the grouping and classification of compounds and variables in classes or categories that share resemblances, and are very interesting in pattern recognition situations and in dimensionality reduction of complex systems.

#### *3.3.2. Principal Component Analysis (PCA)*

Principal component (PCs) methods aim to combine correlated variables, projecting them in a new coordinate system, so that fewer variables are obtains, without any intercorrelation. The former coordinates are projects in a new axis system, in which the system variability is maximum along PC1, decreasing along the other axises (PC2, PC3...), all of the orthogonal each other, what allows one to deal just with the first components (usually PC1, PC2 and PC3). Thus, from a multi-variable universe, commonly multicolinear, one can obtain a simpler system with almost the same amount of information. Naming X the data matrix, with I×J dimension (I molecules and J descritors), a PCA generates two matrices, T e L, so that

$$\mathbf{X} = \mathbf{T} \mathbf{L}^T \tag{8}$$

Chemometrics: Theory and Application 127

naphtoquinones with antitumour activity. Using electronic descriptors, it was possible to distinguish active from inactive compounds (Figure 4). The loadings values indicate that the presence of high-density groups in side chain and terminal positions favours activity. The

To construct a QSAR equation (Eq. 1), it is necessary to adopt some kind of multivariate fitting method in order to correlate the descriptors with the BR. The main methods are:

same profile arise from the dendogram analysis.

**Figure 4.** PC1 versus PC2 scores plot.

**Figure 5.** Dendogram for a naphtoquinone series

**3.4. Multivariate regression** 

The matrix T is of scores, and represents the position of the compounds in a a novel coordinate system in which the components are its axises, and L is the loading matrix. Plotting the PCs instead of the original descriptors, one obtains groups governed by the similarities among the data.

#### *3.3.3. Hierarchical Cluster Analysis (HCA)*

This analysis is also useful to the classification of compounds, permitting visually distinguish the patterns and cluster. The plot resembling a tree, called dendogram, presents similar compounds at the same branches. Those branches are plotted based upon a similarity matrix, **S**, and each component of it is given by the similarity index between two samples *k* and *l*, *Skl*:

$$S\_{kl} = 1.0 - \frac{d\_{kl}}{d\_{max}}\tag{9}$$

In this expression, *dkl* is the Euclidian distance between *k* and *l*, and *dmax*, the maximum distance. Ferreira [9] describes a PCA/HCA analysis for a 25-compound series of 1,4naphtoquinones with antitumour activity. Using electronic descriptors, it was possible to distinguish active from inactive compounds (Figure 4). The loadings values indicate that the presence of high-density groups in side chain and terminal positions favours activity. The same profile arise from the dendogram analysis.

**Figure 4.** PC1 versus PC2 scores plot.

126 Multivariate Analysis in Management, Engineering and the Sciences

**3.3. Chemometric methods applied to drug design** 

multivariate problems are inherent to it.

*3.3.1. Discriminatory and classificatory methods* 

and in dimensionality reduction of complex systems.

*3.3.2. Principal Component Analysis (PCA)* 

similarities among the data.

samples *k* and *l*, *Skl*:

*3.3.3. Hierarchical Cluster Analysis (HCA)* 

variables.

this feature is present, and 0 otherwise. The Taylor QSAR [8] approach employs indicator

Chemometric statistical methods find in QSAR a large application field, considering that the

Those methods aim the grouping and classification of compounds and variables in classes or categories that share resemblances, and are very interesting in pattern recognition situations

Principal component (PCs) methods aim to combine correlated variables, projecting them in a new coordinate system, so that fewer variables are obtains, without any intercorrelation. The former coordinates are projects in a new axis system, in which the system variability is maximum along PC1, decreasing along the other axises (PC2, PC3...), all of the orthogonal each other, what allows one to deal just with the first components (usually PC1, PC2 and PC3). Thus, from a multi-variable universe, commonly multicolinear, one can obtain a simpler system with almost the same amount of information. Naming X the data matrix, with I×J

The matrix T is of scores, and represents the position of the compounds in a a novel coordinate system in which the components are its axises, and L is the loading matrix. Plotting the PCs instead of the original descriptors, one obtains groups governed by the

This analysis is also useful to the classification of compounds, permitting visually distinguish the patterns and cluster. The plot resembling a tree, called dendogram, presents similar compounds at the same branches. Those branches are plotted based upon a similarity matrix, **S**, and each component of it is given by the similarity index between two

1.0 *kl*

In this expression, *dkl* is the Euclidian distance between *k* and *l*, and *dmax*, the maximum distance. Ferreira [9] describes a PCA/HCA analysis for a 25-compound series of 1,4-

*d*

*<sup>d</sup> S =*

*max*

*kl*

*<sup>T</sup> X = TL* (8)

(9)

dimension (I molecules and J descritors), a PCA generates two matrices, T e L, so that

**Figure 5.** Dendogram for a naphtoquinone series

#### **3.4. Multivariate regression**

To construct a QSAR equation (Eq. 1), it is necessary to adopt some kind of multivariate fitting method in order to correlate the descriptors with the BR. The main methods are: multilinear regression (MLR), principal component regression (PCR) and partial-least squares (PLS).

#### *3.4.1. Multilinear regression (MLR)*

The objective of this method is obtaining a relationship among a number of descriptors limited to 1/5 of the number of compounds and the BR, as an equation of the form:

$$BR = \alpha\_1 \left( \pm \varepsilon\_1 \right) \cdot D\_1 + \alpha\_2 \left( \pm \varepsilon\_1 \right) \cdot D\_2 + \alpha\_3 \left( \pm \varepsilon\_1 \right) \cdot D\_3 + \dots + \varepsilon \tag{10}$$

Chemometrics: Theory and Application 129

performed with the interest to determine the experimental variables and interactions between variables that have significant influence on the different responses of interest [11]. After selecting the significant variables, we must evaluate the experimental methodology and the influence of a particular variable on the yield of the reaction, a statistical experimental design, full factorial type, in which the independent variables are: the nature and concentration of catalyst temperature and the molar ratio between alcohol and oil and the dependent variable is the yield of esters produced. The variables that were not selected must be fixed throughout the experiment [12]. In a subsequent step must be chosen which planning used for estimating the effect (the effect) of the different variables results in a reduced number of conducting experiments. In the screening study the interactions between the variables (main interactions) and second order, usually obtained by full or fractional factorial designs. In the experiments are evaluated best experimental conditions, as well as their simultaneous effects that influence the yield of the reaction are therefore extremely important for understanding the behavior of the system [13]. The values of "p" and greater than or equal to 0.05 indicate that the factors: variable (1), variable (2), variable (3), variable (4) and the interactions of the variables are statistically significant at 95% reliable, since they are greater than 0.05. These parameters were evaluated at a low level (-1) and high (+1) are significant to the process of positive or negative manner. The Figure. 6 shows the profile of

**Figure 6.** Pareto chart of the resulting fractional factorial design to evaluate the effects of each variable

The analysis parameters obtained by means of multivariate optimization consists in choosing the conditions for preliminary assessment of experimental variables (fractional factorial design) followed by a response surface methodology (central composite design) made from the screening of the variables that may affect the synthesis of biodiesel. Generated model and the set of significant effects can evaluate through the study of

the Pareto chart [7]

and their interactions in the reaction yield.

in which i are the regression coefficients, *Di* are the descriptors, εi, the coefficients confidence interval and ε, the independent term. The model statistical validation is very important, and it requires the consistency in the *Di* descriptors unit, as well as in values magnitude (necessarily). Statistical parameter like the fitting coefficient (*r*), the sample standard deviation (*s*), the cross-validation coefficient (*q2*) and the Fischer test (*F*) are used in this task. The MLR is quite sensitive to multicollinearity: variables intercorrelated (tipically, com r2 > 0.6) must not be used together. This is a common problem in multi-descriptor system that may be dealed with other regression methods.

#### *3.4.2. Principal component regression (PCR)*

In order to avoid multicollinearity, it is possible to make the regression, not with the descriptors themselves, but with their principal components (PCs) generated in a PCA treatment. The main advantage of this approach is the assurance that every variable are independent and no n-correlated, despite it is necessary to analyze the loading matrix (**L**). In this kind of regression, the variables are defined to maximize the descriptor matrix variance, without force a correlation with the BR

#### *3.4.3. Partial least square (PLS)*

Similarly to PCR, the PCs are employed, but in this case, the BR matrix has maximum variability, so that each loading matrix component (L) is a good predictor for each BR matrix component. This is the most used regression method, and it is adequate for dealing with 3D-QSAR problems, in which a set of compounds preciously aligned is put within a grid of interaction points with a molecular probe. Each point energy is a variable in the QSAR equation, which are by their turn corrlated with the BR to achieve a tridimensional profile of the critical sites that favours or disfavours the interaction with a hypothetical biological receptor.

#### **4. Design of experiments**

The exploration for new sources of energy such as biodiesel is of great importance today as well as their production processes. The factorial design is an important tool to reduce the search time, waste of reagents and hence operating costs [10]. A factorial design is performed with the interest to determine the experimental variables and interactions between variables that have significant influence on the different responses of interest [11]. After selecting the significant variables, we must evaluate the experimental methodology and the influence of a particular variable on the yield of the reaction, a statistical experimental design, full factorial type, in which the independent variables are: the nature and concentration of catalyst temperature and the molar ratio between alcohol and oil and the dependent variable is the yield of esters produced. The variables that were not selected must be fixed throughout the experiment [12]. In a subsequent step must be chosen which planning used for estimating the effect (the effect) of the different variables results in a reduced number of conducting experiments. In the screening study the interactions between the variables (main interactions) and second order, usually obtained by full or fractional factorial designs. In the experiments are evaluated best experimental conditions, as well as their simultaneous effects that influence the yield of the reaction are therefore extremely important for understanding the behavior of the system [13]. The values of "p" and greater than or equal to 0.05 indicate that the factors: variable (1), variable (2), variable (3), variable (4) and the interactions of the variables are statistically significant at 95% reliable, since they are greater than 0.05. These parameters were evaluated at a low level (-1) and high (+1) are significant to the process of positive or negative manner. The Figure. 6 shows the profile of the Pareto chart [7]

128 Multivariate Analysis in Management, Engineering and the Sciences

system that may be dealed with other regression methods.

*3.4.2. Principal component regression (PCR)* 

without force a correlation with the BR

*3.4.3. Partial least square (PLS)* 

**4. Design of experiments** 

receptor.

*3.4.1. Multilinear regression (MLR)* 

squares (PLS).

multilinear regression (MLR), principal component regression (PCR) and partial-least

The objective of this method is obtaining a relationship among a number of descriptors

in which i are the regression coefficients, *Di* are the descriptors, εi, the coefficients confidence interval and ε, the independent term. The model statistical validation is very important, and it requires the consistency in the *Di* descriptors unit, as well as in values magnitude (necessarily). Statistical parameter like the fitting coefficient (*r*), the sample standard deviation (*s*), the cross-validation coefficient (*q2*) and the Fischer test (*F*) are used in this task. The MLR is quite sensitive to multicollinearity: variables intercorrelated (tipically, com r2 > 0.6) must not be used together. This is a common problem in multi-descriptor

In order to avoid multicollinearity, it is possible to make the regression, not with the descriptors themselves, but with their principal components (PCs) generated in a PCA treatment. The main advantage of this approach is the assurance that every variable are independent and no n-correlated, despite it is necessary to analyze the loading matrix (**L**). In this kind of regression, the variables are defined to maximize the descriptor matrix variance,

Similarly to PCR, the PCs are employed, but in this case, the BR matrix has maximum variability, so that each loading matrix component (L) is a good predictor for each BR matrix component. This is the most used regression method, and it is adequate for dealing with 3D-QSAR problems, in which a set of compounds preciously aligned is put within a grid of interaction points with a molecular probe. Each point energy is a variable in the QSAR equation, which are by their turn corrlated with the BR to achieve a tridimensional profile of the critical sites that favours or disfavours the interaction with a hypothetical biological

The exploration for new sources of energy such as biodiesel is of great importance today as well as their production processes. The factorial design is an important tool to reduce the search time, waste of reagents and hence operating costs [10]. A factorial design is

11 121 231 3 *BR = α ±ε D + α ±ε D + α ±ε D+ + ε* (10)

limited to 1/5 of the number of compounds and the BR, as an equation of the form:

**Figure 6.** Pareto chart of the resulting fractional factorial design to evaluate the effects of each variable and their interactions in the reaction yield.

The analysis parameters obtained by means of multivariate optimization consists in choosing the conditions for preliminary assessment of experimental variables (fractional factorial design) followed by a response surface methodology (central composite design) made from the screening of the variables that may affect the synthesis of biodiesel. Generated model and the set of significant effects can evaluate through the study of

response surface methodology, as shown in Figure 7 and 8, and their interference in the response, ie the yield of the reaction, in which the dark area demonstrates the conditions that process has higher yield.

Chemometrics: Theory and Application 131

**5. Conclusion of chapter** 

*University of São Paulo (USP), Brazil* 

André Maurício de Oliveira

Ana Paula Rodrigues de Freitas

Ed. Wiley-VCH. 1999

Soc. 1 (69): 17–20.

Interscience Publication. 1998.

research. Boston: Academic Press.

Pergamon Press: Oxford, 1990, vol. 4.

Processing Technology, 92(1), 112-118.

Technology, 90(12), 1447-1451.

[9] Ferreira, M.M.C. J. Braz. Chem. Soc., Vol. 13, No. 6, 742-753, 2002

[3] Hammett, Louis P. (1937). J. Am. Chem. Soc. 59: 96.

Relationships. Acct. Chem. Res. 2: 232-239.

Patrícia Gontijo de Melo

**6. References** 

Hilton Túlio Lima dos Santos and Wagner Freitas

*Federal University of Uberlândia (UFU), Brazil* 

*State University of São Paulo (UNESP), Brazil* 

*Federal Center of Technology – Minas Gerais (CEFET - MG), Brazil* 

**Author details** 

This chapter had as aim to show the versatility tools chemometrics in several areas. Was showed application chemometrics theory in drug design, natural products chemistry but it is not limited in theses area. Well, we hope to have expanded the range of chemometrics

[1] Otto M. Chemometrics- Statistic and Computer Application in Analytical Chemistry.

[2] Beebe K., Pell R., Seasholtz M., Chemometrics – A Practical Guide. Ed.Wiley

[4] Hansch, C. (1969) A Quantitative Approach to Biochemical Structure-Activity

[5] Hall, Lowell H.; Kier, Lemont B. (1976). Molecular connectivity in chemistry and drug

[6] Wiener, H. (1947). "Structural determination of paraffin boiling points". J. Am. Chem.

[7] R. W. Taft, Linear free energy relationships from rates of esterification and hydrolysis of aliphatic and ortho-substituted benzoate esters. J. Am. Chem. Soc. 1952, 74, 2729-2732. [8] Hansch, C.; Sammes, P. G.; Taylor, J. B.; Comprehensive medicinal chemistry: the rational design, mechanistic study & therapeutic application of chemical compounds,

[10] Charoenchaitrakool, M., & Thienmethangkoon, J. (2011). Statistical optimization for biodiesel production from waste frying oil through two-step catalyzed process. Fuel

[11] Berrios, M., Gutiérrez, M. C., Martín, M. A., & Martín, A. (2009). Application of the factorial design of experiments to biodiesel production from lard. Fuel Processing

**Figure 7.** (a) Response surface generated by the central composite design for optimization of variables 1 and 3

**Figure 8.** Zoom applied to the surface region of response.

Thus, the statistical analysis shown to be an important tool to evaluate, select and propose new technological routes, either through raw materials and / or process evaluation of the parameters that most influence the transesterification reaction to obtain for biofuels.

### **5. Conclusion of chapter**

This chapter had as aim to show the versatility tools chemometrics in several areas. Was showed application chemometrics theory in drug design, natural products chemistry but it is not limited in theses area. Well, we hope to have expanded the range of chemometrics

#### **Author details**

130 Multivariate Analysis in Management, Engineering and the Sciences

**Figure 8.** Zoom applied to the surface region of response.

that process has higher yield.

and 3

response surface methodology, as shown in Figure 7 and 8, and their interference in the response, ie the yield of the reaction, in which the dark area demonstrates the conditions

**Figure 7.** (a) Response surface generated by the central composite design for optimization of variables 1

Thus, the statistical analysis shown to be an important tool to evaluate, select and propose new technological routes, either through raw materials and / or process evaluation of the

parameters that most influence the transesterification reaction to obtain for biofuels.

Hilton Túlio Lima dos Santos and Wagner Freitas *University of São Paulo (USP), Brazil* 

André Maurício de Oliveira *Federal Center of Technology – Minas Gerais (CEFET - MG), Brazil* 

Patrícia Gontijo de Melo *Federal University of Uberlândia (UFU), Brazil* 

Ana Paula Rodrigues de Freitas *State University of São Paulo (UNESP), Brazil* 

#### **6. References**

	- [12] Melo, P. G. (2012). Production and characterization of obtained from Macaúba (Acrocomia aculeata). Master degree thesis. Univesity of Federal of Uberlandia – Brazil.

**Chapter 8** 

© 2012 Smidt et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 Smidt et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Ageing of materials or products implies changes of the original state, but it does not necessarily only comprise deterioration or degradation. Ageing can also mean formation of new substances and stabilisation. In some cases this effect is desirable. Ageing of

**Ageing and Deterioration** 

**of Materials in the Environment –** 

E. Smidt, M. Schwanninger, J. Tintner and K. Böhm

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/53984

**1. Introduction** 

recovery [6].

**Application of Multivariate Data Analysis** 

Ageing and deterioration of materials are key processes within the perpetual conversion of organic and inorganic matter. As far as natural cycles of organic substances are concerned a balance between syntheses, metabolic products, degradation and recycling can be expected. With respect to inorganic materials, weathering is the dominant natural process of ageing. It comprises transformation of chemical compounds and is caused by abiotic and biotic factors. The formation of new mineral phases closes the loop. Anthropogenic activities influence the well-balanced metabolism due to the increased consumption of resources and the inherent accelerated turnover rate. This development is paralleled by a relevant environmental impact caused by increasing concentrations of metabolic products. Especially greenhouse gases have become a crucial topic due to their global effect and the contribution to climate change. The fate of carbon, a key element in the global cycle, therefore attracts much attention [1-4]. Carbon sequestration and minimisation of gaseous emissions such as CO2 and methane are promoted to decelerate the turnover. Deterioration and degradation are not only paralleled in many cases by the release of harmful substances but also by the loss of valuable resources. Prevention of negative environmental effects and careful use of resources therefore require a responsible management of products, substances and elements. Several elements such as nitrogen, phosphorus and sulphur that are released as different compounds during degradation of organic matter are in the focus of interest [5]. The ambivalence being both nutrient and pollutant has led to several techniques of resource

