3. Performance evaluation parameters

After adjustment of classification or calibration models, it is important to have parameters to evaluate the performance through the results obtained. Some of the most widely used parameters and a few explanations on each of them will be provided in this section. These parameters can also be employed, for instance, to aid comparing and deciding among different methods applied to the same problem addressed.

#### 3.1. Evaluation parameters for classification

The first step to organize classification results for better visualization is to build a confusion matrix, like in the example from Table 2. The actual classes are represented in the columns, and the predicted classes, in the rows. The number of apple samples classified as apples is registered in cell AA; the number of apples classified as bananas is in cell AB and those


Table 2. 3 � 3 confusion matrix of results of fruits classification.

In the work of Nadai et al., entitled "Inference of the biodiesel cetane number by multivariate techniques", a method consisting of successive application of principal components analysis (PCA), fuzzy clustering and ANN in a dataset composed by structural information from

The aim of that work was to obtain the cetane number of different types of complex mixtures from data of pure substances (esters). The authors pointed out two main characteristics that affect the cetane number values: the number of carbon-carbon double bonds and the structure

In 2014, with the research "Neural network prediction of biodiesel kinematic viscosity at 313 K" Meng et al. performed the prediction of the kinematic viscosity property of biodiesel by artificial neural networks. The authors used 105 samples of biodiesel collected from the literature and 19 types of fatty acid methyl esters (FAMEs) were set as inputs. The results obtained suggested ANNs as an option in predicting kinematic viscosity with a correlation coefficient of 0.9774 [25]. In the paper "Application of artificial neural networks to predict viscosity, iodine value and induction period of biodiesel focused on the study of oxidative stability", Barradas Filho and collaborators optimized ANN models to predict viscosity, iodine value and induction period

Also in the work of Barradas Filho et al., the ANNs optimization occurred by varying the activation functions, the number of neurons in the hidden layers and the convergence methods. The evaluation criteria of the models were MSE, RMSE, MAPE and r and R<sup>2</sup> coefficients. After optimization, the ANN results were compared to other models and obtained the best perfor-

In 2017, the work "Attesting compliance of biodiesel quality using composition data and classification methods" of Lopes et al. compared four classification methods (decision tree classifier, Knearest neighbors, support vector machine and artificial neural networks) to evaluate the compliance of biodiesel samples concerning some quality parameters. This work aimed to obtain an alternative method with more accuracy when compared to other alternative methods [27].

After adjustment of classification or calibration models, it is important to have parameters to evaluate the performance through the results obtained. Some of the most widely used parameters and a few explanations on each of them will be provided in this section. These parameters can also be employed, for instance, to aid comparing and deciding among different methods

The first step to organize classification results for better visualization is to build a confusion matrix, like in the example from Table 2. The actual classes are represented in the columns, and the predicted classes, in the rows. The number of apple samples classified as apples is registered in cell AA; the number of apples classified as bananas is in cell AB and those

(oxidative stability) of 98 biodiesel samples by its fatty esters composition [26].

H NMR) of biodiesel fatty esters was implemented [24].

proton nuclear magnetic resonance (1

186 Advanced Applications for Artificial Neural Networks

mance [26].

of the alcohol moiety in each fatty ester [24].

3. Performance evaluation parameters

applied to the same problem addressed.

3.1. Evaluation parameters for classification

classified as coconuts are in cell AC. The same goes to the other fruit classes. The principal diagonal of the matrix represents the samples correctly classified (cells AA, BB and CC), and the other cells represent the misclassified ones. An ideal classifier would provide a confusion matrix in which all the cells out of the principal diagonal have zero value.

The evaluation parameters for classification methods are based on rates that can be obtained from the confusion matrix. These rates correspond to integer values as they are the numbers of samples classified and split according to some criteria, as will be explained below.

The example given in Table 2, and for banana class, the true positive rate (TP) corresponds to the number of bananas correctly classified as bananas (5 samples, cell BB) and the true negatives (TN) are the samples of the other classes (apple and coconut) classified in any class other than banana (21 samples, cells AA, AC, CA and CC). The false positive rate (FP) is the number of samples of other classes misclassified as bananas (2 samples, cells AB and CB) and the false negative rate (FN) corresponds to the banana samples not classified as bananas (4 samples, cells BA and BC).

For apple class, the TP, TN, FP and FN rates are 9, 19, 3 and 1, respectively, and for coconut class, these rates in the same sequence are 11, 17, 2 and 2. Once the TP, TN, FP and FN rates have been obtained for each class, their average values for all classes together can be used to calculate some global evaluation parameters within which the main ones will be briefly explained with the fruits example.

The accuracy (ACC), given by Eq. (1), reflects the global ability of correctly classification by the method. For the fruits example, ACC is 85.42%, which is the percentage of samples that were classified in its actual classes.

$$\text{ACC} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} \times 100\% \tag{1}$$

The sensitivity (SENS), also called "recall", can be considered as a global TP rate. The SENS of the fruits classification is 78.13%, calculated by Eq. (2).

$$\text{SENS} = \frac{\text{TP}}{\text{TP} + \text{FN}} \times 100\% \tag{2}$$

The specificity (SPEC) can be calculated by Eq. (3) and it is a global TN rate. For the fruits example, SPEC is 89.06%.

$$\text{SPEC} = \frac{\text{TN}}{\text{TN} + \text{FP}} \times 100\% \tag{3}$$

The false positive rate (FPR) can be interpreted as a global rate of FP for all the classes combined and it is the inversely proportional to the SPEC. In the example discussed here, FPR is 10.94%, calculated by Eq. (4).

$$\text{FPR} = \frac{\text{FP}}{\text{TN} + \text{FP}} \times 100\% = 100\% \text{ - SPEC} \tag{4}$$

Analogously, the false negative rate (FNR) is a global rate of FN (Eq. (5)). For the fruits classification, FNR is 21.87% and it is complementary to the SENS.

$$\text{FNR} = \frac{\text{FN}}{\text{TP} + \text{FN}} \times 100\% = 100\% \text{ - SENS} \tag{5}$$

The ACC, SENS, SPEC, FPR and FNR are some of the main evaluation parameters for classification. Here an example of three classes was presented, giving a 3 � 3 confusion matrix and, therefore, the evaluation parameters should be calculated by the average TP, TN, FP and FN rates.

Problems with only two classes are simpler and more widespread in the literature, usually involving samples that "have" or "do not have" a specific characteristic and giving a 2 � 2 confusion matrix. In this case, TP, TN, FP and FN rates are obtained only for the "positive class" and the evaluation parameters are directly calculated by these rates instead of by the averages.

A two class example, already cited in Section 2, is the classification of biodiesel samples according to their compliance to some quality parameters. For each criteria, the samples were split in "compliant" and "non-compliant" [27].

#### 3.2. Evaluation parameters for calibration

The evaluation parameters for calibration are quite different from those for classification. In calibration, these parameters are based on the difference between the actual response, that obtained experimentally by a reference method, and the predicted response, the one estimated by the calibration method.

The oxidative stability (h) of some biodiesel samples from the case study of Section 4 are show in Table 3 with the actual (y) and predicted (y<sup>0</sup> ) responses given in hours. The residuals are the difference between the actual and the predicted responses. The other columns have values calculated to be used in equations of the evaluation parameters that will be explained.

The RMSE is an average deviation between the actual and the predicted values and it has the same unity from the responses. The example from Table 3, the RMSE calculated is 0.34 h, which means that the estimated responses differ, on average, in �0.34 h from the actual values. In papers, the RMSE is often abbreviated as RMSEC, RMSEP, RMSEV and RMSECV when


Table 3. Actual and predicted values of oxidative stability (h) of biodiesel samples.

The specificity (SPEC) can be calculated by Eq. (3) and it is a global TN rate. For the fruits

TN þ FP

The false positive rate (FPR) can be interpreted as a global rate of FP for all the classes combined and it is the inversely proportional to the SPEC. In the example discussed here,

Analogously, the false negative rate (FNR) is a global rate of FN (Eq. (5)). For the fruits

The ACC, SENS, SPEC, FPR and FNR are some of the main evaluation parameters for classification. Here an example of three classes was presented, giving a 3 � 3 confusion matrix and, therefore, the evaluation parameters should be calculated by the average TP,

Problems with only two classes are simpler and more widespread in the literature, usually involving samples that "have" or "do not have" a specific characteristic and giving a 2 � 2 confusion matrix. In this case, TP, TN, FP and FN rates are obtained only for the "positive class" and the evaluation parameters are directly calculated by these rates instead of by the averages. A two class example, already cited in Section 2, is the classification of biodiesel samples according to their compliance to some quality parameters. For each criteria, the samples were

The evaluation parameters for calibration are quite different from those for classification. In calibration, these parameters are based on the difference between the actual response, that obtained experimentally by a reference method, and the predicted response, the one estimated

The oxidative stability (h) of some biodiesel samples from the case study of Section 4 are show

difference between the actual and the predicted responses. The other columns have values

The RMSE is an average deviation between the actual and the predicted values and it has the same unity from the responses. The example from Table 3, the RMSE calculated is 0.34 h, which means that the estimated responses differ, on average, in �0.34 h from the actual values. In papers, the RMSE is often abbreviated as RMSEC, RMSEP, RMSEV and RMSECV when

calculated to be used in equations of the evaluation parameters that will be explained.

� 100% (3)

� 100% ¼ 100% - SPEC (4)

� 100% ¼ 100% - SENS (5)

) responses given in hours. The residuals are the

SPEC <sup>¼</sup> TN

FPR <sup>¼</sup> FP

classification, FNR is 21.87% and it is complementary to the SENS.

FNR <sup>¼</sup> FN

TN þ FP

TP þ FN

example, SPEC is 89.06%.

TN, FP and FN rates.

split in "compliant" and "non-compliant" [27].

in Table 3 with the actual (y) and predicted (y<sup>0</sup>

3.2. Evaluation parameters for calibration

by the calibration method.

FPR is 10.94%, calculated by Eq. (4).

188 Advanced Applications for Artificial Neural Networks

calculated for the samples of calibration (training), prediction (test), validation and crossvalidation, respectively, and it can be given by Eq. (6), in which n is the number of samples (n = 8, in this example).

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left( y\_i - y\_i' \right)^2} \tag{6}$$

Another important error metric is the MAPE, which is a relative measure of the prediction accuracy, calculated by Eq. (7). From the example of oxidative stabilities of biodiesel, the MAPE is 2.48%, that is, on average, the predicted values deviate in 2.48% from their actual values.

$$\text{MAPE} = \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{y\_i - y\_i'}{y\_i} \right| \times 100\% \tag{7}$$

The Pearson correlation coefficient (r, Eq. (8)) is a measure of the linear relationship between two variables and it is expressed in values from 0 to |1|. The closer to |1|, the more linearly correlated the variables are. In cases of calibration methods, r coefficient is used to compare the actual and the predicted values. Since y and y<sup>0</sup> are expected to be equal, this represents a direct relationship and, then, the ideal r coefficient is +1.

$$r(y, y') = \frac{n \sum\_{i=1}^{n} y\_i y\_i' - \left(\sum\_{i=1}^{n} y\_i\right) \left(\sum\_{i=1}^{n} y\_i'\right)}{\sqrt{\left[n \sum\_{i=1}^{n} y\_i^2 - \left(\sum\_{i=1}^{n} y\_i\right)^2\right] \left[n \sum\_{i=1}^{n} y\_i'^2 - \left(\sum\_{i=1}^{n} y\_i'\right)^2\right]}} \tag{8}$$

For the oxidative stabilities, for example, r is 0.9977, which represents a high correlation between the actual and the predicted responses. However, it is important to perform the graphical analysis of the correlation by a scatter plot of the actual (y in x-axis) and the predicted (y<sup>0</sup> in y-axis) values, because not all samples with a perfect correlation coefficient are well distributed along the line of the expected identity function for y=y<sup>0</sup> .


Table 4. Evaluation parameters calculated from the results obtained by the ANN MLP 4-3-1-1 for the oxidative stability (h) of biodiesel samples.

Although the R<sup>2</sup> coefficient can be obtained by taking the square of the correlation coefficient, they have different meanings. The R<sup>2</sup> , calculated by Eq. (9) in which ym is the average of the actual values, expresses how much the calibration model explains from the total variance and it can range from 0 to +1. For example, from Table 4, the R<sup>2</sup> obtained is 0.9954, which means that 99.54% of total data variance is explained by the regression, and the 0.46% remaining are attributed to residuals.

$$R^2 = r^2 = \frac{\sum\_{i=1}^n \left(y\_i' - y\_m\right)^2}{\sum\_{i=1}^n \left(y\_i - y\_m\right)^2} \tag{9}$$

Some of the main evaluation parameters for regression have been explained in this section. Besides the numerical parameters, it is also quite important to perform a graphical evaluation of the results by the correlation and residual plots. More details on this will be provided in the case study of the next section.

#### 4. Case study

Biodiesel, like any other fuel, needs to meet some parameters specifications so it can be marketed with quality and safety. These quality parameters are established by standards in each country or region, such as the standards EN 14214 (Europe), ASTM D6751 (USA) and RANP 45/2014 (Brazil) [27].

Among the parameters of biodiesel quality, there are general parameters, which are also applied to petroleum diesel, and there is a special group of parameters related to the chemical composition and purity of the vegetable oils. These parameters can be grouped into four sets: contaminants from the raw material, parameters related to the evaluation of the production process, properties inherent to molecular structures and parameters related to the storage process [7].

One of the main problems assigned to the quality of biodiesel is the possibility of its oxidation caused by the presence of unsaturations in its ester molecules, which is one of the most relevant differences between biodiesel and mineral diesel composition. The main products formed by the oxidation of biodiesel can cause formation of insoluble gums in the engine, filter clogging, injector cocking and corrosion of the metal parts of the engine. Therefore, the evaluation of the oxidative stability of biodiesel is considered by many researchers in the literature to be a very important analysis that should be done because it is directly related to the deterioration capacity (oxidation) and to the time in which the biofuel can be stored (induction period) [26].

The oxidative stability of biodiesel is measured by the method EN 14112, also called Rancimat method, which consists of a system composed of a reaction vessel connected to a cell monitored by an electrode. The sample is placed in the vessel in a heating block at 110C and a continuous stream of air is bubbled through. The increase in temperature and the presence of oxygen induce the accelerated oxidation of biodiesel. The primary products are formed, followed by secondary products of the oxidation among which are short chain volatile organic acids. These acids are carried to a cell containing deionized water and promote the increase in conductivity, which is measured by an electrode coupled to a device that records the conductivity as a function of time [28].

Although the R<sup>2</sup> coefficient can be obtained by taking the square of the correlation coefficient,

Table 4. Evaluation parameters calculated from the results obtained by the ANN MLP 4-3-1-1 for the oxidative stability

MAPE (%) 8.35 R<sup>2</sup> 0.9306 r 0.9647

MAPE (%) 5.51 R<sup>2</sup> 0.9733 r 0.9866

MAPE (%) 6.89 R<sup>2</sup> 0.9544 r 0.9769

actual values, expresses how much the calibration model explains from the total variance and it can range from 0 to +1. For example, from Table 4, the R<sup>2</sup> obtained is 0.9954, which means that 99.54% of total data variance is explained by the regression, and the 0.46% remaining are

> Pn i¼1 y0 <sup>i</sup> � ym � �<sup>2</sup>

Pn i¼1

Some of the main evaluation parameters for regression have been explained in this section. Besides the numerical parameters, it is also quite important to perform a graphical evaluation of the results by the correlation and residual plots. More details on this will be provided in the

Biodiesel, like any other fuel, needs to meet some parameters specifications so it can be marketed with quality and safety. These quality parameters are established by standards in each country or region, such as the standards EN 14214 (Europe), ASTM D6751 (USA) and

yi � ym

<sup>R</sup><sup>2</sup> <sup>¼</sup> <sup>r</sup>

Step Parameter MLP 4-3-1-1

Training RMSEC (h) 1.31

Validation RMSEV (h) 0.43

Test RMSEP (h) 0.67

2 ¼

, calculated by Eq. (9) in which ym is the average of the

� �<sup>2</sup> (9)

they have different meanings. The R<sup>2</sup>

190 Advanced Applications for Artificial Neural Networks

attributed to residuals.

(h) of biodiesel samples.

case study of the next section.

RANP 45/2014 (Brazil) [27].

4. Case study

The induction period used to evaluate the oxidative stability of biodiesel is the time at which the conductivity curve increases rapidly, corresponding to the emergence of the secondary products of the oxidation. The standards EN 14214 and RANP 45/2014 state that the minimum oxidative stability of biodiesel should be 8 h at 110C [29, 30], while ASTM D6751 specifies 3 h of oxidative stability [31].

Aiming to reduce the time, complexity and costs of analyzing biodiesel quality parameters, some papers in the literature report analytical methodologies alternative to official methods. In this context, the Rancimat method is a relevant case to be studied due to the long analysis time, since a sample of biodiesel that meets EN 14214 requirements will be under analysis for more than 8 h to obtain an oxidative stability result.

A case study of an application of ANN to predict oxidative stability of biodiesel will be presented below to better illustrate the main steps from data preprocessing and selection of sample sets (for training, validation and test) to the optimization of ANN configuration and application. Finally, some performance measures will be evaluated and discussed in the case study. All data handling, preprocessing, subset partitioning and ANN regression were carried out with software MATLAB® 2013a (MathWorks), PLS\_Toolbox (Eigenvector) and an algorithm implemented in MATLAB [32].
