**2. Acquisition data base**

The multivariate analysis, using infrared signals, allow manipulate data absorption, called spectra, associated with more than one frequency or wavelength at the same time. In the oil industry, their applications are associated with the prediction of the quality of distillates such as naphtha, gasoline, diesel and kerosene (Kim, Cho, Park, 2000).

The chemical bonds of the type carbon-hydrogen (CH), oxygen-hydrogen (OH) and nitrogen-hydrogen (NH) are responsible for the absorption of infrared radiation, but are not very intense and overlap, creating broad spectral bands, that are correlated and difficult to interpret. However, a multivariate approach has proven quite adequate for modeling physical and chemical properties from samples of the input variables, it is known as absorbed infrared radiation (Behzadian et al., 2010).

The polychromatic radiation emitted by the source has a wavelength selected by a Michelson interferometer. The beam splitter has a refractive index such that approximately half of the radiation is directed to the fixed mirror and the other half is reflected, reaching the movable mirror and is therefore reflected by them. The optical path differences occur due to movement of the movable mirror that promotes wave interference. Nowadays, the instrumentation has introduced an improved Michaelson interferometer, to develop the system "Wishbone", as illustrated in Figure 4. In this system, instead of having a movable mirror and a fixed mirror, both mirrors cubic move, tied to a single support.

Fig. 4. "Wishbone" interferometric system employed in modern NIR spectrophotometers based on Fourier Transform. A, beam splitter; B, corner cubic mirrors; C, anchor, and D, "wishbone" (Pasquini, 2003)

An interferogram is obtained as a result of a graph of the signal intensity received by the detector versus the difference in optical path traveled by the beams. Then, like the Fourier Transform translates the recurring phenomenon in a series of sines and cosines (see Fig. 5), it is possible to transform the interferogram in a spectrum transmittance. The amount of radiation absorbed is determined by using the co-logarithm of the transmittance spectrum.

Fig. 5. Interferogram of radiation containing several wavelengths (adapted from Smith, 2011)

#### **3. Multivariate analysis**

36 Fuel Injection in Automotive Engineering

The multivariate analysis, using infrared signals, allow manipulate data absorption, called spectra, associated with more than one frequency or wavelength at the same time. In the oil industry, their applications are associated with the prediction of the quality of distillates

The chemical bonds of the type carbon-hydrogen (CH), oxygen-hydrogen (OH) and nitrogen-hydrogen (NH) are responsible for the absorption of infrared radiation, but are not very intense and overlap, creating broad spectral bands, that are correlated and difficult to interpret. However, a multivariate approach has proven quite adequate for modeling physical and chemical properties from samples of the input variables, it is known as

The polychromatic radiation emitted by the source has a wavelength selected by a Michelson interferometer. The beam splitter has a refractive index such that approximately half of the radiation is directed to the fixed mirror and the other half is reflected, reaching the movable mirror and is therefore reflected by them. The optical path differences occur due to movement of the movable mirror that promotes wave interference. Nowadays, the instrumentation has introduced an improved Michaelson interferometer, to develop the system "Wishbone", as illustrated in Figure 4. In this system, instead of having a movable

such as naphtha, gasoline, diesel and kerosene (Kim, Cho, Park, 2000).

mirror and a fixed mirror, both mirrors cubic move, tied to a single support.

absorbed infrared radiation (Behzadian et al., 2010).

Fig. 3. Diesel engine (Marshall, 2002)

**2. Acquisition data base** 

The characterization of the mathematical models more adequate was performed by using the multivariate technique with partial least squares regression (PLS). This is an analysis technique where the original matrix of data is represented by factors or latent variables. Only the portion of the spectral data that correlates with the property assessed is included in this representation.

The first factor, calculated by a statistical program The Unscrambler®, has the highest correlation of spectral data with respect the property of interest. The residual spectrum,

Multivariate Modeling in Quality Control of Viscosity in Fuel: An Application in Oil Industry 39

Fig. 7. Matrix representation of mathematical model

Fig. 8. Samples measured with three variables

which is the original spectrum minus the proportion represented by the first factor, then the same statistical program evaluates it. Thus, the second factor has the highest correlation with the residual spectrum property. This procedure is replicated until each one of the important information, which has a correlation with the property under study, was represented by the factors or latent variables. Observe that some caution is needed to determine the appropriate number of factors, because an insufficient number of them will not include all the necessary spectral information and too many of them will add noise (see Fig. 6).

Fig. 6. Optimized number of components (Naes et al., 2002)

This regression model has the advantage of using the entire spectrum, be quick and offer a stable result. In addition, the regression using partial least squares, that use a number of factors less than the principal component regression (PCR) method, is more resistant to noise and in presence of weaker correlations.

The decomposition of the original variables on principal components can be represented by Equation (1), where for the k principal components, t is the value called score, that indicates the differences or similarities between samples, γ is the parameter that relates the original variables with the latent variable, it is called loading and represents how much a variable contributes to a major component and it considers the variation of the data:

$$\text{tik} = \text{uk1Xi1} + \text{uk2Xii2} + \dots + \text{ukqXkq} \tag{1}$$

So the principal components are related to the concentration, or property of interest, according to Equation (2), where h is the number of principal components used. The Fig. 7 represents the Equation (2) in matrix form:

$$\text{Y = b0 + b1t i1 + b2t i2 + ...+ bht ih + e1} \tag{2}$$

The Fig. 8 illustrates the absorbance of three wavelengths. Observe that the first principal component PC1 is a linear combination of the absorbance values that representing the maximum variance between samples. The projection of the sample point on the axis of PC1 is the score of PC1 (see Fig. 9).

Fig. 7. Matrix representation of mathematical model

which is the original spectrum minus the proportion represented by the first factor, then the same statistical program evaluates it. Thus, the second factor has the highest correlation with the residual spectrum property. This procedure is replicated until each one of the important information, which has a correlation with the property under study, was represented by the factors or latent variables. Observe that some caution is needed to determine the appropriate number of factors, because an insufficient number of them will not include all the necessary

This regression model has the advantage of using the entire spectrum, be quick and offer a stable result. In addition, the regression using partial least squares, that use a number of factors less than the principal component regression (PCR) method, is more resistant to

The decomposition of the original variables on principal components can be represented by Equation (1), where for the k principal components, t is the value called score, that indicates the differences or similarities between samples, γ is the parameter that relates the original variables with the latent variable, it is called loading and represents how much a variable

So the principal components are related to the concentration, or property of interest, according to Equation (2), where h is the number of principal components used. The Fig. 7

The Fig. 8 illustrates the absorbance of three wavelengths. Observe that the first principal component PC1 is a linear combination of the absorbance values that representing the maximum variance between samples. The projection of the sample point on the axis of PC1

tik = υk1Xi1 + υk2Xi2 +...+ υkqXkq (1)

Y = b0 + b1t i1 + b2t i2 +...+ bht ih + e1 (2)

contributes to a major component and it considers the variation of the data:

spectral information and too many of them will add noise (see Fig. 6).

Fig. 6. Optimized number of components (Naes et al., 2002)

noise and in presence of weaker correlations.

represents the Equation (2) in matrix form:

is the score of PC1 (see Fig. 9).

Fig. 8. Samples measured with three variables

Multivariate Modeling in Quality Control of Viscosity in Fuel: An Application in Oil Industry 41

To obtain the remaining principal components, the procedure is the same. It be noted that for a set of 100 different wavelengths, for example, is not necessary 100 principal

The variability of the spectrum can be compressed into a less number of principal components without significant loss of analytical information. After this compression, the scores are considered as independent variables in the regression to obtain the dependent

The PLS determines the principal components that are the best ones with respect to the

The first step is collect samples of kerosene and diesel oil in a reasonable period of time, to obtain a data set that best reflect all possible operational variability, as changes in the cast of

In the second stage experiments are performed to characterize, on a laboratory scale, the product, aiming to determining, from the samples, the real kinematic viscosity to be

In the third step the mathematical models were developed using The Unscrambler® and Excel® softwares, associating the information to absorbed infrared radiation with the physicochemical property. In the end, the model is implemented on an industrial scale for forecasting the viscosity in real time, providing to the production area, high power decision-

For each oil studied was developed a mathematical model with 800 input variables. To help determine the number of latent variables and minimize the residual variance was used the full cross-validation method, which is a mathematical algorithm able to gradually reduce the number of samples. In the sequence, a model constructed from the remaining samples is tested by comparing it with the true values of the samples excluded. The models are developed using The Unscrambler® program. Several forms of preprocessing were

components to represent the data variability.

variable y (concentration or physicochemical properties).

Fig. 11. Data compression in principal components

**4. Proposed method, results and analysis** 

modeled. The samples were also subjected to infrared radiation.

making, and enabling increase the profitability of the blending process.

oil and operating conditions of the units.

variable y and that explain as best as possible the variable x (see Fig. 11).

Fig. 9. First principal component in 3D

The PC2 must be orthogonal to PC1 and is positioned to capture the maximum residual variance. When all the data variability can not be explained by only one major component (red and green samples), a second PC is needed and so on. The score for the PC2 is obtained by projection, in a manner analogous to the previous situation (Fig. 10).

Fig. 10 . Second principal component in 3D

The PC2 must be orthogonal to PC1 and is positioned to capture the maximum residual variance. When all the data variability can not be explained by only one major component (red and green samples), a second PC is needed and so on. The score for the PC2 is obtained

by projection, in a manner analogous to the previous situation (Fig. 10).

Fig. 9. First principal component in 3D

Fig. 10 . Second principal component in 3D

To obtain the remaining principal components, the procedure is the same. It be noted that for a set of 100 different wavelengths, for example, is not necessary 100 principal components to represent the data variability.

The variability of the spectrum can be compressed into a less number of principal components without significant loss of analytical information. After this compression, the scores are considered as independent variables in the regression to obtain the dependent variable y (concentration or physicochemical properties).

The PLS determines the principal components that are the best ones with respect to the variable y and that explain as best as possible the variable x (see Fig. 11).

Fig. 11. Data compression in principal components
