**5. Method used**

88 Multivariate Analysis in Management, Engineering and the Sciences

Stretch Structure Lenth

the dam.

5

following:

<sup>1</sup>Auxiliary Dam

<sup>2</sup>Auxiliary Dam

Main Dam (TStretch F)

Other strwtches Características 6 Powerhouse 20 Generator units 8 Spillway 350 m length

**Table 1.** Characteristics of the stretches of Itaipu.

30 extensometers located in stretch F.

of the clustering methods.

applied only to the extensometers located in stretch F.

the necessary data used for developing this study were extracted.

<sup>9</sup>Auxiliary

on Figure 5), should be highlighted in a deeper study. The turbines for generating energy can be found in stretch F. In addition, this stretch is the most high water column and the most instrumented one. This stretch is made of many blocks, and each of them has instruments in the concrete structures and in the foundation that provides data about its physical behavior. This study was developed based on the data collected in this stretch of

(Saddle Dam) Earth 2294 30

(Saddle Dam) Rock-filling 1984 70

Dam Terra 872 25

In the stretch F it is possible to find extensometers, piezometers, triothogonal meter, water level gauge and foundation instrumentation (seepage flow meter). Among these instruments, the multiple point rod extensometers, that are installed in boreholes, were selected for the analysis. This type of instrument is considered one of the most important because they are responsible for measuring the vertical displacement. That is one of the most important observations while monitoring the behavior of the dam structure. There are

The procedure for the methodology used for the analysis of the problem of *Itaipu* is the

In the first phase, the data were selected and it was decided that the methodology would be

In the second phase, the data given by *Itaipu* were converted into spreadsheets, from which

In the third phase, the data were standardized in order to receive the subsequent application

3 and 7 Lateral Dams Couterfort 1438 81 4 Deviation Structure Concrete mass 170 162

Reliefed

(m)

Gravity 612 196

Maximun high (m)

The methodology used for the analysis was applied to the data of 30 extensometers located in different blocks of stretch F of the dam, which having one or two point rods, totalizes 72 displacement measures. These measurements are identified as follow: equip4\_1, meaning rod 1 of the extensometer 4, and so on.

The data used in this study are monthly stored and they correspond to the period of January/1995 to December/2004, totalizing 120 readings. This period was chosen as a suggestion of the engineer team of Itaipu because it is subsequent the construction of the dam and prior to the system of automatic acquisition of data. During the period of system implementation, some instruments ended up having no manual readings, in addition, a total of 11 automated instruments (totalizing 24 rods) went through modifications that might have influenced the subsequent readings; there was an exchange on the instrument head for a 70 cm longer one. In this way, the referred 120 readings were immune to these irregularities.

During the period of pre-processing the data, it was identified that most of the instruments readings are monthly, but some of them showed more than one reading per month, so for this cases, the monthly average was considered. Moreover, some instruments had missing readings, in these cases; interpolations were performed through temporal series, meaning that, an adequate model was established from the Box & Jenkins methodology, using the Statgraphics [13]. In this way, it was possible to assure that all the 120 instruments had 120 readings (10 years). See [14] for more information about the interpolation techniques with temporal series.

In this way, the Matrix of entrance of structural geotechnical instrumentation data (Matrix *Q*) *is of order a* x *b,* where *a* is the number of patterns and *b* is the number of attributes. For the structural geotechnical instrumentation data of Itaipu, a = 72 (number of patterns) and b = 120 (number of attributes).

During the period of the Multivariate Analysis was applied and the patterns were grouped through the Ward's hierarchical clustering method. The grouping was performed in order to find out similar groups of instruments, and the aim of doing it was to establish the technical justifications for its formation. In addition, the Factor Analysis was applied to the referred data. The Factor Analysis was used to rank the rods of the extensometers through a balanced average of factor scores. Next, the Factor Analysis was applied within each group formed by the clustering analysis. Once having groups that have the instrumentations with a similar behavior, a raking of these instruments was performed within each group, in order to indicate the most relevant instruments, which would be chosen, for example, in cases of intensifying the reading.

#### **5.1. Statistical multivariate analysis**

#### *5.1.1. Factor analysis*

Factor Analysis is a multivariate statistical method, which objective is to explain the correlations between one large set of variables in terms of a set of unobserved low random variables called factors. Hence, suppose the random vector *X* resulted from p random variables; *X'* = [*x1 x2 x3 ... xp*], and in order to study the **c**ovariance structure of this vector, in other words, if *X* is observed *n* times, it happens that its parameters *E(X) =*  e V(X) = can be estimated and the relation between the evaluated variables represented by matrix of covariance or of correlation *p.* The factor analysis makes a grouping of variables to explain the influence of latent variables (unobserved) or factors. Within a same group, the variables are highly correlated with each other, and from one group to another, the correlations are low. Each group represents a factor, which is responsible for the observed correlation.

Itaipu Hydroelectric Power Plant Structural Geotechnical Instrumentation

i, *e*i) of the matrix of sample covariance *S*,

�� are given by the diagonal

� �� =

Temporal Data Under the Application of Multivariate Analysis – Grouping and Ranking Techniques 91

<sup>p</sup>0 and *m*<*p* is the number of common factors the matrix of the estimated

loadings is given by *L* = *CD*1/2, where *C* is the matrix of the eigenvectors and *D* is a diagonal

In the application of this method, the observations are primarily centralized or standardized, in other words, the matrix of correlation R (estimator of p) is used in order to

In multiple actions, it is necessary to estimate the value of the scores of each factor (unobserved) for an individual *X* observation. These factor values are called factor scores. The estimated factor scores to the original variables are *F* = (*L*'*L*)-1 *L*'(*X* – *X* ) and for the standardized variables are *F* = (*L'L)'Lz,,* that is, if the principal components are used in order

According to [15], with the rotation of factors, a structure is obtained for the low or moderated loadings on the other factors. This leads to a more simplified structure to be interpreted. Kaiser suggested an analytical measure known as *Varimax* criteria [15] in order

The rotation coefficient scaled by the square root of the communalities is defined by �

which comes from LT. Hence, the criterion is to maximize V.

1 1

*i i*

1

*V l*

,. The *Varimax* selects the orthogonal transformation *T* that turns *V* (given by equation 2) the largest possible, in other words, the procedure starts from *LT L*' ' and gives the

<sup>4</sup> <sup>1</sup> \*

*p p*

attributed to the factors and represents the percentage of variation of the variable which is not random but from the factors. Thus, the criterion used to classify the patterns is sort the variables (instruments) according to their factor scores. The factor scores were evaluated by a factor that distinguishes the behavior of the instrument, using it as a practical and simple

To perform the ranking of the variables (instruments), a final factor score was used, which is given by equation (3), where *m* is the number of factors extracted, λi are the eigenvalues and

������������������ = <sup>∑</sup> ���� �

The Factor Analysis was done with the aid of the computational *Statgraphics* [13].

��� ∑ �� � ���

*p p p ij <sup>i</sup> ij*

2\*

(2)

2 is the portion of the variance of the variable that is

(3)

*l*

If the pair of eigenvalues and eigenvectors are (

matrix of which the diagonal elements are the eigenvalues.

avoid problems of scale. The specific variances estimated

�� = *S* – *LL*'.

with 1

2 ... 

elements of the matrix

to estimate the loadings.

to make the rotation.

In Factor Analysis, communality *h*<sup>i</sup>

*fi* are the factor scores.

quality control of the measurement of the instrument.

��� ���

loadings \* 

The covariance matrix of the vector *X* can be placed in an exact form*: V(X) = = LL' +* , where matrix *LL'* has on the main diagonal the called communality defined for each variable considering *m* factors by: *hi 2 = li12 + li22 + ... + lip2*. However, considering the *m* main factors, it is given that *hi 2 = li12 + li22 + ... + lim2,* i = 1, 2, ..., *p* variables. In this way, the communality *h*<sup>i</sup> 2 is the part of the variance of the random variable *x*i that comes from *m* factors. And, the part of the variance of the random variable *x*i due to the factors *p – m* that are not important are called specific variance. Hence, *V(xi) = hi 2 + <sup>I</sup>* , i = 1, 2, …, p.

There are many criteria to define *m* number of factors. The most used one is the Kaiser criterion [15], which suggests that the number of extracted factors must be equal to the number of eigenvalues higher than one, of or *ρ*.

If *X* is a random vector, with *p* components, and the parameters E(X) = e V(X) = , in factor model ortogonal, *X* is linearly dependent upon several random unobserved variables*, F*1, *F*2, ... , *F*m called common factors and *p* sources of joining variables: 1, 2, ... , p, called errors or specific factors.

The model of Factor Analysis is represented below, where i is the average of the *i*-th variable, i is the *i*-th error, or specific factor, Fj is the *j*-th common factor and *l*ij is the weight of the *j*-th *F*j factor on *i*-th *X*<sup>i</sup> variable. Equation 1 shows the model represented in matrix terms.

$$\begin{cases} \mathbf{x}\_1 - \boldsymbol{\mu}\_1 = \mathbf{l}\_{11} \ \mathbf{F}\_1 + \mathbf{l}\_{12} \ \mathbf{F}\_2 + \dots + \mathbf{l}\_{1m} \ \mathbf{F}\_m + \boldsymbol{\varepsilon}\_1 & i = 1, 2, \dots, p \\ \mathbf{x}\_2 - \boldsymbol{\mu}\_2 = \mathbf{l}\_{21} \ \mathbf{F}\_1 + \mathbf{l}\_{22} \ \mathbf{F}\_2 + \dots + \mathbf{l}\_{2m} \ \mathbf{F}\_m + \boldsymbol{\varepsilon}\_2 & j = 1, 2, \dots, m \\ \dots \\ \mathbf{x}\_p - \boldsymbol{\mu}\_p = \mathbf{l}\_{p1} \ \mathbf{F}\_1 + \mathbf{l}\_{p2} \ \mathbf{F}\_2 + \dots + \mathbf{l}\_{pm} \ \mathbf{F}\_m + \boldsymbol{\varepsilon}\_p & m \le p. \end{cases} \quad j = 1, 2, \dots, m$$
 
$$\underline{\mathbf{X}} = \boldsymbol{\mu} + \mathbf{L} \mathbf{L}' \ \mathbf{ } + \boldsymbol{\Psi} \tag{1}$$

In order to estimate the loading *l*ij and the specific variables i, the method of principal components can be used, which is briefly described below [15].

If the pair of eigenvalues and eigenvectors are (i, *e*i) of the matrix of sample covariance *S*, with 1 2 ... <sup>p</sup>0 and *m*<*p* is the number of common factors the matrix of the estimated loadings is given by *L* = *CD*1/2, where *C* is the matrix of the eigenvectors and *D* is a diagonal matrix of which the diagonal elements are the eigenvalues.

90 Multivariate Analysis in Management, Engineering and the Sciences

Factor Analysis is a multivariate statistical method, which objective is to explain the correlations between one large set of variables in terms of a set of unobserved low random variables called factors. Hence, suppose the random vector *X* resulted from p random variables; *X'* = [*x1 x2 x3 ... xp*], and in order to study the **c**ovariance structure of this vector, in

be estimated and the relation between the evaluated variables represented by matrix of

the influence of latent variables (unobserved) or factors. Within a same group, the variables are highly correlated with each other, and from one group to another, the correlations are low. Each group represents a factor, which is responsible for the observed correlation.

where matrix *LL'* has on the main diagonal the called communality defined for each variable

part of the variance of the random variable *x*i that comes from *m* factors. And, the part of the variance of the random variable *x*i due to the factors *p – m* that are not important are called

*<sup>I</sup>* , i = 1, 2, …, p. There are many criteria to define *m* number of factors. The most used one is the Kaiser criterion [15], which suggests that the number of extracted factors must be equal to the

 or *ρ*. If *X* is a random vector, with *p* components, and the parameters E(X) = e V(X) = , in factor model ortogonal, *X* is linearly dependent upon several random unobserved variables*, F*1, *F*2,

*2 = li12 + li22 + ... + lim2,* i = 1, 2, ..., *p* variables. In this way, the communality *h*<sup>i</sup>

i is the *i*-th error, or specific factor, Fj is the *j*-th common factor and *l*ij is the

weight of the *j*-th *F*j factor on *i*-th *X*<sup>i</sup> variable. Equation 1 shows the model represented in

<sup>X</sup> <sup>μ</sup> l F l F l F <sup>ε</sup> 1,2,..., 1 1 11 1 12 2 1m m 1 <sup>X</sup> <sup>μ</sup> l F l F l F <sup>ε</sup> 1,2,..., 2 2 21 1 22 2 2m m 2

 

 

<sup>X</sup> <sup>μ</sup> l F l F l F <sup>ε</sup> . p p p1 1 p2 2 pm m

X LL' 

or of correlation *p.* The factor analysis makes a grouping of variables to explain

*2 = li12 + li22 + ... + lip2*. However, considering the *m* main factors, it is

1, 2, ... , 

*i p j m*

*m p <sup>p</sup>*

(1)

e V(X) =

 *= LL' +* 

can

> ,

2 is the

p, called errors or

i is the average of the *i*-th

i, the method of principal

other words, if *X* is observed *n* times, it happens that its parameters *E(X) =* 

The covariance matrix of the vector *X* can be placed in an exact form*: V(X) =* 

*2 +* 

... , *F*m called common factors and *p* sources of joining variables:

The model of Factor Analysis is represented below, where

In order to estimate the loading *l*ij and the specific variables

components can be used, which is briefly described below [15].

**5.1. Statistical multivariate analysis** 

*5.1.1. Factor analysis* 

considering *m* factors by: *hi*

specific variance. Hence, *V(xi) = hi*

number of eigenvalues higher than one, of

...

 

covariance

given that *hi*

specific factors.

matrix terms.

variable,

In the application of this method, the observations are primarily centralized or standardized, in other words, the matrix of correlation R (estimator of p) is used in order to avoid problems of scale. The specific variances estimated �� are given by the diagonal elements of the matrix �� = *S* – *LL*'.

In multiple actions, it is necessary to estimate the value of the scores of each factor (unobserved) for an individual *X* observation. These factor values are called factor scores. The estimated factor scores to the original variables are *F* = (*L*'*L*)-1 *L*'(*X* – *X* ) and for the standardized variables are *F* = (*L'L)'Lz,,* that is, if the principal components are used in order to estimate the loadings.

According to [15], with the rotation of factors, a structure is obtained for the low or moderated loadings on the other factors. This leads to a more simplified structure to be interpreted. Kaiser suggested an analytical measure known as *Varimax* criteria [15] in order to make the rotation.

The rotation coefficient scaled by the square root of the communalities is defined by � � �� = ��� ��� ,. The *Varimax* selects the orthogonal transformation *T* that turns *V* (given by equation 2) the largest possible, in other words, the procedure starts from *LT L*' ' and gives the loadings \* which comes from LT. Hence, the criterion is to maximize V.

$$V = \frac{1}{p} \sum\_{i=1}^{p} \left[ \sum\_{i=1}^{p} \underline{\underline{\iota}}\_{ij}^{\*4} - \sum\_{i=1}^{p} \binom{\underline{\iota}^{\*2}}{p}\_{p} \right] \tag{2}$$

In Factor Analysis, communality *h*<sup>i</sup> 2 is the portion of the variance of the variable that is attributed to the factors and represents the percentage of variation of the variable which is not random but from the factors. Thus, the criterion used to classify the patterns is sort the variables (instruments) according to their factor scores. The factor scores were evaluated by a factor that distinguishes the behavior of the instrument, using it as a practical and simple quality control of the measurement of the instrument.

To perform the ranking of the variables (instruments), a final factor score was used, which is given by equation (3), where *m* is the number of factors extracted, λi are the eigenvalues and *fi* are the factor scores.

$$final\\_factor\\_score = \frac{\sum\_{l=1}^{m} \lambda\_l f\_l}{\sum\_{l=1}^{m} \lambda\_l} \tag{3}$$

The Factor Analysis was done with the aid of the computational *Statgraphics* [13].
