*5.1.2. Cluster analysis*

The clustering is a manner of grouping in a way that those patterns inside the same group are very similar to each other, and different from patterns of the other groups. According to [16], cluster analysis is an analytical technique used to develop meaningful subgroups of objects. Its objective is to classify the objects in a small number of groups that are mutually exclusive. According, to [17], it is important to favor a small number of groups in cluster analysis.

Itaipu Hydroelectric Power Plant Structural Geotechnical Instrumentation

Temporal Data Under the Application of Multivariate Analysis – Grouping and Ranking Techniques 93

Methods that are not hierarchical or partitioning seek for a way of partitioning without the need of hierarchical associations. Optimizing some criteria, a partition of the elements on *k* 

The most known method among the nonhierarchical methods is the *k*-means cluster method [15]. Normally, the *k* clusters that are found are of better quality than the *k* clusters produced by the hierarchical methods. The methods of partitioning are advantageous in applications

The methods of the Multivariate Statistics field were used because these are already common methods. The Multivariate Statistic Analysis is an old method that has been made

The clustering of the patterns is based on the measure of similarity and dissimilarity. The similarity measure evaluates the similarities of the objects, in other words, the highest the measures value are the most similar are the objects. The most known mean of similarity is the correlation coefficient. The means of dissimilarity evaluates whether the objects are dissimilar, this is to say, that the highest the measure value are the less similar the objects

According to [15], Ward's method performs the join of two clusters based on the "loss of information". It is considered to be the criteria of "loss of information" the sum of the error square *(SQE)*. For each cluster *I*, the measure of the cluster (or centroide) of the cluster and the sum of the cluster error square *(SQEi)* which is the sum of the error square of each pattern of the cluster in relation to the measure. For cluster *k* there is *SQE1, SQE2, ..., SQEk*,

For each pair of cluster *m* and *n*, first, the measure (or centroide) of the formed cluster is calculated (cluster *mn*). Then, the sum of error for the square of cluster *mn* is calculated

The clusters *m* and *n* that show the lower increase on the sum of error square *(SQE)* (lower loss of information) will be gathered. According to [16], this method tends to obtain clusters

Cluster Analysis was applied with the aid of the computational software *Statgraphics* [13]. The measure of dissimilarity used was the Euclidean distance. The data were standardized.

This research was performed during the first author's (Rosangela Villwock) doctorate process, from 2005 to 2009, in the Post-graduation Program on Numerical Methods in Engineering, of the Federal University of Paraná, guided by the second author of this text

1 2 ... *<sup>k</sup> SQE SQE SQE SQE* (4)

1 2 ... – – *k m n mn SQE SQE SQE SQE SQE SQE SQE* (5)

feasibly recently with the advance of present, fast and economic computation.

are. The most known measure of dissimilarity is the Euclidean distance.

group is selected [18].

that involve larger series of data.

where *SQE* is defined by equation 4.

(*SQEmn)*, according to equation 5.

**6. Status** 

of same size due to the deacrese of its internal variation.

The clustering algorithms can be divided into categories in many ways, according to its characteristics. The two main classes of clustering are: the hierarchical methods and the nonhierarchical methods.

The hierarchical methods include techniques that connection of the items assuming obtain various levels of clustering. The hierarchical methods can be subdivided into divisive or agglomerative ones. The agglomerative hierarchical method considers at the beginning each pattern as a group and interactively, clusters a pair of groups that are the most similar with a new group until there is only one group containing all patterns. In the other hand, the divisive hierarchical method, starts with a single group and performs a process of successive subdivisions [18].

The most popular hierarchical clustering methods are: Single Linkage, Complete Linkage, Average Linkage and Ward's Method. The most common method of representing a hierarchical cluster is using a dendrogram that represents the clustering of the patterns and the levels of similarity in which the groups are formed. The dendrograms can be divided in different levels, showing different groups [19].

In the dendrogram (figure 6), two groups can be seen by admitting a cut on the level represented by the figure. The first one composed by patterns *P1, P2* and *P5* and the second one composed by patterns *P3* and *P4.* 

**Figure 6.** Example of dendrogram.

Methods that are not hierarchical or partitioning seek for a way of partitioning without the need of hierarchical associations. Optimizing some criteria, a partition of the elements on *k*  group is selected [18].

The most known method among the nonhierarchical methods is the *k*-means cluster method [15]. Normally, the *k* clusters that are found are of better quality than the *k* clusters produced by the hierarchical methods. The methods of partitioning are advantageous in applications that involve larger series of data.

The methods of the Multivariate Statistics field were used because these are already common methods. The Multivariate Statistic Analysis is an old method that has been made feasibly recently with the advance of present, fast and economic computation.

The clustering of the patterns is based on the measure of similarity and dissimilarity. The similarity measure evaluates the similarities of the objects, in other words, the highest the measures value are the most similar are the objects. The most known mean of similarity is the correlation coefficient. The means of dissimilarity evaluates whether the objects are dissimilar, this is to say, that the highest the measure value are the less similar the objects are. The most known measure of dissimilarity is the Euclidean distance.

According to [15], Ward's method performs the join of two clusters based on the "loss of information". It is considered to be the criteria of "loss of information" the sum of the error square *(SQE)*. For each cluster *I*, the measure of the cluster (or centroide) of the cluster and the sum of the cluster error square *(SQEi)* which is the sum of the error square of each pattern of the cluster in relation to the measure. For cluster *k* there is *SQE1, SQE2, ..., SQEk*, where *SQE* is defined by equation 4.

$$\text{SQE} = \text{SQE}\_1 + \text{SQE}\_2 + \dots + \text{SQE}\_k \tag{4}$$

For each pair of cluster *m* and *n*, first, the measure (or centroide) of the formed cluster is calculated (cluster *mn*). Then, the sum of error for the square of cluster *mn* is calculated (*SQEmn)*, according to equation 5.

$$\text{SQE} = \text{SQE}\_1 + \text{SQE}\_2 + \dots + \text{SQE}\_k - \text{SQE}\_m - \text{SQE}\_n + \text{SQE}\_{mn} \tag{5}$$

The clusters *m* and *n* that show the lower increase on the sum of error square *(SQE)* (lower loss of information) will be gathered. According to [16], this method tends to obtain clusters of same size due to the deacrese of its internal variation.

Cluster Analysis was applied with the aid of the computational software *Statgraphics* [13]. The measure of dissimilarity used was the Euclidean distance. The data were standardized.

#### **6. Status**

92 Multivariate Analysis in Management, Engineering and the Sciences

The clustering is a manner of grouping in a way that those patterns inside the same group are very similar to each other, and different from patterns of the other groups. According to [16], cluster analysis is an analytical technique used to develop meaningful subgroups of objects. Its objective is to classify the objects in a small number of groups that are mutually exclusive. According, to [17], it is important to favor a small number of groups in cluster

The clustering algorithms can be divided into categories in many ways, according to its characteristics. The two main classes of clustering are: the hierarchical methods and the

The hierarchical methods include techniques that connection of the items assuming obtain various levels of clustering. The hierarchical methods can be subdivided into divisive or agglomerative ones. The agglomerative hierarchical method considers at the beginning each pattern as a group and interactively, clusters a pair of groups that are the most similar with a new group until there is only one group containing all patterns. In the other hand, the divisive hierarchical method, starts with a single group and performs a process of successive

The most popular hierarchical clustering methods are: Single Linkage, Complete Linkage, Average Linkage and Ward's Method. The most common method of representing a hierarchical cluster is using a dendrogram that represents the clustering of the patterns and the levels of similarity in which the groups are formed. The dendrograms can be divided in

In the dendrogram (figure 6), two groups can be seen by admitting a cut on the level represented by the figure. The first one composed by patterns *P1, P2* and *P5* and the second

*5.1.2. Cluster analysis* 

nonhierarchical methods.

subdivisions [18].

different levels, showing different groups [19].

one composed by patterns *P3* and *P4.* 

**Figure 6.** Example of dendrogram.

analysis.

This research was performed during the first author's (Rosangela Villwock) doctorate process, from 2005 to 2009, in the Post-graduation Program on Numerical Methods in Engineering, of the Federal University of Paraná, guided by the second author of this text

(Maria Teresinha Arns Steiner). This study was part of a project guide by the third author (Andrea Sell Dyminski), called "*Analise de Incertezas e Estimação de Valores de Controle para o Sistema de Monitoração Geotécnico-estrutural na Barragem de Itaipu"* (Estimation of Control Values for the System of Geotechnic-structural Monitoring in the Itaipu Dam). All the research process counted with the collaboration of the fourth author (Anselmo Chaves Neto) and it was also supervised by him.

Itaipu Hydroelectric Power Plant Structural Geotechnical Instrumentation

Temporal Data Under the Application of Multivariate Analysis – Grouping and Ranking Techniques 95

showed to the technician. However, on cluster 2, three rods of extensometers installed in joint B were observed, and in cluster 3, three rods of extensometers installed in the basaltic

**Figure 7.** Dendrogram showing the formation of the clusters in different types of cuts (Ward's method).

Figure 8 shows the graphic of all the rods of the extensometer during the period of study. The lines were colored according to the cluster of which the rods belong to (black, blue and yellow for clusters 1, 2 and 3, respectively). It is possible to note the distinction between the clusters. This distinction of clusters is not easily recognized when there is no previous knowledge about these three clusters. The task would not be possible if a larger cluster of

Cluster 1, which is composed by rods of extensometers installed on the upstream of the dam, clearly shows the effects of summer and winter. The clusters 2 and 3 are separated due to the absolute measures. This separation can be justified by the fact that they are indifferent conditions, which is more superficial in cluster 2, and deeper in cluster 3. Once the readings of the most superficial rods and the readings of the deepest rods are summed up, these

Table 2 shows the most important rod for each of eight factors, for instances, the rod dominating each factor. Notice that in table 2 the factor 2 is dominated by the rod equip1\_1, equip1\_2, equip4\_1, equip4\_2, equip6\_1, equip6\_2, equip8\_1, equip8\_3, equip21\_1, equip21\_2, equip25\_3, equip26\_2 e equip31\_1. This factor has 10 of the 11 rods that are part

data hat to be analyzed, hence, the importance of this type of analysis.

measures are larger.

rocks B and C and in the lithological contact B/C were observed.

As mentioned before, the aim of this paper is to identify the instruments that are the most significant to the analysis of the behavior of dams. There are no records of the existence of methods that perform the ranking of the instruments of monitoring dams. In order to achieve this aim, it is necessary to select, cluster and rank geotechnical-structural instruments of an electric power plant looking forward to maximizing the effectiveness and efficiency of the readings analysis, in our case the Itaipu Hydroelectric Power Plant. In case of needing to intensify the reading this hierarchy could be useful to define which instruments to choose.

The choice of instrumentation is performed with no previous knowledge about the location, features, or characteristics of the instruments. In this way, it is possible to think of applying the methodology when making decisions about the automation of the additional instruments. Approaches that are similar to this can be used in many other cases because there are hundreds of large Civil Engineering construction works that rely on systems of instrumentation in Brazil which the data must have an appropriate treatment.
