2. Criteria for stability of clustering

Let the sample of objects <sup>Χ</sup> <sup>¼</sup> <sup>x</sup><sup>i</sup> f g ; <sup>i</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>;…; <sup>m</sup> , <sup>x</sup><sup>i</sup> <sup>∈</sup>Rn be given and <sup>Κ</sup> <sup>¼</sup> f g <sup>K</sup>1;K2;…;Kl is the clustering of the sample into <sup>l</sup> clusters obtained by some method, Ki⊆Χ, i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, l, <sup>∪</sup><sup>l</sup> <sup>1</sup>Ki ¼ Χ, Ki ∩Kj ¼ ∅, i 6¼ j: Speaking of clustering, we mean applying a method to a sample without focusing on the method itself. Is partition Κ of a sample by this method clustering or here some kind of stopping criterion is satisfied? For example, an extremum of some functional is obtained or the maximum number of operations in the iterative process is fulfilled. We will use the following thesis as the main one. If the resulting partition Κ is indeed clustering, then it must be the same clustering for any minimal change in the sample Χ. Let x<sup>i</sup> be arbitrary, x<sup>i</sup> ∈K<sup>α</sup> ensemble then the sample <sup>Χ</sup>∖f g <sup>x</sup><sup>i</sup> partition <sup>Κ</sup><sup>∗</sup>ð Þ¼ <sup>x</sup><sup>i</sup> <sup>K</sup><sup>∗</sup> 1;K<sup>∗</sup> 2;…;K<sup>∗</sup> l � �, K<sup>∗</sup> <sup>j</sup> ¼ Kj, j ¼ 1, 2, …, l, j 6¼ <sup>α</sup>, K<sup>∗</sup> <sup>α</sup> ¼ Kα∖f g x<sup>i</sup> , i ¼ 1, 2, …, m must be clustering. The fact of "coincidence" of clusterings <sup>Κ</sup> <sup>¼</sup> f g <sup>K</sup>1;K2;…; Kl and <sup>Κ</sup><sup>∗</sup>ð Þ¼ <sup>x</sup><sup>i</sup> <sup>K</sup><sup>∗</sup> 1;K<sup>∗</sup> <sup>2</sup>;…; K<sup>∗</sup> l � � will be called identity, the clusterings themselves are identical and denoted it as <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup>. In this case, it is natural to call a partition <sup>Κ</sup> as stable clustering if the partitions <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> and <sup>Κ</sup> coincide for all <sup>x</sup>i, i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m. In the case of non-identity of some individual <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> with <sup>Κ</sup>, we will call <sup>Κ</sup> as quasi-clustering.

Definition 1. The quality of quasi-clustering (of unstable clustering) is the quantity <sup>Ф</sup>ð Þ¼ <sup>Κ</sup> xi; <sup>i</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>;…; <sup>m</sup> : <sup>Κ</sup><sup>∗</sup> j j f g ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> <sup>=</sup>m.

If Фð Þ¼ Κ 1, then in this case, we will talk about stable clustering Κ or simply clustering. Suppose that for some i, i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m the condition <sup>Κ</sup><sup>∗</sup> ð Þ x<sup>i</sup> ≈ Κ is not satisfied, and Κ∘ ð Þ¼ <sup>x</sup><sup>i</sup> <sup>K</sup><sup>∘</sup> <sup>1</sup> ;K<sup>∘</sup> <sup>2</sup> ; …;K<sup>∘</sup> l � � is the clustering of the sample <sup>X</sup>∖f g <sup>x</sup><sup>i</sup> obtained from the partition <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> using <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> as the initial approximation. Then <sup>Κ</sup><sup>∘</sup> ð Þ x<sup>i</sup> can significantly differ from Κ∗ ð Þ <sup>x</sup><sup>i</sup> . We will use as a function of the proximity between clustering <sup>Κ</sup><sup>∘</sup> ð Þ x<sup>i</sup> and partitioning Κ the value d Κ<sup>∘</sup> ð Þ x<sup>i</sup> ð Þ¼ ; Κ max<sup>α</sup> P<sup>l</sup> <sup>i</sup>¼<sup>1</sup> <sup>K</sup><sup>∘</sup> <sup>i</sup> ∩ K<sup>α</sup><sup>i</sup> � � � �=ð Þ m � 1 . Note that to calculate proximity it is required to find the maximum matching in a bipartite graph, for which there is a polynomial algorithm [16]. If Κ<sup>∘</sup> ð Þ <sup>x</sup><sup>i</sup> does not exist, we will assume that <sup>d</sup> <sup>Κ</sup><sup>∘</sup>ð Þ <sup>x</sup><sup>i</sup> ð Þ¼ ; <sup>Κ</sup> 0.

Definition 2. The quality Fmin(Κ) of the quasi-clustering Κ will be called the quantity <sup>F</sup>minð Þ¼ <sup>Κ</sup> min<sup>i</sup> <sup>d</sup> <sup>Κ</sup><sup>∘</sup> ð Þ x<sup>i</sup> ð Þ ; Κ .

Definition 3. The quality Favr(Κ) of the quasi-clustering Κ will be called the quantity Favrð Þ¼ <sup>Κ</sup> <sup>P</sup><sup>m</sup> <sup>i</sup>¼<sup>1</sup> <sup>d</sup> <sup>Κ</sup><sup>∘</sup>ð Þ <sup>x</sup><sup>i</sup> ð Þ ; <sup>Κ</sup> <sup>=</sup>m.

For some clustering algorithms, there are simple economical rules for computing Фð Þ Κ . Let us bring them (see also in [3, 17, 18]).

#### 2.1. Method of minimizing the dispersion criterion

It is known that in order to minimize the dispersion criterion, it suffices to satisfy inequalities

$$\frac{n\_j}{(n\_j - 1)} \|\mathbf{x}^\times - \mathbf{m}\_j\|^2 - \frac{n\_k}{(n\_k + 1)} \|\mathbf{x}^\times - \mathbf{m}\_k\|^2 \le 0 \tag{1}$$

for any clusters Kj and Kk, arbitrary x� ∈Kj , where nj ¼ Kj � � � � � �, <sup>m</sup><sup>j</sup> <sup>¼</sup> <sup>1</sup> nj P x<sup>t</sup> ∈ Kj xt.

We establish the conditions for the identity <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> of the partitions <sup>Κ</sup><sup>∗</sup> ð Þ x<sup>i</sup> and Κ. In the case <sup>x</sup>� <sup>∈</sup>Kj [considering (Eq. (1))] to satisfy the condition <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> inequalities.

$$\frac{\left(\mathbf{x} - \mathbf{1}\right)}{\left(\mathbf{x} - \mathbf{2}\right)} \left\|\mathbf{x}^{\times} - \mathbf{m}\_{\uparrow}\right\|^2 + \frac{2}{\left(\mathbf{x} - \mathbf{2}\right)} \left(\mathbf{x}^{\times} - \mathbf{m}\_{\slash}, \mathbf{x}\_{i} - \mathbf{m}\_{\uparrow}\right) + \frac{1}{\left(\mathbf{x} - \mathbf{1}\right)\left(\mathbf{x} - \mathbf{2}\right)} \left\|\mathbf{x}\_{i} - \mathbf{m}\_{\uparrow}\right\|^2 - \frac{\mathbf{m}\_{i}}{n+1} \left\|\mathbf{x}^{\times} - \mathbf{m}\_{k}\right\|^2 \le 0 \quad \text{must} $$
   
  $\text{be satisfied. In the case } \mathbf{x}^{\times} \in \mathcal{K}\_{k}$   $\text{inequalities } \frac{\mathbf{m}\_{i}}{\left(\mathbf{x}\_{i} - \mathbf{1}\right)} \left\|\mathbf{x}^{\times} - \mathbf{m}\_{k}\right\|^2 - \frac{\left(\mathbf{x} - \mathbf{1}\right)}{n\_{j}} \left\|\mathbf{x}^{\times} - \mathbf{m}\_{\uparrow}\right\|^2 - \frac{2}{n\_{j}} (\mathbf{x}^{\times} - \mathbf{m}\_{\downarrow}) + \frac{1}{\left(\mathbf{x} - \mathbf{m}\_{\downarrow}\right)} \left\|\mathbf{x}^{\times} - \mathbf{m}\_{\downarrow}\right\|^2 \le 0 \text{ must be satisfied.}$ 

#### 2.2. k-means method

change with a small change in the data. Criteria are introduced for the quality of the partition obtained. If the criterion value is less than one, then the partition is unstable. Let us obtain for the same data N clusterings. How to create a new ensemble clustering based on the N partitions? Previously, a committee method for building ensemble clusterings was proposed [12–15]. Let there be N results of cluster analysis of the same data for l clusters. The committee method of building ensemble clustering makes it possible to build such l clusters, each of which is the intersection of "many" initial clusters. In other words, we find such l clusters whose objects are "equivalent" to each other according to several principles. As initial N clusterings, one can take stable ones. Finally, we consider a video-logical approach to building

Let the sample of objects <sup>Χ</sup> <sup>¼</sup> <sup>x</sup><sup>i</sup> f g ; <sup>i</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>;…; <sup>m</sup> , <sup>x</sup><sup>i</sup> <sup>∈</sup>Rn be given and <sup>Κ</sup> <sup>¼</sup> f g <sup>K</sup>1;K2;…;Kl is the

Ki ∩Kj ¼ ∅, i 6¼ j: Speaking of clustering, we mean applying a method to a sample without focusing on the method itself. Is partition Κ of a sample by this method clustering or here some kind of stopping criterion is satisfied? For example, an extremum of some functional is obtained or the maximum number of operations in the iterative process is fulfilled. We will use the following thesis as the main one. If the resulting partition Κ is indeed clustering, then it must be the same clustering for any minimal change in the sample Χ. Let x<sup>i</sup> be arbitrary, x<sup>i</sup> ∈K<sup>α</sup>

<sup>α</sup> ¼ Kα∖f g x<sup>i</sup> , i ¼ 1, 2, …, m must be clustering. The fact of "coincidence" of clusterings

1;K<sup>∗</sup>

ð Þ x<sup>i</sup> with Κ, we will call Κ as quasi-clustering.

� � will be called identity, the clusterings them-

2;…;K<sup>∗</sup> l � �, K<sup>∗</sup> <sup>1</sup>Ki ¼ Χ,

<sup>j</sup> ¼ Kj, j ¼ 1, 2, …,

ð Þ x<sup>i</sup> ≈ Κ is not satisfied, and

ð Þ x<sup>i</sup> can significantly differ from

�=ð Þ m � 1 . Note that to calculate proximity it is

ð Þ x<sup>i</sup> ð Þ¼ ; Κ 0.

ð Þ x<sup>i</sup> and partitioning

clustering of the sample into <sup>l</sup> clusters obtained by some method, Ki⊆Χ, i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, l, <sup>∪</sup><sup>l</sup>

1;K<sup>∗</sup>

<sup>2</sup>;…; K<sup>∗</sup> l

selves are identical and denoted it as <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup>. In this case, it is natural to call a partition <sup>Κ</sup> as stable clustering if the partitions <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> and <sup>Κ</sup> coincide for all <sup>x</sup>i, i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m. In the case of

Definition 1. The quality of quasi-clustering (of unstable clustering) is the quantity

If Фð Þ¼ Κ 1, then in this case, we will talk about stable clustering Κ or simply clustering.

required to find the maximum matching in a bipartite graph, for which there is a polynomial

Definition 2. The quality Fmin(Κ) of the quasi-clustering Κ will be called the quantity

<sup>i</sup> ∩ K<sup>α</sup><sup>i</sup>

ð Þ <sup>x</sup><sup>i</sup> does not exist, we will assume that <sup>d</sup> <sup>Κ</sup><sup>∘</sup>

� � is the clustering of the sample <sup>X</sup>∖f g <sup>x</sup><sup>i</sup> obtained from the partition

the initial N coarse clusterings.

222 Recent Applications in Data Clustering

l, j 6¼ <sup>α</sup>, K<sup>∗</sup>

Κ∘

Κ∗

ð Þ¼ <sup>x</sup><sup>i</sup> <sup>K</sup><sup>∘</sup>

Κ the value d Κ<sup>∘</sup>

algorithm [16]. If Κ<sup>∘</sup>

<sup>F</sup>minð Þ¼ <sup>Κ</sup> min<sup>i</sup> <sup>d</sup> <sup>Κ</sup><sup>∘</sup>

<sup>1</sup> ;K<sup>∘</sup>

2. Criteria for stability of clustering

<sup>Κ</sup> <sup>¼</sup> f g <sup>K</sup>1;K2;…; Kl and <sup>Κ</sup><sup>∗</sup>ð Þ¼ <sup>x</sup><sup>i</sup> <sup>K</sup><sup>∗</sup>

non-identity of some individual Κ<sup>∗</sup>

<sup>Ф</sup>ð Þ¼ <sup>Κ</sup> xi; <sup>i</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>;…; <sup>m</sup> : <sup>Κ</sup><sup>∗</sup> j j f g ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> <sup>=</sup>m.

<sup>2</sup> ; …;K<sup>∘</sup> l

ð Þ x<sup>i</sup> ð Þ¼ ; Κ max<sup>α</sup>

ð Þ x<sup>i</sup> ð Þ ; Κ .

ensemble then the sample <sup>Χ</sup>∖f g <sup>x</sup><sup>i</sup> partition <sup>Κ</sup><sup>∗</sup>ð Þ¼ <sup>x</sup><sup>i</sup> <sup>K</sup><sup>∗</sup>

Suppose that for some i, i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m the condition <sup>Κ</sup><sup>∗</sup>

<sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> using <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> as the initial approximation. Then <sup>Κ</sup><sup>∘</sup>

ð Þ <sup>x</sup><sup>i</sup> . We will use as a function of the proximity between clustering <sup>Κ</sup><sup>∘</sup>

� � �

P<sup>l</sup> <sup>i</sup>¼<sup>1</sup> <sup>K</sup><sup>∘</sup> Let the clustering Κ be obtained by k-means method, that is, kx� � mjk ≤ kx� � mkk, ∀j 6¼ k, ∀x� ∈Kj. In the case of equality, the object is considered to belong to a cluster with a lower number. Then, Κ<sup>∗</sup> ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> is satisfied if <sup>k</sup>x� � <sup>m</sup>jk<sup>2</sup> <sup>þ</sup> <sup>2</sup> ð Þ nj�<sup>1</sup> <sup>x</sup>� � <sup>m</sup>j; <sup>x</sup><sup>i</sup> � <sup>m</sup><sup>j</sup> � � <sup>þ</sup> <sup>1</sup> ð Þ nj�<sup>1</sup> <sup>2</sup> <sup>k</sup>xi� <sup>m</sup>jk<sup>2</sup> <sup>≤</sup> <sup>k</sup>x� � <sup>m</sup>kk<sup>2</sup> under <sup>x</sup>� <sup>∈</sup> Kj, <sup>x</sup>� 6¼ <sup>x</sup><sup>i</sup> and <sup>k</sup>x� � <sup>m</sup>kk<sup>2</sup> <sup>≤</sup> <sup>k</sup>x� � <sup>m</sup>jk<sup>2</sup> <sup>þ</sup> <sup>2</sup> ð Þ nj�<sup>1</sup> x� � mj; x<sup>i</sup> � �mjÞ þ <sup>1</sup> ð Þ nj�<sup>1</sup> <sup>2</sup> <sup>k</sup>x<sup>i</sup> � <sup>m</sup>jk<sup>2</sup> under <sup>x</sup>� <sup>∈</sup>Kk.

#### 2.3. Method of hierarchical agglomeration grouping

We confine ourselves to the case of an agglomeration hierarchical grouping. To find the value of the criterion <sup>Ф</sup>ð Þ <sup>Κ</sup> , you can calculate the partitioning <sup>Κ</sup>, partitions <sup>Κ</sup><sup>∘</sup>ð Þ <sup>x</sup><sup>i</sup> , i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m, and compare <sup>Κ</sup> with each <sup>Κ</sup><sup>∘</sup>ð Þ <sup>x</sup><sup>i</sup> , i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m. Here it is possible to save in the calculation of <sup>Ф</sup>ð Þ <sup>Κ</sup> without carrying through the clustering for some of "i " . Indeed, let there Κ<sup>t</sup> ð Þ¼ <sup>x</sup><sup>i</sup> <sup>K</sup><sup>t</sup> 1;K<sup>t</sup> 2;…;K<sup>t</sup> m�t � � be clustering of the sample X∖f g x<sup>i</sup> into m � t clusters, t ≤ m � l. Κ is a partition obtained by the clustering algorithm X. The main property of the hierarchical grouping is that for any <sup>k</sup> <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m � <sup>t</sup> there is <sup>j</sup> <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m � <sup>t</sup> � 1 for which <sup>K</sup><sup>t</sup> k⊆K<sup>t</sup>þ<sup>1</sup> <sup>j</sup> . In this case, if at some step t, t <sup>≤</sup> <sup>m</sup> � <sup>l</sup> for some <sup>k</sup> the condition <sup>K</sup><sup>t</sup> <sup>k</sup>⊆Kj does not hold for all j ¼ 1, 2, …, l, then the condition <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> will not be fulfilled.

#### 2.4. Examples

We give some examples illustrating the stability criteria introduced.

1. Below are the results obtained for model samples. The method of clustering based on the minimization of the dispersion criterion [3] has been used. As the initial data, we used samples of a mixture of two two-dimensional normal distributions with independent features, different а, and σ. Examples are shown in Figures 1–3 (images of the samples in question) and in Tables 1 and 2. Figure 1 represents a sample of 200 objects for which all the criteria Фð Þ Κ , Fmin(Κ), Favr(Κ) are equal to 1, and the resulting clustering into two clusters is stable clustering. Here we used distributions with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , and σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 3; 3 .

Further, with the same parameters а1, а2, experiments were carried out for σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 5; 5 .

Then, we used distributions with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 10; 10 , m ¼ 200. In this case, we have the case of strongly intersecting distributions. Formally, the clustering method gives a quasi-clustering, approximately corresponding to the partitioning of the original sample (Figure 3) into two sets by a diagonal from the upper left corner of the picture to the lower right. The values of the criteria in Table 2 were obtained.

Figure 2. Clustering in a task with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 5; 5 , m ¼ 200.

Collective Solutions on Sets of Stable Clusterings http://dx.doi.org/10.5772/intechopen.76189 225

Figure 3. Data with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 10; 10 , m ¼ 200.

Table 1. Values of quasi-clustering criteria.

Фð Þ Κ 0.995 Fmin(Κ) 0.995 Favr(Κ) 0.999

2. Data clustering of [19] and criteria values Фð Þ Κ , Fmin(Κ), Favr(Κ). The following data from classification problem of electromagnetic signals were considered: n ¼ 34, m<sup>1</sup> ¼ 225, m<sup>2</sup> ¼ 126, l ¼ 2. We give the values of the stability criteria obtained. Figure 4 shows the visualization [3] of the sample. The accuracy of the supervised classification methods was about 87% of the correct answers. However, the clustering of data turned out to be only quasi-clustering (Table 3).

Figure 1. Clustering in a task with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 3; 3 , m ¼ 200.

Figure 2. Clustering in a task with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 5; 5 , m ¼ 200.

Figure 3. Data with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 10; 10 , m ¼ 200.


Table 1. Values of quasi-clustering criteria.

2.4. Examples

224 Recent Applications in Data Clustering

σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 3; 3 .

obtained.

quasi-clustering (Table 3).

We give some examples illustrating the stability criteria introduced.

1. Below are the results obtained for model samples. The method of clustering based on the minimization of the dispersion criterion [3] has been used. As the initial data, we used samples of a mixture of two two-dimensional normal distributions with independent features, different а, and σ. Examples are shown in Figures 1–3 (images of the samples in question) and in Tables 1 and 2. Figure 1 represents a sample of 200 objects for which all the criteria Фð Þ Κ , Fmin(Κ), Favr(Κ) are equal to 1, and the resulting clustering into two clusters is stable clustering. Here we used distributions with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , and

Further, with the same parameters а1, а2, experiments were carried out for σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 5; 5 . Then, we used distributions with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 10; 10 , m ¼ 200. In this case, we have the case of strongly intersecting distributions. Formally, the clustering method gives a quasi-clustering, approximately corresponding to the partitioning of the original sample (Figure 3) into two sets by a diagonal from the upper left corner of the picture to the lower right. The values of the criteria in Table 2 were

2. Data clustering of [19] and criteria values Фð Þ Κ , Fmin(Κ), Favr(Κ). The following data from classification problem of electromagnetic signals were considered: n ¼ 34, m<sup>1</sup> ¼ 225, m<sup>2</sup> ¼ 126, l ¼ 2. We give the values of the stability criteria obtained. Figure 4 shows the visualization [3] of the sample. The accuracy of the supervised classification methods was about 87% of the correct answers. However, the clustering of data turned out to be only

Figure 1. Clustering in a task with parameters а<sup>1</sup> ¼ ð Þ 0; 0 , а<sup>2</sup> ¼ ð Þ 9; 9 , σ<sup>1</sup> ¼ σ<sup>2</sup> ¼ ð Þ 3; 3 , m ¼ 200.


best clustering using a finite set of given solutions? Here, all problems are connected primarily with the absence of a single generally accepted criterion. Each clustering algorithm finds such "source" clusters of objects that are "equivalent" to each other. In this chapter, it is proposed to build such a clustering of the initial data, the cluster solutions of which have a large intersec-

Let the sample of objects <sup>Χ</sup> <sup>¼</sup> f g <sup>x</sup>1; <sup>x</sup>2;…; <sup>x</sup><sup>m</sup> , <sup>x</sup><sup>i</sup> <sup>∈</sup>R<sup>n</sup> for supervised classification and <sup>l</sup> classes are given. In the theory of supervised classification, the following definition of the supervised classification algorithm exists [21]. Let αij ∈f g 0; 1 be equal to 1 when the object xi, i ¼ 1, 2, …, m

tion of classes is allowed. Unlike the supervised classification problem, when clustering a

, αij ∈ f g 0; 1 and I

Definition 5. A clustering algorithm is an algorithm that maps a sample Χ to a set of equiva-

The number of clusters and the length of the control sample are considered to be given. This definition emphasizes the fact that in an arbitrary partition of a sample into l clusters, we have complete freedom in the numbering of clusters. In what follows we shall always consider

� : � 0 ¼ kα 0 ijk m�l , α 0

ð Þ¼k <sup>Χ</sup> <sup>α</sup>ijk<sup>m</sup>�<sup>l</sup>

Collective Solutions on Sets of Stable Clusterings http://dx.doi.org/10.5772/intechopen.76189

<sup>N</sup> for clustering and their solutions Ac

<sup>m</sup>�<sup>l</sup> an arbitrary element of the clustering

0 1; I 0 <sup>2</sup>; …; I 0 N � �; <sup>I</sup>

, <sup>с</sup>ij <sup>∈</sup> f g <sup>0</sup>; <sup>1</sup> � (that is, the con-

�

� � <sup>¼</sup> <sup>B</sup> ¼ kbijk<sup>m</sup>�<sup>l</sup> is called an adder if bij <sup>¼</sup> <sup>P</sup><sup>N</sup>

. Here the intersec-

227

ij ∈f g 0; 1 are said to be

<sup>ν</sup>ð Þ¼ Χ

0 <sup>ν</sup> ∈ Κð Þ I<sup>ν</sup>

> <sup>ν</sup>¼<sup>1</sup> <sup>α</sup> 0 ν ij .

n o,

is classified by the algorithm Ar as x<sup>i</sup> ∈ Kj and 0 otherwise: A<sup>r</sup>

equivalent if they are equals to within a permutation of the columns.

ð Þ¼ <sup>Χ</sup> <sup>Κ</sup> <sup>k</sup>αijk<sup>m</sup>�<sup>l</sup>

1, A<sup>c</sup>

for sample <sup>Χ</sup>. We denote <sup>I</sup><sup>ν</sup> ¼ kα<sup>ν</sup>

Therefore, we have Ι ¼ Κð Þ� I<sup>1</sup> Κð Þ� I<sup>2</sup> … � Κð Þ IN or set Ι ¼ I

1. Construction of the mapping <sup>Ι</sup> on, <sup>Κ</sup>c, <sup>Ι</sup> ! <sup>Κ</sup><sup>c</sup> <sup>¼</sup> <sup>Κ</sup>kсijk<sup>m</sup>�<sup>l</sup>

0 1; I 0 <sup>2</sup>;…; I 0 N

2. Finding the optimal element in Κ<sup>c</sup> (i.e. finding the best clustering in Κc).

struction of some kind of clustering).

It is clear that 0 ≤ bij ≤ N, bij ∈ f g 0; 1; 2;…; N .

<sup>2</sup>, …, Ac

ijk

It is clear that this definition defines a class of equivalent matrices for some matrix.

sample, we have freedom in the designation of clusters.

Definition 4. The matrices <sup>I</sup> ¼ kαijk<sup>m</sup>�<sup>l</sup>

tion with the initial clusters.

lent information matrices A<sup>c</sup>

matrices of dimension m � l.

.

There are two problems.

Definition 6. An operator Β I

<sup>Κ</sup> <sup>k</sup>α<sup>v</sup> ijk m�l � �

<sup>Κ</sup> <sup>k</sup>α<sup>v</sup> ijk m�l � �

I 0 <sup>ν</sup> ¼ kα 0 ν ij k m�l .

Let there be given N algorithms A<sup>c</sup>

Table 2. Values of quasi-clustering criteria. Сase of very intersecting distributions

Figure 4. Data visualization.


Table 3. The values of the criteria in the problem "ionosphere" Фð Þ Κ , Fmin(Κ), Favr(Κ).

#### 3. Committee synthesis of ensemble clustering

The problem is as follows. There are N clusterings for the same number of clusters. How to choose from them the only one or build a new clustering from the available ones? In the supervised classification problem (with the help of a collective solution of a set of algorithms) there is a criterion according to which one can choose an algorithm from existing ones or build a new algorithm. This is a supervised classification error. This direction in the theory of classification appeared in the early 1970s of the last century [20, 21], then was created an algebraic approach [22], various correctors were appeared. The key in the algebraic approach is the creation in the form of special algebraic polynomials of a correct (error-free) algorithm based on a set of supervised classification algorithms. Some algebraic operations on matrices of "degrees of belonging" of recognized objects are used. Various types of correctors were also created [22–25], when the problem of constructing (and applying) the best algorithm is also solved in two stages. First, the supervised classification algorithms are determined, and then the corrector. This can be, for example, the problem of approximating a given partial Boolean function by some monotonic function. In recent decades, there are conferences on multiple classifier systems, these issues are reflected in the books [21, 10]. How to choose or create the best clustering using a finite set of given solutions? Here, all problems are connected primarily with the absence of a single generally accepted criterion. Each clustering algorithm finds such "source" clusters of objects that are "equivalent" to each other. In this chapter, it is proposed to build such a clustering of the initial data, the cluster solutions of which have a large intersection with the initial clusters.

Let the sample of objects <sup>Χ</sup> <sup>¼</sup> f g <sup>x</sup>1; <sup>x</sup>2;…; <sup>x</sup><sup>m</sup> , <sup>x</sup><sup>i</sup> <sup>∈</sup>R<sup>n</sup> for supervised classification and <sup>l</sup> classes are given. In the theory of supervised classification, the following definition of the supervised classification algorithm exists [21]. Let αij ∈f g 0; 1 be equal to 1 when the object xi, i ¼ 1, 2, …, m is classified by the algorithm Ar as x<sup>i</sup> ∈ Kj and 0 otherwise: A<sup>r</sup> ð Þ¼k <sup>Χ</sup> <sup>α</sup>ijk<sup>m</sup>�<sup>l</sup> . Here the intersection of classes is allowed. Unlike the supervised classification problem, when clustering a sample, we have freedom in the designation of clusters.

Definition 4. The matrices <sup>I</sup> ¼ kαijk<sup>m</sup>�<sup>l</sup> , αij ∈ f g 0; 1 and I 0 ¼ kα 0 ijk m�l , α 0 ij ∈f g 0; 1 are said to be equivalent if they are equals to within a permutation of the columns.

It is clear that this definition defines a class of equivalent matrices for some matrix.

Definition 5. A clustering algorithm is an algorithm that maps a sample Χ to a set of equivalent information matrices A<sup>c</sup> ð Þ¼ <sup>Χ</sup> <sup>Κ</sup> <sup>k</sup>αijk<sup>m</sup>�<sup>l</sup> � : �

The number of clusters and the length of the control sample are considered to be given. This definition emphasizes the fact that in an arbitrary partition of a sample into l clusters, we have complete freedom in the numbering of clusters. In what follows we shall always consider matrices of dimension m � l.

Let there be given N algorithms A<sup>c</sup> 1, A<sup>c</sup> <sup>2</sup>, …, Ac <sup>N</sup> for clustering and their solutions Ac <sup>ν</sup>ð Þ¼ Χ <sup>Κ</sup> <sup>k</sup>α<sup>v</sup> ijk m�l � � for sample <sup>Χ</sup>. We denote <sup>I</sup><sup>ν</sup> ¼ kα<sup>ν</sup> ijk <sup>m</sup>�<sup>l</sup> an arbitrary element of the clustering <sup>Κ</sup> <sup>k</sup>α<sup>v</sup> ijk m�l � � .

Therefore, we have Ι ¼ Κð Þ� I<sup>1</sup> Κð Þ� I<sup>2</sup> … � Κð Þ IN or set Ι ¼ I 0 1; I 0 <sup>2</sup>; …; I 0 N � �; <sup>I</sup> 0 <sup>ν</sup> ∈ Κð Þ I<sup>ν</sup> n o, I 0 <sup>ν</sup> ¼ kα 0 ν ij k m�l .

There are two problems.

3. Committee synthesis of ensemble clustering

Фð Þ Κ 0.966 Fmin(Κ) 0.997 Favr(Κ) 0.999

Table 3. The values of the criteria in the problem "ionosphere" Фð Þ Κ , Fmin(Κ), Favr(Κ).

Figure 4. Data visualization.

226 Recent Applications in Data Clustering

Фð Þ Κ 0.770 Fmin(Κ) 0.995 Favr(Κ) 0.998

Table 2. Values of quasi-clustering criteria. Сase of very intersecting distributions

The problem is as follows. There are N clusterings for the same number of clusters. How to choose from them the only one or build a new clustering from the available ones? In the supervised classification problem (with the help of a collective solution of a set of algorithms) there is a criterion according to which one can choose an algorithm from existing ones or build a new algorithm. This is a supervised classification error. This direction in the theory of classification appeared in the early 1970s of the last century [20, 21], then was created an algebraic approach [22], various correctors were appeared. The key in the algebraic approach is the creation in the form of special algebraic polynomials of a correct (error-free) algorithm based on a set of supervised classification algorithms. Some algebraic operations on matrices of "degrees of belonging" of recognized objects are used. Various types of correctors were also created [22–25], when the problem of constructing (and applying) the best algorithm is also solved in two stages. First, the supervised classification algorithms are determined, and then the corrector. This can be, for example, the problem of approximating a given partial Boolean function by some monotonic function. In recent decades, there are conferences on multiple classifier systems, these issues are reflected in the books [21, 10]. How to choose or create the


Definition 6. An operator Β I 0 1; I 0 <sup>2</sup>;…; I 0 N � � <sup>¼</sup> <sup>B</sup> ¼ kbijk<sup>m</sup>�<sup>l</sup> is called an adder if bij <sup>¼</sup> <sup>P</sup><sup>N</sup> <sup>ν</sup>¼<sup>1</sup> <sup>α</sup> 0 ν ij .

It is clear that 0 ≤ bij ≤ N, bij ∈ f g 0; 1; 2;…; N .

Definition 7. An operator <sup>r</sup> is called a threshold decision rule, if r Bð Þ¼ <sup>С</sup> ¼ kсijk<sup>m</sup>�<sup>l</sup> , cij <sup>¼</sup> <sup>1</sup>, bij <sup>≥</sup> <sup>δ</sup>i, 0, otherwise, � where δ<sup>i</sup> ∈ R.

We introduce definitions of potentially best and worst-case solutions. As the "ideal" of the collective solution, we will consider the case when all algorithms give us essentially the same

Definition 9. A numerical matrix <sup>k</sup>bijk<sup>m</sup>�<sup>l</sup> is called contrasting if bij <sup>∈</sup> f g <sup>0</sup>; <sup>N</sup> . A numeric matrix

i¼1 P l j¼1 b1 ij � <sup>b</sup><sup>2</sup> ij

Denote by Μ the set of all contrast matrices, and by M~ the set of all blurred matrices. We

Φð Þ¼ B rð Þ! B; Μ

<sup>Φ</sup><sup>~</sup> ð Þ¼ <sup>B</sup> <sup>r</sup> <sup>B</sup>; <sup>Μ</sup><sup>~</sup> � �!<sup>B</sup>

ð Þ¼ <sup>B</sup> <sup>r</sup> <sup>B</sup>; <sup>B</sup><sup>~</sup> � �!<sup>B</sup>

We note that the optimums according to the criteria (Eq. (2)) and (Eq. (3)) do not have to

Figure 6 illustrates the sets of contrasting and blurred matrices. Arrows indicate some ele-

<sup>2</sup> for any <sup>B</sup>. We write <sup>Φ</sup><sup>~</sup> <sup>0</sup>

ij <sup>¼</sup> <sup>N</sup> 2 .

<sup>2</sup> then <sup>α</sup>~ij <sup>¼</sup> bij � <sup>N</sup>

, <sup>~</sup>bij <sup>¼</sup> <sup>N</sup>

Φ~ 0

Theorem 1. The sets of optimal solutions by criteria Eqs. (2) and (4) coincide.

ð Þ <sup>B</sup> <sup>=</sup> Nml

ij <sup>¼</sup> bij, and <sup>α</sup>~ij+α<sup>∗</sup>

ij <sup>¼</sup> min bij; <sup>N</sup> � bij � �. If bij <sup>≥</sup> <sup>N</sup>

<sup>2</sup> � bij, <sup>α</sup><sup>∗</sup>

� � �

B

� � �.

<sup>2</sup> ) is called the mean blurred matrix.

ð Þ¼ <sup>B</sup> <sup>P</sup><sup>m</sup> i¼1 P l j¼1

<sup>2</sup> , α<sup>∗</sup>

min: (2)

Collective Solutions on Sets of Stable Clusterings http://dx.doi.org/10.5772/intechopen.76189 229

max: (3)

max (4)

<sup>α</sup>~ij, <sup>α</sup>~ij <sup>¼</sup> bij � <sup>N</sup>

ij <sup>¼</sup> <sup>N</sup> � bij, and <sup>α</sup>~ij+α<sup>∗</sup>

� � �

2

ij <sup>¼</sup> <sup>N</sup> <sup>2</sup> . If

�, Φð Þ B

As the distance between two numerical matrices, we consider the function

; <sup>B</sup><sup>2</sup> � � <sup>¼</sup> <sup>P</sup><sup>m</sup>

r B<sup>1</sup>

introduce definitions for estimating the quality of matrices.

<sup>¼</sup> <sup>B</sup>~<sup>g</sup> � (where <sup>B</sup><sup>~</sup> ¼ k~bijk<sup>m</sup>�<sup>l</sup>

coincide. The sets Μ and Μ~ intersect.

partitions or coverings.

Definition 10.

Definition 11.

The set Μ~ <sup>0</sup>

Definition 12.

ments of sets.

<sup>¼</sup> <sup>P</sup><sup>m</sup> i¼1 P l j¼1 α∗ ij, α<sup>∗</sup>

bij < <sup>N</sup>

Let us show that <sup>Φ</sup>ð Þ <sup>B</sup> <sup>+</sup>Φ<sup>~</sup> <sup>0</sup>

<sup>2</sup> then <sup>α</sup>~ij <sup>¼</sup> <sup>N</sup>

<sup>k</sup>bijk<sup>m</sup>�<sup>l</sup> is called blurred if bij <sup>¼</sup> <sup>δ</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>.

Definition 8. By the committee synthesis of an information matrix С on an element ~I 0 ¼ I 0 1; I 0 <sup>2</sup>;…; I 0 N � � let us call it a computation by the formula <sup>С</sup> <sup>¼</sup> <sup>r</sup><sup>Β</sup> <sup>~</sup><sup>I</sup> <sup>0</sup> � �, provided that <sup>Β</sup> is the adder and r is the threshold decision rule.

The general scheme of collective synthesis is shown in Figure 5.

We note that the total number of possible values <sup>B</sup> is bounded from above by a quantity ð Þl! <sup>N</sup>. Let s be the operator that performs permutation of columns of matrices m � l with the help of a substitution < j <sup>1</sup>, j2, …, jl >, S ¼ f gs is the set of all operators s. We believe that rs ¼ sr, ∀s ∈S. We continue s∈ S to the n-dimensional case σ ~I <sup>0</sup> � � <sup>¼</sup> s I<sup>0</sup> 1 � �;s I<sup>0</sup> 2 � �;…;s I<sup>0</sup> n � � � � . We denote Σ ¼ f gσ , σ is the extension of s. From the definition of the adder it follows that <sup>σ</sup><sup>Β</sup> <sup>¼</sup> <sup>Β</sup>σ, <sup>∀</sup>σ<sup>∈</sup> <sup>Σ</sup>. Further, <sup>∀</sup>~<sup>I</sup> 0 ∈Ι, ∀σ∈Σ we have rΒ σ ~I <sup>0</sup> � � � � <sup>¼</sup> <sup>r</sup><sup>σ</sup> <sup>Β</sup> <sup>~</sup><sup>I</sup> <sup>0</sup> � � � � <sup>¼</sup> s r<sup>Β</sup> <sup>~</sup><sup>I</sup> <sup>0</sup> � � � � and finally σ ~I <sup>0</sup> � �; <sup>σ</sup>∈<sup>Σ</sup> n o! rΒ s rΒ ~I <sup>0</sup> � � � � ; <sup>s</sup>∈<sup>S</sup> n o <sup>¼</sup> <sup>Κ</sup> <sup>r</sup><sup>Β</sup> <sup>~</sup><sup>I</sup> <sup>0</sup> � � � � <sup>¼</sup> <sup>Κ</sup> <sup>k</sup>cijk<sup>m</sup>�<sup>l</sup> � � . Therefore, the product rΒ defines the desired mapping and specifies some ensemble clustering. It is necessary to determine the optimal element from Κc, find it and ~I 0 .

$$\mathbf{I} \xrightarrow{r\mathbf{B}} \mathbb{K}\_{\circ} \ A\_{\widetilde{I}}^{\circ}(\mathsf{X}) = \mathsf{K}\left(r\mathsf{B}\left(\widetilde{I}\right)\right).$$

Figure 5. Scheme of committee synthesis.

We introduce definitions of potentially best and worst-case solutions. As the "ideal" of the collective solution, we will consider the case when all algorithms give us essentially the same partitions or coverings.

Definition 9. A numerical matrix <sup>k</sup>bijk<sup>m</sup>�<sup>l</sup> is called contrasting if bij <sup>∈</sup> f g <sup>0</sup>; <sup>N</sup> . A numeric matrix <sup>k</sup>bijk<sup>m</sup>�<sup>l</sup> is called blurred if bij <sup>¼</sup> <sup>δ</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>.

As the distance between two numerical matrices, we consider the function

$$\rho\left(\boldsymbol{B}^1, \boldsymbol{B}^2\right) = \sum\_{i=1}^m \sum\_{j=1}^l \left| b\_{ij}^1 - b\_{ij}^2 \right|.$$

Denote by Μ the set of all contrast matrices, and by M~ the set of all blurred matrices. We introduce definitions for estimating the quality of matrices.

#### Definition 10.

,

<sup>0</sup> � �

;…;s I<sup>0</sup> n

� � � �

<sup>¼</sup> <sup>r</sup><sup>σ</sup> <sup>Β</sup> <sup>~</sup><sup>I</sup> <sup>0</sup> � � � �

<sup>¼</sup> <sup>Κ</sup> <sup>k</sup>cijk<sup>m</sup>�<sup>l</sup> � � , provided that Β is

. We denote

and

<sup>¼</sup> s r<sup>Β</sup> <sup>~</sup><sup>I</sup> <sup>0</sup> � � � �

. Therefore, the prod-

Definition 7. An operator <sup>r</sup> is called a threshold decision rule, if r Bð Þ¼ <sup>С</sup> ¼ kсijk<sup>m</sup>�<sup>l</sup>

Definition 8. By the committee synthesis of an information matrix С on an element

We note that the total number of possible values <sup>B</sup> is bounded from above by a quantity ð Þl! <sup>N</sup>. Let s be the operator that performs permutation of columns of matrices m � l with the help of a

Σ ¼ f gσ , σ is the extension of s. From the definition of the adder it follows that

uct rΒ defines the desired mapping and specifies some ensemble clustering. It is necessary to

<sup>¼</sup> <sup>Κ</sup> <sup>r</sup><sup>Β</sup> <sup>~</sup><sup>I</sup> <sup>0</sup> � � � �

0ð Þ¼ <sup>Χ</sup> <sup>Κ</sup> <sup>r</sup><sup>Β</sup> <sup>~</sup><sup>I</sup>

0 .

<sup>0</sup> � � � �

.

∈Ι, ∀σ∈Σ we have rΒ σ ~I

; s∈S

<sup>1</sup>, j2, …, jl >, S ¼ f gs is the set of all operators s. We believe that rs ¼ sr, ∀s ∈S.

<sup>¼</sup> s I<sup>0</sup> 1 � � ;s I<sup>0</sup> 2 � �

<sup>0</sup> � � � �

<sup>0</sup> � �

let us call it a computation by the formula <sup>С</sup> <sup>¼</sup> <sup>r</sup><sup>Β</sup> <sup>~</sup><sup>I</sup>

cij <sup>¼</sup> <sup>1</sup>, bij <sup>≥</sup> <sup>δ</sup>i,

� �

substitution < j

finally σ ~I

<sup>σ</sup><sup>Β</sup> <sup>¼</sup> <sup>Β</sup>σ, <sup>∀</sup>σ<sup>∈</sup> <sup>Σ</sup>. Further, <sup>∀</sup>~<sup>I</sup>

; σ∈Σ n o

Figure 5. Scheme of committee synthesis.

<sup>0</sup> � �

�

~I 0 ¼ I 0 1; I 0 <sup>2</sup>;…; I 0 N

0, otherwise,

228 Recent Applications in Data Clustering

where δ<sup>i</sup> ∈ R.

The general scheme of collective synthesis is shown in Figure 5.

the adder and r is the threshold decision rule.

We continue s∈ S to the n-dimensional case σ ~I

! rΒ

determine the optimal element from Κc, find it and ~I

0

s rΒ ~I <sup>0</sup> � � � �

n o

Ι! <sup>r</sup><sup>Β</sup> <sup>Κ</sup>c, <sup>A</sup><sup>c</sup> ~I

$$\Phi(B) = \rho(B, \mathbf{M}) \underset{B}{\to} \min. \tag{2}$$

Definition 11.

$$
\tilde{\Phi}(B) = \rho(B, \tilde{\mathbf{M}}) \underset{B}{\to} \max. \tag{3}
$$

The set Μ~ <sup>0</sup> <sup>¼</sup> <sup>B</sup>~<sup>g</sup> � (where <sup>B</sup><sup>~</sup> ¼ k~bijk<sup>m</sup>�<sup>l</sup> , <sup>~</sup>bij <sup>¼</sup> <sup>N</sup> <sup>2</sup> ) is called the mean blurred matrix.

#### Definition 12.

$$
\tilde{\Phi}'(B) = \rho(B, \tilde{B}) \underset{B}{\to} \max \tag{4}
$$

We note that the optimums according to the criteria (Eq. (2)) and (Eq. (3)) do not have to coincide. The sets Μ and Μ~ intersect.

Figure 6 illustrates the sets of contrasting and blurred matrices. Arrows indicate some elements of sets.

Theorem 1. The sets of optimal solutions by criteria Eqs. (2) and (4) coincide.

Let us show that <sup>Φ</sup>ð Þ <sup>B</sup> <sup>+</sup>Φ<sup>~</sup> <sup>0</sup> ð Þ <sup>B</sup> <sup>=</sup> Nml <sup>2</sup> for any <sup>B</sup>. We write <sup>Φ</sup><sup>~</sup> <sup>0</sup> ð Þ¼ <sup>B</sup> <sup>P</sup><sup>m</sup> i¼1 P l j¼1 <sup>α</sup>~ij, <sup>α</sup>~ij <sup>¼</sup> bij � <sup>N</sup> 2 � � � �, Φð Þ B <sup>¼</sup> <sup>P</sup><sup>m</sup> i¼1 P l j¼1 α∗ ij, α<sup>∗</sup> ij <sup>¼</sup> min bij; <sup>N</sup> � bij � �. If bij <sup>≥</sup> <sup>N</sup> <sup>2</sup> then <sup>α</sup>~ij <sup>¼</sup> bij � <sup>N</sup> <sup>2</sup> , α<sup>∗</sup> ij <sup>¼</sup> <sup>N</sup> � bij, and <sup>α</sup>~ij+α<sup>∗</sup> ij <sup>¼</sup> <sup>N</sup> <sup>2</sup> . If bij < <sup>N</sup> <sup>2</sup> then <sup>α</sup>~ij <sup>¼</sup> <sup>N</sup> <sup>2</sup> � bij, <sup>α</sup><sup>∗</sup> ij <sup>¼</sup> bij, and <sup>α</sup>~ij+α<sup>∗</sup> ij <sup>¼</sup> <sup>N</sup> 2 .

The identity P<sup>l</sup>

j¼1 P i∈ Xj

<sup>Δ</sup><sup>ν</sup> <sup>¼</sup> <sup>P</sup><sup>l</sup>

Theorem 2

ΔΦ ¼ Φ B

algorithm of Φ.

P i ∈ Xj αν ij � 2 P<sup>l</sup> j¼1 P i∈ Xj αν iμ<sup>ν</sup> j .

j¼1 P i∈ Xj αν ij <sup>þ</sup> <sup>P</sup>

Figure 7. Sets Xj, Yj, j ¼ 1, 2,…, l are changed.

introduce the notations M1<sup>j</sup> ¼ Xj∖ Y

<sup>0</sup> � � � <sup>Φ</sup>ð Þ <sup>B</sup> <sup>≤</sup>

Figure 8. All possible variants of P<sup>l</sup>

X N

ν¼1

j¼1 P <sup>i</sup><sup>∈</sup> Xj α<sup>ν</sup> iμ<sup>ν</sup> j

αν ij � <sup>α</sup><sup>ν</sup> iμ<sup>ν</sup> j � � <sup>þ</sup> <sup>P</sup> i ∈Yj αν

i∈ Yj

0 j ∖Yj � �, <sup>M</sup>2<sup>j</sup> <sup>¼</sup> <sup>Y</sup>

Figure 7 schematically shows the changes in sets Xj, Yj, j ¼ 1, 2, …, l.

<sup>Δ</sup><sup>ν</sup> <sup>þ</sup><sup>X</sup> N

ν¼1

M2<sup>j</sup> � � �

The proof is given in [12, 13]. Theorem 2 is the basis for creating an effective minimization

for all admissible j and i.

αν iμ<sup>ν</sup> j � αν

Thus, minimizing a function is equivalent to maximizing the second sum of the expression.

After applying the permutations <sup>π</sup> <sup>¼</sup> <sup>&</sup>lt; <sup>π</sup><sup>1</sup>,π<sup>2</sup>, …, <sup>π</sup><sup>N</sup> <sup>&</sup>gt;, the sets Xj, Yj, j <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, l change. We

0 j

� �2, N � even,

�1, N � odd <sup>þ</sup> <sup>M</sup>4<sup>j</sup>

j¼1 P i∈ Xj αν iμ<sup>ν</sup> j <sup>þ</sup> <sup>P</sup> i∈ Yj αν iμ<sup>ν</sup> j � � is valid. Get

> P<sup>l</sup> j¼1 P i∈ Xj

<sup>∖</sup>Yj, <sup>M</sup>3<sup>j</sup> <sup>¼</sup> Yj<sup>∖</sup> <sup>X</sup><sup>0</sup>

αν ij � <sup>α</sup><sup>ν</sup> iμ<sup>ν</sup> j � � <sup>¼</sup> <sup>2</sup>

Collective Solutions on Sets of Stable Clusterings http://dx.doi.org/10.5772/intechopen.76189

> j ∖Xj � �, <sup>M</sup>4<sup>j</sup> <sup>¼</sup> <sup>X</sup><sup>0</sup>

� � � �

�1, N � odd ( ( !

P<sup>l</sup> j¼1 231

j ∖Xj.

0, N � even,

ij � � <sup>¼</sup> <sup>P</sup><sup>l</sup>

ij � � � � <sup>¼</sup> <sup>2</sup>

Figure 6. The sets of contrasting <sup>Μ</sup>, blurred <sup>Μ</sup><sup>~</sup> matrices, and the set of matrices f g<sup>B</sup> .

Summing over all the set of values of pairs of indices i, j, we get that <sup>Φ</sup>ð Þ <sup>B</sup> <sup>+</sup>Φ<sup>~</sup> <sup>0</sup> ð Þ <sup>B</sup> <sup>=</sup> Nml 2 .

We consider the problem of finding optimal ensemble clusterings for the criterion (2). It is clear that <sup>Φ</sup>ð Þ¼ <sup>B</sup> <sup>P</sup><sup>m</sup> i¼1 P l j¼1 min bij; <sup>N</sup> � bij � �.

We introduce the notations <sup>M</sup> <sup>¼</sup> f g <sup>1</sup>; <sup>2</sup>; …; <sup>m</sup> , Xj <sup>¼</sup> <sup>i</sup>jbij <sup>≥</sup> <sup>N</sup> <sup>2</sup> ; <sup>i</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>;…; <sup>m</sup> � �, Yj <sup>¼</sup> <sup>M</sup>∖Xj, <sup>j</sup> <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, l. Let <sup>π</sup><sup>ν</sup> <sup>¼</sup><sup>&</sup>lt; <sup>μ</sup><sup>ν</sup> <sup>1</sup>, μν <sup>2</sup>,…, μν <sup>l</sup> >, ν ¼ 1, 2,…, N be some permutation of the set <sup>π</sup><sup>0</sup> <sup>¼</sup><sup>&</sup>lt; <sup>1</sup>, <sup>2</sup>, …, l <sup>&</sup>gt;. A set of permutations <sup>π</sup> <sup>¼</sup><sup>&</sup>lt; <sup>π</sup><sup>1</sup>, <sup>π</sup><sup>2</sup>, …, <sup>π</sup><sup>N</sup> <sup>&</sup>gt; uniquely determines the matrix of estimates.

$$B' = \left\| b\_{ij}' \right\|\_{m \times l'} b\_{ij}' = b\_{ij}(\pi) = \sum\_{\nu=1}^{N} a\_{ij}'^\nu.$$

We will further assume that the "initial" matrix <sup>k</sup>αν ijk m�l of the algorithm A<sup>c</sup> <sup>ν</sup> corresponds to the permutation <sup>π</sup>0. <sup>k</sup><sup>α</sup> 0 ν ij k m�l is the matrix of the algorithm A<sup>c</sup> <sup>ν</sup> corresponding to some permutation πν. Then α 0 ν ij <sup>¼</sup> <sup>α</sup><sup>ν</sup> iμν j .

Consider <sup>Δ</sup>~<sup>ν</sup> <sup>¼</sup> <sup>P</sup><sup>l</sup> j¼1 P i∈ Xj αν ij <sup>þ</sup> <sup>P</sup> i∈Yj αν ij � �, <sup>Δ</sup>~<sup>0</sup> <sup>ν</sup> <sup>¼</sup> <sup>P</sup><sup>l</sup> j¼1 P i∈ Xj α 0 ν ij <sup>þ</sup> <sup>P</sup> i ∈Yj α 0 ν ij � �. Then <sup>Δ</sup><sup>ν</sup> <sup>¼</sup> <sup>Δ</sup>~<sup>0</sup> <sup>ν</sup> � <sup>Δ</sup>~<sup>ν</sup> <sup>¼</sup> <sup>P</sup><sup>l</sup> j¼1 P i ∈ Xj αν ij � α 0 ν ij � � <sup>þ</sup> <sup>P</sup> i ∈Yj α 0 ν ij � <sup>α</sup><sup>ν</sup> ij � � � � . We convert this expression.

Figure 7. Sets Xj, Yj, j ¼ 1, 2,…, l are changed.

\* The identity  $\sum\_{j=1}^{l} \left(\sum\_{i \in X\_{i}} a\_{\vec{\eta}\_{j}}^{v} + \sum\_{i \in Y\_{i}} a\_{\vec{\eta}\_{j}}^{v}\right) = \sum\_{j=1}^{l} \left(\sum\_{i \in X\_{i}} a\_{\vec{\mu}\_{j}}^{v} + \sum\_{i \in Y\_{i}} a\_{\vec{\mu}\_{j}^{v}}^{v}\right)$   $\Delta\_{\boldsymbol{\nu}} = \sum\_{j=1}^{l} \left(\sum\_{i \in X\_{i}} \left(a\_{\vec{\eta}}^{v} - a\_{\vec{\mu}\_{j}^{v}}^{v}\right) + \sum\_{i \in Y\_{i}} \left(a\_{\vec{\mu}\_{j}^{v}}^{v} - a\_{\vec{\eta}}^{v}\right)\right) = 2\sum\_{j=1}^{l} \sum\_{i \in X\_{i}} \left(a\_{\vec{\eta}}^{v} - a\_{\vec{\mu}\_{j}^{v}}^{v}\right) = 2\sum\_{j=1}^{l}$   $\sum\_{i \in X\_{i}} a\_{\vec{\eta}}^{v} - 2\sum\_{j=1}^{l} \sum\_{i \in X\_{i}} a\_{\vec{\mu}\_{j}^{v}}^{v}$ .

Thus, minimizing a function is equivalent to maximizing the second sum of the expression. After applying the permutations <sup>π</sup> <sup>¼</sup> <sup>&</sup>lt; <sup>π</sup><sup>1</sup>,π<sup>2</sup>, …, <sup>π</sup><sup>N</sup> <sup>&</sup>gt;, the sets Xj, Yj, j <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, l change. We introduce the notations M1<sup>j</sup> ¼ Xj∖ Y 0 j ∖Yj � �, <sup>M</sup>2<sup>j</sup> <sup>¼</sup> <sup>Y</sup> 0 j <sup>∖</sup>Yj, <sup>M</sup>3<sup>j</sup> <sup>¼</sup> Yj<sup>∖</sup> <sup>X</sup><sup>0</sup> j ∖Xj � �, <sup>M</sup>4<sup>j</sup> <sup>¼</sup> <sup>X</sup><sup>0</sup> j ∖Xj.

Figure 7 schematically shows the changes in sets Xj, Yj, j ¼ 1, 2, …, l.

#### Theorem 2

Summing over all the set of values of pairs of indices i, j, we get that <sup>Φ</sup>ð Þ <sup>B</sup> <sup>+</sup>Φ<sup>~</sup> <sup>0</sup>

that <sup>Φ</sup>ð Þ¼ <sup>B</sup> <sup>P</sup><sup>m</sup>

of estimates.

permutation <sup>π</sup>0. <sup>k</sup><sup>α</sup>

Consider <sup>Δ</sup>~<sup>ν</sup> <sup>¼</sup> <sup>P</sup><sup>l</sup>

Then <sup>Δ</sup><sup>ν</sup> <sup>¼</sup> <sup>Δ</sup>~<sup>0</sup>

0 ν ij <sup>¼</sup> <sup>α</sup><sup>ν</sup> iμν j .

πν. Then α

i¼1 P l j¼1

230 Recent Applications in Data Clustering

<sup>j</sup> <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, l. Let <sup>π</sup><sup>ν</sup> <sup>¼</sup><sup>&</sup>lt; <sup>μ</sup><sup>ν</sup>

min bij; N � bij � �.

We will further assume that the "initial" matrix <sup>k</sup>αν

0 ν ij k m�l

j¼1

<sup>ν</sup> � <sup>Δ</sup>~<sup>ν</sup> <sup>¼</sup> <sup>P</sup><sup>l</sup>

P i∈ Xj αν ij <sup>þ</sup> <sup>P</sup>

> j¼1 P i ∈ Xj

We introduce the notations <sup>M</sup> <sup>¼</sup> f g <sup>1</sup>; <sup>2</sup>; …; <sup>m</sup> , Xj <sup>¼</sup> <sup>i</sup>jbij <sup>≥</sup> <sup>N</sup>

Figure 6. The sets of contrasting <sup>Μ</sup>, blurred <sup>Μ</sup><sup>~</sup> matrices, and the set of matrices f g<sup>B</sup> .

<sup>2</sup>,…, μν

<sup>1</sup>, μν

B 0 ¼ kb 0 ijk m�l , b0

We consider the problem of finding optimal ensemble clusterings for the criterion (2). It is clear

<sup>π</sup><sup>0</sup> <sup>¼</sup><sup>&</sup>lt; <sup>1</sup>, <sup>2</sup>, …, l <sup>&</sup>gt;. A set of permutations <sup>π</sup> <sup>¼</sup><sup>&</sup>lt; <sup>π</sup><sup>1</sup>, <sup>π</sup><sup>2</sup>, …, <sup>π</sup><sup>N</sup> <sup>&</sup>gt; uniquely determines the matrix

ij <sup>¼</sup> bijð Þ¼ <sup>π</sup> <sup>P</sup>

ijk m�l

is the matrix of the algorithm A<sup>c</sup>

, Δ~<sup>0</sup> <sup>ν</sup> <sup>¼</sup> <sup>P</sup><sup>l</sup> j¼1

> <sup>þ</sup> <sup>P</sup> i ∈Yj α 0 ν ij � <sup>α</sup><sup>ν</sup> ij

� � � �

i∈Yj αν ij

αν ij � α 0 ν ij � �

� �

N ν¼1 α 0 ν ij .

> P i∈ Xj α 0 ν ij <sup>þ</sup> <sup>P</sup>

ð Þ <sup>B</sup> <sup>=</sup> Nml 2 .

<sup>ν</sup> corresponds to the

.

. We convert this expression.

<sup>2</sup> ; <sup>i</sup> <sup>¼</sup> <sup>1</sup>; <sup>2</sup>;…; <sup>m</sup> � �, Yj <sup>¼</sup> <sup>M</sup>∖Xj,

<sup>ν</sup> corresponding to some permutation

i ∈Yj α 0 ν ij

� �

<sup>l</sup> >, ν ¼ 1, 2,…, N be some permutation of the set

of the algorithm A<sup>c</sup>

$$\Delta \Phi = \Phi \left( B' \right) - \Phi(B) \le \sum\_{\nu=1}^{N} \Delta\_{\nu} + \sum\_{\nu=1}^{N} \left( |M\_{2\bar{\gamma}}| \begin{cases} -2, & N-\text{even}, \\ -1, & N-\text{odd} \end{cases} + |M\_{4\bar{\gamma}}| \begin{cases} 0, & N-\text{even}, \\ -1, & N-\text{odd} \end{cases} \right)$$

The proof is given in [12, 13]. Theorem 2 is the basis for creating an effective minimization algorithm of Φ.

Figure 8. All possible variants of P<sup>l</sup> j¼1 P <sup>i</sup><sup>∈</sup> Xj α<sup>ν</sup> iμ<sup>ν</sup> j for all admissible j and i.

Since the second sum is always not positive, we have an upper bound. We consider the problem of minimizing a function Δν. We write out all possible variants of the function P<sup>l</sup> j¼1 P i∈ Xj αν iμ<sup>ν</sup> j in the form of a table in Figure 8. Then the minimum of this function is reduced to finding the maximum matching of the bipartite graph, for finding which we can use the polynomial Hungarian algorithm [16].

5. Man-machine (video-logical) clustering method

personally cluster projections of sets of points from Rn into R<sup>2</sup>

known internal criteria lead to degenerate solutions.

logical method on one model example.

character of the first feature is lost.

objects were erroneously clustered.

heuristic approach, all C<sup>2</sup>

In the problems of ensemble clustering synthesis considered earlier, we did not consider the number of initial clustering algorithms, their quality and their proximity. Ensemble clustering was built and reflected only the opinion of the collective decisions that we used. "Internal" indices [9] reflect the person's ideas about clustering. You can think up examples of data when

At the same time, a person has the ability to cluster visual sets on a plane without using any proximity functions, criteria and indices. The following idea was realized. A person can

under different projections, we can construct generally speaking various N clusterings, which we submit to the input of the construction of the collective solution. The person himself "does not see" the objects in Rn, but can exactly solve the clustering tasks on the plane. Thus, here we use N precise solutions, but of various partial information about the data. Consider this video-

A sample of two normal distributions with independent characteristics was considered. The first feature of the first distribution (200 objects) had zero expectation and the standard deviation, the first attribute of the second distribution (200 objects) had these values equal to 5. All the other 49 attributes for all objects had а<sup>i</sup> ¼ 5, σ<sup>i</sup> ¼ 5, i ¼ 2, 3, …, 50. That is, the two sets had equal distributions for 49 features and one informative feature. Clustering of the entire sample by minimizing dispersion is shown in Figure 9. Black and gray points on sample visualization represent the objects of the first and second clusters. Here the fact of informative

The program of the video-logical approach worked as follows. With the help of a single

criteria of the presence of two clusters. Next we as experts consider some projections and with the help of the mouse we select in each of them two clusters. Figure 10 shows two such examples. Note that the first feature was present in all projections. It was used "manually" as the defining area for the dense location of objects. Then 10 "manual" clustering went to the program entrance for the committee synthesis of the collective solution. Note that only two

Figure 9. Clustering of a sample of model objects by the method of minimizing variance.

<sup>n</sup> projections are automatically ordered according to the descending

. Having made such clusterings

Collective Solutions on Sets of Stable Clusterings http://dx.doi.org/10.5772/intechopen.76189 233

It is clear that minπν <sup>Δ</sup><sup>ν</sup> <sup>≤</sup> 0. Now we can propose the following heuristic algorithm for steepest descent.

Algorithm.

1. We calculate Xj, j ¼ 1, 2, …, l.

2. We find Δ<sup>∗</sup> <sup>ν</sup> ¼ min πν <sup>Δ</sup><sup>ν</sup> for each <sup>ν</sup>.

If P<sup>N</sup> <sup>ν</sup>¼<sup>1</sup> <sup>Δ</sup><sup>∗</sup> <sup>ν</sup> <sup>&</sup>lt; 0, then apply the found permutations πν <sup>¼</sup><sup>&</sup>lt; μν <sup>1</sup>, μ<sup>ν</sup> <sup>2</sup>, …, μν <sup>l</sup> > , ν ¼ 1, 2, …, N and go to step 1).

If P<sup>N</sup> <sup>ν</sup>¼<sup>1</sup> <sup>Δ</sup><sup>∗</sup> <sup>ν</sup> ¼ 0 then the END of algorithm.

NOTE. We note that our algorithm does not even find a local minimum of the criterion Φð Þ B . Nevertheless, this algorithm is very fast, its complexity at each iteration is estimated as O l<sup>5</sup> mN � �.
