1. Introduction

There are many different approaches to solving the problems of clustering multidimensional data: based on the optimization of internal criteria (indices) [1, 2], hierarchical clustering [3], centroid-based clustering [4], density-based clustering [5], distribution-based clustering [6], and many others. There are well-known books and papers on clustering [7–10].

This section is devoted to one approach to the creation of stable clusterings and the processing of their sets. A natural criterion is considered, which is applicable to any clustering method. In work [11], various criteria (indices) are proposed, optimizing which clustering is built with a definite look "what is clustering?" In this chapter, we use a criterion based on stability. If we really got clustering, that is, a solution for the whole sample, the partitioning should not

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

change with a small change in the data. Criteria are introduced for the quality of the partition obtained. If the criterion value is less than one, then the partition is unstable. Let us obtain for the same data N clusterings. How to create a new ensemble clustering based on the N partitions? Previously, a committee method for building ensemble clusterings was proposed [12–15]. Let there be N results of cluster analysis of the same data for l clusters. The committee method of building ensemble clustering makes it possible to build such l clusters, each of which is the intersection of "many" initial clusters. In other words, we find such l clusters whose objects are "equivalent" to each other according to several principles. As initial N clusterings, one can take stable ones. Finally, we consider a video-logical approach to building the initial N coarse clusterings.

Definition 3. The quality Favr(Κ) of the quasi-clustering Κ will be called the quantity

For some clustering algorithms, there are simple economical rules for computing Фð Þ Κ . Let us

It is known that in order to minimize the dispersion criterion, it suffices to satisfy inequalities

We establish the conditions for the identity <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> of the partitions <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> and <sup>Κ</sup>. In the case

Let the clustering Κ be obtained by k-means method, that is, kx� � mjk ≤ kx� � mkk, ∀j 6¼ k, ∀x� ∈Kj. In the case of equality, the object is considered to belong to a cluster with a lower

We confine ourselves to the case of an agglomeration hierarchical grouping. To find the value of the criterion <sup>Ф</sup>ð Þ <sup>Κ</sup> , you can calculate the partitioning <sup>Κ</sup>, partitions <sup>Κ</sup><sup>∘</sup>ð Þ <sup>x</sup><sup>i</sup> , i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m, and compare <sup>Κ</sup> with each <sup>Κ</sup><sup>∘</sup>ð Þ <sup>x</sup><sup>i</sup> , i <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m. Here it is possible to save in the calculation of <sup>Ф</sup>ð Þ <sup>Κ</sup> without

"

clustering of the sample X∖f g x<sup>i</sup> into m � t clusters, t ≤ m � l. Κ is a partition obtained by the clustering algorithm X. The main property of the hierarchical grouping is that for any

. Indeed, let there Κ<sup>t</sup>

k⊆K<sup>t</sup>þ<sup>1</sup>

<sup>k</sup>⊆Kj does not hold for all j ¼ 1, 2, …, l, then the condition

, where nj ¼ Kj

� � � � � �, <sup>m</sup><sup>j</sup> <sup>¼</sup> <sup>1</sup> nj P x<sup>t</sup> ∈ Kj xt.

ð Þ nj�<sup>1</sup> ð Þ nj�<sup>2</sup> <sup>k</sup>x<sup>i</sup> � <sup>m</sup>jk<sup>2</sup> � nk

ð Þ nk�<sup>1</sup> <sup>k</sup>x� � <sup>m</sup>kk<sup>2</sup> � ð Þ nj�<sup>1</sup>

ð Þ nj�<sup>1</sup> <sup>x</sup>� � <sup>m</sup>j; <sup>x</sup><sup>i</sup> � <sup>m</sup><sup>j</sup>

� � <sup>þ</sup> <sup>1</sup>

ð Þ¼ <sup>x</sup><sup>i</sup> <sup>K</sup><sup>t</sup>

1;K<sup>t</sup>

<sup>j</sup> . In this case, if at some step

2;…;K<sup>t</sup> m�t � � be

ð Þ nj�<sup>1</sup>

�

ð Þ nk <sup>þ</sup> <sup>1</sup> <sup>k</sup>x� � <sup>m</sup>kk<sup>2</sup> <sup>≤</sup> <sup>0</sup> (1)

Collective Solutions on Sets of Stable Clusterings http://dx.doi.org/10.5772/intechopen.76189

nkþ<sup>1</sup> <sup>k</sup>x� � <sup>m</sup>kk<sup>2</sup> <sup>≤</sup> 0 must

nj <sup>k</sup>x� � <sup>m</sup>jk<sup>2</sup> � <sup>2</sup>

nj x� ð 223

ð Þ nj�<sup>1</sup> <sup>2</sup> <sup>k</sup>xi�

x� � mj; x<sup>i</sup>

nj � <sup>1</sup> � � <sup>k</sup>x� � <sup>m</sup>jk<sup>2</sup> � nk

<sup>x</sup>� <sup>∈</sup>Kj [considering (Eq. (1))] to satisfy the condition <sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> inequalities.

x� � mj; x<sup>i</sup> � m<sup>j</sup> � � <sup>þ</sup> <sup>1</sup>

njð Þ nj�<sup>1</sup> <sup>k</sup>x<sup>i</sup> � <sup>m</sup>jk<sup>2</sup> <sup>≤</sup> 0 must be satisfied.

ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> is satisfied if <sup>k</sup>x� � <sup>m</sup>jk<sup>2</sup> <sup>þ</sup> <sup>2</sup>

<sup>m</sup>jk<sup>2</sup> <sup>≤</sup> <sup>k</sup>x� � <sup>m</sup>kk<sup>2</sup> under <sup>x</sup>� <sup>∈</sup> Kj, <sup>x</sup>� 6¼ <sup>x</sup><sup>i</sup> and <sup>k</sup>x� � <sup>m</sup>kk<sup>2</sup> <sup>≤</sup> <sup>k</sup>x� � <sup>m</sup>jk<sup>2</sup> <sup>þ</sup> <sup>2</sup>

Favrð Þ¼ <sup>Κ</sup> <sup>P</sup><sup>m</sup>

ð Þ nj�<sup>1</sup>

ð Þ nj�<sup>2</sup> <sup>k</sup>x� � <sup>m</sup>jk<sup>2</sup> <sup>þ</sup> <sup>2</sup>

�mj; <sup>x</sup><sup>i</sup> � <sup>m</sup>jÞ � <sup>1</sup>

2.2. k-means method

number. Then, Κ<sup>∗</sup>

�mjÞ þ <sup>1</sup>

<sup>i</sup>¼<sup>1</sup> <sup>d</sup> <sup>Κ</sup><sup>∘</sup>ð Þ <sup>x</sup><sup>i</sup> ð Þ ; <sup>Κ</sup> <sup>=</sup>m.

for any clusters Kj and Kk, arbitrary x� ∈Kj

ð Þ nj�<sup>2</sup>

ð Þ nj�<sup>1</sup> <sup>2</sup> <sup>k</sup>x<sup>i</sup> � <sup>m</sup>jk<sup>2</sup> under <sup>x</sup>� <sup>∈</sup>Kk.

carrying through the clustering for some of "i

t, t <sup>≤</sup> <sup>m</sup> � <sup>l</sup> for some <sup>k</sup> the condition <sup>K</sup><sup>t</sup>

<sup>Κ</sup><sup>∗</sup>ð Þ <sup>x</sup><sup>i</sup> <sup>≈</sup> <sup>Κ</sup> will not be fulfilled.

2.3. Method of hierarchical agglomeration grouping

<sup>k</sup> <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m � <sup>t</sup> there is <sup>j</sup> <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, m � <sup>t</sup> � 1 for which <sup>K</sup><sup>t</sup>

be satisfied. In the case x� ∈Kk inequalities nk

2.1. Method of minimizing the dispersion criterion

nj

bring them (see also in [3, 17, 18]).
