5. Man-machine (video-logical) clustering method

Since the second sum is always not positive, we have an upper bound. We consider the problem of minimizing a function Δν. We write out all possible variants of the function

reduced to finding the maximum matching of the bipartite graph, for finding which we can

It is clear that minπν <sup>Δ</sup><sup>ν</sup> <sup>≤</sup> 0. Now we can propose the following heuristic algorithm for steepest

NOTE. We note that our algorithm does not even find a local minimum of the criterion Φð Þ B . Nevertheless, this algorithm is very fast, its complexity at each iteration is estimated as

Results of clustering by N algorithms of sampling of m objects to l clusters solutions are

m, j ¼ 1, 2, …, l. We assume that the cluster numbers in each algorithm are fixed. Then any horizontal layer number i of this three-dimensional matrix will denote the results of object x<sup>i</sup> clustering. As an ensemble clustering of the sample Χ, we can take the result of clustering the

clustering, we take the method of minimizing the dispersion criterion. Let there be a lot of N

cluster of the collective solution will be close to each other in the Euclidean metric. The committee synthesis of collective decisions provides more interpretable solutions. Indeed, if

� �, <sup>ν</sup> <sup>¼</sup> <sup>1</sup>, <sup>2</sup>,…, N are separate solutions of heuristic clustering algorithms, then the cluster of collective solution will be the "intersection" of many some original clusters

in the form of a table in Figure 8. Then the minimum of this function is

<sup>1</sup>, μ<sup>ν</sup>

k with heuristic clustering algorithms, then we calculate their

<sup>j</sup> � <sup>α</sup><sup>v</sup> iμj � �<sup>2</sup>

<sup>μ</sup>¼<sup>1</sup> <sup>α</sup><sup>∗</sup><sup>v</sup>

. Note that this method makes it possible to calculate such ensemble

� � that the sets of heuristic clustering of the objects of some

<sup>2</sup>, …, μν

<sup>l</sup> > , ν ¼ 1, 2, …, N and

ijk, ν ¼ 1, 2, …, N, i ¼ 1, 2, …,

ijk, ν ¼ 1, 2, …, n. As a method of

! min α∗<sup>v</sup> j

. Where do we

P<sup>l</sup> j¼1 P i∈ Xj αν iμ<sup>ν</sup> j

descent.

Algorithm.

2. We find Δ<sup>∗</sup>

go to step 1).

clusterings <sup>k</sup>α<sup>v</sup>

obtain α<sup>∗</sup><sup>v</sup>

<sup>Κ</sup><sup>ν</sup> <sup>¼</sup> <sup>K</sup><sup>ν</sup>

K1 i1 , K<sup>2</sup> i2 , …, K<sup>N</sup> il .

sample mean <sup>k</sup>α<sup>∗</sup><sup>ν</sup>

<sup>j</sup> <sup>¼</sup> <sup>1</sup> N P<sup>N</sup> <sup>μ</sup>¼<sup>1</sup> <sup>α</sup><sup>v</sup> iμj

clusterings <sup>Κ</sup> <sup>¼</sup> <sup>K</sup><sup>∗</sup>

1;K<sup>ν</sup>

2;…;K<sup>ν</sup> l

i1j <sup>k</sup>, <sup>k</sup>α<sup>v</sup> i2j

If P<sup>N</sup> <sup>ν</sup>¼<sup>1</sup> <sup>Δ</sup><sup>∗</sup>

If P<sup>N</sup> <sup>ν</sup>¼<sup>1</sup> <sup>Δ</sup><sup>∗</sup>

O l<sup>5</sup> mN � �.

1. We calculate Xj, j ¼ 1, 2, …, l.

232 Recent Applications in Data Clustering

<sup>ν</sup> ¼ min

use the polynomial Hungarian algorithm [16].

πν <sup>Δ</sup><sup>ν</sup> for each <sup>ν</sup>.

<sup>ν</sup> ¼ 0 then the END of algorithm.

4. The algorithm of collective k-means

<sup>ν</sup> <sup>&</sup>lt; 0, then apply the found permutations πν <sup>¼</sup><sup>&</sup>lt; μν

obtained, which we can write in the form of a binary matrix <sup>k</sup>α<sup>v</sup>

<sup>j</sup> <sup>k</sup> as the solution of the problem <sup>P</sup><sup>t</sup>

"new" descriptions—the layers of the original matrix <sup>k</sup>α<sup>v</sup>

<sup>k</sup>, …, <sup>k</sup>α<sup>v</sup> iNj

<sup>2</sup>;…; K<sup>∗</sup> l

1;K<sup>∗</sup>

In the problems of ensemble clustering synthesis considered earlier, we did not consider the number of initial clustering algorithms, their quality and their proximity. Ensemble clustering was built and reflected only the opinion of the collective decisions that we used. "Internal" indices [9] reflect the person's ideas about clustering. You can think up examples of data when known internal criteria lead to degenerate solutions.

At the same time, a person has the ability to cluster visual sets on a plane without using any proximity functions, criteria and indices. The following idea was realized. A person can personally cluster projections of sets of points from Rn into R<sup>2</sup> . Having made such clusterings under different projections, we can construct generally speaking various N clusterings, which we submit to the input of the construction of the collective solution. The person himself "does not see" the objects in Rn, but can exactly solve the clustering tasks on the plane. Thus, here we use N precise solutions, but of various partial information about the data. Consider this videological method on one model example.

A sample of two normal distributions with independent characteristics was considered. The first feature of the first distribution (200 objects) had zero expectation and the standard deviation, the first attribute of the second distribution (200 objects) had these values equal to 5. All the other 49 attributes for all objects had а<sup>i</sup> ¼ 5, σ<sup>i</sup> ¼ 5, i ¼ 2, 3, …, 50. That is, the two sets had equal distributions for 49 features and one informative feature. Clustering of the entire sample by minimizing dispersion is shown in Figure 9. Black and gray points on sample visualization represent the objects of the first and second clusters. Here the fact of informative character of the first feature is lost.

The program of the video-logical approach worked as follows. With the help of a single heuristic approach, all C<sup>2</sup> <sup>n</sup> projections are automatically ordered according to the descending criteria of the presence of two clusters. Next we as experts consider some projections and with the help of the mouse we select in each of them two clusters. Figure 10 shows two such examples. Note that the first feature was present in all projections. It was used "manually" as the defining area for the dense location of objects. Then 10 "manual" clustering went to the program entrance for the committee synthesis of the collective solution. Note that only two objects were erroneously clustered.

Figure 9. Clustering of a sample of model objects by the method of minimizing variance.

[4] Lloyd S. Least squares quantization in PCM (PDF). IEEE Transactions on Information

Collective Solutions on Sets of Stable Clusterings http://dx.doi.org/10.5772/intechopen.76189 235

[5] Kriegel H, Kröger P, Sander J, Zimek A. Density-based clustering. WIREs Data Mining

[6] Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977;39(1):1-38 JSTOR 2984875.

[7] Jain A, Dubes R. Algorithms for Clustering Data. Englewood Cliffs: Prentice-Hall, Inc.; 1998 [8] Kaufman L, Rousseeuw P. Finding Groups in data: An Introduction to Cluster Analysis.

[9] Aggarwal C. Data Mining: The Textbook. Yorktown Heights/New York: IBM T.J. Watson

[10] Kuncheva L. Combining Pattern Classifiers: Methods and Algorithms. Hoboken: Wiley;

[12] Ryazanov V. Commitee synthesis of algorithms for recognition and classification. Journal of Computational Mathematics and Mathematical Physics. 1981;21(6):1533-1543. DOI:

[13] Ryazanov V. On the synthesis of classification algorithms on finite sets of classification algorithms (taxonomy). Journal of Computational Mathematics and Mathematical Phys-

[14] Ryazanov V. One approach for classification (taxonomy) problem solution by sets of heuristic algorithms. In: Proceedings of the 9-th Scandinavian Conference on Image Anal-

[15] Biryukov A, Shmakov A, Ryazanov V. Solving the problems of cluster analysis by collectives of algorithms. Journal of Computational Mathematics and Mathematical Physics.

[16] Kuhn H. The Hungarian method for the assignment problem. Naval Research Logistics

[17] Ryazanov V. Estimations of clustering quality via evaluation of its stability. In: Bayro-Corrochano E, Hancock E, editors. CIARP 2014. LNCS. Vol. 8827; 2014. pp. 432-439. DOI: 10.

[18] Ryazanov V. About estimation of quality of clustering results via its stability. Intelligent

[19] Sigillito V, Wing S, Hutton L, Baker K. Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest. 1989;10:262-266

Theory. 1982;28(2):129-137. DOI: 10.1109/TIT.1982.1056489

Research Center; 2015. 771 p. DOI: 10.1007/978-3-319-14142-8

ics. 1982;22(2):429-440. DOI: 10.1016/0041-5553(82)90049-0

ysis; 6–9 June 1995; Uppsala; 1995(2). pp. 997-1002

2008;48(1):176-192. DOI: 10.1134/S0965542508010132

Quarterly. 1955;2:83-97. DOI: 10.1002/nav.3800020109

Data Analysis. 2016;20:S5-S15. DOI: 10.3233/IDA-160842

[11] Desgraupes B. Clustering indices. University Paris Ouest. Lab Modal'X; 2013

MR 0501537

New York: Wiley; 2009

2004. DOI: 10.1002/9781118914564

10.1016/0041-5553(81)90161-0

1007/978-3-319-12568-8\_53

and Knowledge Discovery. 2011;1(3):231-240. DOI: 10.1002/widm.30

Figure 10. Allocation of clusters by mouse on the (1.4) and (1.6) features.
