**1. Introduction**

Cluster analysis [1] is a robust tool for exploring the underlining structures in data and grouping them with similar objects called clusters. Cluster analysis found applications in different fields ranging from the main task of data mining applications [2] such as scientific data exploration, spatial database applications, web analysis, marketing, medical diagnostics, computational biology, etc., to statistical data analysis that is used in many fields including machine

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

learning, pattern recognition [3], image analysis [4], information retrieval [5], and bioinformatics [6]. There are different algorithms related to neural networks; the most popular are K-means, the self-organizing map (SOM), neural gas (NG), and growing neural gas (GNG) [7, 8].

It adapts the reference vectors (prototype vectors) "*wi*

, *wi*<sup>1</sup>

(*x*, *<sup>w</sup>*) is the ranking index associated with each weight *wi*

for *<sup>k</sup>* <sup>=</sup> 1, 2, …,*<sup>N</sup>* <sup>−</sup> <sup>1</sup>; ‖*<sup>x</sup>* <sup>−</sup> *wj*‖ <sup>&</sup>lt; ‖*<sup>x</sup>* <sup>−</sup> *wik*‖, where *wik*

tors *wci*(*<sup>i</sup>* 0 , *i* 1 , …,*i*

*wi*0

*wi*1

*k i*

where:

*h*(.,.

*hλ*( *k i*

for *h<sup>λ</sup>*

of the position ranking (*wi*<sup>0</sup>

is second adjacent to *x*

The updating step of adjusting *wi*

extent is represented by {*ε*(*t*) <sup>=</sup> *<sup>ε</sup><sup>i</sup>*

(*v*, *<sup>w</sup>*) <sup>∈</sup> [0, 1]: considers the *wi*

∆*wi* = *ε*(*t*).*hλ*(*ki*

is adjacent to *x*.

ment within the network. NG not just adapts the winner vector for a specific input vector as a single-layered soft competitive learning neural network, but also updates the residual reference vectors according to the input vector nearness using a soft-max updating rule [21]. The main advantages of NG network [22] are: (1) lower distortion error than other clustering algorithms (k-means, maximum-entropy and SOM), (2) faster assemblage due to low distor-

The NG algorithm is represented by the dependence of updating strengths for *c* reference vec-

, …,*wik*) of the reference vectors *wci*

(*x*, *w*)).(*x* − *wi*

*ε*(*t*) ∈ [0, 1]: the learning rate (step size) that characterize the total range of the variation. This

(*k*) ∈ [0, 1], the exponential form exp(−*k*/*λ*) was proposed [22] to obtain the best extensive

*λ*: finds the number of reference vectors that significantly change their positions in the updat-

The NG algorithm is widely related to the structure of fuzzy clustering methods [23]. So, NG

all the reference vectors *wi* (*<sup>i</sup>* <sup>=</sup> 1, 2, …,*c*)instead of using *uij* (2 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>c</sup>*, <sup>1</sup> <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>N</sup>*) in FCM algorithm. This algorithm is based on solving a cost function using iterative methods plus the familiarity with linear optimization methods, essentially the gradient descent method and Newton's

∫*P*(*x*) *hλ*(*ki*

*i*

within the input extent.

ing steps and usually individually decrease with the iteration step *t* as: *λ*(*t*) <sup>=</sup> *<sup>λ</sup><sup>i</sup>*

<sup>2</sup>*C*(*λ*) ∑ *i*=1 *c*

): deterministic function with some regularity condition imposed on it.

. (*εf* /*εi*) *t*/(Max\_ter)

result compared to other options like the Gaussian function.

method. Therefore, the NG cost function to optimize [22] is:

of repetitions and the repetition step respectively.

used the uncertainty of the relationship value (*hλ*(*<sup>k</sup>*

*Eng* <sup>=</sup> \_\_\_\_\_ <sup>1</sup>

*<sup>N</sup>*−1) on their position ranking. If the input vector is presented by *x*, the definition

Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index

.

according to a Hebb-like learning rule is given by:

will be:

is the reference vector, which has *k* vectors *wj*

), *i* = 1, 2, …,*c* (1)

. (*λf* /*λi*) *t*/(Max\_iter) .

(*x*, *<sup>w</sup>*)))/(*<sup>C</sup>* (*λ*))to set each input vector "*x*" to

(*x*, *w*))‖*x* − *wi*‖<sup>2</sup> (2)

}, so Max\_iter, so and *t* denote the maximum number

.

177

tion errors, (3) submission a stochastic gradient descent on a specific energy surface.

" without any fixed topological arrange-

http://dx.doi.org/10.5772/intechopen.74506

The goal of this work is to present a comparison among neural gas (NG), growing neural gas (GNG), and robust growing neural gas (RGNG) approaches that are related to neural networks, as well as design a new simulation tool for the purpose of education and scientific research using unsupervised learning methods. Due to the difficulty in introducing these algorithms in literature, the three techniques have been presented using a simple graphical user interface (GUI) model. Alziarjawey et al. [9] introduced an application of Matlab GUI in the medical field using the ECG signal for heart rate monitoring and PQRST detection. They introduced another application by developing a software package based on GUI, which consists of two modules using many important methods derived from linear algebra [10]. Aljobouri et al. [11] designed an educational tool for biosignal processing and medical imaging using a GUI package. The user friendly package explained in this work can be used easily by: choosing any method, changing the predefined parameters for each algorithm and comparing the results. Hence, it can be used without any programming knowledge. The interested reader may find more technical details in our previous reports and publications [12, 13].

The current study is organized as follows: Section 2 provides the unsupervised clustering algorithms. Case studies are described in Section 3. Sections 4 and 5 present the experimental implementation on the synthetic dataset and clustering package design, respectively. Finally, Section 6 concludes the paper and introduces future work.
