**2.1. NG algorithm**

The NG network algorithm is a simple artificial neural network algorithm for finding optimal data representations based on reference vectors (prototype vectors). It was first introduced in 1991 [14] and is based on Kohonen's SOM [15]. Because of the dynamics of the reference vectors during the adaptation process, this algorithm was called "neural gas" that spread itself as a gas through the data space. NG is unlike other methods that consider distance as a rank like Euclidean distance, but it proposes a new way of calculating the influence of distance. Nearer prototypes in NG algorithm are more affected, but it does not depend directly on the influence of distance.

NG has been successfully applied to clustering [16], speech recognition [17], image processing [18], vector quantization, pattern recognition, topology representation, etc., [19, 20] especially where there is a problem arriving at vector quantization or data compression.

It adapts the reference vectors (prototype vectors) "*wi* " without any fixed topological arrangement within the network. NG not just adapts the winner vector for a specific input vector as a single-layered soft competitive learning neural network, but also updates the residual reference vectors according to the input vector nearness using a soft-max updating rule [21]. The main advantages of NG network [22] are: (1) lower distortion error than other clustering algorithms (k-means, maximum-entropy and SOM), (2) faster assemblage due to low distortion errors, (3) submission a stochastic gradient descent on a specific energy surface.

The NG algorithm is represented by the dependence of updating strengths for *c* reference vectors *wci*(*<sup>i</sup>* 0 , *i* 1 , …,*i <sup>N</sup>*−1) on their position ranking. If the input vector is presented by *x*, the definition of the position ranking (*wi*<sup>0</sup> , *wi*<sup>1</sup> , …,*wik*) of the reference vectors *wci* will be:

 $w\_o$  is adjacent to  $x$ .

learning, pattern recognition [3], image analysis [4], information retrieval [5], and bioinformatics [6]. There are different algorithms related to neural networks; the most popular are K-means,

The goal of this work is to present a comparison among neural gas (NG), growing neural gas (GNG), and robust growing neural gas (RGNG) approaches that are related to neural networks, as well as design a new simulation tool for the purpose of education and scientific research using unsupervised learning methods. Due to the difficulty in introducing these algorithms in literature, the three techniques have been presented using a simple graphical user interface (GUI) model. Alziarjawey et al. [9] introduced an application of Matlab GUI in the medical field using the ECG signal for heart rate monitoring and PQRST detection. They introduced another application by developing a software package based on GUI, which consists of two modules using many important methods derived from linear algebra [10]. Aljobouri et al. [11] designed an educational tool for biosignal processing and medical imaging using a GUI package. The user friendly package explained in this work can be used easily by: choosing any method, changing the predefined parameters for each algorithm and comparing the results. Hence, it can be used without any programming knowledge. The interested reader may find more technical details in our previous reports and publications [12, 13]. The current study is organized as follows: Section 2 provides the unsupervised clustering algorithms. Case studies are described in Section 3. Sections 4 and 5 present the experimental implementation on the synthetic dataset and clustering package design, respectively. Finally,

In this section, a review of the NG, GNG, and RGNG algorithms are presented. Because of the length and complexity of these algorithms, along with the mathematical model, flowcharts are designed for the three algorithms in this work in order to make it more understandable

The NG network algorithm is a simple artificial neural network algorithm for finding optimal data representations based on reference vectors (prototype vectors). It was first introduced in 1991 [14] and is based on Kohonen's SOM [15]. Because of the dynamics of the reference vectors during the adaptation process, this algorithm was called "neural gas" that spread itself as a gas through the data space. NG is unlike other methods that consider distance as a rank like Euclidean distance, but it proposes a new way of calculating the influence of distance. Nearer prototypes in NG

NG has been successfully applied to clustering [16], speech recognition [17], image processing [18], vector quantization, pattern recognition, topology representation, etc., [19, 20] especially

algorithm are more affected, but it does not depend directly on the influence of distance.

where there is a problem arriving at vector quantization or data compression.

the self-organizing map (SOM), neural gas (NG), and growing neural gas (GNG) [7, 8].

Section 6 concludes the paper and introduces future work.

**2. Unsupervised clustering algorithms**

and easier to write the related codes.

**2.1. NG algorithm**

176 Recent Applications in Data Clustering

*wi*1 is second adjacent to *x*

for *<sup>k</sup>* <sup>=</sup> 1, 2, …,*<sup>N</sup>* <sup>−</sup> <sup>1</sup>; ‖*<sup>x</sup>* <sup>−</sup> *wj*‖ <sup>&</sup>lt; ‖*<sup>x</sup>* <sup>−</sup> *wik*‖, where *wik* is the reference vector, which has *k* vectors *wj* .

*k i* (*x*, *<sup>w</sup>*) is the ranking index associated with each weight *wi* .

The updating step of adjusting *wi* according to a Hebb-like learning rule is given by:

$$
\Delta w\_i = \varepsilon(t) h\_\lambda(k\_\uparrow \text{x}, w\mathfrak{h})\_\uparrow \text{cx} - w\_\downarrow \mathfrak{h} \quad i = 1, 2, \dots, c \tag{1}
$$

where:

*h*(.,. ): deterministic function with some regularity condition imposed on it.

*ε*(*t*) ∈ [0, 1]: the learning rate (step size) that characterize the total range of the variation. This extent is represented by {*ε*(*t*) <sup>=</sup> *<sup>ε</sup><sup>i</sup>* . (*εf* /*εi*) *t*/(Max\_ter) }, so Max\_iter, so and *t* denote the maximum number of repetitions and the repetition step respectively.

*hλ*( *k i* (*v*, *<sup>w</sup>*) <sup>∈</sup> [0, 1]: considers the *wi* within the input extent.

for *h<sup>λ</sup>* (*k*) ∈ [0, 1], the exponential form exp(−*k*/*λ*) was proposed [22] to obtain the best extensive result compared to other options like the Gaussian function.

*λ*: finds the number of reference vectors that significantly change their positions in the updating steps and usually individually decrease with the iteration step *t* as: *λ*(*t*) <sup>=</sup> *<sup>λ</sup><sup>i</sup>* . (*λf* /*λi*) *t*/(Max\_iter) .

The NG algorithm is widely related to the structure of fuzzy clustering methods [23]. So, NG used the uncertainty of the relationship value (*hλ*(*<sup>k</sup> i* (*x*, *<sup>w</sup>*)))/(*<sup>C</sup>* (*λ*))to set each input vector "*x*" to all the reference vectors *wi* (*<sup>i</sup>* <sup>=</sup> 1, 2, …,*c*)instead of using *uij* (2 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>c</sup>*, <sup>1</sup> <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>N</sup>*) in FCM algorithm. This algorithm is based on solving a cost function using iterative methods plus the familiarity with linear optimization methods, essentially the gradient descent method and Newton's method. Therefore, the NG cost function to optimize [22] is:

$$E\_{\rm reg} = \frac{1}{2\text{C}\lambda\lambda} \sum\_{i=1}^{\epsilon} \left\{ \mathbf{P}(\mathbf{x}) \, h\_{\lambda}(k\_{\uparrow}\mathbf{x}, \mathbf{w}) \right\| \|\mathbf{x} - \mathbf{w}\_{\parallel}\|^2 \tag{2}$$

with

$$C(\lambda) \ = \sum\_{\lambda=1}^{\zeta} h\_{\lambda}(k\_{\zeta}) \ = \sum\_{\lambda=0}^{\zeta \cdot 1} h\_{\lambda}(k) \tag{3}$$

Martinetz et al. [22, 26] introduced this cost function and proved that the updating in the Hebb-like learning rule can be derived by a stochastic gradient descent on this function. By starting with a large value of *λ* and reducing it slowly, a good reference vector can be obtained.

Due to the sequential learning scheme in NG algorithms and the use of the neighborhood dealing rule, NG became less sensitive to various initializations due to the sequential learning scheme and use of neighborhood cooperation rule with comparison to other clustering algorithms like k-means and FCM.

Before feeding the NG algorithm, there are some parameters that have to be defined:

*N*: maximal number of neurons

*εi* , *εf* : step size

*λi* , *λf* : decay constant

*Ti* , *Tf* : life-time

*t* max = Max\_iter = (maximal number of iterations)

**Figure 1** shows the flowchart of the NG algorithm. Although the NG model has many advantages as mentioned earlier, it also has some limitations. It depends on decaying parameters that change over time; it is incapable of finding a network size and structure automatically and continue learning. Hence, based on the NG algorithm, the GNG algorithm was introduced by Fritzke [24, 25], which has an advantage over NG algorithms through its ability to modify the network topology by removing edges with its age variable. Moreover, during the growth process associated with the neighborhood updating rule, there is no need for the neighborhood sorting step [24, 25]. It has the ability to find a network size and structure automatically, and continue learning, adding units and connections, until a performance criterion is fulfilled.

#### **2.2. GNG algorithm**

In the GNG algorithm, Fritzke [24, 27] proposed changing the unit numbers (mostly increased) during SOM network with a variable topological structure [24, 25]. This growth mechanism is combined with topology formation rules using the competitive Hebbian learning (CHL) [26] and the earlier proposed growing mechanism inherited from the growing cell structures [27] to form a new model.

Each new unit is inserted near the unit that has accumulated the highest error, and a connection between the winner and the second nearest neuron is formed using the competitive

Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index

http://dx.doi.org/10.5772/intechopen.74506

179

Before feeding the GNG algorithm, there are some parameters that have to be defined:

: constant learning rate for the winner and its topological neighbors, respectively

Hebbian learning algorithm.

**Figure 1.** The flowchart of an NG algorithm.

*N*: maximal number of neurons

*εb* , *εn*

The GNG algorithm needs only constant parameters; it is not required to set the amount of prototypes. The main idea behind the GNG is to start with a minimal network size and insert a few new neurons and connections respectively in a growing structure by using a vector quantization until the desired characteristics of the model is fulfilled (e.g., net size, time limit, predefined number of neurons inserted, quality or some performance measure). To determine where to insert new units, local error measures are gathered during the adaptation process. Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index http://dx.doi.org/10.5772/intechopen.74506 179

**Figure 1.** The flowchart of an NG algorithm.

with

178 Recent Applications in Data Clustering

*εi* , *εf*

*λi* , *λf*

*Ti* , *Tf*

*t*

: step size

: life-time

**2.2. GNG algorithm**

to form a new model.

: decay constant

*C*(*λ*) = ∑

algorithms like k-means and FCM.

*N*: maximal number of neurons

max = Max\_iter = (maximal number of iterations)

*i*=1 *c hλ*(*ki*

Before feeding the NG algorithm, there are some parameters that have to be defined:

**Figure 1** shows the flowchart of the NG algorithm. Although the NG model has many advantages as mentioned earlier, it also has some limitations. It depends on decaying parameters that change over time; it is incapable of finding a network size and structure automatically and continue learning. Hence, based on the NG algorithm, the GNG algorithm was introduced by Fritzke [24, 25], which has an advantage over NG algorithms through its ability to modify the network topology by removing edges with its age variable. Moreover, during the growth process associated with the neighborhood updating rule, there is no need for the neighborhood sorting step [24, 25]. It has the ability to find a network size and structure automatically, and continue learning, adding units and connections, until a performance criterion is fulfilled.

In the GNG algorithm, Fritzke [24, 27] proposed changing the unit numbers (mostly increased) during SOM network with a variable topological structure [24, 25]. This growth mechanism is combined with topology formation rules using the competitive Hebbian learning (CHL) [26] and the earlier proposed growing mechanism inherited from the growing cell structures [27]

The GNG algorithm needs only constant parameters; it is not required to set the amount of prototypes. The main idea behind the GNG is to start with a minimal network size and insert a few new neurons and connections respectively in a growing structure by using a vector quantization until the desired characteristics of the model is fulfilled (e.g., net size, time limit, predefined number of neurons inserted, quality or some performance measure). To determine where to insert new units, local error measures are gathered during the adaptation process.

Martinetz et al. [22, 26] introduced this cost function and proved that the updating in the Hebb-like learning rule can be derived by a stochastic gradient descent on this function. By starting with a large value of *λ* and reducing it slowly, a good reference vector can be obtained. Due to the sequential learning scheme in NG algorithms and the use of the neighborhood dealing rule, NG became less sensitive to various initializations due to the sequential learning scheme and use of neighborhood cooperation rule with comparison to other clustering

) = ∑ *k*=0 *c*−1 *hλ*

(*k*) (3)

Each new unit is inserted near the unit that has accumulated the highest error, and a connection between the winner and the second nearest neuron is formed using the competitive Hebbian learning algorithm.

Before feeding the GNG algorithm, there are some parameters that have to be defined:

*N*: maximal number of neurons

*εb* , *εn* : constant learning rate for the winner and its topological neighbors, respectively


Max\_iter:: maximal number of iterations

Each reference vector *wi* ,*i* = 1, 2, …,*c*, has a set of edges emanating from it. It is defined to connect with its direct topological neighbors. The GNG algorithm starts by initializing a few prototype vectors (usually two) *<sup>W</sup>* <sup>=</sup> {*w*<sup>1</sup> , *<sup>w</sup>*2} with reference vectors that are chosen randomly. New prototype vectors are successively inserted. The learning rates *ε<sup>b</sup>* , *εn* are used in the training procedure and a connection is formed *C*, *C* ⊂ *w* × *w*, to the empty set: *C* = ∅.

The pre-specified maximum number of prototypes or neurons is set to grow as pre\_numnode; and the maximum predefined training epoch Max\_iter is set during each growth stage with the largest local accumulated error measure. The data set used for training is *<sup>X</sup>* <sup>=</sup> {*x*<sup>1</sup> , *x*2 , …,*xN*}. Then, the initial training epoch number is set as *m* = 0 and the iteration step in the training epoch *m* is set as: *t* = 0.

**Figure 2** presents the flowchart of the GNG algorithm. This figure shows that nonfunctional prototypes that do not win over long time intervals may be detected by tracing the changes of an age variable associated with each edge. Hence, the GNG algorithm has an advantage against the NG algorithm through its ability to modify the network topology by removing edges with their age variable (not being refreshed for a time interval α\_max) and the resultant nonfunctional prototypes. In the GNG algorithm, the growth process associated with the neighborhood updating rule used is somewhat similar to the neighborhood, decreasing procedure in NG. However, unlike the NG algorithm, there is no need for the neighborhood sorting step.
