**2.3. RGNG algorithm**

Any robust algorithm should have the following features [28]:


If classical clustering methods are to be used as prototype based clustering algorithms, the major robustness problems are the sensitivity to initialization, the order of input vectors, and existence of many outliers, but each well executed regarding condition 1. Due to the growth scheme associated with the GNG algorithm, the algorithm faces the "dead nodes" problem. This occurs due to inappropriate initializations that led to some prototypes that may never win through the training process.

**Figure 2.** The flowchart of the GNG algorithm.

Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index

http://dx.doi.org/10.5772/intechopen.74506

181

Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index http://dx.doi.org/10.5772/intechopen.74506 181

**Figure 2.** The flowchart of the GNG algorithm.

λ: iteration number

180 Recent Applications in Data Clustering

Each reference vector *wi*

*m* is set as: *t* = 0.

sorting step.

**2.3. RGNG algorithm**

win through the training process.

*α*: reduction of error counter by inserting a new neuron

Max\_iter:: maximal number of iterations

prototype vectors (usually two) *<sup>W</sup>* <sup>=</sup> {*w*<sup>1</sup>

*β*: value that will reduce the overall value of the error counter every iteration step

New prototype vectors are successively inserted. The learning rates *ε<sup>b</sup>*

Any robust algorithm should have the following features [28]:

**1.** It should achieve a good precision for the given model.

ing procedure and a connection is formed *C*, *C* ⊂ *w* × *w*, to the empty set: *C* = ∅.

largest local accumulated error measure. The data set used for training is *<sup>X</sup>* <sup>=</sup> {*x*<sup>1</sup>

nect with its direct topological neighbors. The GNG algorithm starts by initializing a few

The pre-specified maximum number of prototypes or neurons is set to grow as pre\_numnode; and the maximum predefined training epoch Max\_iter is set during each growth stage with the

the initial training epoch number is set as *m* = 0 and the iteration step in the training epoch

**Figure 2** presents the flowchart of the GNG algorithm. This figure shows that nonfunctional prototypes that do not win over long time intervals may be detected by tracing the changes of an age variable associated with each edge. Hence, the GNG algorithm has an advantage against the NG algorithm through its ability to modify the network topology by removing edges with their age variable (not being refreshed for a time interval α\_max) and the resultant nonfunctional prototypes. In the GNG algorithm, the growth process associated with the neighborhood updating rule used is somewhat similar to the neighborhood, decreasing procedure in NG. However, unlike the NG algorithm, there is no need for the neighborhood

**2.** The performance of the given model may have few deviations from the assumptions made, but these deviations should not weaken the performance, except by a small degree. **3.** The presence of large deviations from the model assumptions should not cause disaster.

If classical clustering methods are to be used as prototype based clustering algorithms, the major robustness problems are the sensitivity to initialization, the order of input vectors, and existence of many outliers, but each well executed regarding condition 1. Due to the growth scheme associated with the GNG algorithm, the algorithm faces the "dead nodes" problem. This occurs due to inappropriate initializations that led to some prototypes that may never

,*i* = 1, 2, …,*c*, has a set of edges emanating from it. It is defined to con-

, *<sup>w</sup>*2} with reference vectors that are chosen randomly.

, *εn*

are used in the train-

, *x*2

, …,*xN*}. Then,

Even with initialization-insensitive clustering methods, good clustering results may not be obtained if the order of the input sequence is not chosen properly.

Even with the initialization insensitive clustering methods, good clustering results may not be obtained if the order of the input sequence is not chosen properly. As well as the introduced problem gets along with the sensitivity for initialization and the order of input vectors, there also another problem attributable to the existence of many outliers. This implies the GNG network may fail to differentiate the outliers from the inliers through the original prototype updating rule when many of outliers exist in a data set. These outliers can be regarded as input vectors that different from data points belonging to the ordinary clusters (inliers).

For these limitations of the GNG algorithm, a novel robust clustering algorithm was proposed [29] within the GNG structure, namely the robust growing neural gas (RGNG) network. RGNG possesses better robustness than the original GNG algorithm because of its succession properties. It also incorporates with it several robust strategies, such as outlier resistant scheme, adaptive modulation of learning rates, and cluster repulsion method.

Therefore, compared to the GNG network, the RGNG network is insensitive to initialization, input sequence ordering, the presence of outliers, and determination of the optimal number of clusters. The minimum description length (MDL) value was used with RGNG as the clustering validity index [30, 31]. The MDL value is used to find the optimal number of clusters and their center positions corresponding to the smallest MDL. This determined automatically the optimal number of clusters by searching the extreme value of the MDL measure through the network growing process.

Before feeding the RGNG algorithm, there are some parameters that have to be defined:

*N*: maximal number of neurons

*εb l* : learning rate of the winner

*εn l* : learning rate of its topological neighbors

*εbf l* , *εbi l* , *εnf l* , *εni l* : initial and final values of *ε<sup>b</sup> l* and *ε<sup>n</sup> l*

*<sup>α</sup>*max: maximal age of a connection

*β*: mobility of the winner's neighborhood toward the input vector

*k*, *η*: parameters used to determine the MDL value

Max\_iter: maximal number of iterations

The maximum number of nodes may be set to increase the pre\_numnode and Max\_iter and during each step with a defined number of nodes. The initial training epoch number (*m* = 0) and the iteration stage in training epoch *m* at *t* = 0 may also be set. Hence, the total iteration step iter during each increasing step is iter = *m*.*N* + *t*, where *N* is an actual number of the neuron. The dataset used for training is *<sup>X</sup>* <sup>=</sup> {*x*<sup>1</sup> , *x*2 , …,*xN*}.

**Figure 3.** The flowchart of the RGNG algorithm.

Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index

http://dx.doi.org/10.5772/intechopen.74506

183

**Figure 3** presents the flowchart of the RGNG algorithm.

Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index http://dx.doi.org/10.5772/intechopen.74506 183

**Figure 3.** The flowchart of the RGNG algorithm.

Even with initialization-insensitive clustering methods, good clustering results may not be

Even with the initialization insensitive clustering methods, good clustering results may not be obtained if the order of the input sequence is not chosen properly. As well as the introduced problem gets along with the sensitivity for initialization and the order of input vectors, there also another problem attributable to the existence of many outliers. This implies the GNG network may fail to differentiate the outliers from the inliers through the original prototype updating rule when many of outliers exist in a data set. These outliers can be regarded as input vectors that different from data points belonging to the ordinary clusters (inliers).

For these limitations of the GNG algorithm, a novel robust clustering algorithm was proposed [29] within the GNG structure, namely the robust growing neural gas (RGNG) network. RGNG possesses better robustness than the original GNG algorithm because of its succession properties. It also incorporates with it several robust strategies, such as outlier resistant

Therefore, compared to the GNG network, the RGNG network is insensitive to initialization, input sequence ordering, the presence of outliers, and determination of the optimal number of clusters. The minimum description length (MDL) value was used with RGNG as the clustering validity index [30, 31]. The MDL value is used to find the optimal number of clusters and their center positions corresponding to the smallest MDL. This determined automatically the optimal number of clusters by searching the extreme value of the MDL measure through

Before feeding the RGNG algorithm, there are some parameters that have to be defined:

The maximum number of nodes may be set to increase the pre\_numnode and Max\_iter and during each step with a defined number of nodes. The initial training epoch number (*m* = 0) and the iteration stage in training epoch *m* at *t* = 0 may also be set. Hence, the total iteration step iter during each increasing step is iter = *m*.*N* + *t*, where *N* is an actual number of the neuron. The

*l* and *ε<sup>n</sup> l*

*β*: mobility of the winner's neighborhood toward the input vector

, *x*2 , …,*xN*}.

**Figure 3** presents the flowchart of the RGNG algorithm.

scheme, adaptive modulation of learning rates, and cluster repulsion method.

the network growing process.

182 Recent Applications in Data Clustering

*N*: maximal number of neurons

: learning rate of the winner

*<sup>α</sup>*max: maximal age of a connection

Max\_iter: maximal number of iterations

dataset used for training is *<sup>X</sup>* <sup>=</sup> {*x*<sup>1</sup>

: learning rate of its topological neighbors

: initial and final values of *ε<sup>b</sup>*

*k*, *η*: parameters used to determine the MDL value

*εb l*

*εn l*

*εbf l* , *εbi l* , *εnf l* , *εni l*

obtained if the order of the input sequence is not chosen properly.
