**5. Prototype-based clustering package**

The techniques introduced in this work are designed and implemented in a simple software package tool that allows users to interact with the clustering techniques and output data easily [13]. **Figure 8** shows the main window with the most important features of the designed prototype-based clustering software package.


of the selected "Ring" data is 400x2 double. The selected data is plotted on sketch1 inside the main clustering window of **Figure 8**.

**Figure 9** shows some of selected 2D synthetic datasets from the different datasets that were used in this work. Beside each plot, the information related to it is shown in the "info" window, in the left side of each plot.

**3.** *Selection technique*: The user can select one of the clustering techniques NG, GNG, or RGNG. The RGNG technique is selected as an example for the training in **Figure 8** with Ring data and *N* = 18, which is selected randomly.

Before clicking on "Apply NG," "Apply GNG," or "Apply RGNG" button, the training parameters related to each technique must be defined. As explained in Section 3, the training parameters must be set carefully within the limited range. The number of neurons (*N*) as well as the other parameters related to the selected technique must be defined. Another example of using the RGNG technique with Set3 dataset is shown in **Figure 10**. RGNG training parameters are set as the typical values in literature: *εbi* <sup>=</sup> 0.1, *εbf* <sup>=</sup> 0.01, *εni* <sup>=</sup> 0.005, *<sup>ε</sup>nf* <sup>=</sup> 0.0005, *<sup>α</sup>*max <sup>=</sup> <sup>100</sup>, *<sup>k</sup>* <sup>=</sup> 1.3, *<sup>η</sup>* <sup>=</sup> <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>−</sup><sup>4</sup> ; the number of neurons (*N*) is chosen randomly as 14. When the algorithm's training is

**Figure 8.** Main window of the prototype-based clustering software package.

The average MDL values during the growth stages are plotted versus the number of clusters or prototypes. **Figure 7** shows the curves for the NG and GNG techniques combined with the MDL criterion, as well as the RGNG approach on a synthetic dataset for different number of neurons, which are selected randomly as *N* = 7, 10, and 12. Each detected the cluster number

**Figure 7.** MDL values versus the number of clusters running the NG, GNG, and RGNG techniques on synthetic data,

In RGNG, the smallest MDL value was recorded on average with respect to NG and GNG combined with the MDL principle. For example, in **Figure 7 (b)**, the smallest MDL value is 2.65 that is obtained from running RGNG when *N* is equal to 4. While in the same *N* = 4, higher MDL value of 2.77 is recorded from running NG and GNG. From the presented figures, it is concluded that the proposed RGNG approach is insensitive to different initializations and the presence of outliers and can successfully find the actual number of clusters.

The techniques introduced in this work are designed and implemented in a simple software package tool that allows users to interact with the clustering techniques and output data easily [13]. **Figure 8** shows the main window with the most important features of the designed

**1.** *Selection data*: The user can select any one type of data from the different synthetic 2D datasets in the pop-up menu. Ring data is a 2D synthetic data selected as an example in

**2.** *Load data*: The selected data are loaded and all information related to the selected data ("Dimension," "Name," and "Type of Data") appear in the "info" window. The dimension

corresponding to the MDL value.

for: (a) *N* = 7; (b) *N* = 10; (c) *N* = 12.

188 Recent Applications in Data Clustering

**5. Prototype-based clustering package**

prototype-based clustering software package.

**Figure 8**.

**Figure 9.** Different datasets with their information: (a) snail data; (b) screw data; (c) ring data; (d) Set5 data.

started, the program sketches the output running of the implemented technique as Sketch1. In Sketch1, a Set3 data is shown with firm red circles, which represent the actual cluster centers.

**4.** *MDL plot*: This panel is related to plotting MDL values versus the number of neurons (*N*) running the RGNG, GNG, and NG combined with MDL criterion. This panel includes three main buttons: "No. of neurons (N)," "Technique selection for MDL value," and "Apply MDL versus N" buttons, as shown in **Figure 11**.

After defining the number of neurons (*N*); one, two, or three of the training techniques have to be selected for comparing the MDL results. In the "Technique selection for MDL value" pop-up menu, there are seven selections—either show the result of each technique alone, two of them, or three of them for easy comparison. After clicking on the "Apply MDL versus N" button, the output results of MDL values are plotted with respect to the number of neurons (*N*) in Sketch2.

**Figure 11** shows an example of the MDL plot, defining *N* = 16 and choosing "RGNG & GNG & NG" for comparing the results of the three techniques in Sketch2. For easy and best comparison between the MDL values of the three techniques, the output results sketch in the same figure.

**6. Conclusions**

**Figure 10.** RGNG clustering with Set3 data (*N* = 14).

**Figure 11.** Comparison of MDL values for *N* = 16.

A simple user friendly software package is designed and implemented as an automatic clustering model for any dataset to use as part of the neural network course. NG, GNG, and RGNG algorithms are performed in the same package using a MATLAB-based graphical user

Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index

http://dx.doi.org/10.5772/intechopen.74506

191

Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index http://dx.doi.org/10.5772/intechopen.74506 191

**Figure 10.** RGNG clustering with Set3 data (*N* = 14).


**Figure 11.** Comparison of MDL values for *N* = 16.

#### **6. Conclusions**

started, the program sketches the output running of the implemented technique as Sketch1. In Sketch1, a Set3 data is shown with firm red circles, which represent the actual cluster

**Figure 9.** Different datasets with their information: (a) snail data; (b) screw data; (c) ring data; (d) Set5 data.

**4.** *MDL plot*: This panel is related to plotting MDL values versus the number of neurons (*N*) running the RGNG, GNG, and NG combined with MDL criterion. This panel includes three main buttons: "No. of neurons (N)," "Technique selection for MDL value," and "Apply

After defining the number of neurons (*N*); one, two, or three of the training techniques have to be selected for comparing the MDL results. In the "Technique selection for MDL value" pop-up menu, there are seven selections—either show the result of each technique alone, two of them, or three of them for easy comparison. After clicking on the "Apply MDL versus N" button, the output results of MDL values are plotted with respect to the number of neurons

**Figure 11** shows an example of the MDL plot, defining *N* = 16 and choosing "RGNG & GNG & NG" for comparing the results of the three techniques in Sketch2. For easy and best comparison between the MDL values of the three techniques, the output results sketch in

MDL versus N" buttons, as shown in **Figure 11**.

centers.

190 Recent Applications in Data Clustering

(*N*) in Sketch2.

the same figure.

A simple user friendly software package is designed and implemented as an automatic clustering model for any dataset to use as part of the neural network course. NG, GNG, and RGNG algorithms are performed in the same package using a MATLAB-based graphical user interface (GUI) tool. This visual tool lets the students/ researchers visualize the desired results using plots obtained with the click of a few buttons. The performance of these algorithms on 2D synthetic datasets is reported with respect to statistical estimations to explain the meaning of the output results. These results clarified that RGNG is better than NG and GNG when considering insensitivity to initialization as well as the presence of outliers. RGNG enhances GNG to be more robust toward noisy input dataset by using MDL criteria. Hence, RGNG solves the problem of finding the optimal number of clusters with respect to NG and GNG.

[6] Ressom H, Wang D, Natarajan P. Adaptive double self-organizing maps for clustering

Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index

http://dx.doi.org/10.5772/intechopen.74506

193

[7] Duda RO, Hart PE, Storck DG. Pattern Classification. New York: Wiley-Interscience;

[8] Ripley BD. Pattern Recognition and Neural Networks. New York: Cambridge University

[9] Alziarjawey HA, Cankaya I. Heart rate monitoring and PQRST detection based on

[10] Alziarjawey HAJ, Çamdalı Ü, Çankaya I, Aljobouri H. Design graphical user interface of

[11] AlJobouri HK, Alziarjawey HA, Cankaya I. Biosignal Processing. Medical Imaging and fMRI (BSPMI) Software Package Based on MATLAB GUI for Education and Research.

[12] Aljobouri HK, Çankaya I, Karal O. From biomedical signal processing techniques to fMRI Parcellation. Biosciences Biotechnology Research Asia. 2015;**12**(2):1115-1138 [13] AlJobouri HK, Jaber HA, Çankaya I. Performance evaluation of prototype-based clustering algorithms combined MDL index. Computer Applications in Engineering

[14] Martinetz T, Schulten KA. "Neural Gas" Network Learns Topologies. Artificial Neural

[16] Fernando C, Max C. Modification of the growing neural gas algorithm for cluster analysis. Progress in Pattern Recognition, Image Analysis and Applications, Lecture Notes in

[17] Curatelli F, Mayora-Iberra O.Competitive learning methods for efficient vector Quantizations in a speech recognition environment. MICAI 2000: Advances in artificial intelligence, lecture

[18] Anastassia A, Alexandra P, José GR, Kenneth R. Automatic Landmarking of 2D medical shapes using the growing neural gas network. Computer vision for biomedical image

[19] Atukorale AS, Downs T, Suganthan PN. Boosting the HONG network. Neurocomputing.

[20] Winter M, Metta G, Sandini G. Neural-gas for Function Approximation: A heuristic for minimizing the local estimation error. Proceeding of International Joint Conference on

[21] Haykin S. Neural Networks: A Comprehensive Foundation. 2nd ed. Englewood Cliffs,

applications, lecture notes in computer science. Spring. 2005;**3765**:210-219

linear algebra system package by using MATLAB. IJRITCC. 2016;**4**:428-433

gene expression profiles. Neural Networks. 2003;**16**(5-6):633-640

graphical user interface with Matlab. IJIEE. 2015;**5**:311-316

[15] Kohonen T. Self-Organizing Maps. 3rd ed. Berlin: Springer; 2001

2000

Press; 1996

2015;**1**:2380-8128

2003;**51**:75-86

NJ: Prentice-Hall; 1998

Education, Wiley Inc. 2017;**25**(4):642-654

Networks. Elsevier; 1991. pp. 397-402

Computer Science, Springer. 2007;**4756**:684-693

notes in computer science. Spring. 2000;**1793**:108-114

Neural Network (IJCNN), Italy. 2000:535-538

For future research directions, other unsupervised or supervised clustering algorithms may be used in the laboratory experiments. Another research direction is to apply the comparison among the three clustering algorithms to real multimodal datasets in medical applications. The package results could also be shared to websites using ASP .NET, which can give facility for users by sharing applications which requires no installation of MATLAB or any special program just a Web browser.
