**5. Experimental results and discussion**

To test how the matching-based algorithm performs on real-world dataset, we have employed it to the soybean disease dataset [23]. It is one of the standard test data

**Figure 1.** *Dendrogram, based on the user-defined number of clusters.*

sets used in the machine learning community. It has often been used to test conceptual clustering algorithms. We chose this data set to test our algorithm because of its publicity and because all its attributes can be treated as categorical without categorization. The soybean data set has 47 observations, each being described by 35 attributes. Each observation is identified by one of the four diseases – Diaporthe Stem Canker, Charcoal Rot, Rhizoctonia Root Rot, and Phytophthora Rot. Which are used as indicators of the efficiency of the algorithm.

After applying the MBC algorithm to the soybean disease dataset, we got 18 different clusters.


However, as we can see from the table above all the clusters except for one entirely belong to one of the groups mentioned above. In other words, we have only one possible misclassification. However, as already mentioned one may require specific number of clusters. In this case, one can use the dendrogram (**Figure 2**).

Furthermore, we can compare the performance of the algorithm with K-modes [35]. In that paper, the algorithm was also applied to the soybean disease dataset. The author emphasized the fact that K-modes depend on the data order and the user should also give the number of clusters. In case of MBC, we do not have these

*Perspective Chapter: Matching-Based Clustering Algorithm for Categorical Data DOI: http://dx.doi.org/10.5772/intechopen.109548*

#### **Figure 2.**

*Dendogram for the soybean disease dataset after applying the MBC algorithm.*

limitations. Thus the application of the MBC algorithm may result in more accurate outcome. However, we still should consider how to lower the number of clusters in case of it.
