3. Hard C-means (HCM)

been developed. However, there is still no satisfactory performance especially in noisy images. This makes development of segmentation algorithms that are capable of handling noisy images an active area of research. The current segmentation methods can be classified into thresholding, region-detection, edge-detection, probabilistic and artificial neural-network classification and clustering [1–3]. Among the widely used are the hard and fuzzy-based clustering methods since clustering needs no training examples [4–24]. Hard C-means (HCM) also called K-means clustering algorithm is an unsupervised approach in which data is basically partitioned based on locations and distances between various data points [4–6]. K-means partitions the data into C-clusters so that the distances between data within each cluster are as close as possible but as far as possible between data in different clusters. HCM clustering algorithm offers crisp segmentation in which each data point belongs to only one cluster. Thereby it does not take into consideration fine details of infrastructure of data such as hybridization or mixing. Compared with HCM algorithm, fuzzy C-means (FCM) algorithm is able to provide soft segmentation by incorporating membership of belonging described by a membership function [7, 8]. However, one disadvantage of the standard FCM is not incorporating any spatial or local information in image context, making it very sensitive to additive noise and other imaging artifacts. To handle this problem, different techniques have been developed [9–13]. These techniques have involved spatial or local data information for the enhancement and regularization of the performance of the standard FCM algorithm. Local membership information has also been employed to generate a parameter to weight or modify the membership function in order to give more weight to the pixel membership if the immediate neighborhood pixels are of the same cluster [14]. HCM algorithm has also been fuzzified by

In this chapter, HCM clustering algorithm is modified by incorporating local spatial data and Kullback-Leibler (KL) membership divergence [18–22]. The local data information is incorporated via an additional weighted HCM function in which the smoothed image data is used for the distance computation. The KL membership divergence aims at minimizing the information distance between the membership function of each pixel and the locally smoothed one in the pixel vicinity. The KL membership divergence thus provides an approach for regularization and fuzzification. The chapter is organized as follows. In Section 2, clustering problem formulation is overviewed. In Section 3, HCM clustering algorithm is described. In Section 4, several FCM-related clustering algorithms are explained. In Sections 5 and 6, the proposed local membership KL divergence-based FCM (LMKLFCM) and Local Data and membership KL divergence-based FCM (LDMKLFCM) clustering algorithm are discussed. In Section 7, simulation results of clustering and segmentation of synthetic and real-world images are presented.

The objective is to cluster a set of observed data f g xn; n ¼ 1; 2; ::; N where each data point is an <sup>M</sup> � dimensional real-vector called the feature or the pattern vector, i.e., xn <sup>∈</sup> R1�<sup>M</sup>. For gray-scale image data, f g xn; n ¼ 1; 2; ::; N is a row-wise concatenation of a 2-D image

involving membership entropy optimization [15–17].

Finally Section 8 presents the conclusion.

2. Problem formulation

36 Recent Applications in Data Clustering

In hard C-means (HCM) algorithm also called the K-means one, the objective is to minimize the following function [4–6, 15].

$$J\_{\rm HCM} = \sum\_{i=1}^{\mathcal{C}} \sum\_{n=1}^{\mathcal{N}} u\_{in} d\_{in} \tag{1}$$

where din ¼ k k xn � vi 2 , is the square of the Euclidian distance between the nth pixel feature xn of the image under segmentation and vi ∈ V ¼ f g v1; v2;…; vC called the center of the ith cluster given by

$$\mathbf{v}\_{i} = \frac{\sum\_{\mathbf{x}\_{n} \in \mu\_{i}} \mathbf{x}\_{n}}{N\_{i}}, \quad \mathbf{i} = \mathbf{1}, \mathbf{2}, \dots, \mathbf{C}. \tag{2}$$

where μ<sup>i</sup> is the ith cluster label and Ni is its number of patterns in cluster i. In (2), it is clear that the pattern xn belongs to only one cluster which means that uin ∈f g 0; 1 called the membership function is given by [15].

$$\mu\_{kn} = \begin{cases} 1; \ k = \arg\min\_{i}(d\_{in}) \\ 0, \text{Otherwise} \end{cases} \tag{3}$$

From (3), it is obvious that the HCM provides a crisp membership function uin ∈ f g 0; 1 or {False, True}. uin ∈ f g 0; 1 . Thus HCM algorithm does not take into account fine details of infrastructure

Given xn, n ¼ 1, 2,…, N: Initialize v<sup>0</sup> <sup>i</sup> , i ¼ 1, 2, ::, C; t ¼ 0; 1. For n ¼ 1, 2, …, N Compute: 2. din <sup>¼</sup> xn � vt i � � � �2 ; i ¼ 1, 2,…, C: 3. k ¼ arg minið Þ din ; ukn=1; uin ¼ 0; i ¼ 1, 2, ::, C; i 6¼ k: (HCM); uin <sup>¼</sup> <sup>1</sup> P<sup>C</sup> j¼1 din djn � � <sup>1</sup> ð Þ <sup>m</sup>�<sup>1</sup> (FCM) 4. Update <sup>t</sup> <sup>¼</sup> <sup>t</sup> <sup>þ</sup> <sup>1</sup>; v<sup>t</sup>þ<sup>1</sup> <sup>i</sup> <sup>¼</sup> P <sup>n</sup> <sup>P</sup>uin xn <sup>n</sup> uin , i ¼ 1, 2, …, C: 5. Check if <sup>V</sup><sup>t</sup> � <sup>V</sup><sup>t</sup>þ<sup>1</sup> � � � �<sup>2</sup> <sup>&</sup>gt; <sup>ε</sup> (negligible change); repeat 1–5 until convergence.

Table 1. Pseudo code of the HCM (FCM) algorithms.

such as hybridization or mixing of data which is important in data clustering and decision making. The algorithm is implemented by an iterative procedure as summarized in Table 1.
