Data Clustering for Fuzzyfier Value Derivation

*JaeHyuk Cho*

#### **Abstract**

The fuzzifier value m is improving significant factor for achieving the accuracy of data. Therefore, in this chapter, various clustering method is introduced with the definition of important values for clustering. To adaptively calculate the appropriate purge value of the gap type 2 fuzzy c-means, two fuzzy values *m1* and *m2* are provided by extracting information from individual data points using a histogram scheme. Most of the clustering in this chapter automatically obtains determination of m1 and *m2* values that depended on existent repeated experiments. Also, in order to increase efficiency on deriving valid fuzzifier value, we introduce the Interval type-2 possibilistic fuzzy C-means (IT2PFCM), as one of advanced fuzzy clustering method to classify a fixed pattern. In Efficient IT2PFCM method, proper fuzzifier values for each data is obtained from an algorithm including histogram analysis and Gaussian Curve Fitting method. Using the extracted information form fuzzifier values, two modified fuzzifier value *m1* and *m2* are determined. These updated fuzzifier values are used to calculated the new membership values. Determining these updated values improve not only the clustering accuracy rate of the measured sensor data, but also can be used without additional procedure such as data labeling. It is also efficient at monitoring numerous sensors, managing and verifying sensor data obtained in real time such as smart cities.

**Keywords:** fuzzifier value determining, sensor data clustering, fuzzy C-means, histogram approach, interval type-2 PFCM

#### **1. Introduction**

In the majority of cases, fuzzy clustering algorithms have been verified to be a better method than hard clustering in dealing with discrimination of similar structures [1], dataset in dimensional spaces [2], and is more useful for unlabeled data with outliers [3]. Fuzzy C-means proved to offer better solutions in machine learning, and image processing than hard clustering such as Ward's clustering and the k mean algorithm [4–9]. Generally, fuzzy c-mean has 66% accuracy while Gustafson-Kessel scored 70% [10]. Fuzzy c-mean is one of the most largely applied and modified techniques in pattern recognition applications [11] even though the sensitivity of fuzzy C-means is counted as a weak point of outcome to the prototypes and also the optimizing process [12–14].

Classification algorithms are generally subject to various sources of uncertainty that should be appropriately managed. Fuzzy clustering can be used with datasets

#### *Fuzzy Systems - Theory and Applications*

where the variables have a high level of overlap. Therefore, membership functions are represented as a fuzzy set which can be either Type-I, Type-II or Intuitionistic.

Data are generated by a possible distribution or collected from various resources; Since Euclidean distance leads to clustering outcomes of spherical shapes, which is suitable for most cases, it is a top choice for many applications, it is the measurement used in most clustering algorithms to decide new centers [15].
