**2. Basic notions**


**Figure 1(a)** the case where a small m value is set in two clusters with different volumes. Because the section with a fuzzy membership value extends to a bulky *C2* cluster, applying it to the *C1* cluster allot a lot of relatively unnecessary patterns. **Figure 1(b)** large *m* value is set. It seems to have good performance since similar membership values are assigned, but the center value of the *C1* cluster tends to move to the *C2* cluster, **Figure 1(c)** Fuzzy area in accordance with Interval type-2 m value. Instead of the fuzzy area according to the value of *m1* and *m2* using the characteristics of the Interval type-2 membership set, uncertainty can be reduced and a proper fuzzy area for the cluster volume can be formed.

As presented above, deciding the lowest and highest boundary range values of the fuzzier value extracted from particular data has been suggested by some methods. The following is about PFCM membership function for deciding the fuzzifier value's range. The membership function at k-th data point for cluster *i* is presented in Eq. (1). *dik/dij* signifies Euclidean distance value between cluster and data point.

#### **Figure 1.**

*Fuzzy area between clusters according to m. (a) the case where a small m value, (b) large m value is set, (c) instance of appropriate fuzzy area using Interval type-2.*

*Data Clustering for Fuzzyfier Value Derivation DOI: http://dx.doi.org/10.5772/intechopen.96385*

$$u\_{ik} = \frac{1}{\sum\_{j=1}^{c} \left(d\_{ik}/d\_{ij}\right)^{2/(m-1)}}\tag{1}$$

The neighbor membership values are computed, employing the membership value presented in Eq. (1) in order to decide the fuzzifier value's range. Summarization with an expression including fuzzifier value indicates Eq. (2). It obtains the lower and upper boundary values of the fuzzy constant which includes the number of clusters as *C* and the fuzzifier value as *m*.

$$|\mathbf{1} + \frac{C - 1}{C} \cdot \frac{2}{\delta} \cdot |\Delta| \le m \le \frac{2 \log d}{\log \left(\frac{\delta}{1 - \delta} \cdot \frac{1}{c - 1}\right)} + 1 \text{ where} \\ \Delta = \frac{d\_i - d\_i^\*}{d\_i^\*} \text{ and } \delta \text{ is threshold} \tag{2}$$

#### **3. Conventional fuzzy clustering algorithm**

#### **3.1 Fuzzy C- means (FCM)**

FCM includes the concept of a fuzzifier m being used to determine the membership value of data X*<sup>k</sup>* in a specific cluster with cluster prototype. Specifically, the equation of FCM is consist of the cluster center vi and the membership value of data *Xk*, representing *k = 1, 2...n* and *i = 1, 2...c*, where n indicates the number of patterns and c indicates the number of clusters. FCM requests the knowledge of the initial number of desired clusters. The membership value is by the relative distance between the pattern X*<sup>k</sup>* and the cluster center *Vi*. However, one of the main weaknesses by using FCM is its noise sensitivity as well as its limited memberships. The weighting exponent *m*; is referred to the being effective on the clustering performance of FCM algorithm [16].

#### **3.2 PCM**

In order to solve problems of FCM method, PCM uses a parameter given by value estimated from the dataset itself. PCM applies the possibilistic approach which obviously means that the membership value of a point in a class represents the typicality of the point in the class. It also means the possibility of data X*<sup>k</sup>* in the class with cluster prototype V*<sup>i</sup>* where *k* = 1, 2...n and *i* = 1, 2...c. Then, the noise points are comparatively less typical, using typicality in PCM algorithm. Furthermore, noise sensitivity is significantly reduced [17, 18]. However, the PCM algorithm also has the problem that the clustering outcome is sensitively reacted according to the initial parameter value [19].

#### **3.3 PFCM**

The PFCM algorithm is a mixture of PCM algorithm and FCM algorithm [20]. Although the representative value limit (or constraint = 1) was mitigated, the heat constraints on the membership value were preserved, so the PFCM algorithm generated both membership and possibility, and solved the noise sensitivity problem as seen in the FCM [21]. The PFCM is based on the fuzzy value m, which determines the membership value, and the PFCM also uses constants to define the relative importance of fuzzy membership and typicality values in the objective function. The PFCM utilizes more parameters to determine the optimal solution for clustering, which increases the degree of freedom and thus controls better results than the above-mentioned study. However, when considering fuzzy sets and other parameters in certain algorithms, we face the potential for fuzzy of these parameters. In this paper, we describe the fuzziness of the fuzzy value m and the possible value of the bandwidth parameter and generate FOU of uncertainty for both considering the fuzzy *m* interval, i.e. the *m1* and *m2* intervals and the fuzzy interval. Existing studies have been implemented to measure the optimal range along the upper and lower bounds of fuzzy values through multiple iterations [22]. This study is ongoing, but the same fuzzy constant range cannot be applied to all data [23].

### **3.4 Type-1 fuzzy set (T1FS)**

Type 1 fuzzy logic was first introduced by Jade (1965). Fuzzy logic systems are based on Type 1 fuzzy sets (T1FS), and have demonstrated their capabilities in many applications, especially for control of complex nonlinear systems that are difficult to model analytically [24, 25]. Since the Type 1 fuzzy logic system (T1FS) uses a clear and accurate type 1 fuzzy set, T1FS can be used to model user behavior under certain conditions. Type 1 fuzzy sets deal with uncertainty using precise membership functions that users think capture uncertainty [26–30]. When the Type 1 membership function is selected, all uncertainties disappear because the Type 1 membership function is completely accurate. The Type 2 fuzzy set concept was presented by Jade as an extension of the general fuzzy set concept., i.e. a type 1 fuzzy set [31]. All fuzzy sets are identified as membership functions. In a type 1 fuzzy set, each element is identified as a two-dimensional membership function. The membership rating for Type 1 fuzzy sets is [0, 1], which is an accurate number. The comparison of membership function and uncertainty extracted from the result of the conventional fuzzy clustering algorithm is shown as below [32].


### **4. Advanced fuzzy clustering algorithm**

Fuzzy c-means (FCM) is an unsupervised form of a clustering algorithm where unlabeled data *X = {x1, x2..., xN}* is grouped together in accordance with their fuzzy membership values [33, 34]. Since, data analysis and computer vision problems, analyzing and dealing the uncertainties are a very important issue, FCM is being widely used in these fields. Several methods of other IT2 approach for pattern recognition algorithms have been successfully reported [35–41]. Type-1 fuzzy sets cannot deal uncertainties therefore; type-2 fuzzy sets were defined to represent the uncertainties associated with type-1 fuzzy sets. As shown in **Figure 2**, the type-reduction process in IT2 FSs requires a relatively large amount of computation as type-2 fuzzy

*Data Clustering for Fuzzyfier Value Derivation DOI: http://dx.doi.org/10.5772/intechopen.96385*

**Figure 2.** *(a) Cluster position uncertainty for T1FCM, (b) 1 T2 FCM, (c) QT2 FCM, (d) GT2 FCM algorithms.*

methods increase the computational complexity due to the numerous combinations of embedded T2 FSs. Methods for reducing the computational complexity have been proposed, such as, the increase in computational complexity of T2 FSs may be less costly for improved performance by applying satisfactory results using T1 FSs. In [42], it was suggested that two Fuzzifier *m* values is used and the centroid type reduction algorithm for center update is incorporated for interval type-2 (IT2) fuzzy approach to FCM clustering. The IT2 FCM was suggested to clear up the complication with FCM for clusters with different number of volumes and patterns. Moreover, it was suggested that miscellaneous uncertainties were linked with clustering algorithms such as FCM and PCM [43]. Motivation of the success IT2 FSs has made on T1 FSs algorithms.

#### **4.1 Type-2 fuzzy set (T2 FS)**

Due to their potential to model various uncertainties, Type-2 fuzzy sets (T2 FSs) have primarily received interest of increased research [44]. Type-2 fuzzy sets are characterized by a three-dimensional fuzzy membership function. The [0, 1] fuzzy set is the membership grade for each element of a type-2 fuzzy set. The extra third dimension provides extra degrees of freedom to get more information about the expressed term. Type-2 fuzzy sets are valuable in situations where it is difficult to resolve the exact membership function of the fuzzy set. This helps to incorporate uncertainty [45].

The computational complexity of the Type-2 fuzzy set is higher than that of the Type 1 fuzzy set. However, the results gained by the Type-2 fuzzy set are much better than those gained by the Type 1 fuzzy set. Therefore, if type-2 fuzzy sets can significantly improve performance (depending on the application), the increased computational complexity of the type-2 fuzzy sets can be an affordable price to pay [46].

#### **4.2 Type-2 FCM (T2-FCM)**

Type-2 FCM (T2-FCM), whose type-2 membership is promptly generated by extending a scalar membership degree to a T1-FS. When limiting the secondary fuzzy set to have a triangular membership function, T2-FCM extends the scalar membership *uij* to a triangular secondary membership function [47, 48].

#### **4.3 General type-2 FCM**

The GT2 FCM algorithm accepts a linguistic description of the fuzzifier value expressed as a set of T1 fuzzy- upper and lower value [49]. The linguistic fuzzifier value is denoted as a T1 fuzzy set of *m*. **Figure 3** is shown as two examples of encoding the linguistic nation of the appropriate Fuzzifier value for the GT2 FCM algorithm using three linguistic terms.

#### **4.4 Interval type 2 fuzzy sets (IT2 FSs)**

In order to model uncertainty associated to a type-1 fuzzy set with an interval type 2 fuzzy set, a membership interval with all secondary grades of the primary memberships equaling to one can represent the primary membership *Jx*<sup>0</sup> of a sample point *x*<sup>0</sup> [18, 50].

**Figure 3(a)** represents an instance of an interval type 2 fuzzy set where the gray shaded region indicates FOU. In the figure, the membership value for a sample *x'* is represented by the interval between upper *μA*<sup>~</sup> *x*<sup>0</sup> ð Þ, and lower *μA*<sup>~</sup> *x*<sup>0</sup> ð Þ membership. Therefore, each *x'* has a primary membership interval as

$$J\_{\mathbf{x'}} = \left[\underline{\mu}\_{\hat{A}}(\mathbf{x'}), \overline{\mu}\_{\hat{A}}(\mathbf{x'})\right] \tag{3}$$

In the **Figure 3(b)** shown as the vertical slice x<sup>0</sup> , where the secondary grade for the primary membership of each *x*<sup>0</sup> equals one, in accordance with the property of interval type-2 fuzzy sets. This interval is defined as the FOU. An interval type 2 fuzzy set A can be expressed as

$$\tilde{A} = \left\{ \left( (\mathbf{x}, u), \mu\_{\tilde{A}}(\mathbf{x}, u) \right) \middle| \forall \mathbf{x} \in A, \forall u \in I\_x \subseteq [0, 1], \left( (\mathbf{x}, u), \mu\_{\tilde{A}}(\mathbf{x}, u) \right) = 1 \right\} \tag{4}$$

#### **Figure 3.**

*Two possible linguistic representation of the Fuzzifier M using T1 fuzzy sets. (a) membership value for a sample x*<sup>0</sup> *(b) vertical slice x*<sup>0</sup> *.*

#### **4.5 Interval type-2 FCM (IT2-FCM)**

In fuzzy clustering algorithms such as FCM, the fuzzy fire value m plays a significant [50] role in determining clustering uncertainty. However, it is generally difficult to properly determine the value of m. IT2-FCM regards fuzzy fire values as intervals [*m1, m2*] and settles two optimization matters [51].

First, an interval type 2 FCM is used to obtain a rough estimate of which data points belong to which cluster.

In Eq. (3) is minimized with respect to *uij* to provide upper and lower membership values.

$$\begin{split} \mathsf{u}\_{\mathrm{j}}(\mathbf{x}\_{\mathrm{i}}) &= \begin{cases} \frac{1}{\sum\_{\mathbf{k}=1}^{c} \left(\mathbf{d}\_{\mathrm{ij}}/\mathbf{d}\_{\mathrm{ik}}\right)^{2/(\mathbf{m}\_{\mathrm{i}}-1)}}, \text{if } 1/\sum\_{\mathbf{k}=1}^{c} \left(\mathbf{d}\_{\mathrm{ij}}/\mathbf{d}\_{\mathrm{ik}}\right) < \frac{1}{\mathsf{c}} \\\\ \frac{1}{\sum\_{\mathbf{k}=1}^{c} \left(\mathbf{d}\_{\mathrm{ij}}/\mathbf{d}\_{\mathrm{ik}}\right)^{2/(\mathbf{m}\_{\mathrm{i}}-1)}}, \text{otherwise} \end{cases} \end{split} \tag{5}$$
 
$$\underline{\mathbf{u}}\_{\mathrm{j}}(\mathbf{x}\_{\mathrm{i}}) = \begin{cases} \frac{1}{\sum\_{\mathbf{k}=1}^{c} \left(\mathbf{d}\_{\mathrm{ij}}/\mathbf{d}\_{\mathrm{ik}}\right)^{2/(\mathbf{m}\_{\mathrm{i}}-1)}}, \text{if } 1/\sum\_{\mathbf{k}=1}^{c} \left(\mathbf{d}\_{\mathrm{ij}}/\mathbf{d}\_{\mathrm{ik}}\right) \ge \frac{1}{\mathsf{c}} \\\\ \frac{1}{\sum\_{\mathbf{k}=1}^{c} \left(\mathbf{d}\_{\mathrm{ij}}/\mathbf{d}\_{\mathrm{ik}}\right)^{2/(\mathbf{m}\_{\mathrm{i}}-1)}}, \text{otherwise} \end{cases} \tag{6}$$

After this cluster prototypes are calculated, then type reduction and then classification is done. Qiu et al. (2014) proposed this complete method of interval type-2 FCM for finding the clusters in each class of the histogram in individual dimensions is acquired with these labeled clusters. This histogram is smoothed by the mean of moving window (using a triangular window in my case). The curve fitting of this smoothed histogram gets the membership function. Histograms with values greater than the membership value are assigned as histograms for higher membership, and histograms for values less than membership value are saved as histograms for lower membership. Curve fitting is carried out severally in the top and bottom histograms to supply the top and bottom member values [52]. This membership value is suggested to estimate the values of fuzzifiers *m1* and *m2*. Fixed-point iteration is a method of expressing the transcendental equation *f(x) = 0* in the form of *x = g(x)* and then solving this expression iteratively for *x* in iterative relationship.

$$\mathbf{x}\_{i+1} = \mathbf{g}(\mathbf{x}\_i), I = \mathbf{0}, \mathbf{1}, \mathbf{2}, \dots \tag{7}$$

where *x0* being some initial guess. Rewriting the equation to express Eq. (5) and (6) in the form of (7) and dropping the upper and lower term,

$$\mu\_j = \frac{1}{\sum\_{k=1}^{c} \left( d\_{ij} / d\_{ik} \right)^{2/(m-1)}} \tag{8}$$

$$\Rightarrow \frac{1}{\mu\_j} = \sum\_{k=1}^{c} \left( d\_{ij} / d\_{ik} \right)^{2/(m-1)}$$

log on both sides, Eq. (8) can be rewritten as

$$\log\left(\frac{1}{u\_j}\right) = \log\left(\sum\_{k=1}^c \left(d\_{\vec{\imath}}/d\_{ik}\right)^{2/(m-1)}\right) \tag{9}$$

$$\therefore \log\left(a+c\right) = \log a + \log\left(1 + \frac{c}{a}\right)$$

Extending this logarithmic identity to the sum of *N* elements,

$$\Rightarrow \log\left(a\_0 + \sum\_{k=1}^{N} a\_k\right) = \log a\_0 + \log\left(1 + \sum\_{k=1}^{N} \frac{a\_k}{a\_0}\right) \tag{10}$$

$$\log\left(\frac{1}{\mathbf{u}\_{\mathrm{j}}}\right) = \frac{2}{(\mathbf{m}-\mathbf{1})}\log\left(\frac{\mathbf{d}\_{\mathrm{j}}}{\mathbf{d}\_{\mathrm{l}\mathrm{j}}}\right) + \log\left(\mathbf{1} + \sum\_{\mathrm{k}=2}^{\mathrm{c}} \left(\frac{\mathbf{d}\_{\mathrm{l}\mathrm{j}}}{\mathbf{d}\_{\mathrm{ik}}}\right)^{2/\left(\mathbf{m}\_{\mathrm{old}}-1\right)}\right) \tag{11}$$

Rearranging Eq. (11) and expressing it in terms of *m*, gives us Eq. (12).

$$\chi = \frac{\log\left(\frac{1}{\mathbf{u}\_{\mathrm{i}}}\right) - \log\left(\mathbf{1} + \sum\_{k=2}^{\mathbf{c}} \left(\frac{\mathbf{d}\_{\mathrm{i}}}{\mathbf{d}\_{\mathrm{k}}}\right)^{2/\left(\mathbf{m}\_{\mathrm{i}\mathrm{i}\mathrm{i}} - 1\right)}\right)}{\log\left(\frac{\mathbf{d}\_{\mathrm{i}}}{\mathbf{d}\_{\mathrm{i}}}\right)}\tag{12}$$

$$m\_{\mathrm{jnew}} = \mathbf{1} + \frac{2}{\chi}$$

So, Eq. (13) gives *m1jnew* and *m2jnew,* where *m1j new* ≥ *m2j new*. Eq. (12) is used to calculate fuzzifier values of each data. In some cases, the value of fuzzifier of particular data shows relatively large variation. Here, upper (*mupper*) and a lower (*mlower*) fuzzifier is necessary, using Eq. (2). If the curtain data point has a fuzzy fire value below the lower bound, the fuzzy fire value is set to the *mlower* bound, and if it exceeds the upper bound, the fuzzy fire value is set to the *mupper* bound. In the end, a mean of these fuzzifiers is taken to get the last fuzzifier values *m1* and *m2*.

#### **4.6 Multiple kernels PFCM algorithm**

Typically, the kernel method uses a spatial conversion function to convert input data from input property space to kernel property space [53]. This is to change the kernel property space to a kernel property space so that it is easy to distinguish between overlapping data and having a nonlinear boundary surface in the input property space. If the data in the input space is *Xi*, *i* ¼ 1, … , *N,* the data converted to the kernel property space through the function is represented by Φ *X <sup>j</sup>* � �, *<sup>j</sup>* <sup>¼</sup> 1 … *N.* Alike as general PFCM, in the case of Kernels-PFCM, the goal is to minimize the following objective function.

$$J^{\phi} = \sum\_{k=1}^{n} \sum\_{i=1}^{c} \left( a u\_{ik}^{m} + b t\_{ik}^{\eta} \right) \times d\_{ij}^{2} + \sum\_{i=1}^{c} \gamma \sum\_{k=1}^{n} \left( \mathbf{1} - t\_{ik} \right)^{\eta} \tag{13}$$

In the input space for kernel *K*, the pattern *xi* and the distance *dij* in the kernel attribute space of cluster prototype *vj* are expressed as Eq. (14) by the kernel function.

$$\begin{aligned} d\_{\vec{\boldsymbol{y}}} &= \left\| \Phi(\boldsymbol{x}\_{j}) - \Phi(\boldsymbol{v}\_{j}) \right\|^{2} \\ &= \Phi(\boldsymbol{x}\_{j})\Phi(\boldsymbol{x}\_{j}) + \Phi(\boldsymbol{v}\_{j})\Phi(\boldsymbol{v}\_{j}) - 2\Phi(\boldsymbol{x}\_{j})\Phi(\boldsymbol{v}\_{j}) \\ &= K(\boldsymbol{x}\_{j}, \boldsymbol{x}\_{j}) + K(\boldsymbol{v}\_{j}, \boldsymbol{v}\_{j}) - 2k(\boldsymbol{x}\_{j}, \boldsymbol{v}\_{j}) \end{aligned} \tag{14}$$

Commonly, the new Gaussian multi-kernel ~ *k* using a Gaussian kernel assumes a multi-kernel with the number of kernels *S*, and the formula is as follows [54].

*Data Clustering for Fuzzyfier Value Derivation DOI: http://dx.doi.org/10.5772/intechopen.96385*

$$\vec{k}^{(\cdot)} = (\mathbf{x}\_{\cdot}, \boldsymbol{\nu}\_{\cdot}) = \sum\_{l=1}^{s} \frac{w\_{il}}{\sigma\_{l}} \frac{\exp\left(-\frac{||\boldsymbol{x}\_{\cdot} - \boldsymbol{v}\_{j}||^{2}}{2\sigma\_{l}^{2}}\right)}{\sum\_{t=1}^{s} \frac{w\_{t}}{\sigma\_{t}}} \tag{15}$$

From [55] way, using e FCM-MK, normalized kernel is defined to recognize weights by cluster prototypes, resolution and membership values. Using this optimization way, following PFCM objective equation should be minimized. By minimizing the objective function, cluster prototype *vi*, resolution-specific weight *wil,* and membership value *uij* are defined.

$$f\_{m, \eta}(U, T, V; X) = 2 \sum\_{k=1}^{n} \sum\_{i=1}^{c} (au\_{ik}^m + bt\_{ik}^{\eta} \times \\\\\left(1 - \sum\_{l=1}^{s} \frac{w\_{il}}{\sigma^2} \exp\left(-\frac{||x\_j - v\_i||^2}{2\sigma\_l^2}\right) \times \frac{1}{\sum\_{l=1}^{s} \frac{w}{\sigma\_l}}\right)$$

$$+ \sum\_{i=1}^{c} \gamma\_i \sum\_{k=1}^{n} ((1 - t\_{ik})^{\eta})$$

Here, *ρ* is a gradient descent way to learn rate parameter. Finally, using type reduction and hard partitioning, clustering is performed as described in the Interval Type-2 PFCM [56].

#### **4.7 Interval type-2 fuzzy c-regression clustering**

Let the regression function be represented by Eq. (17)

$$\mathbf{y}\_{i} = f^{x}(\mathbf{x}\_{i}, a\_{j}) = a\_{1}^{x}\mathbf{x}\_{1i} + a\_{2}^{x}\mathbf{x}\_{2i} + \cdots + a\_{M}^{x}\mathbf{x}\_{Mi} + b\_{0}^{x} \tag{17}$$

where, *xi* = [*x1i,x2i,*. *..,xMi*] represents points of data, the number of data indicates <sup>i</sup> = 1,.. .,n, the number of clusters (or rules) indicates j = 1,.. ., c, the number of variables in each regression indicates q = 1,.. .,M and the number of regression functions indicates z = 1,.. ., r. By *aj*, regression coefficients are denoted. We use weighted least square method (WLS) for calculating regression coefficients *aj*, In this way, membership grades of partition matrix *P* are worked for weights. In Eq. (18), Xi is a data point matrix with inputs, y is a data point matrix with outputs.

$$\mathbf{x}\_{i} = \begin{bmatrix} \mathbf{x}\_{1,i} \\ \mathbf{x}\_{2,i} \\ \vdots \\ \vdots \\ \mathbf{x}\_{M,i} \end{bmatrix}^{T}, \mathbf{y} = \begin{bmatrix} \mathbf{y}\_{1,} \\ \mathbf{y}\_{2} \\ \vdots \\ \mathbf{y}\_{M} \end{bmatrix}^{T}, \mathbf{P}\_{j} = \begin{bmatrix} \boldsymbol{u}\_{j}(\mathbf{x}\_{1}) \ \mathbf{0} & \dots & \mathbf{0} \\ \mathbf{0} & \boldsymbol{u}\_{j}(\mathbf{x}\_{1}) & \dots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} & \dots & \boldsymbol{u}\_{j}(\mathbf{x}\_{1}) \end{bmatrix} \tag{18}$$

$$\boldsymbol{a}\_{j} = \left[\mathbf{X}^{\mathbf{f}} \boldsymbol{P}\_{j}\mathbf{X}\right]^{-1} \mathbf{X}^{T} \boldsymbol{P}\_{j} \boldsymbol{y}$$

The partition matrix *P* is acquired through Gaussian mixture distribution which is the first stage for computing regression coefficients. We consider two fuzzifiers or weighting exponent *m1* and *m2* for indicating the problem into IT2F. However, there is a difference that this model is FCM although our model is FCRM. These two fuzzy fires divide the objective function into two separate functions. The aim is to minimize the total error from Eq. (19) shows these two objective functions. It should be mentioned that the following proof is an extended and modified version of'type-1, which has been presented in [57].

$$\begin{cases} J\_{m\_1}(U, \boldsymbol{\nu}) = \sum\_{i=1}^{n} \sum\_{j=1}^{C} u\_j(\boldsymbol{\kappa}\_i)^{m\_1} \boldsymbol{E}\_{ji}(\boldsymbol{\alpha}\_j) \\\\ J\_{m\_2}(U, \boldsymbol{\nu}) = \sum\_{i=1}^{n} \sum\_{j=1}^{C} u\_j(\boldsymbol{\kappa}\_i)^{m\_2} \boldsymbol{E}\_{ji}(\boldsymbol{\alpha}\_j) \end{cases} \tag{19}$$

Where type-1 FCRM, *Eji* is the total error, which indicates the distance between actual output and estimated regression equation, and it is presented by Eq. (20).

$$E\_{ji}(a\_j) = \left(\mathcal{y}\_i - f\_j(\mathbf{x}\_i, a\_j)\right)^2 \tag{20}$$

Eq. (21) represents the Lagrangian of the objective functions of IT2 FCRM model. We expend the type-1 NFCRM algorithm to interval type-2 NFCRM.

$$\begin{cases} L\_1(\boldsymbol{\lambda}\_1, \boldsymbol{u}\_j) = \sum\_{i=1}^n \sum\_{j=1}^C u\_j(\mathbf{x}\_i)^{m\_1} E\_{ji}(\boldsymbol{a}\_j) - \lambda\_1 \left(\sum\_{j=1}^c u\_j - \mathbf{1}\right) \\\\ L\_2(\boldsymbol{\lambda}\_2, \boldsymbol{u}\_j) = \sum\_{i=1}^n \sum\_{j=1}^C u\_j(\mathbf{x}\_i)^{m\_2} E\_{ji}(\boldsymbol{a}\_j) - \lambda\_2 \left(\sum\_{j=1}^c u\_j - \mathbf{1}\right) \end{cases} \tag{21}$$

The partial derivatives with respect to *uj* of Eq. (21) are set to 0 in Eq. (22) and (23) for minimizing the objective function.

$$\begin{cases} \frac{\partial L\_1}{\partial u\_1(\mathbf{x}\_i)} = m\_1 u\_1(\mathbf{x}\_i)^{m\_{1-1}} E\_{1i}(a\_1) - \lambda\_1 = \mathbf{0} \\\\ \vdots \\\\ \frac{\partial L\_1}{\partial u\_C(\mathbf{x}\_i)} = m\_1 u\_C(\mathbf{x}\_i)^{m\_{1-1}} E\_{Ci}(a\_C) - \lambda\_1 = \mathbf{0} \end{cases} \tag{22}$$
 
$$\left\{ \begin{array}{l} \frac{\partial L\_2}{\partial u\_1(\mathbf{x}\_i)} = m\_2 u\_1(\mathbf{x}\_i)^{m\_{2-1}} E\_{1i}(a\_1) - \lambda\_2 = \mathbf{0} \\\\ \vdots \end{array} \right\} \tag{23}$$

$$\left\{ \begin{array}{l} \vdots\\ \frac{\partial L\_2}{\partial u\_C(\mathbf{x}\_i)} = m\_2 u\_C(\mathbf{x}\_i)^{m\_{1-1}} E\_{Ci}(a\_C) - \lambda\_2 = \mathbf{0} \end{array} \right\} \tag{23}$$

Next, the partial derivatives with respect to *k1* and *k2* are performed.

$$\frac{\partial L\_1}{\partial \lambda\_1} = -\left(\sum\_{j=1}^c u\_j(\mathbf{x}\_i) - \mathbf{1}\right) = \mathbf{0} \tag{24}$$

*Data Clustering for Fuzzyfier Value Derivation DOI: http://dx.doi.org/10.5772/intechopen.96385*

To adapt KPCM to IT2 KPCM, three steps are included. In other words, we update the prototype location via initialization, two different fuzzy devices, high and low membership or typicality value calculation, format reduction, and defuzzing for data patterns. In the way we propose, by using IT2FS, our point lies in the development of a prototype update process that can solve the cluster matching problem caused by KPCM. Cluster matching usually results in a set of patterns containing clusters that are relatively close to each other. This allows by definition a type 1 fuzzy set to obtain a type reduction via an embedded fuzzy set, but a typereduced fuzzy set can be obtained by a combination of central intervals estimated from the embedded fuzzy set. This approach is a standard method for obtaining reduced fuzzy set types from IT2FS. However, this approach avoids due to its huge computational requirements, which include a number of embedded fuzzy sets. Therefore, we consider the KM algorithm as an alternative type reduction method. Since KM is an iterative algorithm which estimates both ends of an interval, calculating the left (right) interval *vL* (*vR*) can be found without using all of the embedded fuzzy sets.

Form KERNELS SFCM ALGORITHM in **Figure 4**, The kernel distance,

$$\left\|\left\|\Phi(\mathbf{x}\_{k}) - \boldsymbol{\nu}\_{i}\right\|\right\|^{2} \tag{25}$$

can be derived using the kernel way as

$$\left\|\left|\Phi(\mathbf{x}\_{k})-v\_{i}\right\|\right\|^{2} = K(\mathbf{x}\_{k},\mathbf{x}\_{k}) - 2\frac{\sum\_{j=1}^{N}\mathsf{u}\_{\vec{\eta}}^{m}K(\mathbf{x}\_{k},\mathbf{x}\_{j})}{\sum\_{j=1}^{N}\mathsf{u}\_{\vec{\eta}}^{m}} + \frac{\sum\_{j=1}^{N}\sum\_{l=1}^{N}\mathsf{u}\_{\vec{\eta}}^{m}\mathsf{u}\_{\vec{\eta}}^{m}K(\mathbf{x}\_{j},\mathbf{x}\_{l})}{\left(\sum\_{j=1}^{N}\mathsf{u}\_{\vec{\eta}}^{m}\right)^{2}}\tag{26}$$

The inverse mapping of prototypes is also needed to approximate the prototypes expressions *vi* in the feature space. The objective equation can be written as

$$V(\hat{v}\_i, v\_i) = \sum\_{i=1}^{C} ||\Phi(\hat{v}\_i) - v\_i|| = \sum\_{i=1}^{C} \left(\Phi(\hat{v}\_i)^T \Phi(\hat{v}\_i) - 2\Phi(\hat{v}\_i)v\_i + v\_i^T v\_i\right) \tag{27}$$

#### **Figure 4.**

*FOU representation for our proposed IT2 KPCM approach with* m1 *= 2,* m2 *= 5 and variance = 0.5; (a) FOU of cluster 1 (b) FOU of cluster 2 [58].*

While, the final location for ^*vi* in the KPCM algorithm becomes,

$$\hat{\boldsymbol{w}}\_{i} = \frac{\sum\_{k=1}^{N} \boldsymbol{u}\_{ik}^{m} \boldsymbol{K}(\boldsymbol{\omega}\_{k}, \hat{\boldsymbol{v}}\_{i}) \boldsymbol{\omega}\_{k}}{\sum\_{k=1}^{N} \boldsymbol{u}\_{ik}^{m} \boldsymbol{K}(\boldsymbol{\omega}\_{k}, \hat{\boldsymbol{v}}\_{i})} \tag{28}$$

The left (right) interval of the centroids can be found by employing the KM algorithm on the ascending order of a pattern set and its associated interval memberships. The result of the KM algorithm can be expressed as,

$$\boldsymbol{v}\_{i} = \mathbf{1}.\mathbf{0}/[\boldsymbol{v}\_{L}\boldsymbol{v}\_{R}] \tag{29}$$

While the procedure to calculate the left value of interval set *vL* and right value *vR*, defuzzification is used next to calculate the crisp centers and is defined as the midpoint between *vL*, *vR*. We can now compute the defuzzified output that is a crisp value of the prototypes by using the expression.

$$v\_i = \frac{\sum\_{v \in f\_{Y\_i}} (u(v))v}{\sum\_{v \in f\_{Y\_i}} (u(v))} = \frac{v\_L + v\_R}{2} \tag{30}$$

Hard partitioning is used to classify test patterns using the resulting prototype of the procedure above. Euclidian distance is now used to hard partition patterns because the prototype is in feature space. The pattern is assigned to a cluster prototype with a minimum Euclidean distance. Experimental results presented in the following sections will demonstrate the validity of the proposed IT2 approach to KPCM clustering.

#### **4.8 Interval type-2 possibilistic fuzzy C-means (IT2PFCM)**

In order to solve the uncertainty existing in the fuzzifier value *m* in the general PFCM algorithm, Multiple Kernels PFCM algorithm should be extended to the Interval Type-2 fuzzy set. If there are *N* data, *W* set of resolution-specific weight, *U* partition matric, *C* clusters, *V* set of cluster prototype and *S* kernels, the cluster prototype can be obtained from minimizing the Gaussian kernel objective function as follows.

$$
\omega\_{il}^{(new)} = \omega\_{il}^{(old)} - \rho \frac{\partial l}{\partial w\_{il}} \tag{31}
$$

$$d\_{ij}^2 = \left( 2 - 2 \sum\_{i=1}^{S} \frac{w\_{il}}{\sigma\_l} \frac{\exp\left(-\frac{||x\_j - v\_j||}{2\sigma\_l^2}\right)}{\sum\_{t=1}^{s} \frac{w}{\sigma\_t}} \right) \tag{32}$$

Where,

$$w\_i = \left( 2 - 2 \sum\_{i=1}^{S} \frac{w\_{il}}{\sigma\_l} \frac{\exp\left(-\frac{\left\|{x\_j - v\_j}\right\|^2}{2\sigma\_l^2}\right)}{\sum\_{t=1}^{s} \frac{w}{\sigma\_t}}\right) \tag{33}$$

The cluster prototype is calculated to optimize the objective function for the center *vi* of each cluster [23].

*Data Clustering for Fuzzyfier Value Derivation DOI: http://dx.doi.org/10.5772/intechopen.96385*

Where,

$$\overline{K}^{(i)}(\mathbf{x}\_{j}, v\_{i}) = \left(\sum\_{i=1}^{S} \frac{w\_{jl}}{\sigma\_{l}^{3}} \frac{\exp\left(\left||\mathbf{x}\_{j} - v\right||^{2}\right)}{\sum\_{t=1}^{S} \frac{w}{\sigma\_{t}}}\right) \tag{34}$$

optimized membership value- the smallest membership value and the largest membership value for each pattern using the Interval Type-2 fuzzy set- is used for calculating the crisp value *vi*. In order to compute *vR* and *vL*, determination of the upper or lower bound of fuzzifier is essential. It is organized as follows by given Eq. (38) [59] .

$$J(U, V, W) = 2\sum\_{i=1}^{C} \sum\_{j=1}^{N} u\_{ij}^{m} d\_{ij}^{2} \tag{35}$$

Using the final *vR* and *vL*, the crisp center value is obtained from defuzzification as follows.

$$\begin{aligned} \text{For } v\_{\mathbb{R}},\\ \text{if } (v(i < k)) \text{ then } u\_{\vec{\imath}\vec{\jmath}} = \overline{u}\_{\vec{\imath}\vec{\jmath}}\\ \text{else } u\_{\vec{\imath}\vec{\jmath}} = \underline{u}\_{\vec{\imath}\vec{\jmath}} \end{aligned} \tag{36}$$

Using the cluster Prototype *vi*, obtained through the optimization function and the membership value *uij*, the resolution-specific weight value *wil* is re-obtain as follows.

$$\frac{\partial \mathcal{J}}{\partial w\_{il}} = -2 \sum\_{i=1}^{N} \frac{u\_{ij}^{m}}{\sum\_{t=1}^{S} \frac{w\_{t}}{\sigma\_{t}}} \left( K\left(\boldsymbol{\varkappa}\_{j}, \boldsymbol{\upsilon}\_{i} - \overline{K}^{(i)}\left(\boldsymbol{\varkappa}\_{i}, \boldsymbol{\upsilon}\_{j}\right) \right) \right) \tag{37}$$

Where

$$v\_{iR} = \frac{\sum\_{j=1}^{N} \boldsymbol{\mu}\_{ij}^{m} \overline{\boldsymbol{K}}^{(i)}(\boldsymbol{x}\_{\boldsymbol{j},\boldsymbol{v}\_{i}} \boldsymbol{v}\_{i}) \boldsymbol{x}\_{\boldsymbol{j}}}{\sum\_{j=1}^{N} \boldsymbol{\mu}\_{ij}^{m} \overline{\boldsymbol{K}}^{(i)}(\boldsymbol{x}\_{\boldsymbol{j}}, \boldsymbol{v}\_{i})} \tag{38}$$

To define the Interval Type-2 fuzzy set and calculate uncertainty for membership, the input data, the primary fuzzy set, is needed to assign into the Interval Type-2 fuzzy set. Eventually, the upper and lower membership function are created from the primary membership functions.

After calculating the upper and lower membership for each cluster, we need to update the new center values. The membership is obtained from the Type-2 fuzzy set, however, the center value is a crisp value, the value cannot be calculated from the above method. Therefore, in order to compute the center value, type reduction is performed by the Type-1 fuzzy set. In addition, defuzzification is accomplished to change the value of Type-1 to a crisp value.

#### **5. Heuristic method: histogram analysis**

The goal of heuristic method is to extract information from data, and then adaptively calculates the fuzzifier value. In this approach, some heuristic type- 1 membership function is used appropriately for given dataset. The parameters are defined as the upper and lower membership is decided according to following rules. First, given that the membership values are determined, the IT2 PFCM algorithm

**Figure 5.**

*FOU obtained for individual class and dimension updated fuzzifier value m1 and m2 are obtained (a) class 1 dimension 1, and (b) class 2 dimension 1.*

calculates roughly in which cluster the data belongs to and then secure a histogram based on the classified clusters. The histogram from IT2 PFCM tends to be gentler and smoother through the membership function by curve fitting of the same histogram. Curve fitting is enforced separately on upper and lower histograms to obtain upper and lower membership values. In order to reach to the IT2 FS, determination of FOU is necessary, which is generally the set of membership values of the T2 FS. Given that, the greater values of the histogram than the membership value are allocated as the highest membership histogram while the opposite case is calculated. **Figure 5** shows histograms and FOU determined by classification and dimensional calculation. To find *X*, satisfying *f (X) = 0*, it can be expressed as *X = g(X)* using fixed-point iteration, where *X* is,

$$X\_{i+1} = \mathbf{g}(X), \ i = \mathbf{0}, \mathbf{1}, \ \dots, N \tag{39}$$

Eq. (7) and (8) of the membership function *ui* can be shown in the form of Eq. (38) as follows.

$$u\_i = \frac{1}{\sum \left(\frac{d\_{ik}}{d\_{\bar{q}}}\right)^{\frac{2}{m-1}}} \tag{40}$$

Where fuzzifier value *m* is a value that determines the degree of final clustering fuzzifier as the value of the fuzzy parameter. This value of *m1* and *m2* is then applied into the algorithm for calculate updated clusters and this routine is repeated repeatedly. The detailed algorithm is as follows:

1.Set the initial fuzzifier value of *m1* and *m2*.

2.Apply *m1* and *m2* to interval type-2 FCM and obtain the membership of data.

3.Generate a histogram of each cluster from the membership.

4.Curve fit the histogram to get primary memberships.


10.The algorithm is iteratively performed using updated *m1* and *m2*.

The Upper Membership Function (UMF) Histogram and Lower Membership Function (LMF) Histogram are drawn in **Figure 5**. A new membership function obtained from the Gaussian Curve Fitting (GF-F) method as.

From simply log process on both sides in Eq. (39), Eq. (40) can be expressed as follows:

$$\log\left(\frac{1}{u\_1}\right) = \frac{2}{m-1}\log\left(\frac{d\_{ki}}{d\_{1i}}\right) + \log\left(1 + \sum\_{j=2}^{\varepsilon} \left(\frac{d\_{ki}}{d\_{ji}}\right)^{\frac{2}{m\_{di}-1}}\right). \tag{41}$$

Rearranging Eq. (40) and calculate it in terms of *m*, gives us Eq. (41), (42).

$$\chi = \frac{\log\left(\frac{1}{u\_j}\right) - \log\left(1 + \sum\_{k=2}^{C} \left(\frac{d\_{\vec{v}}}{d\_{ik}}\right)^{2/m\_{old}-1}\right)}{\log\left(\frac{d\_{\vec{v}}}{d\_{ik}}\right)}\tag{42}$$

$$m\_{jnew} = \mathbf{1} + \frac{2}{\mathbf{y}}\tag{43}$$

As in the above process, the membership value *ui* ∈ {*ui*(*Xk*)} and *mjnew* is used as a function to get the *ui*. Where Eq. (9) is applied to each clustered data and updated, *m1inew* and *m2inew* values is easily calculated, averaging the fuzzifier value by Eq. (42), the new fuzzifier value *m1* and *m2* are finally calculated as follow

$$m\_1 = \left(\sum\_{i=1}^N m\_{1i}\right) / N, m\_2 = \left(\sum\_{i=1}^N m\_{2i}\right) / N \tag{44}$$
