**3.2 Fuzzy based approaches**

226 Fuzzy Inference System – Theory and Applications

A popular undersampling method is the Condensed Nearest Neighbour (CNN) rule (Hart, 1968). CNN is used in order to find a consistent subset of samples. A subset �� is defined consistent with *S* if, using a one nearest neighbour, �� correctly classifies the instances in *S*. Fawcett and Provost (Fawcet and provost, 1997) propose an algorithm to extract a subset �� from *S* and using the approach as an undersampling method. Firstly one example belonging to the majority class is randomly extracted and put with all examples belonging to the minority class in ��. Then a 1-NN over the examples in �� is used in order to classify the examples belonging to *S*. If an example in *S* is misclassified, it is moved to ��. The main aim of this method is to delete the examples belonging to the majority class which are distant

Another undersampling approach is the so-called *Tomek links* (Tomek, 1976). This method can be defined as follow: let us suppose that *xi* and *xj* are two examples belonging to different classes and *d(xi, xj)* is their distance. A pair *(xi, xj)* is called a Tomek link if there is not an example *xk* such that *d(xi, xl)< d(xi, xj)* or *d(xj, xl)< d(xi, xj).* If two examples are a Tomek link then either one of these is noise or both are borderline. If Tomek Link is used as underline sampling, only samples belonging to the majority class are removed. Kubat and Matwin (Kubat & Matwin, 1997) propose a method, called One-Side Selection (OSS) which uses both Tomek Link and CNN. Tomek Link is used as undersampling technique removing noisy and borderline samples belonging to the majority class. Borderline samples are considered as *unsafe* since noise can make them fall on the wrong side of the decision border. CNN is used to delete samples belonging to the majority class which are distant from the decision border. The remainder samples including *safe* samples of majority class

Internal methods deal with variations of a learning algorithm in order to make it less

Boosting is a method used to improve the accuracy of weak classifiers. The most famous boosting algorithm is the so-called AdaBoost (Freund & Schapire, 1997). It is based on the fusion of a set of weak learners, i.e. classifiers which have better performance than random classifiers in a classification task. During the learning phase weak learners are trained and included in the strong learner. The contribution of the added learners is weighted on the basis of their performance. At the end all modified learners contribute to classify unlabelled samples. This approach is suitable to deal with imbalanced dataset because the samples, belonging to the minority class, are most likely to be misclassified and also have higher weights during iterations. In literature several approaches using boosting techniques for imbalanced dataset has been proposed (Guo & Viktor, 2004; Leskovec & Shawe-Taylor,

Another effective approach is the cost-sensitive learning. In this approach cost is associated with misclassifying samples; the cost matrix is a numerical representation of the penalty of classifying samples from a class to another. A correct classification has not penalty and the

Two common methods Boosting and Cost-Sensitive learning are used in this area.

and all samples belonging to the minority class, are used for learning.

2003) and results confirm the effectiveness of the method.

**3.1.1.2 Undersampling techniques** 

from the decision border.

**3.1.2 Internal methods** 

sensitive for the class imbalance.

In classification task with imbalanced dataset SVMs are widely used (Baser et al., 1992). In (Akbani et al., 2004; Japkowicz & Shayu, 2002) the capabilities of SVM and their effect on imbalance have been widely discussed. SVM is a widely used machine learning method which has been applied to many real world problems providing satisfactory results. SVM works effectively with balanced dataset but provides suboptimal classification models considering the imbalanced dataset; several examples demonstrate this conclusion (Veropoulus et al., 1999; Akbani et al., 2004; Wu & Chang, 2003; Wu & Chang, 2005; Raskutti & Kowalczyk, 2004; Imam et al., 2006; Zou et al., 2008; Lin et al., 2009; Kang & Cho, 2006; Liu et al., 2006; Haibo & Garcia, 2009). SVM is biased toward the majority class and provides poor results concerning the minority class.

A limit of the SVM approach is that it is sensitive to outliers and noise by considering all the training samples uniformly. In order to overcome this problem a Fuzzy SVM (FSVM) has been proposed (Lin & Wang, 2002) which is a variant of the traditional SVM algorithm. FSVM associates different fuzzy membership values (called weights) for different training samples in order to assign their importance degree of its class. Subsequently the proposed approach includes these weights inner the SVM learning algorithm in order to reduce the effect of outliers or noise when finding the separating hyperplane.

An extension of this approach is due to Wang et al. (Wang et al.,2005). They introduced two membership values for each training sample defining the membership degree of positive and negative classes. This approach has been proposed again by Hao et al. (Hao et al., 2007) based on the notion of *vague set.*

Spyrou et al. (Spyrou et al., 2005) propose another kind of fuzzy SVM approach which uses a particular kernel function built from fuzzy basis functions. There are also other works which combine fuzzy theory with SVM assigning a membership value to the outputs of the algorithm. For example Xie et al. (Xie et al., 2005) define a membership degree for the output class through the decision value generated by SVM algorithm, while Inoue and Abe (Inoue & Abe, 2001) use the fuzzy output decision for multiclass classification. Finally Mill and Inoue (Mill & Inoue, 2003) propose an approach which generates the fuzzy membership values for the output classes through the strengths support vectors.

Fuzzy Inference System for Data Processing in Industrial Applications 229

where *k+* and *k-* are values which reflect the class imbalance such that *k+ > k-* and f(*xi* ) is

Samples near to the class centre are considered important samples because containing more information and also their *f(xi)* value is high. In contrast, samples which lie far away from centres are treated as outliers or noise and their *f(xi)* value is low. In FSVM-CIL approach authors use two separate decaying functions of distance to define *f(xi):* a linearly decaying

where *α* is a small positive value which is introduced in order to avoid that *flin(xi)* could

where *β*, which can assume values inner the range [0;1], determines the steepness decay and

The performance measure used in this approach is the geometric mean of sensitivity *GM* 

where *SE* and *SP* are the proportions of positive and negative samples among the correctly

Three datasets (Blanke & Merz, 1988) have been exploited in order to demonstrate the effectiveness of the FSVM-CIL approach. Then a comparison between this method and SVM and FSVM has been evaluated considering both linear and exponential decaying function. Table 4 shows a description of the used datasets while Tab.5 illustrates the obtained results

**Ecoli** 77 259 0.297 **Pima Indians** 268 500 0.536 **Page Blocks** 115 5358 0.021 Table 4. Datasets descriptions. IR represents the imbalance ratio, i.e. the ratio between the

As already mentioned, SVM classifier favours the majority class over the minority class obtaining an high value of SP and a low value of SE. This result confirms that SVM classifier is sensitive to the imbalance problem. Concerning FSVM results show that the best FSVM setting depends on the considered dataset. For Ecoli and Pima databases the choice of the type of decaying function is irrelevant while treating with page-blocks dataset the exponential decaying function provides better results than the linear one. Finally table shows that concerning FSVM-CIL approach, the use of exponential decaying function

provides the best results independently from the considered dataset.

**# Positive Samples # Negative Samples IR** 

*-* =f(*xi*

*flin(xi) = 1-[di/(max{di}+α)]* (16)

*fexp(xi) = 2/(1+exp{βdi})* (17)

�� � �√SE ∗ SP (18)

*-* )\**k* (15)

*<sup>+</sup>* )\**k+ ki*

*<sup>+</sup>* =f(*xi*

defined considering the distance between *xi* and its class centre *di.* 

function *flin(xi)* and *fexp(xi).* The two functions are defined as follows:

*di* is the Euclidean distance between *xi* and its own centre.

and it is represented in equation (18).

classified ones, respectively.

for the several approaches tested.

positive and the negative class.

 *ki*

assume null value.

Another typology of FSVM regards the extraction of fuzzy rules from the trained SVM model and a lot of works were been proposed (Chiang & Hao, 2004; Chen & Wang, 2003; Chaves et al., 2005; Castro et al., 2007).

The above approaches demonstrate that FSVM outperforms the traditional SVM algorithm avoiding the frequent problem of outliers and noise. However the FSVM technique, such as the SVM method, can be sensitive to the class imbalance problem. Batuwita and Palade (Batuwita & Palade, 2010) propose a novel method which uses the FSVM for class imbalance leraning (CIL). This approach, that is called FSVM-CIL, is able to classify with a satisfactory accuracy solving both the problems of class imbalance and outlier/noise. FSVM-CIL is been improved by Lakshmanan et al. (Lashmanan et al., 2011) extending the approach to multi-class classification problem instead of binary classification.

The general implementation scheme of the proposed approach is represented in figure 2.

Fig. 2. Implementation diagram

FSVM-CIL method assigns a membership value for training examples in order to suppress the effect of class imbalance and to reflect the within-class importance of different training samples suppressing the effect of outliers and noise. Let us suppose that *ki <sup>+</sup>* and *ki -* are the membership values of the positive class sample *xi <sup>+</sup>* and negative class sample *xi -* in their own class respectively. They are calculated as follows:

Another typology of FSVM regards the extraction of fuzzy rules from the trained SVM model and a lot of works were been proposed (Chiang & Hao, 2004; Chen & Wang, 2003;

The above approaches demonstrate that FSVM outperforms the traditional SVM algorithm avoiding the frequent problem of outliers and noise. However the FSVM technique, such as the SVM method, can be sensitive to the class imbalance problem. Batuwita and Palade (Batuwita & Palade, 2010) propose a novel method which uses the FSVM for class imbalance leraning (CIL). This approach, that is called FSVM-CIL, is able to classify with a satisfactory accuracy solving both the problems of class imbalance and outlier/noise. FSVM-CIL is been improved by Lakshmanan et al. (Lashmanan et al., 2011) extending the approach to

The general implementation scheme of the proposed approach is represented in figure 2.

FSVM-CIL method assigns a membership value for training examples in order to suppress the effect of class imbalance and to reflect the within-class importance of different training

*<sup>+</sup>* and *ki*

*<sup>+</sup>* and negative class sample *xi*

*-* are the

*-* in their

samples suppressing the effect of outliers and noise. Let us suppose that *ki*

multi-class classification problem instead of binary classification.

Chaves et al., 2005; Castro et al., 2007).

Fig. 2. Implementation diagram

membership values of the positive class sample *xi*

own class respectively. They are calculated as follows:

$$k\_{i}^{\*}\ \mathsf{=f}(\mathsf{x}\_{i}^{\*}\ )^{\*}k^{\*}\qquad\qquad k\_{i}^{\*}\ \mathsf{=f}(\mathsf{x}\_{i}^{\*}\ )^{\*}k\tag{15}$$

where *k+* and *k-* are values which reflect the class imbalance such that *k+ > k-* and f(*xi* ) is defined considering the distance between *xi* and its class centre *di.* 

Samples near to the class centre are considered important samples because containing more information and also their *f(xi)* value is high. In contrast, samples which lie far away from centres are treated as outliers or noise and their *f(xi)* value is low. In FSVM-CIL approach authors use two separate decaying functions of distance to define *f(xi):* a linearly decaying function *flin(xi)* and *fexp(xi).* The two functions are defined as follows:

$$f\_{\rm lin}(\mathbf{x}\_i) = \mathbf{1}\_{\rm \text{-}} \{ d\boldsymbol{\upmu}(\max\{d\_i\} \mathbf{+} a) \}\tag{16}$$

where *α* is a small positive value which is introduced in order to avoid that *flin(xi)* could assume null value.

$$f\_{\exp}(\mathbf{x}\_i) = \mathcal{Z}/(1 + \exp\{\beta d\_i\})\tag{17}$$

where *β*, which can assume values inner the range [0;1], determines the steepness decay and *di* is the Euclidean distance between *xi* and its own centre.

The performance measure used in this approach is the geometric mean of sensitivity *GM*  and it is represented in equation (18).

$$GM = \sqrt{\text{SE} \ast \text{SP}} \tag{18}$$

where *SE* and *SP* are the proportions of positive and negative samples among the correctly classified ones, respectively.

Three datasets (Blanke & Merz, 1988) have been exploited in order to demonstrate the effectiveness of the FSVM-CIL approach. Then a comparison between this method and SVM and FSVM has been evaluated considering both linear and exponential decaying function. Table 4 shows a description of the used datasets while Tab.5 illustrates the obtained results for the several approaches tested.


Table 4. Datasets descriptions. IR represents the imbalance ratio, i.e. the ratio between the positive and the negative class.

As already mentioned, SVM classifier favours the majority class over the minority class obtaining an high value of SP and a low value of SE. This result confirms that SVM classifier is sensitive to the imbalance problem. Concerning FSVM results show that the best FSVM setting depends on the considered dataset. For Ecoli and Pima databases the choice of the type of decaying function is irrelevant while treating with page-blocks dataset the exponential decaying function provides better results than the linear one. Finally table shows that concerning FSVM-CIL approach, the use of exponential decaying function provides the best results independently from the considered dataset.

Fuzzy Inference System for Data Processing in Industrial Applications 231

In order to demonstrate the effectiveness of the proposed approach, authors considered

The proposed method has been divided into three parts: an analysis of the use of preprocessing for imbalanced problems (such as SMOTE, random oversampling, random under-sampling ...), a study of the effect of the FRM and an analysis of the influence of the granularity applied to the linguistic partitions in combination with the inference method.

Results show that in all considered cases the presence of the pre-processing phase improves

The main conclusion of the proposed method is that the FRCM algorithm outperforms the other analyzed methods obtaining the best results adding a re-sampling operation before use the FRCM technique; moreover authors have found that FRBCSs perform well again the

An alternative to imbalance problem is the use of Complementary Learning Fuzzy Neural Network (CLFNN) which is proposed in (Tan et al., 2007). The use of fuzzy logic allows to tolerate uncertainty in the data reducing the effect of data imbalance. CLFNN has the main advantage that does not requires data pre-processing and hence, does not make any a prior assumption on data and does not alter the data distribution. This method exploits a neurofuzzy system which is based on complementary learning theory (Tan et al., 2005). Complementary learning is a system observed in human brain; with this theory different brain areas, which are segregated and mutually exclusive, are registered in order to recognize different objects (Gauthier, 2000). When an object is seen, registered areas are activated while the irrelevant areas are inhibited; this mechanism is called lateral inhibition.

According to this concepts, the different approaches that are present in the literature can be

1. Positive/negative systems, where the system builds information on the basis only of the

CLFNN is considered a 9-tuple *(X, Y, D, A, R, B, l, s, p)*. The definition of each element is as


2. Neutral learning systems, where the notion of positive and negative does not exist. 3. Complementary learning systems, where the system creates knowledge on the basis of positive and negative classes considering the relation between positive and negative

C4.5 decision tree in the context of very high imbalanced datasets.

Generally complementary learning has the following characteristics:

 Features extraction of positive and negative examples. Separation of positive and negative information.

Development of lateral inhibition.

divided into three groups:

target class.

samples.

*x2, ..., xI).*

follows:

several datasets belonging to the UCI repository with different degrees of imbalance.

Compute the certainty degree.

the behaviour of the learning algorithm.

 Creating a rule for each example in this manner: the antecedent is determined by the selected Fuzzy region with the label of class of the sample in the subsequent.


Table 5. Classification results and comparison between several methods

Jesus et al. (Jesus et al., 2006) propose the study of the performance of Fuzzy Rule Based Classification System (FRBCS) in imbalanced datasets (Chi et al., 1996). The authors analyze the synergy of the linguistic FRBCS with some pre-processing techniques using several approaches such as undersampling, oversampling or hybrid models. A FRBCS is composed by a Knowledge Base (KB) and a Fuzzy Reasing Method (FRM). FRM uses the information of KB in order to determine the class for each sample which goes to the system. KB is composed of two elements: the Data Base (DB) which includes the notion of the fuzzy sets associated to the linguistic terms and the Rule Base (RB) which contains a set of classification rules.

The FRM, which is an inference procedure, utilizes the information of KB to predict a class from an unclassified sample. In a classification task the model of FRM includes four operations:


FRBCS has been proposed in (Chi et al., 1996) and is an extension of the well known Wang & Mendel method (Wang & Mendel, 1992) to classification task. FRBCS finds the relationship between variables of the problem and establishes an association between the space of the features and the space of the classes. The main operations are as follows:

	- Computing the matching degree of each example to the several fuzzy regions.
	- Assigning the selected example to the fuzzy region with the higher membership degree.

**FSVM (exp) (%)** 

GM 88.59 88.53 85.50 89.08 90.64

SP 93.12 93.60 92.97 86.52 88.19 GM 70.18 69.78 71.33 71.56 72.74

SP 89.50 87.5 82.50 76.14 76.35 GM 76.47 79.15 81.62 94.51 95.05

SP 99.54 91.48 99.01 95.21 95.36

**FSVM-CIL (lin) (%)** 

**FSVM-CIL (exp) (%)** 

**(lin) (%)** 

**ECOLI** SE 78.67 78.18 78.64 90.02 92.45

**PIMA** SE 55.04 54.55 55.60 66.94 69.10

**PAGE** SE 58.26 61.74 67.06 93.81 93.14

Jesus et al. (Jesus et al., 2006) propose the study of the performance of Fuzzy Rule Based Classification System (FRBCS) in imbalanced datasets (Chi et al., 1996). The authors analyze the synergy of the linguistic FRBCS with some pre-processing techniques using several approaches such as undersampling, oversampling or hybrid models. A FRBCS is composed by a Knowledge Base (KB) and a Fuzzy Reasing Method (FRM). FRM uses the information of KB in order to determine the class for each sample which goes to the system. KB is composed of two elements: the Data Base (DB) which includes the notion of the fuzzy sets associated to the linguistic terms and the Rule Base (RB) which contains a set of

The FRM, which is an inference procedure, utilizes the information of KB to predict a class from an unclassified sample. In a classification task the model of FRM includes four

2. Compute the association degree of data to the consequent class of each rule. This step consists in the generation of an aggregation function between the compatibility degree

4. Classify by applying a decision function *F* on the association degree of data with

FRBCS has been proposed in (Chi et al., 1996) and is an extension of the well known Wang & Mendel method (Wang & Mendel, 1992) to classification task. FRBCS finds the relationship between variables of the problem and establishes an association between the

 Firstly a domain of variation of each feature Xi is determined, then fuzzy partitions are computed. This step is important in order to establish the linguistic partitions. For each example, a fuzzy rule is generated. Rules are created by considering several

 Computing the matching degree of each example to the several fuzzy regions. Assigning the selected example to the fuzzy region with the higher membership

space of the features and the space of the classes. The main operations are as follows:

1. Compute the compatibility degree of data with the precedent of the rules.

and the certainty degree of the rule with the class related. 3. Set the association degree of data with the several classes.

**SVM (%) FSVM** 

Table 5. Classification results and comparison between several methods

classification rules.

operations:

classes.

main steps.

degree.

In order to demonstrate the effectiveness of the proposed approach, authors considered several datasets belonging to the UCI repository with different degrees of imbalance.

The proposed method has been divided into three parts: an analysis of the use of preprocessing for imbalanced problems (such as SMOTE, random oversampling, random under-sampling ...), a study of the effect of the FRM and an analysis of the influence of the granularity applied to the linguistic partitions in combination with the inference method.

Results show that in all considered cases the presence of the pre-processing phase improves the behaviour of the learning algorithm.

The main conclusion of the proposed method is that the FRCM algorithm outperforms the other analyzed methods obtaining the best results adding a re-sampling operation before use the FRCM technique; moreover authors have found that FRBCSs perform well again the C4.5 decision tree in the context of very high imbalanced datasets.

An alternative to imbalance problem is the use of Complementary Learning Fuzzy Neural Network (CLFNN) which is proposed in (Tan et al., 2007). The use of fuzzy logic allows to tolerate uncertainty in the data reducing the effect of data imbalance. CLFNN has the main advantage that does not requires data pre-processing and hence, does not make any a prior assumption on data and does not alter the data distribution. This method exploits a neurofuzzy system which is based on complementary learning theory (Tan et al., 2005). Complementary learning is a system observed in human brain; with this theory different brain areas, which are segregated and mutually exclusive, are registered in order to recognize different objects (Gauthier, 2000). When an object is seen, registered areas are activated while the irrelevant areas are inhibited; this mechanism is called lateral inhibition. Generally complementary learning has the following characteristics:


According to this concepts, the different approaches that are present in the literature can be divided into three groups:


CLFNN is considered a 9-tuple *(X, Y, D, A, R, B, l, s, p)*. The definition of each element is as follows:


Fuzzy Inference System for Data Processing in Industrial Applications 233

IR 3.85 0.59 0.51 0.07 MLP 0.762 0.925 0.917 0.2 RBF 0.8 0.912 0.912 0.2 C4.5 0.763 0.907 0.906 0.2 SVM 0.79 0.93 0.93 0.2 LDA 0.90 0.931 0.918 0.2 CLFNN 0.91 0.96 0.96 0.37

Fuzzy Systems offer several advantages, among which the possibility to formalise and simulate the expertise of an operator in process control and tuning. Moreover the fuzzy approach provides a simple answer for processes which are not easily modelled. Finally they are flexible and nowadays they can be easily implemented and exploited also for realtime applications. This is the main reason why, in many industrial applications, dealing with processes that are difficult to model, fuzzy theory is widely adopted obtaining

In particular, in this chapter, applications of FIS to industrial data processing have been presented and discussed, with a particular emphasis on the detection of rare patterns or events. Rare patterns are typically much difficult to identify with respect to common objects and often data mining algorithms have difficulty dealing with them. There are two kind of "rarity": rare case and rare classes. Rare cases, commonly known as outliers, refer to anomalous samples, i.e. observations that deviate significantly from the rest of data. Outliers may be due to sensor noise, process disturbances, human errors and instruments degradation. On the other hand, rare classes or more generally class imbalance, occur when,

This chapter provides a preliminary review of classical outlier detection methods and then illustrates novel interesting detection methods based on Fuzzy Inference System. Moreover the class imbalance problem is described and traditional techniques are treated. Finally

Results demonstrate how, in real world applications, fuzzy theory can effectively provide an

in a classification problem, there are more samples of some classes than others.
