**1. Introduction**

214 Fuzzy Inference System – Theory and Applications

[40] Hameed Kaleel Ahmed, Mallick Zulquernain, 2009, Expert system to predict effects of

Health An Inter-disciplinary International Journal, Vol.11 Issue 45 , 206-216. [41] Ahmed Hameed Kaleel, Zulquernain Mallick , 2010, An expert System for Predicting

[42] Ahmed Hameed Kaleel, Abid Haleem and Zulquernain Mallick, 2009, Cognitive Task

for improved productivity EIP, 21-22 November, 10-15.

007, Gujarat, India, 647-651.

noise pollution on operators of power plant using neuro-fuzzy approach, Noise &

Cognitive Performance in Reciprocating Pumps Industry Using Neuro-Fuzzy Approach, Proc. of the 3rd International Conference on Advances in Mechanical Engineering, January 4-6, 2010, S.V. National Institute of Technology, Surat -395

Assessment in steam Turbine Power Plant Station, All India Seminar Ergonomics

In the last years Fuzzy Inference Systems (FIS) have been used in several industrial applications in the field of automatic control, data classification, decision analysis, expert systems, time series prediction, and pattern recognition.

The large use of FIS in the industrial field is mainly due to the nature of real data, that are often incomplete, noisy and inconsistent, and to the complexity of several processes, where the application of mathematical models can be impractical or even impossible, due to the lack of information on the mechanisms ruling the phenomena under consideration. Fuzzy theory is in fact essential and applicable to many complex systems and the linguistic formulation of its rule basis provides an optimal, very suitable and intuitive tool to formalise the relationships between input and output variables.

In real world database anomalous data (often called *outliers*) can be frequently found, which are due to several causes, such as erroneous measurements or anomalous process conditions. Outliers elimination is a necessary step, for instance, when building a training database for tuning a model of the process under consideration in standard operating conditions. On the other hand, in many applications, such as medical diagnosis, network intrusion or fraud detection, rare events are more interesting than the common samples. The rarity of certain patterns combined to their low separability from the rest of data makes difficult their identification. This is the case, for instance, of classification problems when the patterns are not equally distributed among the classes (the so-called *imbalanced dataset*  (Vannucci et al., 2011)). In many real problems, such as document filtering and fraud detection, a binary classification problem must be faced, where the data belonging from the "most interesting" class are far less frequent than the data belonging to the second class, which corresponds to normal situations. The main problem with imbalanced dataset is that the standard learners are biased towards the common samples and tend to reduce the error rate without taking the data distribution into account.

In this chapter a preliminary brief review of traditional outlier detection techniques and classification algorithms suitable for imbalanced dataset is presented. Moreover some recent practical applications of FIS that are capable to outperform the widely adopted traditional methods for detection of rare data are presented and discussed.

Fuzzy Inference System for Data Processing in Industrial Applications 217

neighbors of *x*. It is evident that *MinPts* is an important parameter of the proposed algorithm. Papadimitriou et al. (Papadimitriou et al., 2003) propose LOCI (Local Correlation Integral) which uses statistical values belonging to data to solve the problem of choosing

Clustering-based methods perform a preliminary clustering operation on the whole dataset

Fuzzy C-means algorithm (FCM) is a method of clustering developed by Dunn in 1973 (Dunn, 1973) and improved by Bezdek in 1981 (Bezdek, 1981). This approach is based on the notion of fuzzy c-partition introduced by Ruspini (Ruspini, 1969). Let us suppose *X={x1, x2, ... xn}* be a set of data where each *sample xh (h=1, 2, ... n)* is a vector with dimensionality *p*. Let *Ucn* be a set of real *c×n* matrices where *c* is an integer value which can assume values

where *uih* is the degree of membership of *xn* in cluster *i* (*1≤i≤c*). The objective of FCM approach is to provide an optimal fuzzy C-partition minimizing the following function:

> ����� �� �� � ∑ ∑ ������ ‖�� � ��‖ � � ��� �

where *V=(v1, v2,... vc)* is a matrix of cluster centres, ║.║is the Euclidean norm and *m* is a

Many clustering-based outlier approaches have been recently developed. For instance, Jang et al. (Jang et al., 2001) proposed an outlier-finding process called OFP based on k-means algorithm. This approach considers small clusters as outliers. Yu et al. (Yu et al., 2002) proposed an outlier detection method called FindOut, which is based on removing of clusters from original data to identify outliers. Moreover He et al. (He et al., 2003) introduced the notion of cluster-based local outlier and outlier detection method (FindCBLOF), which exploits a cluster-based LOF in order to identify the outlierness of each sample. Finally Jang et al. (Jang et al., 2005) proposed a novel method in order to improve

Statistical-based methods use standard distribution to fit the initial dataset. Outliers are defined considering the probability distribution and assuming that the data distribution is a priori known. The main limit of this approach lies in the fact that, for many applications, the prior knowledge is not always distinguishable and the cost for fitting data with standard distribution could be considerable. A widely used method belonging to distribution-based approaches has been proposed by Grubbs (Grubbs, 1969). This test is efficient if data can be approximated by a Gaussian distribution. The Grubbs test calculates the following statistics:

> �� � ���� ��� � �� �

��� � �, � � ∑ ��� � � �

��� (2)

��� } (1)

(3)

and then classify as outliers the data which are not located in any cluster.

between 2 and *n*. The fuzzy C-partition space for *x* is the following set:

Mcn = { UєUcn; uihє[0, 1] ; ∑ ��� �

values for *MinPts*.

**2.3 Clustering-based methods** 

weighting exponent (*m>1*).

the efficiency of FindCBLOF approach.

**2.4 Statistical-based methods** 
