**4. Conclusions**

232 Fuzzy Inference System – Theory and Applications






In CLFNN the behaviour is characterized by a linguistic term derived by fuzzy set *A*. *A* is a mapping of *X* to the linguistic model of CLFNN; each set *A* identifies a set of locations in the input space characterizing by their membership functions. CLFNN is less subjected to data imbalance problem because it builds its knowledge from positive and negative classes separately and the influence of each class on the other one is minimized. Moreover in CLFNN the inference system uses the lateral inhibition which improves the system

In order to demonstrate the efficiency of CLFNN method, four imbalanced datasets are used: Single Photon Emission Computed Tomography (SPECT) (Blake & Merz, 1998), Wisconsin Breast Cancer diagnosis (WBCD) (Blake & Merz, 1998), Fine Needle Aspiration

The dataset is divided in training set, testing set and validation set maintaining the class distribution. The averaged performance of CLFNN, which is calculated by the F-Measure over three cross-validation sets, is compared with other popular methods: Multilayer Perceptron (MLP)(Rosenblattx, 1958)., Radial Basis Function (RBF) (Powell, 1985), Linear Discriminant Analysis (LDA) (McLachlan, 2004), Decision tree C4.5 (Brodley & Utgoff,P.E.)

Table 6 illustrates the description of datasets and the averaged results obtained over the different cross validations with the several approaches. The acronym IR indicates the imbalanced ratio, i.e. the ratio between the number of positive samples and the number of

Table 6 shows that CLFNN provides better results than the other approaches. In thermogram dataset none of the system can give satisfactory results because of its very high imbalance ratio but, also in this case, CLFNN outperforms the other approaches. This work confirms that CLFNN provides more consistent results over different data distributions




(FNA) (Cross & Harrison, 2006) and Thermogram (THERM).(Ng &Fok, 2003).

which represent respectively the positive and negative output.

includes two complementary learning function: *s* and *p*.

structure of data is suitable to reflect the data *D*.

performance treating with imbalanced dataset.

and Support Vector Machine (SVM).

coming a promising tool for handling imbalanced dataset.

negative samples.

knowledge data.

relative outputs.

Fuzzy Systems offer several advantages, among which the possibility to formalise and simulate the expertise of an operator in process control and tuning. Moreover the fuzzy approach provides a simple answer for processes which are not easily modelled. Finally they are flexible and nowadays they can be easily implemented and exploited also for realtime applications. This is the main reason why, in many industrial applications, dealing with processes that are difficult to model, fuzzy theory is widely adopted obtaining satisfactory results.

In particular, in this chapter, applications of FIS to industrial data processing have been presented and discussed, with a particular emphasis on the detection of rare patterns or events. Rare patterns are typically much difficult to identify with respect to common objects and often data mining algorithms have difficulty dealing with them. There are two kind of "rarity": rare case and rare classes. Rare cases, commonly known as outliers, refer to anomalous samples, i.e. observations that deviate significantly from the rest of data. Outliers may be due to sensor noise, process disturbances, human errors and instruments degradation. On the other hand, rare classes or more generally class imbalance, occur when, in a classification problem, there are more samples of some classes than others.

This chapter provides a preliminary review of classical outlier detection methods and then illustrates novel interesting detection methods based on Fuzzy Inference System. Moreover the class imbalance problem is described and traditional techniques are treated. Finally some actual Fuzzy based approaches are considered.

Results demonstrate how, in real world applications, fuzzy theory can effectively provide an optimal tool outperforming other traditional techniques.

Fuzzy Inference System for Data Processing in Industrial Applications 235

Chawla, N.V. (2003). C4.5 and imbalanced data sets: investigating the effect of sampling

Chi, Z, Yan, H. & Pham,T. (1996). Fuzzy algorithms with applications to image processing

Chiang, J. & Hao, P. (2004). support Vector learning mechanism for fuzzy rule-based modeling: a new approach.*IEEE Trans. Fuzzy System*, vol 12, no 6, pp.1-12. Chen, Y. & Wang, J. (2003). Support vector learning for fuzzy rule based classification

Cross, S. & Harrison, R. F.(2006). Fine needle aspirated of breast lesions dataset. *Artificial* 

De Rouin, E., Brown, J., Fausett, L. & Schneider, M., (1991). Neural network training on

Dunn, J.C. (1973). A fuzzy relative of the ISODATA Process and its use in detecting compact well-separated clusters, *Journal of Cybernetics,*3, pp. 32-57, 1973. Dunn, J.C. (1974). Some recent investigations of a new fuzzy partition algorithm and its

Fawcett, T.& Provost, F.J. (1997). Adaptive fraud detection. *Data Mining Knowledge Discovery*,

Gao, J., Cheng, H. & Tan, P.N. (2006). Semi-supervised outlier detection. *Proceedings of the 2006 ACM Symposium on apllied Computing,* ACM Press, 2006, pp. 635-636. Gauthier, I., Skuldlasky, P., Gore, J.C. & Andreson, A.W. (2000). Experise for cars and birds

Gibbons, R.D. (1994). Statistical Methods for Groundwater Monitoring, *John Wiley & Sons,* 

Grubbs, F.E. (1969). Procedures for detecting outliyng observations in samples, *Technometric* 

Guo, H. & Viktor, H. (2004) Learning from imbalanced data sets with boosting and data

Han, H. Wang, W & Mao,B. (2005). Borderline-SMOTE: A New Oversampling Method in

Hao, Y., Chi, X. & Yan, D. (2007) Fuzzy Support Vector Machinre based on vague sets for

Hart, P. (1968). The condensed nearest neighbor rule. *IEEE Trans. Inform. Theory,* 14. pp.515-

Hawkin, D. (1980). Identification of outliers, Chapman and Hall, London 1980.

unequally represented classes. In Intelligent engineering systems through artificial neural networks, pp. 135-141, ASME press, NY. Dixon, W.J. (1953*). processing data* 

application to pattern classification problems, *journal of Cibernetics* 4, pp.1-15, 1974.

recruits brain areas involved in face recognition. *Nature Neuroscience*, Vol 3., (2),

generation: the DataBoost-IM approach. *ACM SIGKDD Explorations Newsletter Special issue on learning from imbalanced datasets* Volume 6 , Issue 1 (June 2004). Haibo, H & Garcia, E. (2009). Learning from Imbalanced Data. *IEEE Trans. Knowl. Data Eng.* 

Imbalanced Data sets Learning. *Proceeding of ICIC* Springer LNCS 3644, pp.878-887.

credit assessment. *Proc. 4th Int. Conf. Fuzzy System Knowl. Discov.* Changsha, China,

*from imbalanced dataset II, ICML*, Washington DC, 2003.

systems, *IEEE Trans Fuzzy System,* Vol,11, no 6, pp.716-728.

and pattern recognition. *World Scientific.* 

*Neural Networks in* Medicine World map.

*for Outliers*. *Biometrics* 9, pp.74-89, 1953.

Vol. 1, N°3, pp.291-316.

Vol 21, no 9, pp.1263-1284.

vol.1 , pp. 603-607.

516.

pp.191-197.

New York.

*11*, pp.1-21.

method, probabilistic estimate, and decision tree structure. *Workshop on learning* 

#### **5. References**


Akbani, R., Kwek, S.& Japkowicz, N. (2004). Applying support vector machines to

Alejo, R., Garcia, V., Sotoca, J.M., Mollineda, R.A. & Sanchez, J.S. (2006). Improving the

Batuwita R. & Palade, V. (2010). FSVM-CIL: Fuzzy Suport Vector Machines for class Imbalance Learning. *IEEE Transactions on fuzzy system*, Vol. 3, N° 18, June 2010. Berthold, M.R. & Huber, K.P. (1995). From radial to rectangular basis functions: a new

Bezdek, J.C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, *Plenum* 

Blake, C.L. (1998). UCI Repository of machine learning databases. *Irvine C.A.*, university of

Boser, B. E., Guyon, I. M. & Vapnik, V. N. (1992). A training algorithm for optimal margin

Breunig, N.M; kriegel, H.P. & Ng, R.T. (2000). LOF: Identifying density-based local outliers.

Brodley, C.E & Utgoff, P.E. (1995). Multivariate decision trees. *Machine Learning,* 19. pp. 45-

Bunkhumpornpat, C. Sinapiromsaran, K. & Lursinsap, C. (2009). Safe-Level-SMOTE: Safe

Cateni, S.; Colla, V. & Vannucci, M. (2009). A fuzzy system for combining different outliers

Chan, P.K. & Stolfo, S.J. (1998). Toward scalable learning with non-uniform class and cost

Chaves, A.; Vellasco, M. & Tansheit, R. (2005). Fuzzy rule extraction from support vector machines, *5th inter. Conf. Hybrid Intell. System,* Rio de Janeiro, Brazil. Chawla, N.V.; Hall, L.O., Bowyer, W. & Kegelmeyer, W.P. (2002) SMOTE: Synthetic

problem. In *Proceedings PAKDD 2009,*Springer LNAI 5476, pp. 475-482. Castillo, M.D. & Serrano, J.I. (2004) A multistrategy approach for digital text categorization from imbalanced documents. *SIGKDD Explor. Newsl.,* Vol. 6, N° 1, pp. 70-79. Castro, J.; Hidalgo, L.; Mantas, C. & Puche, J. (2007). Extraction of fuzzy rules from Support

vector machine. *Fuzzy Sets System,* Vol. 158, pp.2957-2077.

California, Department of Information and Computer science.

samples. *In E. Corchado et al. (Eds) IDEAL 2006,* pp.464-461.

imbalanced datasets. *In Proceedings of 15th European Conference on Machine Learning*,

classification accuracy of RBF and MLP neural networks trained with imbalanced

approach for rule learning from large datasets. *Technical report,* University of

classifiers. *In D. Haussler, editor, 5th Annual ACM Workshop on COLT*, pages 144-152,

Level Synthetic Oversampling technique fopr handling the class imbalanced

detection methods, *Proceedings of the 25th conference on proceedings of the International conference: Artificial Intelligence and Applications, ,* Innsbruck, Austria, 16-18 Febbraio

distribution: a case study in credit fraud detection. *Kwowledge Discovery and data* 

Minoruty Oversampling TEcnique. *Journal of Artificial Intelligence Research.* 16,

**5. References** 

Pisa, Italy, September 20-24, 2004.

Pittsburgh, PA, 1992. ACM Press

*Proceedings of ACM Conference*, pp.93-104.

Karlsruhe, 1995.

*Press,* New York.

77.

2009.

*Mining,* pp.164-168.

pp.321-327.


Fuzzy Inference System for Data Processing in Industrial Applications 237

Li, B.; Fang, L. & Guo, L. (2007) A novel data mining method for network anomaly detection

Lin, C.F. & Wang, D. (2002) Fuzzy support Vector machine, *IEEE Trans. Neural Network,* vol

Lin, Z., Hao, Z. & Yang, X. Lium, X. (2009). Several SVM ensemble methods integrated with

Ling, C.X. & Li, C. (1998). Data Mining for direct marketing: problems and solutions.

Lingras, P. & West, C. (2004). Interval set clustering of web users with rough k-means, *Journal of Intelligent Information System,* Vol. 23, N°1, July 2004, pp. 5-16. Liu, Y., An, A. & Huang, X. (2006). Boosting prediction accuracy of imbalanced dataset with

Liu, A. & Ghosh, J. (2007). Generative Oversampling for Mining Imbalanced Datasets. *In* 

Mahalanobis, P.C. (1936). On the generalized distance in statistics. *Proceedings of the National* 

Maloof, M. (2003). Learning when data sets are imbalanced and when costs are unequal and

Mandani, E.H. (1974). Application of fuzzy algorithms for control of simple dynamic plant.

Matsumoto, S., Kamei, Y., Monden, A. & Matsumoto, K. (2007). Comparison of outlier

McCarthy, K., Zabar, B. & Weiss, G. (2005). Does cost sensitive learning beat sampling

McLachlan (2004). Discriminant Analysis and Statistical pattern recognition.*Wiley* 

Mill, J. & Inoue, A. (2003). An application of fuzzy support vector. *Proc. 22nd Int. Conf.* 

Ng, E.Y.K & Fok, S.C. (2003). A framework foe early discovery of breast tumor using

Papadimitriou, S. Kitawaga, H. Gibbons, P. & Faloutsos, C. (2003). LOCI: Fast Outlier

Pawlak, Z. (1982). Rough Sets. *International journal of computer and Information Sciences,* 11,

Pawlak, Z. (1991). Rough Sets: Theoretical aspects of reasoning about data. Dordrecht:

*Neural Amer. fuzzy Inf. Process Soc.* Chicago, IL pp.302-306.

Detection methods in Fault-proneness Models. *Proceedings of the 1st International symposium on Emperical Software Engineering and measurements* (ESEM2007), pp. 461-

for classifying rare classes?. *UBDM'05* New York, NY, USA, ACM Press, pp.69-

thermography with artificial neural network. *The Breast Journal,* Vol. 9, 4, 2003,

Detection using the Local Correlation Integral. *Proceedings of the International* 

*Proceedings of the IEEE Control and Science,* 121, pp. 298-313, 1974.

*Application,*Berlin, Germany Springer Verlag, pp.536-544.

*Knowledge Discovery and data Mining,* pp.73-79.

Springer Berlin, 2007, pp. 1286-1292.

13, n 2, pp.464-471.

Singapore pp.107-118.

unknown. *ICML,* 2003.

*Institute of science of india,* pp.49-55.

*Conference of Data Engineering, 2003.*

Kluver Academic Publisher.

*DMIN* pp. 66-72.

463.

77.

*interscience.* 

pp.341-343.

341-356.

based on transductive Scheme, *Advances in Neural Networks,* LNCS, VOL.4491,

under-sampling for imbalanced data learning. *In Advanced Data Mining and* 

SVM ensembles. *Proc. 10th Pac.-Asia Conf. Adv. Knowl. Discov. Data Mining*,


He,Z., Xu,X. & Deng, S. (2003) Discovering cluster-based local outliers, *Pattern recognition* 

Hodge, V.J. (2004). A survey of outlier detection methodologies, *Kluver Academic Publishers,* 

Hong, X.; Chen, S. & Harris, C.J. (2007). A kernel based two class classifier for imbalanced

Hu, Q. & Yu, D. (2005) An improved clustering algorithm for information granulation,

Imam, T., Ting, K. & Kamruzzaman, J. (2006). z-SVM: An SVM for improved classification of

Inoue, T. & Abe, S. (2001). Support Vector machine for pattern classification. *Proc. Int. Conf.* 

Jang, M.F., Tseng, S.S, & Su, C.M. (2001). Two-phase clustering process for outliers

Jang, S., Li, Q., Wang, H. & Zhao, Y. (2005). A two-stage outlier detection method. *MINI-*

Japkowicz, N. & Shaju, S. (2002). The class imbalance problem: a systematic study. *Intelligent* 

Jesus, M.J., Fernandèz, A., Garcia, S. & Herrera, F. (2006) A first study of the use of fuzzy

Jimenez-Marquez, S.A., Lacroix, L. & Thibault, J. (2002) Statistical data validation methods for large cheese plant database. *J.Dairy Sci.,* 85(9), Sep 2002, pp. 2081-2097. Kang, P. & Cho, S. (2006). EUS SVMs: Ensemble of under sampled SVMs for data imbalance

Knorr, E.M., Ng, R. (2003). algorithms for Mining Distance-based Outliers in Large datasets,

Kubat, M. & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one sided

Kubat, M., Holte, R.C. and Matwin, S. (1998). Machine learning for the detection of oil spill in satelite radar images, *Machine Learning*, Vol. 30, N°2-3, pp.195-215. Leskovec, J. & Shawe-Taylor, J. (2003). Linear programming boosting for uneven datasets. In

Li, P., Chan, K.L. & Fang, W. (2006) Hybrid kernel machine ensemble for imbalanced

rule based classification systems for Problems with Imbalanced Data Sets*. Symposium on Fuzzy Systems in Computer Science (FSCS06).* Magdeburg,

problems. *Proc. of 13th International Conference Neural Inf. Process*., Hong Kong, pp.

selection. *Proceedings of the Fourteenth International Conference on Machine Learning.*

*Proc. Of the 20th International conference on Machine learning (ICML-2003),*

data sets*. In Proc. Of the 18th Int. Conference on pattern recognition, (ICPR'06),*

*Data Analysis*, Volume 6, Issue 5 (October 2002), Pages: 429 – 449.

*Proceedings of 2nd International Conference on Fuzzy Systems and Knowledge Discovery*  (FSKD'05), vol 3613, LNCS, Springer-Verlag, Berlin Heidelberg Changsha, China,

imbalanced data. In *Proceeding of Aust. Joint Conf.* AI, Hobart, Australia, pp. 264-273.

datasets. *IEEE Transactions on Neural Networks,* vol 18 (1), pp.28-41.

*Neural Network,* Washington , D.C. pp.1449-1457.

detection. *Pattern Recognition Letters*, pp.691-700.

*MICRO SYSTEMS*, pp.1237-1240.

Germany, pp. 63-72 (2006).

Washington DC, 2003.

*Proceedings of VLDB,* pp.392-403, 2003.

pp.179-186. Nashville Tennesse, Morgan, Kaufmann.

837-846.

2006.

*Letters,* pp.1651-1660.

2005, pp.494-504.

Netherlands, January, 2004.


Fuzzy Inference System for Data Processing in Industrial Applications 239

Veropoulus, K. Campbell, C. & Cristianini, N. (1999). Controlling the sensitivity of support

Visa, S. & Ralescu, A. (2005). Issues in mining imbalanced datasets - a review paper.

Xie, Z., Hu, Q. & Yu, D. (2005). Fuzzy Support Vector Machine for classification. *Proc. Int.* 

Xu, Z. & Liu, S. (2009). Rough based Semi- Supervised Outlier Detection. *Sixth International* 

Wang, L.X. & Mendel, J.M. (1992). Generating fuzzy rules for learning from examples. *IEEE Transactions on Systems, Man and Cybernetics,* Vol 35, No 2, pp.353-361. Wang, J. Xu, M., Wang, H. & Zhang, J. (2006). Classification of Imbalanced Data by

Wang, Y., Wang, S. & Lai, K. (2005). A new fuzzy support vector machine to evaluate credit

Weiss, G.M. & Provost, F. (2003). Learning when training data are costly: the effect of class

Wu, G. & Chang, E. (2003). Class boundary alignment for imbalanced dataset learning.

Wu, G. & Chang, E. (2004). KBA: Kernel Boundary Alignment considering imbalanced

Xie, J. & Qiu, Z. (2007). The effect of imbalanced datasets on LDA: a theoretical and

Xue, Z.; Shang, Y; Feg S. (2010) Semi-supervised outlier detection based on fuzzy rough

Yousri, N.A., Ismal, M.A. & Kamel, M.S. (2007). Fuzzy outlier analysis a combined

Yu, D., Sheikholeshami, G & Zhang, A. (2003). Findout: finding out outliers in large

Zadrozny, B., Langford, J. & Abe, N. (2003). Cost sensitive learning by cost proportionate

Zhang, D., Gatica-Perezs, D., Bengio, S. & McCowan, I. Semi supervised adapted HMMs for

*Pattern Recognition (CVPR'05) IEEE Press,* June, 2005. Vol 1 pp.611-618.

empirical analysis. *Pattern Recognition*, vol. 40, pp.557-562.

clustering-outlier detection approach, *IEEE SMC 2007.* 

datasets. *Knowledge and information Systems,* pp.387-412.

*Conf. Adv. Natural. comput.* Changsha, China p.1190-1197.

risk. *IEEE Trans. Fuzzy Syst.* Vol 13, no 6, pp.820-831.

*conference on Fuzzy System and Knowledge Discovery,* pp. 520-524.

pp. 55-60.

16-20, 2006.

Datasets II, Washington ,D.C.

*Conference on Data Mining,* 2003.

354.

786-795.

2010.

*Conference,* pp. 67-73.

vector machines, in *Proc. Int. Joint Conf. Artificial Intelligence,* Stockholm, Sweden,

*Proocedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science* 

using the SMOTE algorithm and Locally Linear Embedding. *Proceeding of 8th International conference on Signal Processing,* IEEE, Classification of Imbalanced Data by using the SMOTE algorithm and Locally Linear Embedding. Vol. 3, pp.

distribution on tree induction. *Journal of Artificial Intelligence Research,* 19, pp. 315-

*Proceeding of Internaltional Conference Data Mining,* Workshop Learning Imbalanced

data distribution.. *Proceeding of IEEE Trans. Knowl. Data Eng.* Vol 17, N°6, pp.

C-means clustering, *Mathematics and Computers in simulation,* 80, pp.2011-2021,

example weigthing. In *ICDM'O3 Proceedings of the Third IEEE International* 

unusual event detection, *IEEE Computer society Conference on Computer Vision and* 


Pazzani, M., Marz, C., Murphy, P., Ali, K., Hume, T. & Brunk, C. (1994). Reducing

Phua, C, Alahakoon, D. & Lee, V. (2004). Minority report in fraud detection: Classification of

Powell, M.J.D. (1985). Radial Basic Functions for multivariable interpolation: a review. *IMA* 

Ramasmawy,S., Rastogi, R. & Shim, K. (2000). Efficient Algorithms for Mining Outliers

Raskutti, B. & Kowalczyk, A. (2004). Extreme re-balancing for svms: a case study. *SIGKDD* 

Ray, S & Turi, H. (1999). Determination of number of clusters in k-means clustering and

Rosenblatt, F. (1958). The Perceptron: a probabilistic model for information storage and

Ruspini, E. (1969). A new approach to clustering. *Information and Control*, N°15, pp.22-32,

Sam, T.R. & Lawrence, K.S. (2000). Nonlinear Dimensionality Reduction by Locally Linear

Soler, V. & Prim, M. (2007). Rectangular basis functions applied to imbalanced datasets.

Spyrou, E., Stamou, G. Avrithis, Y. & Kollias, S. (2005). Fuzzy Support Vector Machine for

Tan, T.Z., Ng, S.G. & Quek, C. (2005). Ovarian cancer diagnosis by hippocampus and

Tan, T.Z.; Ng, S.G. & Quek, C. (2007). Complementary Learning Fuzzy Neural Network: An

Tomek, I. (1976). Two modifications of CNN. *IEEE Transactions System Man Comm.* 6, pp.769-

Turney, P.D. (2000). Learning algorithms for keyphrase extraction. *Information Retrieval,* 

Vannucci, M., Colla, V., Cateni, S. & Sgarbi, M. (2011). Artificial intelligence techniques

image classification fusing mpeg7 visual descriptors. *Proc. 2nd Eur. Workshop Integr.* 

neocortex-inspired learning memory structures. *Neural Networks.*Vol 18, 5-6,

approach to Imbalanced Dataset. *Proceedings of International Joint Conference on* 

for unbalanced datasets in real world classification tasks. *Computational Modeling and Simulation of Intellect current state and future perspectives,* pp. 551-

organization in The brain. *Psycological review,* 65. pp.386-408.

*Lecture notes in computer science,* Vol. 4668/2007, Springer.

*Knowl. Semantics Dig. Media Technol.* London UK, pp.25-30.

*Neural Networks,* Orlando, Florida, USA, August 12-17.

skewed data, *SIGKDD explor. Newsl.,* Vol. 6, N°1, pp.50-59, 2004.

pp. 217- 225, 1994.

RMCS Shrivenham, England.

*(SIGMOD' 00)*, 2000, pp. 427-438.

*Exploration*, Vol 6, N°1, pp. 60-69.

December 1999, pp. 137-143.

embedding. *Science , 290*, pp. 2323-2326.

1969.

pp.818-825.

vol.2 , n°4, pp. 303-336.

772.

565.

misclassification cost. *In Proceedings of the 11th Intl. Conference on machine learning*,

*Conference on Algorithms for the Approximation of Functions on Data.* pp.143-167,

from large datasets, *Proceedings of International conference of management of data* 

application in colour image segmentation. *Proceedings of 4th International conference in pattern recognition and Digital Techniques,* (ICAPRDT'99), Calcutta, india, 27-29


**Section 4** 

**Cognition Problems** 

**Application to Image Processing and** 

Zou, S. Huang, Y., wang,Y. Wang, J. & Zou, C. (2008). SVM learning from imbalanced data by GA sampling for protein domain prediction. *Proc. of 9th Int. Congf. Joung Comput., Sci* Hunan, China, 2008, pp.982-987.
