Negative Samples

**4. Conclusions** 

satisfactory results.

Table 6. Datasets description and averaged results.

some actual Fuzzy based approaches are considered.

optimal tool outperforming other traditional techniques.

SPECT WBCD FNA THERM

212 212 235 5

55 357 457 65


In CLFNN the behaviour is characterized by a linguistic term derived by fuzzy set *A*. *A* is a mapping of *X* to the linguistic model of CLFNN; each set *A* identifies a set of locations in the input space characterizing by their membership functions. CLFNN is less subjected to data imbalance problem because it builds its knowledge from positive and negative classes separately and the influence of each class on the other one is minimized. Moreover in CLFNN the inference system uses the lateral inhibition which improves the system performance treating with imbalanced dataset.

In order to demonstrate the efficiency of CLFNN method, four imbalanced datasets are used: Single Photon Emission Computed Tomography (SPECT) (Blake & Merz, 1998), Wisconsin Breast Cancer diagnosis (WBCD) (Blake & Merz, 1998), Fine Needle Aspiration (FNA) (Cross & Harrison, 2006) and Thermogram (THERM).(Ng &Fok, 2003).

The dataset is divided in training set, testing set and validation set maintaining the class distribution. The averaged performance of CLFNN, which is calculated by the F-Measure over three cross-validation sets, is compared with other popular methods: Multilayer Perceptron (MLP)(Rosenblattx, 1958)., Radial Basis Function (RBF) (Powell, 1985), Linear Discriminant Analysis (LDA) (McLachlan, 2004), Decision tree C4.5 (Brodley & Utgoff,P.E.) and Support Vector Machine (SVM).

Table 6 illustrates the description of datasets and the averaged results obtained over the different cross validations with the several approaches. The acronym IR indicates the imbalanced ratio, i.e. the ratio between the number of positive samples and the number of negative samples.

Table 6 shows that CLFNN provides better results than the other approaches. In thermogram dataset none of the system can give satisfactory results because of its very high imbalance ratio but, also in this case, CLFNN outperforms the other approaches. This work confirms that CLFNN provides more consistent results over different data distributions coming a promising tool for handling imbalanced dataset.


Table 6. Datasets description and averaged results.
