**5. Classification and similarity matching**

For classification of breast masses as either normal and abnormal (two-class separation) or normal, benign, and malignant cases (three-class study), we used SVM and ELM classifiers for excellent generalization performance and little human intervention. The SVM carries out classification between two classes by determining a hyper plane in feature space that is based on the most informative points of the training set [47]. On the other hand, ELM is a single-hidden layer feed-forward neural network (SLFNs) learning algorithm [48]. It first randomly assigns weights and biases for hidden nodes, and then analytically defines the output weights by using the least square method. Due to the random selection of weights and biases for hidden nodes, the ELM can decrease the learning time considerably and also can achieve superior generalization performance [49].

For similarity matching, it is challenging to find a unique feature representation to compare images accurately for all types of queries. Feature descriptors at different levels of image representation are in diverse forms and may be complementary in nature. The difference between the feature vector of queried mass (or ROI) and the feature vectors of reference images (or ROIs) is calculated to compute the similarity between the query image and the database. Current CAD schemes using CBIR approaches typically use the k-nearest neighbor type searching method which involves searching for the k most similar reference ROIs to the queried ROI. The smaller the difference ("distance"), the higher the computed "similarity" level is between the two compared ROIs. The searching and retrieving result of the CBIR algorithm depends on the effectiveness of the distance metrics to measure the "similarity" level among the selected images.

In this work, a fusion-based linear combination (Eq. (7)) scheme of similarity measure of different features is used with pre-determined weights. The similarity between a query image Iq and target image *I j* is described as:

$$\text{Sim}(I\_{q'}I\_) = \sum a^F S^F(I\_{q'}I\_) = \sum a^F S(f\_{q'}^{\mathcal{F}} f\_{\mid}^{\mathcal{F}}) \tag{7}$$

overall performance of a classifier is guaranteed by a 10-fold CV in all evaluation indices. The performance of ELM classifier depends on the selection of number of neurons in hidden layer L, which was determined as L = 700 by trials with increments within the range of 100–1000. It was found that both the training and testing errors were decreased when L increased to around 700 and after that the training and testing performance did not improve and kept almost fixed as shown in **Figure 6**. We also tested with different activation functions, such as sigmoid, tangent sigmoid, sin, and radial basis and the tangent-sigmoid found to be the optimal one.

A Decision Support System (DSS) for Breast Cancer Detection Based on Invariant Feature…

http://dx.doi.org/10.5772/intechopen.81119

23

For the performance evaluation of the proposed classification approaches in different feature spaces, we computed the sensitivity (true positive rate) and specificity (true negative rate) for each of the confusion matrices. The accuracy, sensitivity and specificity parameters are employed for the performance evaluation of our classification approaches. The specificity measures the percentage of positive instances that were predicted as positives, while sensitivity measures the percentage of negative instances that were predicted as negatives. The retrieval effectiveness is measured with the precision-recall (PR) graphs that are commonly used in the information retrieval domain. For the experiments, each image in the testing dataset is served as a query image. A retrieved image is considered to be a correct match if it belongs to the same category to which the query image belongs. The performances of the two image categories (e.g., normal and abnormal) and the three image categories (e.g., benign,

Finally, for both classification and retrieval evaluation, different combination of concatenated

five different features: Shape, Mass, GLCM, NSCT, eig(Hess)HOG features, whereas *f*

1

feature set consists of all

6 feature

**6.2. Performance evaluation**

malignant, normal) are compared, based on the PR graphs.

**Figure 6.** The number of hidden layer nodes L determination with L = 700.

set consists of eig(Hess)HOG feature only.

feature vectors are utilized as shown in **Table 1**. For example, the *f*

where *F* ∈ *{NSCT, HOG, Shape, Mass, and GLCM} and SF (Iq, Ij)* are the Euclidean similarity matching function in individual feature spaces and α<sup>F</sup> are weights (determined experimentally) within the different image representation schemes.
