**6.1. Experiment design**

To experiment with the classification systems, the entire collection of mammograms is divided where 40% of the images are chosen as the training set and the remaining 60% as the test set, and a 10-fold Cross Validation (CV) has been used in the experimental design. The SVM learning approach was examined with the Gaussian radial basis function (GRBF) (r = 2, C = 100). The overall performance of a classifier is guaranteed by a 10-fold CV in all evaluation indices. The performance of ELM classifier depends on the selection of number of neurons in hidden layer L, which was determined as L = 700 by trials with increments within the range of 100–1000. It was found that both the training and testing errors were decreased when L increased to around 700 and after that the training and testing performance did not improve and kept almost fixed as shown in **Figure 6**. We also tested with different activation functions, such as sigmoid, tangent sigmoid, sin, and radial basis and the tangent-sigmoid found to be the optimal one.

#### **6.2. Performance evaluation**

layer feed-forward neural network (SLFNs) learning algorithm [48]. It first randomly assigns weights and biases for hidden nodes, and then analytically defines the output weights by using the least square method. Due to the random selection of weights and biases for hidden nodes, the ELM can decrease the learning time considerably and also can achieve superior

For similarity matching, it is challenging to find a unique feature representation to compare images accurately for all types of queries. Feature descriptors at different levels of image representation are in diverse forms and may be complementary in nature. The difference between the feature vector of queried mass (or ROI) and the feature vectors of reference images (or ROIs) is calculated to compute the similarity between the query image and the database. Current CAD schemes using CBIR approaches typically use the k-nearest neighbor type searching method which involves searching for the k most similar reference ROIs to the queried ROI. The smaller the difference ("distance"), the higher the computed "similarity" level is between the two compared ROIs. The searching and retrieving result of the CBIR algorithm depends on the effectiveness of the distance metrics to measure the "similarity" level among the selected images. In this work, a fusion-based linear combination (Eq. (7)) scheme of similarity measure of different features is used with pre-determined weights. The similarity between a query image Iq

To evaluate the effectiveness of the proposed classification and retrieval-based decision support system, the experiments are performed on mammographic digitized images taken from the Digital Database for Screening Mammography (DDSM), a collaboratively maintained public dataset at the University of South Florida [23]. The DDSM database has been widely used as a benchmark for numerous articles on the mammographic area, for being free of charge and having a diverse quantity of cases. The database contains approximately 2500 cases where each case includes two image view anatomy (CC and MLO) of each breast (right and left). The size of the images varies from 1024 × 300 pixels to 1024 × 800 pixels. The DDSM database offers more than 9000 images and

To experiment with the classification systems, the entire collection of mammograms is divided where 40% of the images are chosen as the training set and the remaining 60% as the test set, and a 10-fold Cross Validation (CV) has been used in the experimental design. The SVM learning approach was examined with the Gaussian radial basis function (GRBF) (r = 2, C = 100). The

from where we selected a total of 5880 images for experiments and result evaluation.

) (7)

 *(Iq, Ij)* are the Euclidean similarity

are weights (determined experimen-

generalization performance [49].

22 Medical Imaging and Image-Guided Interventions

and target image *I*

*j*

Sim(*I*

**6. Result evaluation**

**6.1. Experiment design**

is described as:

*q* , *I <sup>j</sup>*) = ∑ *F α<sup>F</sup> SF* (*I q* , *I <sup>j</sup>*) = ∑ *F α<sup>F</sup> S*(*f q F* , *f j F*

where *F* ∈ *{NSCT, HOG, Shape, Mass, and GLCM} and SF*

matching function in individual feature spaces and α<sup>F</sup>

tally) within the different image representation schemes.

For the performance evaluation of the proposed classification approaches in different feature spaces, we computed the sensitivity (true positive rate) and specificity (true negative rate) for each of the confusion matrices. The accuracy, sensitivity and specificity parameters are employed for the performance evaluation of our classification approaches. The specificity measures the percentage of positive instances that were predicted as positives, while sensitivity measures the percentage of negative instances that were predicted as negatives. The retrieval effectiveness is measured with the precision-recall (PR) graphs that are commonly used in the information retrieval domain. For the experiments, each image in the testing dataset is served as a query image. A retrieved image is considered to be a correct match if it belongs to the same category to which the query image belongs. The performances of the two image categories (e.g., normal and abnormal) and the three image categories (e.g., benign, malignant, normal) are compared, based on the PR graphs.

Finally, for both classification and retrieval evaluation, different combination of concatenated feature vectors are utilized as shown in **Table 1**. For example, the *f* 1 feature set consists of all five different features: Shape, Mass, GLCM, NSCT, eig(Hess)HOG features, whereas *f* 6 feature set consists of eig(Hess)HOG feature only.

**Figure 6.** The number of hidden layer nodes L determination with L = 700.
