**1. Introduction**

Breast cancer is considered a major health problem since it is the second most common cause of cancer among women only after lung cancer in both developed and developing countries [1, 2]. Based on the estimates of American cancer society (ACS), there were 231,840 new cases of invasive breast cancer and 40,290 breast cancer deaths among U.S. women in 2015 [3]. Although, the breast cancer mortality has declined among women of all ages during the last two decades due to the result of treatment improvements, earlier detection, and awareness, the incidence rate has increased significantly during this time [2, 3].

In the last several years, developing CAD schemes that use content-based image retrieval (CBIR) approach to search for the clinically relevant and visually similar mammograms (or regions) depicting suspicious lesions has also been attracting research interest [20–22]. CBIRbased CAD schemes have potential to provide radiologists with "visual aid" and increase their confidence in accepting CAD-cued results in the decision making. Furthermore, CBIR would be a useful aid in the training of students, residents, and less experienced radiologists since it would allow them to view images of lesions that appear similar, but may have differing pathology and help them see how the pattern in their current case closely resembles a pattern in cases previously proven to be non-cancerous, thereby, improving specificity. Although preliminary studies have suggested that using CBIR might improve radiologists' performance and/or increase their confidence in the decision making, this technology is still in the early development stage with lack of benchmark evaluation in ground-truth datasets. Despite of great research interest and the significant progress made in the last several years, developing CBIR approaches for breast cancer detection that can be routinely accepted and

A Decision Support System (DSS) for Breast Cancer Detection Based on Invariant Feature…

http://dx.doi.org/10.5772/intechopen.81119

13

To address the limitation of the current CAD in detecting mass-like abnormalities and due to the ongoing success of CBIR to provide clinical decision support for medical images of different modalities, we proposed to develop an integrated and interactive retrieval system. It will be able to respond to image based visual queries of automatically segmented suspicious mass region by displaying mammograms of relevant masses of past cases that are similar to the queried region as well as predicting the image categories (e.g., malignant, benign and normal masses). The performance and reliability of such a CBIR system depends on a number of factors (or issues) including the optimization of lesion segmentation, feature extraction, classification, similarity measures and relationship between the clinical relevance and visual

The most challenging problem in this task is detecting the mass from the background and extracting the discriminative local features of clinical importance. A Graph-Based Visual Saliency (GBVS) method [23] is utilized for the segmentation of the regions of interest (ROIs) as breast masses. To extract invariant features from masses, Non-Subsampled Contourlet transform (NSCT) [24] is utilized due to its powerful capability in image representation compared to wavelets and contourlet transform. Furthermore, in order to distinguish normal and abnormal tissues, eig(Hess) HOG [25] features are extracted based on computation of eigenvalues of the Hessian matrix in a histogram of oriented gradients in addition to several geometric features from the masses in mammographic images. HOG is known as a keypoint descriptor in literature which expresses the local statistics of the gradient orientations around a keypoint [46]. The HOG feature can express object appearance due to the reason that the histogram process gives translational invariance the gradient orientations are strong to lighting changes. It is also useful for the classification and retrieval of textured breast masses with different shapes.

For classification, two-class separation (normal and abnormal) and three-class study (normal, benign, and malignant cases) are carried out on the individual and combined input feature spaces by utilizing Support Vector Machines (SVM) and Extreme Learning Machines (ELM) with 10-fold cross validation. For retrieval, performances are evaluated and compared in different feature spaces in the benchmark DDSM dataset [26] using precision and recall curves

applied in the clinical practice is still not a reality.

similarity, database quality and sizes.

The diagnosis and detection of breast tumor in the early stages is the best opportunity to increase the chances of survival. Therefore, women of age 40 or older are recommended to get mammograms regularly. However, such a recommendation results in the generation of a very large number of mammograms that need to be processed. In addition, the interpretation of mammogram images mostly depends on the experience of the radiologists, and the tumors may be overlooked easily while viewing the image in early stages of breast cancer as the clinical indications are varied in appearance [4]. Screening mammograms is also a repetitive task that causes fatigue and eye strain since for every thousand cases analyzed by a radiologist, only 3–4 are cancerous and thus an abnormality may be overlooked [5]. It has been seen that between 60 and 90% of the biopsies of cancer cases predicted by radiologists found benign later and those biopsied women are exposed to needless fear and anxiety [6].

To support radiologists in the process of visually screening mammograms to avoid missdiagnosis, computer-aided detection and/or diagnosis (CADe and CADx) systems have been proposed for analyzing digital mammogram due to the rapid advancement of digital imaging, computer vision, pattern recognition and machine learning technologies [7–9]. The CADe systems are responsible for highlighting or cueing the locations depicting the suspicious micro calcification clusters and masses and CADx systems deal with classifying classification between malignant and benign masses. Many methods have been proposed in the literature to assist radiologists in accurate interpretation of mammogram for detection of suspicious areas of micro-calcification clusters and breast masses often hidden in dense breast tissues and classification to benign and malignant lesions utilizing a wide variety of algorithms [10–15]. Previous studies have shown that using CAD improves radiologists' efficiency in searching for and detecting micro-calcification clusters as well as helps them detect more cancers associated with malignant micro-calcifications [10]. However, current CAD has no or little impact in helping radiologists detect more subtle cancers associated with mass-like abnormalities due to the relatively low performance in mass detection due to large variation in shape and size and are often indistinguishable from surrounding tissues [15].

Also, the majority of the research efforts in this domain has focused on the problem of the cancer detection, in which the likelihood of malignancy is computed based on some feature extraction and classification schemes [15–19]. These systems are non-interactive in nature and the prediction represents just a cue for the radiologist without the ability to explain the reasoning of the decision-making (the "black-box" type approach), as the final decision regarding the likelihood of the presence of a malignant mass is left exclusively to him/her. Hence, the clinical benefit of using current commercially available CAD systems is still under debate and test.

In the last several years, developing CAD schemes that use content-based image retrieval (CBIR) approach to search for the clinically relevant and visually similar mammograms (or regions) depicting suspicious lesions has also been attracting research interest [20–22]. CBIRbased CAD schemes have potential to provide radiologists with "visual aid" and increase their confidence in accepting CAD-cued results in the decision making. Furthermore, CBIR would be a useful aid in the training of students, residents, and less experienced radiologists since it would allow them to view images of lesions that appear similar, but may have differing pathology and help them see how the pattern in their current case closely resembles a pattern in cases previously proven to be non-cancerous, thereby, improving specificity. Although preliminary studies have suggested that using CBIR might improve radiologists' performance and/or increase their confidence in the decision making, this technology is still in the early development stage with lack of benchmark evaluation in ground-truth datasets. Despite of great research interest and the significant progress made in the last several years, developing CBIR approaches for breast cancer detection that can be routinely accepted and applied in the clinical practice is still not a reality.

**1. Introduction**

12 Medical Imaging and Image-Guided Interventions

Breast cancer is considered a major health problem since it is the second most common cause of cancer among women only after lung cancer in both developed and developing countries [1, 2]. Based on the estimates of American cancer society (ACS), there were 231,840 new cases of invasive breast cancer and 40,290 breast cancer deaths among U.S. women in 2015 [3]. Although, the breast cancer mortality has declined among women of all ages during the last two decades due to the result of treatment improvements, earlier detection, and awareness,

The diagnosis and detection of breast tumor in the early stages is the best opportunity to increase the chances of survival. Therefore, women of age 40 or older are recommended to get mammograms regularly. However, such a recommendation results in the generation of a very large number of mammograms that need to be processed. In addition, the interpretation of mammogram images mostly depends on the experience of the radiologists, and the tumors may be overlooked easily while viewing the image in early stages of breast cancer as the clinical indications are varied in appearance [4]. Screening mammograms is also a repetitive task that causes fatigue and eye strain since for every thousand cases analyzed by a radiologist, only 3–4 are cancerous and thus an abnormality may be overlooked [5]. It has been seen that between 60 and 90% of the biopsies of cancer cases predicted by radiologists found benign

To support radiologists in the process of visually screening mammograms to avoid missdiagnosis, computer-aided detection and/or diagnosis (CADe and CADx) systems have been proposed for analyzing digital mammogram due to the rapid advancement of digital imaging, computer vision, pattern recognition and machine learning technologies [7–9]. The CADe systems are responsible for highlighting or cueing the locations depicting the suspicious micro calcification clusters and masses and CADx systems deal with classifying classification between malignant and benign masses. Many methods have been proposed in the literature to assist radiologists in accurate interpretation of mammogram for detection of suspicious areas of micro-calcification clusters and breast masses often hidden in dense breast tissues and classification to benign and malignant lesions utilizing a wide variety of algorithms [10–15]. Previous studies have shown that using CAD improves radiologists' efficiency in searching for and detecting micro-calcification clusters as well as helps them detect more cancers associated with malignant micro-calcifications [10]. However, current CAD has no or little impact in helping radiologists detect more subtle cancers associated with mass-like abnormalities due to the relatively low performance in mass detection due to large variation in shape and size

Also, the majority of the research efforts in this domain has focused on the problem of the cancer detection, in which the likelihood of malignancy is computed based on some feature extraction and classification schemes [15–19]. These systems are non-interactive in nature and the prediction represents just a cue for the radiologist without the ability to explain the reasoning of the decision-making (the "black-box" type approach), as the final decision regarding the likelihood of the presence of a malignant mass is left exclusively to him/her. Hence, the clinical benefit of using current commercially available CAD systems is still under debate and test.

the incidence rate has increased significantly during this time [2, 3].

later and those biopsied women are exposed to needless fear and anxiety [6].

and are often indistinguishable from surrounding tissues [15].

To address the limitation of the current CAD in detecting mass-like abnormalities and due to the ongoing success of CBIR to provide clinical decision support for medical images of different modalities, we proposed to develop an integrated and interactive retrieval system. It will be able to respond to image based visual queries of automatically segmented suspicious mass region by displaying mammograms of relevant masses of past cases that are similar to the queried region as well as predicting the image categories (e.g., malignant, benign and normal masses). The performance and reliability of such a CBIR system depends on a number of factors (or issues) including the optimization of lesion segmentation, feature extraction, classification, similarity measures and relationship between the clinical relevance and visual similarity, database quality and sizes.

The most challenging problem in this task is detecting the mass from the background and extracting the discriminative local features of clinical importance. A Graph-Based Visual Saliency (GBVS) method [23] is utilized for the segmentation of the regions of interest (ROIs) as breast masses. To extract invariant features from masses, Non-Subsampled Contourlet transform (NSCT) [24] is utilized due to its powerful capability in image representation compared to wavelets and contourlet transform. Furthermore, in order to distinguish normal and abnormal tissues, eig(Hess) HOG [25] features are extracted based on computation of eigenvalues of the Hessian matrix in a histogram of oriented gradients in addition to several geometric features from the masses in mammographic images. HOG is known as a keypoint descriptor in literature which expresses the local statistics of the gradient orientations around a keypoint [46]. The HOG feature can express object appearance due to the reason that the histogram process gives translational invariance the gradient orientations are strong to lighting changes. It is also useful for the classification and retrieval of textured breast masses with different shapes.

For classification, two-class separation (normal and abnormal) and three-class study (normal, benign, and malignant cases) are carried out on the individual and combined input feature spaces by utilizing Support Vector Machines (SVM) and Extreme Learning Machines (ELM) with 10-fold cross validation. For retrieval, performances are evaluated and compared in different feature spaces in the benchmark DDSM dataset [26] using precision and recall curves

and that was one of the earliest researches on CBIR for mammograms. Linear discriminant analysis, logistic regression, and the Mahalanobis distance were used to evaluate the features for classifying the masses. Kinoshita et al. [22] used the breast density to retrieve images from a mammogram dataset available at the Clinical Hospital of the University of São Paulo at Ribeirão Preto, Brazil. Shape, texture features, moments, Radon transform, and histograms were used to describe breast masses, and the Kohonen self-organizing map (SOM) neural network was used for image retrieval. Wang et al. [27] has utilized histograms for the characterization of breast mass in a set of mammogram database at the Medical Center of Pittsburgh in order to automatically evaluate breast mass. They obtained 71% of correct classification rate with the use of a neural network. Muramatsu et al. [28] proposed a psychophysical similarity measure based on neural networks for evaluation of similar images with mammographic masses. The major drawback is that a large amount of data is required to train an artificial neural network (ANN). Oliveira et al. proposed a CBIR system called MammoSys; the novelty of this study is to present a two-dimensional principal component analysis (2DPCA) method [29] for the description of mass texture and thereby also a dimensionality reduction is performed. Wei et al. [30] proposed an adaptive classification scheme in the context of SVM assisted by content-based image retrieval to improve the classification accuracy in the computer aided diagnosis for breast cancer. A CBIR scheme is proposed in [31] that utilizes SVMs capable of optimally exploiting the distribution of input samples in the feature space on the basis of BI-RADS classifications of masses as carried out by the radiologists. In an article by Zhang [20], a number of CBIR-based CAD schemes for mammograms were compared and their performance were assessed and it was concluded that much research work is needed

A Decision Support System (DSS) for Breast Cancer Detection Based on Invariant Feature…

http://dx.doi.org/10.5772/intechopen.81119

15

before the CBIR-based CAD schemes can be accepted in the clinical practice.

The most challenging aspect in developing any CAD based systems for mammograms is to segment the suspicious masses, which are often hidden in dense breast tissues. Since a cancerous region might typically represented by local-oriented patterns, accurately segmenting it is an important first step for the effective performances of the successive feature extraction, similarity matching and classification steps in developing a CAD system as shown in **Figure 1**. A large number of segmentation methods have been proposed in the literature for the detection of breast masses, such as adaptive region growth [32], multi-layer topographic region growth algorithm [33], active contour (snake) modeling [34], level set algorithm [35], dynamic programming [36] etc. However, due to the limitation of benchmark evaluation and testing datasets to compare the performances, it is difficult to find the most robust and effec-

The breast anatomy has a complicated structure because of the presence of pectoral muscles and the different mass density. Although it is easy to analyze breast tissues without getting

**3. Breast mass detection**

tive method in this domain till now.

**3.1. Visual saliency based segmentation**

**Figure 1.** Dataflow diagram of the integrated decision support system (DSS).

obtained from comparison between the query and retrieved images. The system performance is compared with other state-of-the-art algorithms where experimental results indicate that the framework achieved a noticeable increase in recognition rates.

**Figure 1** shows the dataflow diagram of the proposed integrated decision support system based on image pre-processing, mass segmentation, feature extraction, classification, and retrieval.

The rest of the paper is organized as follows. Section 2 describes the related works, specially talk about the CAD-based CBIR systems for mammographic mass retrieval. The mass detection based on marker-controlled watershed segmentation and feature extraction are described in Sections 3 and 4 respectively. Our classification and similarity matching methods are described in Section 5, while all discussions on the obtained experimental results are given in Section 6. The last section comprises of conclusions.
