**4. Computer aided diagnosis for pathology**

This section focuses on the histopathological grading step in the breast diagnosis, the procedure used to grade a certain tissue by examining the tissue slide biopsy, which must undergo a preparation step prior to the grading.

#### **4.1. Tissue preparation**

Breast tissue biopsy is a piece of tumorous tissue taken from the breast to investigate the occurrence of cancer. After the biopsy is extracted, it is enclosed in a fixative to prevent it from decaying. Then, the tissue is sectioned into fragile slices (e.g., 2–15 μm) using a microtome machine, which creates very thin slices. The slices are then arranged on the glass slide before being stained. The tissue is stained using certain pigments to reveal the tissue components (e.g., lumen, nuclei, cytoplasm, and stroma). This helps the pathologist to view the individual tissue component more clearly. This procedure is called cells marker. The pathologists use different methods of staining depending on the diagnostic process at hand. Among the common staining types, Hematoxylin and Eosin combination H&E is the most popular for diagnosis and grading. After staining the tissue slide, the pathologist evaluates the tissue slide using the microscope as in UKMMC or through a digital scanner used to produce digital pathology images. In UKMMC, a specific type of microscope (Olympus BX50 microscope) is used for the diagnosis [16]. This microscope has a camera to capture images of the region of interest. The next subsection will explain the image acquisition steps involved in the creation of the prostate and breast cancer data sets required for this study. Subsequent subsections will present a brief overview of the devices required for the image acquisition and image acquisition flow.

## **4.2. Image acquisition devices**

the following sequence: extracting teacher images, training the rejection model, and testing the model. Credence can be given to the MCWS algorithm in surmounting the challenges associated with the Chan-Vase algorithm. The Chan-Vese algorithm can be made more autonomous

Without the rejection model 0.196 0.803 0.803 0.099 0.800 With the rejection Model (7 × 7 × 7 × 7) 0.058 0.941 0.938 0.031 0.933 With the rejection model (9 × 9 × 9 × 9) 0.051 0.948 0.944 0.028 0.940 With the rejection model (11 × 11 × 11 × 11) 0.044 0.955 0.950 0.025 0.946 With the rejection model (13 × 13 × 13 × 13) 0.040 0.959 0.954 0.023 0.950

**FP rate SP AC NRM OVERLAP**

Nevertheless, the reliance mammogram segmentation on the divergence and convergence of the intensity value of the image pixels is the constraint factor for this algorithm. The tendency has been toward segmenting the outlier component as part of the contour component, resulting in an incremental FP rate of the selected contour pixels. Accordingly, to overcome this issue, the SVM rejection model is geared toward reducing the FP rate. T-test was performed to determine the mean difference of two samples, that is, the accuracy before and after using rejection model with the best window size, which is (13 × 13). The T-test was applied to determine if there was a difference before and after applying the rejection model. The hypothesized mean difference of T-test was set to value 0, also named as null hypothesis. That means, assuming that there was no difference in the result whether using the rejection model. The alpha was set to value 0.05. The concept of T-test states that if the P value is less than the assumed alpha, the null hypothesis is not correct and there is a difference between the mean of the two samples. T-test result shows that the proposed method is considered statistically significant with (P = 0.00001 < 0.05). Furthermore, the proposed rejection models also showed less standard deviation (0.0001) and yields to stability in its performance. In general, this proposed method offers alternative decision-making ability and is able to assist the medical expert in giving second opinion on more

and converge faster by using a good initialization generated by MCWS.

precise nodule detection. Hence, it reduces FP rate that causes over segmentation.

This section focuses on the histopathological grading step in the breast diagnosis, the procedure used to grade a certain tissue by examining the tissue slide biopsy, which must undergo

Breast tissue biopsy is a piece of tumorous tissue taken from the breast to investigate the occurrence of cancer. After the biopsy is extracted, it is enclosed in a fixative to prevent

**4. Computer aided diagnosis for pathology**

a preparation step prior to the grading.

**4.1. Tissue preparation**

**Table 3.** Quantitative analysis.

68 Breast Cancer and Surgery

In this study, prostate histological images were captured from tissue slides. All the images were viewed using an Olympus BX50 microscope (Olympus Corporation, Japan), and images were captured using a DP72 digital camera (Olympus Corporation) and cellSens Life Science imaging software, version 1.6 (Olympus Corporation) [16]. The sensitivity of the illumination source and camera's intensity were kept constant. The microscopes were adjusted manually to form clear magnified images, and the cameras were controlled through desktop computers to capture color digital images. Before image acquisition, the pathologists in UKMMC had selected the ROIs under the microscope. However, this requires substantial time and effort from pathologists, and more importantly, a subjective choice of the ROIs could introduce biases into the database and harm the generalizability of the developed computer CAD system.

#### **4.3. Image acquisition work flow**

Prior to acquiring the images, the microscope components, such as the light condenser, diffusing screen, and objective lens, were properly cleaned to remove any dust in the light path, which might badly affect the clarity of the acquired image. The focal plane was adjusted manually for clear images and was readjusted before every new image was taken. A light condenser was used to increase the light intensity for high-resolution image acquisition. To acquire an image from an ROI, the pathologist in UKMMC first reviewed the tissue section at a low magnification (e.g., 1× or 4×) to locate the ROI at the center of the image's field of view [16]. Usually, fine tuning is needed at higher magnification (40× magnification) to ensure a region with a typical Gleason pattern in the ROI is selected. The focal plane was then adjusted to produce a sharp image, and the light intensity was tuned so that the largest pixel value was slightly lower than the upper limit of the pixel's dynamic range. When all those adjustments were satisfactory, a still image was captured and saved onto the desktop computer as a color RGB digital image with a (tiff) extension. This process was repeated for all images that were captured for breast pathologists.

#### **4.4. Self-collected data set from UKMMC**

This data set contains self-collected breast tissue region images stained using the H&E procedure and captured from tissue slides of needle biopsies taken from 32 breast carcinoma cases. These tissue region images were digitized at 40× magnification, yielding high resolution images (4140 × 3096 pixels) in (tiff) format. The diagnosis assigned to each region image is based on the Bloom–Richardson grading system [16]. Each image was annotated as low grade (Grade 1) or high grade (Grade 3) by three expert pathologists from the HUKM center [16]. The total number of collected images is 100. These can be classified into 56 low-grade cases and 54 high-grade cases. **Figure 5** shows some sample images taken from this data set.

### **4.5. Ensemble learning of tissue components for histopathology image grading**

This section explains the ensemble framework that we used for the classification of breast cancer and Gleason grading using the tissue components of the H&E histopathological region images. This project has been carried out from our previous work [16]. The framework is based on the ensemble learning approach from machine learning and medical tissue components (lumen, nuclei, cytoplasm, and stroma), both of which are of semantic meanings to pathologists. The framework extracts a set of textural features for each tissue component, which creates four independent sub data sets, and the diversity demonstrated by these data sets is then used to create an ensemble framework that is able to classify and grade breast cancer. Our framework consists of five phases: segmentation of four tissue components, feature extraction, feature selection, base classifiers of the framework, and ensemble fusion phase, as per **Figure 5**.

The typical CAD for breast cancer grading extracts features directly from histopathological images. Then, a single classifier is used to train these features to classify unknown patterns (e.g., image). Unlike this typical CAD, our project uses the concept ensemble learning (**Figures 5** and **6**).

Due to the diversity of the tissue components, four different training data sets are created for the corresponding tissue components (lumen, nuclei, cytoplasm, and stroma). Thus, the diversity of the tissue components in ensemble learning is utilized to improve prostate diagnosis and grading. In this project, the ensemble framework, consisting of four-base SVM (RBF) classifiers, where each base classifier is a specialist, is trained to use the selected features of a particular tissue component. The decision function of SVM (RBF) with the top selected

**Figure 6.** Two types of tissue classes of interest for the breast grading problem: (a) Grade 1 (low grade) tissue and (b)

(Ω) = sgn{(w . Ω)} = sgn{∝ (Ω,, Ω) + = 1}, (7)

where Ω is the test sample with only Ω corresponding features, Ω, *i* is that of sample in

*high Grade*)} is the class label of the training sample Ω, , and is the kernel function that is used to calculate the inner product between the Φ Ω, and Φ(Ω) in the transformed space

∈ {1 , 0 (low

Machine Learning Methods for Breast Cancer Diagnostic http://dx.doi.org/10.5772/intechopen.79446 71

features (Ω) in the training model is defined as per (Eq. (7)):

the training set ( = 1, 2, …, ) with only Ω features, y i

Grade 3 (high grade) tissue.

**Figure 5.** Ensemble framework for breast tissue image diagnosis and grading.

**4.4. Self-collected data set from UKMMC**

70 Breast Cancer and Surgery

(**Figures 5** and **6**).

This data set contains self-collected breast tissue region images stained using the H&E procedure and captured from tissue slides of needle biopsies taken from 32 breast carcinoma cases. These tissue region images were digitized at 40× magnification, yielding high resolution images (4140 × 3096 pixels) in (tiff) format. The diagnosis assigned to each region image is based on the Bloom–Richardson grading system [16]. Each image was annotated as low grade (Grade 1) or high grade (Grade 3) by three expert pathologists from the HUKM center [16]. The total number of collected images is 100. These can be classified into 56 low-grade cases and 54 high-grade cases. **Figure 5** shows some sample images taken from this data set.

This section explains the ensemble framework that we used for the classification of breast cancer and Gleason grading using the tissue components of the H&E histopathological region images. This project has been carried out from our previous work [16]. The framework is based on the ensemble learning approach from machine learning and medical tissue components (lumen, nuclei, cytoplasm, and stroma), both of which are of semantic meanings to pathologists. The framework extracts a set of textural features for each tissue component, which creates four independent sub data sets, and the diversity demonstrated by these data sets is then used to create an ensemble framework that is able to classify and grade breast cancer. Our framework consists of five phases: segmentation of four tissue components, feature extraction, feature selection, base classifiers of the framework, and ensemble fusion phase, as per **Figure 5**. The typical CAD for breast cancer grading extracts features directly from histopathological images. Then, a single classifier is used to train these features to classify unknown patterns (e.g., image). Unlike this typical CAD, our project uses the concept ensemble learning

**4.5. Ensemble learning of tissue components for histopathology image grading**

**Figure 5.** Ensemble framework for breast tissue image diagnosis and grading.

**Figure 6.** Two types of tissue classes of interest for the breast grading problem: (a) Grade 1 (low grade) tissue and (b) Grade 3 (high grade) tissue.

Due to the diversity of the tissue components, four different training data sets are created for the corresponding tissue components (lumen, nuclei, cytoplasm, and stroma). Thus, the diversity of the tissue components in ensemble learning is utilized to improve prostate diagnosis and grading. In this project, the ensemble framework, consisting of four-base SVM (RBF) classifiers, where each base classifier is a specialist, is trained to use the selected features of a particular tissue component. The decision function of SVM (RBF) with the top selected features (Ω) in the training model is defined as per (Eq. (7)):

$$\langle \mathbf{x}\Omega \rangle = \text{sgn}\left\{ \langle \mathbf{w} \cdot \mathbf{x}\Omega \rangle \right\} = \text{sgn}\left\{ \text{azi } \text{yi } k\langle \mathbf{x}\Omega \beta, \mathbf{x}\Omega \rangle + b \text{ ni } = 1 \right\},\tag{7}$$

where Ω is the test sample with only Ω corresponding features, Ω, *i* is that of sample in the training set ( = 1, 2, …, ) with only Ω features, y i ∈ {1 , 0 (low *high Grade*)} is the class label of the training sample Ω, , and is the kernel function that is used to calculate the inner product between the Φ Ω, and Φ(Ω) in the transformed space using nonlinear mapping Φ. The product rule Eq. (8) is utilized to produce the final decision for the proposed ensemble framework to combine the prediction outputs of all four base classifiers. The product rule is preferred in the ensemble when the single classifiers posterior probabilities are correctly estimated [16]. The final prediction () for the test image () based on product rule is computed using (Eq. (8))

$$\text{class}(\mathbf{x}) = \max\_{\mu \in \mathbf{I}} \text{col}\_{\mu \mathbf{I}}^{\alpha\_2} \prod\_{\mathbb{H}} p\_\uparrow^{\iota}(\mathbf{x}) \tag{8}$$

validated using prostate and colon data set. Results proved that the ensemble framework can be utilized with other types of histopathology images if the main tissue components are

Machine Learning Methods for Breast Cancer Diagnostic http://dx.doi.org/10.5772/intechopen.79446 73

This chapter discusses how machine learning, particularly SVM can improve the performance for detection and diagnosing of breast cancer. SVM for now is one of the most powerful machine learning techniques that is able to model the human understanding of classifying data. It can find the relationship between data and segregates them accordingly. Using pixel values in mammogram images, SVM helps to improve the mass detection and segmentation of Chan-Vese algorithms by classifying correctly the false positive pixels. As a result, a sharper mass was detected with better estimation of its shapes and sizes. Hence, radiologist can give better diagnosis and biopsy location. Then, images of cell structure or tissue textures from the biopsy sample were examine by the pathologist. These pathology slides were analyzed under the pathologist sharp eyes to locate and identify any abnormal pattern of tissue texture or architecture. The process is tiring and subjective to the pathologist experience in interpreting the tissue condition. Thus, inter-observer and intra-observer variations exist. However, the proposed SVM algorithm can identify the different tissue component and model the pattern of relationship between these components spatially and statistically. The model is then used to grade any new pathology slides into its modified Bloom-Richardson grading, according to what the SVMs have learned from previous examples. Using the technique, it helps the radiologist and pathologist reducing their work load by automating the automation for decision making, especially for common and mundane cases. Radiologist and pathologist would have more time to spend on special or rare cases. The learning curve for young apprentice can

visible in the image [7].

**5. Discussion and conclusion**

**Figure 7.** Single vs. ensemble classification results for low vs. high grade.

#### **4.6. Results and evaluation**

In the ensemble framework, the stages of feature selection and classification are executed 50 times for each classification task. In each run, the data set of each base classifier (i.e., tissue component) is randomly divided into 50% training and 50% testing) after normalizing, as per [16]. It should be pointed out that in each run of the ensemble framework, similar numbers of selected features are used with all base classifiers. The base classifiers utilize the SVM with Radial-Basis-Function (RBF) kernel, while the SVM-RFE utilizes the linear SVM. To deploy RBF, one needs to set an appropriate value of the cost penalty, c, and gamma, *γ*. The grid search tool is one of the most common methods to identify suitable values for c and γ [1, 16]. The SVM implementation is utilized by the LibSVM toolbox [1, 16], while the *C* and *γ* in the SVM are estimated using a grid search with different internal threefold cross-validations on the training data set only from {2–20, 220}. In this data set, the low vs. high grades classification task is dealt with, which is the most well-known task in state-of-the-art breast cancer analyses [1]. The results reported by this data set are shown in **Table 4**. As shown in **Table 4**, the proposed ensemble framework can effectively classify the low vs. high grades breast images. The AUC of low vs. high grade reached an average of 90.7%, which was greater than both the naïve and typical CAD. Moreover, when comparing the structure-method, the proposed method was far more superior. In using the proposed ensemble CAD, classification performance in the context of AUC can be substantially improved by 15% for the structurebased method. The results in **Figure 5** show that the ensemble framework was significantly quite accurate (90.8%) compared to the accuracy of each individual tissue components in the low vs. high grades in breast histopathology images. This framework has also been


**Table 4.** The performance of the proposed ensemble framework on breast histopathology images data set.

validated using prostate and colon data set. Results proved that the ensemble framework can be utilized with other types of histopathology images if the main tissue components are visible in the image [7].
