2.1.2. Preprocessing: artifact removing and segmentation process

2.1.1. Data base: mammogram images

166 Advanced Applications for Artificial Neural Networks

GRANN capable to diagnose BC as an aid for radiologists.

and 1 man), aged between 27 and 92 years.

knowledge to other countries and their own features.

The development of CAD/CADx systems involves their generation and validation by using mammograms obtained from clinical studies, or using public databases. However, at the global level, there are few public databases available to the scientific community to investigate. As mentioned before, currently, there are no public databases of BC in Latin America or Mexico for conducting this kind of studies. Therefore, at first stage, different public mammography databases were used for developing and validating digital image processing algorithms capable to select ROIs from mammograms and to extract image features used to train a

Figure 2. First stage of research. Mammographic feature extraction and artificial neural network training.

Currently, there are only three public databases to conduct this type of research: the first one is the Digital Database for Screening Mammography (DDSM) that has a total of 2620 cases distributed in 625 normal, 1011 benign and 914 malignant and includes the two standardized views CC and MLO. Another available database is the Mammographic Image Analysis Society - Digital Mammogram Data Base (mini-MIAS) that has 322 cases; however, it only has the

A newly created database is the Breast Cancer Digital Repository (BCDR) from Europe. The creation of BDR is supported by the IMED Project (Development of Algorithms for Medical Image Analysis). The IMED project was created by INEGI, FMUP-CHSJ University of Porto, Portugal and CIEMAT, Spain from 2009 to 2013. This database has 724 patients (723 women

In the first stage of this research, the mini-MIAS, DDSM, and BCDR databases were used to generate and validate the development of a biomarker, an artificial neural network approach with incremental learning and with both, the design of a CADx methodology, carried out in a general scope. However, it is important to highlight that these databases are formed by patients with ethnic characteristics typical of their region, which makes it difficult to transfer

Moreover, as has been shown by the scientific community, BC varies widely between different etiologies and may prove that systems created for a population may not work for a different

MLO view. DDSM and mini-MIAS databases are both form North America.

A mammogram image can be considered as a representation of the X-ray radiation density that reflects the tissue of the breast. A risk factor for BC in a patient is recognized when a white region appears on the mammogram image, which means a high tissue density, that may be considered abnormal. A breast abnormality is commonly called ROI. According to DIP techniques, segmentation of breast abnormalities on mammograms is a crucial step in CAD systems. It is a difficult task, since these types of medical images are in low-intensity contrast, making it difficult to identify the edges of a suspicious mass.

In the methodology used in this research, only lateral mammographic images taken from BCDR database were used. In all selected images, a lesion exists, which is considered as benign or malignant; digital mammographic images of BCDR database can be accessed in two forms: the first one from films (photographic films) and the second one from digital images taken from X-ray system (mammography images). Films' images require the design of digital image processing algorithms to eliminate artifacts such as red pixels and prenoise such as labels used by radiologists to identify left or right breast as well as patient identification information. Conversely, digital mammography images only require the design of algorithms for removing red pixels.

The films approach improves digital mammography images increasing the high frequency and eliminating the noise and unwanted artifacts in the ROI. As can be appreciated in Figure 3, at preprocessing stage, a computer tool was designed to automate the preprocessing of film digital mammographic images (FDMIs). All FDMIs are treated to eliminate image artifacts such as background, noise, and image labels. In the FDMIs, a common threshold was applied to create a region of the breast and other regions with the labels and artifacts on the mammography.

Figure 3. Preprocessing of a mammographic image. (a) Original image, (b) breast region, highlighting labels and background noise, (c) clean breast binary image, and (d) mammography image cleaned.

Using the designed automated computer tool, after creating the logical image, all small regions (less than 10,000 pixels) are eliminated to remove the regions considered as unnecessary in FDMIs. Then, the logical image (mask) is used instead of the original image to obtain an image of the breast without artifacts and labels. Figure 3 shows the preprocessing method to remove noise and labels in a digital mammography image.

Converting a greyscale to a digital or logical image is a common task of digital image processing. There are many methods for calculating the threshold value for creating logical images. As can be appreciated in Figure 4, in this research, the threshold was calculated by converting the nonzero pixel's values to 1. To create a logical image that contains the ROI and the pectoral muscle, the gray tones were converted to white level.

For removing the pectoral region in the logical image, as is showed in Figure 4, the white region that is connected to the border of the binary image was eliminated. Therefore, the surplus white region represents the ROI detected in the mammography image. With a cleaned image, the next step is the segmentation process.

On the other hand, the digital mammographic approach works as follows: to calculate the descriptors, the ROI of the lesion is manually segmented by the expert radiologist as can be appreciated in Figure 5. At preprocessing stage, the image is fitted for the segmentation. The pixels in red are turned into black. Using the black pixels, the ROI is separated from the rest of the breast image for making a segmentation of the ROI as is showed in Figure 6.

For the segmentation of the ROI, a binary or logical image with a very high binarization threshold is created where low gray levels become white. This approach considers most of

Figure 4. Pectoral region removing process.

Figure 5. Preprocessing stage of digital mammogram approach.

Using the designed automated computer tool, after creating the logical image, all small regions (less than 10,000 pixels) are eliminated to remove the regions considered as unnecessary in FDMIs. Then, the logical image (mask) is used instead of the original image to obtain an image of the breast without artifacts and labels. Figure 3 shows the preprocessing method to remove

Figure 3. Preprocessing of a mammographic image. (a) Original image, (b) breast region, highlighting labels and back-

Converting a greyscale to a digital or logical image is a common task of digital image processing. There are many methods for calculating the threshold value for creating logical images. As can be appreciated in Figure 4, in this research, the threshold was calculated by converting the nonzero pixel's values to 1. To create a logical image that contains the ROI and

For removing the pectoral region in the logical image, as is showed in Figure 4, the white region that is connected to the border of the binary image was eliminated. Therefore, the surplus white region represents the ROI detected in the mammography image. With a cleaned

On the other hand, the digital mammographic approach works as follows: to calculate the descriptors, the ROI of the lesion is manually segmented by the expert radiologist as can be appreciated in Figure 5. At preprocessing stage, the image is fitted for the segmentation. The pixels in red are turned into black. Using the black pixels, the ROI is separated from the rest of

For the segmentation of the ROI, a binary or logical image with a very high binarization threshold is created where low gray levels become white. This approach considers most of

the breast image for making a segmentation of the ROI as is showed in Figure 6.

noise and labels in a digital mammography image.

168 Advanced Applications for Artificial Neural Networks

image, the next step is the segmentation process.

Figure 4. Pectoral region removing process.

the pectoral muscle, the gray tones were converted to white level.

ground noise, (c) clean breast binary image, and (d) mammography image cleaned.

Figure 6. Binary mask and ROI in tones of grays.

gray pixels of the image looking not to lose many pixels from ROI. Afterward, the white logical region that is connected to the edge of the mammographic image is removed. Some white pixels pertaining to the contour of the breast are discarded when the pixels in the image with a small area are removed. Finally, the white region with a greater number of pixels is extracted, which would be considered as the neoplasia.

Next, a binary mask is created using the ROI obtained in the segmentation stage. With the mask and together with the complete image in shades of gray, we will get the ROI in shades of gray as is showed in Figure 6.

The next step in the operation of regular CAD systems is the feature extraction of the RIO. The feature extraction can be defined as the process to infer and quantify the parameters that characterize the object being studied. The feature extraction contributes to the analysis of the ROI. It is possible to quantify the shape, texture, size, border, and other tissue parameters that can contribute to the diagnosis and detection of a cancer risk factor. As is showed in Figure 7, in this work, shape, intensity, and texture features were extracted in order to create a biomarker for BCD using a CADx system that uses AI technology.

The image features of all Digital Image Mammography (DIM) of BCDR database were extracted and used to build a biomarker to train an ANN. The BCDR digital images are in RGB and gray-level digitalized in JEPG format with a depth of 24 and 8 bits per pixel, respectively, and a resolution of 3328 4084. The RGB mammograms are used to show the red remarked section by a radiologist to delimit the found anomaly.

Figure 7. Image features extracted.

The segmentation process use the red section remarked in the RGB mediolateral oblique view mammograms to obtain the ROI. In the RGB mammograms, all the red pixels in the image and the pixels outside the original red region were eliminated. Finally, the remained pixels in the gray-level mammogram were used to get the ROI used for the features' calculation.

Using a custom-designed automated computer tool, a total of 361 images (36 malignant and 325 with benign abnormalities) from 239 patients were segmented in order to get the RIOs. This information was used as the entrance data for training and testing a GRANN. This computer tool saves a lot of time in the preprocessing stage of mammography analysis. This tool will be used at second and third stages of research to analyze the mammograms of Mexican patients in order to create a Mexican biomarker and a CADx system for BCD.

Feature extraction of digital images, such as obtained in digital mammograms, as is showed in Figure 7, is a manner to represent an element or an ROI in an image like a fingerprint, and these features are used in many research areas such as machine learning, patter recognition, image processing, or diagnosis of disease in medical science. Feature extraction is a crucial task before classifying an ROI or a pattern in an image.
