2.1.1. Data base: mammogram images

The development of CAD/CADx systems involves their generation and validation by using mammograms obtained from clinical studies, or using public databases. However, at the global level, there are few public databases available to the scientific community to investigate. As mentioned before, currently, there are no public databases of BC in Latin America or Mexico for conducting this kind of studies. Therefore, at first stage, different public mammography databases were used for developing and validating digital image processing algorithms capable to select ROIs from mammograms and to extract image features used to train a GRANN capable to diagnose BC as an aid for radiologists.

Currently, there are only three public databases to conduct this type of research: the first one is the Digital Database for Screening Mammography (DDSM) that has a total of 2620 cases distributed in 625 normal, 1011 benign and 914 malignant and includes the two standardized views CC and MLO. Another available database is the Mammographic Image Analysis Society - Digital Mammogram Data Base (mini-MIAS) that has 322 cases; however, it only has the MLO view. DDSM and mini-MIAS databases are both form North America.

A newly created database is the Breast Cancer Digital Repository (BCDR) from Europe. The creation of BDR is supported by the IMED Project (Development of Algorithms for Medical Image Analysis). The IMED project was created by INEGI, FMUP-CHSJ University of Porto, Portugal and CIEMAT, Spain from 2009 to 2013. This database has 724 patients (723 women and 1 man), aged between 27 and 92 years.

In the first stage of this research, the mini-MIAS, DDSM, and BCDR databases were used to generate and validate the development of a biomarker, an artificial neural network approach with incremental learning and with both, the design of a CADx methodology, carried out in a general scope. However, it is important to highlight that these databases are formed by patients with ethnic characteristics typical of their region, which makes it difficult to transfer knowledge to other countries and their own features.

Moreover, as has been shown by the scientific community, BC varies widely between different etiologies and may prove that systems created for a population may not work for a different population in the way they were thought. This is further aggravated by different types of diets, customs, and lifestyle. Due to this, the development of biomarkers and CAD systems for the Mexican population needs an adaptation to the characteristics of our population. In second and third stages of this research, the designed methodology will be focused and refined for its operation in Mexican patients.

In this work, results obtained with BCDR database are presented. BCDR database contains useful information of each mammogram such as gender: masculine or feminine; segmentations of mammogram, marked in red pixels the ROI that contains the lesion found by the radiologist; patient ID; the age of the patient; breast density, i.e., the percentage of breast density according to Breast Imaging Reporting and Data System (BI-RADS) standard expressed as percentage of glandular and fibrous tissue; breast localization, depending on the location of breast of the RIO with the lesions; mammography, the type of lesion found by the mammographic image expert; biopsy result, anatomical pathology of the biopsy; categorization of the definitive diagnosis; the BI-RADS classification of the lesion; and finally, intensity and shape descriptors of ROI. However, it is important to mention that for this research, these descriptors were not used to train the neural network. Instead, a set of computer algorithms were designed in order to extract image descriptors of ROI of mammograms as described in later section.
