**1. Introduction**

Breast cancer is one of the most dangerous and common reproductive cancers that affect mostly women. The oldest documented cases of breast cancer were in Egypt in 3000 BC [1]. Breast tumor is an abnormal growth of tissues in the breast, and it may be felt as a lump or nipple discharge or change of skin texture around the nipple region. Cancers are abnormal cells that divide uncontrollably and are able to invade other tissues. Cancer cells have the ability to spread to other parts of the body through the blood and lymphatic systems [1]. It is the leading cause of death among middle aged and older women [1]. According to cancer statistics, breast cancer is the second most common and the leading cause of cancer deaths among women, second only to lung cancer [1]. Around 1 in 36 (3%) women dies due to breast cancer [2]. It has become a major health issue in the past 50 years, and its incidence has increased in recent years [1]; in Malaysia, breast cancer is the most frequent type of cancer among women. It has an incidence rate of about 26% (more than 4400 women) among cancer affecting women. Around 40% of the women who suffered from breast cancer in Malaysia have died (IARC). Hence, determining the right decision from a right diagnosis is crucial.

breast cancer treatment options and the survival rate. However, mammography is not perfect. Detection of suspicious abnormalities is a repetitive and fatiguing task. For every thousand cases analyzed by a radiologist, only three to four are cancerous, and thus an abnormality may be overlooked. As a result, radiologists fail to detect 10–30% of cancers. Approximately two thirds of these false-negative results are due to missed lesions that are evident retrospectively. Due to the considerable amount of overlap in the appearance of malignant and benign abnormalities, mammography has a positive predictive value (PPV) of less than 35%, where the PPV is defined as the percentage of lesions subjected to biopsy that were found to be cancer. Thus, a high proportion of biopsies are performed on benign lesions. Avoiding benign biopsies would spare women anxiety, discomfort, and expense [7]. As mentioned earlier, with the advent of personalized medicine, the process becomes more complex. Not only that, the emerging of 4th Industrial Revolution (4IR) technology allowed huge amount of data to be captured, and this contributes to the complexity of the radiology and pathology workload. To address these challenges, many researchers are leveraging artificial intelligence to improve medical diagnostics. Machine learning is a sub discipline in the field of artificial intelligence (AI) that explores the

Machine Learning Methods for Breast Cancer Diagnostic http://dx.doi.org/10.5772/intechopen.79446 59

ML comprises a broad class of statistical analysis algorithms that iteratively improve in response to training data to build models for autonomous predictions. In other words, computer program performance improves automatically with experience [9]. ML algorithm's aim is to develop a mathematical model that fits the data. It comprises of two types of learning which are supervised and unsupervised. Supervised learning algorithm required the data to be labeled for training purposes. For example, in training a set of medical images to identify a specific breast tumor type, the label would be tumor pathologic results or genomic information. These labels, also known as ground truth, can be as specific or general as needed to answer the question. The ML algorithm is exposed to enough of these labeled data to allow them to move into a model designed to answer the question of interest. Because of the large number of well-labeled images required to train models, curating these data sets is often laborious and expensive [10]. Unsupervised ML clusters the data that have similar characteristics, and the unlabeled data are exposed to the algorithm with the goal of generating labels that will meaningfully organize the data. This is typically done by identifying useful clusters of data based on one or more dimensions. Compared with supervised techniques, unsupervised learning sometimes requires much larger training data sets. Unsupervised learning is useful in identifying meaningful clustering labels that can then be used in supervised training to develop a useful ML algorithm. This blend

ML algorithms are to analyze any data set to extract data-driven model, prediction rule, or decision rule from the data set. Generally, in order to ensure the ML behave intelligently without human intervention, the system learns or extracts knowledge such as rules or patterns from a collection of input data or past experience. So the steps involved can be described as firstly, the system must acquire features from data. Elaboration of features is well explained in

study and design of algorithms that can learn from data [8].

of supervised and unsupervised learning is known as semi-supervised.

**2. Machine learning**

In today's world with the advent of personalized medicine, it increases the workload and complexity of the doctors in cancer diagnosis. Radiologic and pathology are the key players in making decision for cancer diagnosis. Based on the radiology diagnosis, the results will be submitted to pathology for further diagnosis. Pathology and radiology form the core of cancer diagnosis, yet based on our observation at our studied hospital and under current process of diagnostic medicine, the communication among them remained on papers. That paper contains their respective report of the case on the same patient. This scenario is in parallel with what James et al. [3] had highlighted in their paper. The working flows of both specialties remain ad hoc and occur in separate "silos," with no direct linkage between their case accessioning and/or reporting systems, even when both departments belong to the same host institution. Since both radiologists' and pathologists' data are essential to make correct diagnoses and appropriate patient management and treatment decisions, the isolation of radiology and pathology work flows can be detrimental to the quality and outcomes of patient care. These detrimental effects underscore the need for pathology and radiology work flow integration and for systems that facilitate the synthesis of all data produced by both specialties. With the enormous technological advances currently occurring in both fields, the opportunity has emerged to develop an integrated diagnostic reporting system that supports both specialties and, therefore, improves the overall quality of patient care. In this chapter, we are focusing on breast cancer diagnostic for data collected from UKMMC. Hence, breast radio-pathological correlation is essential. The covered topics would include radio-pathological correlation with recent imaging advances such as machine learning with use of technical methods such as mammography and histopathology.

As a standard, the current diagnostic screening consists of a mammography to identify suspicious regions of the breast, followed by a biopsy of potentially cancerous areas. A breast biopsy is a diagnostic procedure that can determine if the suspicious area is malignant or benign [4–6]. Although criteria for diagnostic categories of radiologic and pathology are well established, manually detection and grading respectively is a tedious and subjective process and thus suffers from inter-observer and intra-observer variations. Early detection via mammography increases breast cancer treatment options and the survival rate. However, mammography is not perfect. Detection of suspicious abnormalities is a repetitive and fatiguing task. For every thousand cases analyzed by a radiologist, only three to four are cancerous, and thus an abnormality may be overlooked. As a result, radiologists fail to detect 10–30% of cancers. Approximately two thirds of these false-negative results are due to missed lesions that are evident retrospectively. Due to the considerable amount of overlap in the appearance of malignant and benign abnormalities, mammography has a positive predictive value (PPV) of less than 35%, where the PPV is defined as the percentage of lesions subjected to biopsy that were found to be cancer. Thus, a high proportion of biopsies are performed on benign lesions. Avoiding benign biopsies would spare women anxiety, discomfort, and expense [7]. As mentioned earlier, with the advent of personalized medicine, the process becomes more complex. Not only that, the emerging of 4th Industrial Revolution (4IR) technology allowed huge amount of data to be captured, and this contributes to the complexity of the radiology and pathology workload. To address these challenges, many researchers are leveraging artificial intelligence to improve medical diagnostics. Machine learning is a sub discipline in the field of artificial intelligence (AI) that explores the study and design of algorithms that can learn from data [8].
