**2.1 Data collection**

The first phase involved collection of data from Dataset collected from Global Hospital, Perumbakkam, Chennai. In a span of 3 weeks, images were collected from the biopsies of 3 patients. The three types of cancerous images obtained during the data collection phase are well-differentiated, moderately differentiated and poorly differentiated. The total number of images collected is 687 whose split up is given in **Table 1**.

Below are some images from the dataset collected, **Figures 1**–**4**.

**Figure 1.** *Non cancerous image.*

**Figure 2.** *Well differentiated cancer.*

**Figure 3.** *Moderately differentiated cancer.*

**Figure 4.** *Poorly differentiated cancer.*

### **2.2 Color normalization**

The features of the nuclei include the texture, size and roundness. Applying a stain on these biopsies cause the nuclei to be highlighted due to absorption of the stain. The color difference between the nuclei and the tissues may be visually comparable or less different. Hence, color normalization is done to highlight the nuclei. Highlighting the nuclei makes it easier to extract the features from them.

*Classification of Hepatocellular Carcinoma Using Machine Learning DOI: http://dx.doi.org/10.5772/intechopen.99841*

**Figure 5.** *Normalized non cancerous image.*

**Figure 6.** *(a), (b) normalized cancerous images.*

The normalization method [3] is exclusive to H and E stain. Normalized images are shown below (**Figures 5** and **6**).
