4. Quantification of cytology using machine learning

As a result, histology and cytology are major diagnostic tools: however, their current prognostic potential is limited, as the majority of genetic events do not have known, defining morphological characteristics [5, 10]. Thanks to emerging computer technologies, a pathologist's qualitative decision can be supported by an automated quantitative decision tool. Morphometrics of the pathological slides can both provide new diagnostic information not visible to the naked eye and improve the prognostic ability of histological-cytological analyses [15, 16].

Statistical analysis of cells and cellular features can guide a pathologist's diagnosis of leukemia. The number of RBCs, WBCs, and platelets, the proportion of specific immature and mature cells, and more detailed morphological features recognized by automated WSI can all help direct a diagnosis. Digital image analysis of BM biopsy has been applied to study relapse in AML [17, 18]. The focus of most prior studies has been recognition of the acute leukemia FAB subtypes or differentiation of acute and chronic leukemia from the PB and BM smears. While PB and BM smears can differ in the type and maturation of their cells, the quantitative process to recognize the cell types and extract morphometric information is similar. We

Following steps a pathologist would take, computer-based digital pathology aims to detect, localize, and recognize the specific cell type under study (Figure 2). An acquired image is preprocessed through image enhancement steps; then, cells are detected, and cell boundaries

Processing steps to automate the analysis of leukemia smear images are shown in Figure 2. A whole slide image (WSI) of a Wright-Giemsa-stained bone marrow smear is shown in Figure 2a. This is a typical smear image annotated by a pathologist. The Wright-Giemsastained image reveals RBCs as smaller pink cells without nuclei (either distributed as single cells or clustered) and a few WBCs of different sizes with dark purple nuclei. Notably, image acquisition techniques, staining methods, and digitization protocols can differ in each laboratory. Furthermore, environmental effects can introduce artifacts and degrade the quality of the image. Image preprocessing steps can improve the image quality and correct for differences in protocols and in illumination. Methods of image correction techniques include illumination

Figure 2. Image processing pipeline (a–e: Image from Carlos Bueso-Ramos, MD, PhD, MD Anderson Cancer Center).

are traced out using segmentation algorithms for morphometric analysis.

3. Digital cytology analysis

98 Hematology - Latest Research and Clinical Advances

describe it below:

To computationally classify tissue types from smear images, identified cells and tissues in the images have to be transformed into a vector of features. Conventional machine learning algorithms typically utilize a domain-specific approach to classify cell and tissue types based on a series of handcrafted features. These algorithms extract metrics from images based on a human engineering process that requires domain knowledge [38, 39].

Features of the smear sample can be extracted from an individual cell in the image or across the entire slide. Once a WBC is segmented within the image, features are extracted either from the whole WBC or separately from the nucleus and cytoplasm. The major discriminating cellular characteristics to classify WBCs are (a) geometric features such as shape (e.g., roundness) and size (e.g., nucleus-cytoplasm size ratio); (b) color features; (c) texture features such as density, granularity, and Fourier descriptors for texture quantification calculated by the twodimensional Fourier transform; and (d) irregularity or boundary roughness measured by fractal dimension [10, 23, 33, 35, 40–49]. Although the analysis at the single cell level provides useful information, it is not sufficient for the diagnosis of a very heterogeneous disorder such as leukemia. In addition to single cell data, characteristics of multicellular groups need to be studied [1]. New studies have extended cell-based morphometric analysis to distinguish major leukemia types and subtypes (Table 1).

The common characteristics in these studies are general steps of the image processing pipeline: preprocessing, segmentation, feature engineering, and supervised classification (Table 1). They discriminate cancerous vs. healthy tissue, AML vs. ALL, CL vs. AL, or AML and ALL subtypes. The main differences across the various studies are the choice of the specific engineered features and the choice of the classification method as illustrated below.

Most of the digital pathology studies of leukemia analyze PB. A healthy blood smear is distinguished from a leukemic smear if one or more immature cells are present. This can be determined from the nucleus structure or from whole cell characteristics. Discriminating features that classify healthy tissue, AML and ALL in the PB are extracted from the cell nucleus. BM is more heterogeneous than PB, and features of BM images are extracted from the whole cells or separately from the nuclei and the cytoplasm. Commonly used features include texture-based metrics and morphology. Texture is based on the spatial variation of the gray-level pixel intensities which can be characterized by their homogeneity, energy, and correlation, among other metrics represented in the gray-level co-occurrence matrix (GLCM). Shape is based on geometrical parameters such as area, perimeter, compactness, minor axis,

Reference Leukemia type Extracted features Classification

PB: ALL vs Healthy Nucleus & cytoplasm (Binary gray):

PB: AML vs ALL Cell & cytoplasm & nucleus:

PB: AML vs ALL vs Healthy Nucleus:

PB: ALL vs AML Cell:

BM: ALL vs AML, L1 vs L2, M2 vs (M3+M5), M3 vs (M2+M5), M5 vs (M2+M3), M1 vs M3 vs M5, L1 vs L2 vs M1 vs vs M3 vs

M6

PB: CLL vs CML Nucleus: Roundness + Count Not applied

HD dimension

Shape: Area, Parameter, Compactness, Minor and Major Axis, Eccentricity, FormFactor, Elongation, Solidity Texture: GLCM: Homogeneity, Energy, Contrast, Correlation Fractal:

Shape: Size: Area, Radius, Perimeter Second order central moment Color: std and mean variance of Red, Green, Blue and intensities of RGB color

Shape/Texture: Area, Total White Cells, Total Black Pixels, Perimeter, Eccentricity, Solidity, Form Factor, Bounding Box

Shape: Area, Nucleus/ Cytoplasm Size Ratio Cell & Nucleus: Shape: Perimeter

grayscale Cytoplasm:

grayscale

Shape: Area

Nucleus & Cytoplasm: Color: mean intensity of R,G,B and Hue, Saturation, Lightness components Texture: Wavelet

PB: ALL vs Healthy Nucleus & cytoplasm& cell:

Shape: Area Pixel Intensity Statistics: Mode, Mean, Standard deviation, Variance Eigenvalues (PCA) - R,G,B,

Cell & nucleus: Shape: Area, Perimeter, Circularity, Width, Height, Elongation, Major and Minor Axis, Eccentricity, Extension, Diameter, Euler number, Convex number, Solidity Pixel intensity Statistics: Mode, Mean, Standard deviation, Variance, IOD, avg. IOD Texture: Entropy, Contrast, Correlation, Energy, Homogeneity Eigenvalues (PCA) - R,G,B,

Support Vector Machine (SVM)

101

Quantitative-Morphological and Cytological Analyses in Leukemia

http://dx.doi.org/10.5772/intechopen.73675

k-Nearest-Neighbor (kNN)

k-Nearest-Neighbor (kNN)

Hybrid Multilayer Perceptron Neural Network (HMLP NN)

Ensemble Particle Swarm Model Selection (ESPMS)

Ensemble of Classifiers (EOC), Naive Bayesian (NB), K-nearest neighbor (KNN), Multilayer Perceptron (MLP NN), Radial Basis Functional Network

Vaghela et al. 2016

Jacob et al. 2016

Supardi et al. 2012

Gumble et al. 2017

Harun et al. 2011

Escalante et al. 2012 [59]

Mohapatra et al. 2014 [52]

[62]

[50]

[55]

[51]

[56]



gray-level pixel intensities which can be characterized by their homogeneity, energy, and correlation, among other metrics represented in the gray-level co-occurrence matrix (GLCM). Shape is based on geometrical parameters such as area, perimeter, compactness, minor axis,

> Cell + nucleus + cytoplasm: Shape: area, perimeter, circularity, width, Length, elongation, major axis, minor axis, eccentricity, extent, equivalent diameter, Euler number, Convex area Size Ratio: Nucleus/Cytoplasm area, Nucleus/Cell area and perimeter Color/Pixel Intensity Statistics: Mode, Mean, Standard deviation, Variance, Sum Texture: Homogeneity, Contrast, Correlation, Energy, Entropy 10 Eigenvalues (PCA) of R, G, B channel of RGB image

k-nearest neighbor (kNN), Random Forest (RF), Simple Logistic (SL), Support Vector Machines (SMV), Random Committee (RC)

Support vector machine (SVM)

Support vector machine (SVM)

Support Vector Machine (SVM)

and of gray image

Nucleus: Shape: Area, Permieter, Elongation, Major and Minor axis, Solidity, Eccentricity, Form Factor, Compactness, Size Ratio: Nuclues/Cytoplasm Color: Mean, Standard deviation, Variance Texture: Energy, Entropy, Contrast, Correlation, Homogeneity Fractal: Hausdorff dimension (HD) Nucleus boundary Irregularity

Texture: Gray-Level Co-occurrence Matrix: Contrast, Homogeneity, Energy,

Entropy, Correlation Image Slide: Fractal: HD dimension

Shape: Area, Parameter, Compactness, Minor and Major Axis, Eccentricity, FormFactor, Elongation, Solidity Color: Standard deviation, Mean, Energy Texture: GLCM: Homogeneity, Contrast, Correlation Fractal: HD

dimension Image Slide: Fractal: HD dimension

Reference Leukemia type Extracted features Classification

Reta et al. 2015 [58] BM (Bone marrow): AML vs

vs L2

100 Hematology - Latest Research and Clinical Advances

Kazemi et al. 2016

Madhukar et al. 2012 [54]

Agaian et al. 2014

[53]

[57]

ALL, M2 vs M3 vs M5 vs vs L1

PB (Peripheral blood): AML vs Healthy, M2 vs M3 vs M4 vs vs

PB: AML vs Healthy Nucleus:

PB: AML vs Healthy Nucleus:

M5 vs (M1+M6+M7)


major axis, eccentricity, form factor, elongation, and solidity. Fractal or Hausdorff dimension

Quantitative-Morphological and Cytological Analyses in Leukemia

http://dx.doi.org/10.5772/intechopen.73675

103

To provide examples of digital pathology's impact in leukemia classification, we summarize here a few of the recent studies. In one study, ALL cells were distinguished from healthy PB cells from shape and texture features extracted from the nucleus and cytoplasm (Gumble and Rode). These features included area, total white blood cells, total black pixels, perimeter, eccentricity, solidity, form factor, and bounding box parameters [51]. In another study, Mohapatra et al. added color and the Fourier descriptor as a cell-based nuclear feature to the shape, fractal, and

What literally do these features mean? In the Mohapatra et al. study, color features of a cell were calculated from the mean intensity of the nucleus color components in RGB or HSV color space and from a grayscale intensity map. In the case of RGB images, the mean intensity of the red, green, and blue channels and, in the case of HSV images, the mean intensity of the hue, saturation, and lightness components were computed. The same color features were calculated for the cytoplasm. The Fourier descriptors were the mean, variance, skewness, and kurtosis of the texture in the frequency domain. The fractal/HD of the nucleus boundary roughness was considered, as was the variance, skewness, and kurtosis computed between the cell's center and each contour point. Texture features from the cytoplasm included wavelet coefficients and metrics derived from the GLCM including contrast, correlation, energy, homogeneity, and entropy values. The area was calculated for the nucleus, cytoplasm, and the whole cell [52].

In addition to determining leukemia from cell-based features, AML can be distinguished from healthy tissue by extracting whole tissue/slide-based features as illustrated in two other

Furthermore, AML can also be distinguished from ALL through comparing cellular features in patient smears, as shown by Jacob and Mundackal [50], Supardi et al. [55], and Harun et al. [56]. Jacob et al. and Supardi et al. used cellular metrics based on texture, shape, and Hausdorff dimension, while Harun et al. classified the two leukemias by cell and nuclear perimeters,

More specifically, AML and ALL subtypes have been discriminated based on cell-based features in three different studies. To classify AML subtypes, Kazemi et al. predicted five AML groups (M2, M3, M4, M5, and all the remaining subtypes (M0, M1, M6, M7) considered as one group) based on handcrafted morphological features from blood microscopic images. The features used were extracted from cells' nuclei: irregularity, Hausdorff dimension, shape, color, and texture features complemented by the nucleus-cytoplasm ratio. The same set of features allowed more accurate discrimination of healthy tissue vs. AML tissue than AML tissue vs. ALL tissue [57]. Reta et al. performed a similar analysis which discriminated L1, L2, M3, M3, and M5 subtypes in ALL and AML based on cellular features, with nucleus features proving to be the most discriminative [58]. An earlier study (Escalante et al.) was also able to discriminate

areas of the cytoplasm and whole cells, and nucleus-cytoplasm ratio [56].

texture parameters to distinguish ALL from healthy lymphoblasts/lymphocytes [52].

(HD) represents the nucleus boundary roughness (Jacob and Mundackal) [50].

4.1. Examples of digital pathology for leukemia

studies (Madhukar et al., Agaian et al.) [53, 54].

Table 1. Leukemia subtype classification.

major axis, eccentricity, form factor, elongation, and solidity. Fractal or Hausdorff dimension (HD) represents the nucleus boundary roughness (Jacob and Mundackal) [50].
