4.2.2. Deep learning methods

multiple leukemia tissue types from the BM: ALL vs. AML, L1 vs. L2, M2 vs. M3 + M5, M3 vs. M2 + M5, M5 vs. M2 + M3, M1 vs. M3 vs. M5, and L1 vs. L2 vs. M1 vs. M3 vs. M5 [59]. However, in the latter study, there was no significant difference in model performance using

This contradicts other studies that suggest classification based on subcellular morphometry improves AML [60] and ALL [61] subtype recognition. In particular, these groups found that color and shape information in the cytoplasmic holes, which indicate vacuoles, and color and shape information on the nucleus, which indicate nucleoli, can reveal the presence of Auer

In addition to the large number of publications characterizing acute forms of leukemia, studies (Vaghela et al.) have suggested measurements of WBC roundness and counts can discriminate

Support vector machine (SVM) is a common classification method in leukemia (Jacob et al. [50]; Agaian et al. [53]; Kazemi et al. [57]; Madhukar et al. [54]). However, other methods (Supardi et al., Gumble and Rode) have been applied with success to classify AML and ALL histology and cytology images including k-nearest neighbor classifiers [51, 55], a hybrid multilayer neural network (HMLNN) (Harun et al.) [56], and an ensemble particle swarm model selection method (EPSMS) (Escalante et al.) [59]. Alternatively, Kumar et al. suggested using a shallow neural network (NN) classifier after the AML slide is processed using wavelet transformation [37]. Other groups (Mohapatra et al.; Reta et al.; Escalante et al.) have compared multiple classifiers on leukemia image datasets and found that depending on the target,

When a small amount of data is available, conventional feature engineering-based machine learning algorithms provide fairly accurate predictions [39]. The accuracy of feature engineering proposed models depends on the distinct leukemia databases studied, the number and quality of the images, and the image acquisition mode; these require different data preprocessing steps. These methods are mainly based on supervised classification of leukemia subtypes. When the set of quantitative morphological features of the leukemia subtype is trained on a labeled dataset, then classifiers have been able to predict the four major leukemia types or the FAB classes applied to a test set. In case of insufficient number of training samples, Kasmin et al. proposed reinforcement learning to classify ALL, AML, CLL, and CML from PB cellular

Although these previous studies found new morphological features from the digitalized leukemia patient histology slides, and were successfully able to identify the major leukemia types and M0–M7 and L1–L3 subtypes, morphological features from the leukemia cells were not correlated with non-morphological information such as genetic mutations and clinical data. The morphological classification methods currently are not sufficient to recognize the majority

different classification methods appear to be the optimal solution [52, 58, 59].

nucleus' geometrical, texture, color, and statistical parameters [63].

features extracted from the nucleus and cytoplasm vs. the whole cell.

rods discriminating AML from ALL where Auer rods are absent [61].

chronic myeloid vs. chronic lymphoid leukemia [62].

4.2. Classification methods

4.2.1. Traditional machine learning methods

104 Hematology - Latest Research and Clinical Advances

Although engineered feature-based conventional machine learning algorithms provide fairly accurate predictions, they do not reach the capability of human perception. The feature engineering process requires defining a carefully chosen set of features. This is a laborious process, and the feature parameters are very sensitive to the specific training set from where they were extracted. Due to this rigidity, a conventional machine learning algorithm likely could not be applied to a second dataset without parameter tweaking. To overcome these limitations, deep learning algorithms trained on large amounts of data can extract generalized features to perform human-level pattern recognition [64, 65].

When a large amount of data is available, for identifying morphological features in leukemia, a deep learning approach can be applied. Deep learning can self-discover new, hierarchical features in images (feature learning) allowing better pattern recognition for classification. These features are identified without human knowledge, and the learning approach is called "domainagonistic," where the computational system alone is able to distinguish distinct tissue types in any type of cancer. Today, with the increasing computing capacity of modern computers and the availability of big data storage, huge amounts of data can now be extracted and analyzed to identify key features for classification. This has enabled deep learning methods to outperform previous conventional machine learning approaches and to achieve higher accuracy [39, 66].

Deep learning is the extension of conventional, artificial neural networks where, instead of a single-layered network, a multilayered connected network processes input data and generates output. The network design is dependent on the input dataset and classification target. For pattern classification problems, convolutional neural networks (CNNs) are the ideally suited network design. The network learns from the example images fed to it and extracts hierarchical features automatically layer by layer (e.g., from low-level features like edges to higher-level features such as the cell, tissue, and then organ) without expert human intervention while retaining highly expressive power (Figure 3) [65–67].

The input of the CNN is a series of images, cropped from the whole slide image, and the images are processed in batch. For WBC classification, one cropped image contains one whole cell. Contrary to the cell-based analysis, for tissue classification, the images are slide-based, so the features are learned directly from the spatial pattern. The image size and the number of images fed to the network should be chosen carefully, and the variety of images should represent the

emerging nonstandard methods to improve and personalize leukemia classification can be expensive and time-consuming. Digital pathology is emerging as a powerful, inexpensive tool

Quantitative-Morphological and Cytological Analyses in Leukemia

http://dx.doi.org/10.5772/intechopen.73675

107

This review discussed how computational cytology can help improve leukemia diagnosis by enhancing pathologist smear-based decisions and improve leukemia diagnosis with automated, biologically meaningful pattern recognition. Techniques summarized in this review extract quantitative imaging features from stained bone marrow and peripheral blood smear samples to detect and classify leukemia. To identify morphological features, conventional machine learning approaches have been broadly applied to classify leukemia types and subtypes based on feature engineering. However, to acquire a new set of morphological features

For most of the cases reviewed in this chapter, the image processing pipeline implements a supervised classification scheme, where the morphometric features are extracted from a set of labeled data (ALL vs. AML, FAB, M1, etc.) and then are validated on a test dataset. In future studies, supervised morphological analysis can be complemented with unsupervised classification schemes such as unbiased clustering. This approach could reveal whether entirely new classification schemes should be implemented for ALL or AML, independent from known acute or chronic leukemia subtype morphological classification. It also could potentially reveal

Emerging omics analysis methods are determining protein expression signatures for leukemia patients; however, these new processes can be time and labor intensive. To determine genetic information and protein signature membership rapidly and without the time delay required for proteomic-based signature assignment, advances in digital pathology offer potentially exciting, inexpensive, rapid alternatives. If morphological surrogates that reliably correlate with clinical, genetic, or proteomic features, either individually or in combinatorial patterns, can be identified directly from histology images, then this could significantly speed up leukemia diagnosis, reduce the cost of the diagnostic workup, optimize the assignment of patients to

Cell metrics can be predefined manually, and often metrics are those known to be pertinent to leukemia cells. These algorithms, which together are employed as part of a "feature engineering process," extract metrics from images based on features of cells (e.g., size or nucleus shape). Using a supervised classification approach, the metrics are extracted from predefined leukemia subtypes. As an example, a set of quantitative morphological features defining a leukemia subtype are trained on a labeled dataset according to the FAB morphological classes, and the resulting developed classifier is then used to predict the leukemia

In the unsupervised classification approach, new clusters of leukemia subtypes are created from the engineered features. Contrary to the feature engineering process, learning algorithms self-discover features representative of leukemia cell types (feature learning) where features are learned from annotated (supervised) or unannotated (unsupervised) data (Figure 2).

a particular therapy, and potentially uncover new pathways for drug targeting.

in leukemia, a deep learning approach would provide higher accuracy.

to enhance biopsy- and smear-based decisions.

common underlying genetic or proteomic patterns.

subtypes on a test set.

Figure 3. Feature learning for classification.

variability of the tissue type. Grayscale images are two-dimensional: width and height. Color images have a third dimension, depth, representing the RGB color channels [38, 65, 67, 68].

Once the set of images is defined and labeled, feature maps are created by sliding a series of filters representing shapes, textures, or colors over the input image (convolution), thus identifying local dependencies. The filters representing the features are learned during the training process through backpropagation and a gradient descent algorithm. After convolution, an activation process introduces nonlinear properties to the linear convolution to improve the model accuracy and to avoid overfitting. The convolutional layer then is down-sampled (pooling). This is successively repeated as many times as necessary according to the hierarchical complexity of the image. The last feature map is then flattened into a one-dimensional vector to feed a fully connected layer for neural network (NN) classification. The NN classification process can be replaced by a different classification scheme such as an SVM or random forest [38, 65, 67, 68].

Convolutional neural networks are ideally suited for pattern recognition and medical image analysis. In fact, CNNs have been successfully applied to feature learning to detect and diagnose a number of different cancers, including leukemia cells. Deep learning methods have been used for white blood cell detection and classification [68], lymphocyte detection [38], and lymphoma subtype classification [38] by identifying three subtypes of lymphoma: chronic lymphocytic leukemia (CLL), follicular lymphoma (FL), and mantle cell lymphoma (MCL). It also has been applied to the analysis of ALL cellular images to classify ALL subtype histopathology [67, 69].

Although the current research in pattern recognition is dominated by the supervised deep learning approach, the unsupervised approach is expected to provide breakthrough results in the near future, and extensive research is currently ongoing to optimize these algorithms [65, 66].
