**4.2 PCA analysis of quantitative parameters obtained from bright-field and phase images of cervical cell nuclei**

A MATLAB based software was designed to compute a number of morphological parameters associated with cell nuclei that are listed in **Table 2** from each of the cell nuclei imaged in bright-field as well as quantitative phase mode. The cell nucleus measurement data was therefore consisted of an (N p) matrix with N = 48,006

*Cytopathology Using High Resolution Digital Holographic Microscopy DOI: http://dx.doi.org/10.5772/intechopen.96459*

**Figure 6.**

*Illustration of (a), (b): ASC-US cells and (c), (d): ASC-H cells, the three columns show the bright-field image, selected nucleus ROI and the phase image of the ROI.*

and p = 20. Note that one parameter in the measurement set is the N/C ratio which was given three labels (low =1, mid =2, high =3) based on visual inspection of nuclei. This was done so that a number of cells of interest that appeared in clusters for which determining the cytoplasm boundary was difficult could be used in the analysis. All the other parameters were measured over the nucleus region in an automated fashion. For the present analysis, the cells were nominally labeled as superficial, intermediate and abnormal (including LSIL, HSIL, ASC-US, ASC-H, SCC) by practicing cyto-pathologists (S. R. M., M. S., K. A., S.S.). Since the number of abnormal cells was much smaller (1.6%) compared to the normal (superficial and intermediate) cells, a truncated data-set with 450 randomly selected cells from each of the three types (superficial, intermediate and abnormal) as per prior labelling was used to train the PCA. Denoting the truncated data matrix with 1350 rows and 20 columns (representing the measurements) with each column in standard form (zero mean and standard deviation 1) by *A*, the PCA solves the eigenvalue problem:

$$A^T A u\_k = \mu\_k u\_k. \tag{8}$$

Here the superscript "T" stands for the transpose of the matrix *A*. The eigenvectors *uk* are mutually orthogonal and are called as the principal vectors. All the

**Figure 7.**

*Illustration of rare abnormal cell types: (a) inflamed, (b) reactive, (c) moon-type, (d) virus-infected and (e) koliocytotic; the three columns show the bright-field image, selected nucleus ROI and the phase image of the ROI.*

cell data corresponding to the 48006 cells was then projected on the PCA vectors. The plot of first two components of PCA for all the cells is shown in **Figure 8**. The color coding of black, blue, and red corresponds to cells that were labeled separately by cyto-pathologists as superficial, intermediate and abnormal (all classes) respectively based on the bright-field images of the nuclei. Typical bright-field and phase images of cells from the three different regions of the PCA plot are also shown for illustration. The PCA plot based on bright-field and phase information separates most of the cells in three different classes, despite some overlap in adjacent classes. In particular, it is interesting to observe that almost all the cells labeled as abnormal fall in the bottom right corner of the PCA plot. We further examine

*Cytopathology Using High Resolution Digital Holographic Microscopy DOI: http://dx.doi.org/10.5772/intechopen.96459*

**Figure 8.**

*Data corresponding to 48006 cells projected onto the first two PCA vectors. The color coding of black, blue, and red corresponds to cells that were labeled by cyto-pathologists as superficial, intermediate and abnormal (all classes) respectively.*

four cells labeled as 1, 2, 3, 4 on the PCA plot that showed unexpected classification when phase parameters were used. Cells 1, 2 were labeled abnormal by the pathologist but were seen to be well within the intermediate region. Similarly the cells 3, 4 were labeled as intermediate but were observed to be well within the abnormal (red points) class. For a closer examination of this anomaly, we show bright-field and phase images of these cells in **Figure 9**. A re-examination of these cell images by pathologists suggested the following. Cell 1 is koliocytotic (abnormal) but it appears to have a dried up cytoplasm and leading to low phase values in the nucleus. The cell 2 is actually very similar to intermediate cells in general, but the pathologists labeled it as abnormal due to comparatively smaller sizes of other nuclei on the particular sample slide. The parameters associated with cell 3 are similar to abnormal cells but it is a rare example of enlarged intermediate cell. Finally cell 4 has folded cytoplasm leading to higher phase values although the cell may be considered to be of the intermediate class. Re-examination of these and other similar anomalies reveal that cervical cell classification has some aspects beyond simple numerical measurements performed on cell images (either in phase or bright-field modes) that need to be taken into account by any automated cell classification methodology. The issues like

**Figure 9.**

*Examples of intermediate and abnormal cells falling well within the abnormal and intermediate regions of the PCA plot in Figure 8.*

folding of cell cytoplasm can for example be minimized with the LBC preparation methodology. PCA analysis was used here because the plot as in **Figure 8** can be generated essentially in an unsupervised manner, however, it is certainly not the best classification methodology available today. In future we hope to test the possibility of cell classification using more advanced machine learning ideas applied to this data.

We further performed a leave-one-out analysis of the PCA for the cell data to determine which of the 20 measurements influenced the PCA scores the most [33]. If the PCA eigenvectors are arranged as columns of a matrix *U* the scores *Z* for the data corresponding to the principal components may be expressed as:

$$Z = \text{AU}.\tag{9}$$

The plot in **Figure 8** thus corresponds to the first two columns of the score matrix *Z*. For the leave-one-out analysis, the PCA was performed at a time with only 19 parameters by leaving one of the measured parameters one by one. The data matrix, the eigenvector matrix and the score matrix corresponding to the case where *<sup>j</sup>*-th measurement (*<sup>j</sup>* <sup>¼</sup> 1, 2, 3, … , 20) is left out may be denoted by *<sup>A</sup>*ð Þ �*<sup>j</sup>* , *<sup>U</sup>*ð Þ �*<sup>j</sup>* and *Z*ð Þ �*<sup>j</sup>* respectively. The importance of the *j*-th measurement is judged by the Procrustese distance *Dj* between the first *M* ¼ 2 columns of the score matrices *Z* and *Z*ð Þ �*<sup>j</sup>* . A specific parameter will be judged to influence the PCA the most if its corresponding Procustes distance *Dj* is higher. The top five morphological parameters in order of importance are shown in **Table 4**. The relative Procrustes distances


**Table 4.**

*Relative importance of numerical parameters using leave-one-out analysis applied to PCA.*

*Cytopathology Using High Resolution Digital Holographic Microscopy DOI: http://dx.doi.org/10.5772/intechopen.96459*

are calculated by dividing all the distances *Dj* with ð Þ *j* ¼ 1, 2, 3, … , 20 by the maximum among them. We find that among the top five parameters that influenced the PCA scores the most, three were derived from the phase images while two were derived from the bright-field images. It may be noted from **Table 4** that two phase based parameters (moment of inertia and optical volume) influence the PCA more than the commonly used N/C ratio criterion. We therefore believe that quantitative phase may prove to be an important future imaging modality in addition to the commonly used bright-field microscopy for cervical cell classification.

#### **4.3 Consistency of phase parameters**

Quantitative phase is not a standard clinical methodology for cell classification, however, as we showed in Section 4.2, quantitative phase may become an important modality to consider for future clinical use. It is therefore important to understand if the phase parameters for cervical nuclei are consistent across different subjects from same clinical site, age group of subjects or clinical sites with different sample preparation methodologies. Since our leave-one-out PCA analysis suggested that optical volume and moment of inertia are the most important phase parameters as explained in the previous section, we have plotted a few hundred randomly selected normal cells (superficial and intermediate) with respect to these phase parameters in **Figure 10**. In **Figure 10(a)** we show the plot for 200 normal cells each for five different patients from a single clinical site. **Figure 10(b)** shows the same plot for 200 cells each from three different clinical sites with different sample preparation protocols. In **Figure 10(c)** we show the plot once again for 500 normal cells for two different age groups (below and above 30 years). From these plots we observe that the normal cells from different categories as above show highly overlapping distributions for the most important phase parameters. We believe that this observation

#### **Figure 10.**

*Verification of consistency of the two most important phase parameters (moment of inertia and optical volume) decided based on the leave-one-out analysis; (a) plot of 200 cells each for 5 patients from the same clinical site (AIIMS), (b) plot of 200 cells each from three clinical sites with different sample preparation protocols, (c) plot of 500 cells each for patients below and above 30 years of age. The numerical values of moment of inertia and optical volume are normalized to standard form (zero mean and standard deviation one).*

#### **Figure 11.**

*Illustration of bright-field and phase imaging of unstained cervical cells: (a), (d): Bright-field images of unstained cells, (b), (e): Nucleus ROI selected from the bright-field image, (c), (f): Phase map of the unstained nuclei.*

is very important for standardization and usage of quantitative phase imaging methodology in future clinical practice.

#### **4.4 Observations on quantitative phase imaging of unstained cervical cells**

In this section we briefly describe an interesting possibility of quantitative phase imaging of unstained cervical cell samples with two typical images of normal cells as shown in **Figure 11**. For unstained cervical cell samples, the Pap smear was prepared using the conventional method and cells were fixed with ethyl alcohol. While the staining protocols used in Pap-smear sample is a gold standard for diagnosis by cyto-pathologists, we note here that compared to stained cells, the phase signal observed from nuclei of unstained unprocessed cell samples is almost three times higher in magnitude. While interpretation of images of the unstained cells and their phase may require one to go through a learning process, the possibility of using unstained cell samples for diagnostic practice may offer an attractive alternative as the cell sample preparations will not involve any wet-lab processing and recurring costs associated with reagents.

### **5. Conclusions**

In conclusion we have reported an image based study of cervical cells at various stages using bright-field as well as quantitative phase microscopy. Over 48000 cells have been imaged individually. The phase images of the cell nuclei were reconstructed using an optimization approach that provided same resolution as the bright-field images. This image data-set may be valuable for future application development using advance machine learning methods. The visual inspection of images shows interesting features in the phase images as the cells evolve from intermediate to superficial stages with distinct features associated with abnormal cells. This finding based on visual inspection is confirmed in the PCA analysis of the *Cytopathology Using High Resolution Digital Holographic Microscopy DOI: http://dx.doi.org/10.5772/intechopen.96459*

morphological parameters of cells derived from both the bright-field and phase images of cell nuclei. A leave-one-out analysis applied to the PCA scores suggests that apart from the N/C ratio that has been used for identifying abnormal cells for decades, the other two parameters that influence the PCA the most are optical volume and moment of inertia of nucleus - both of which are derived from phase images. A consistency study suggests that the phase parameters associated with normal cells show highly overlapping distributions for multiple patients from same clinical site, for three clinical sites with different sample preparation protocols and for patients in two age groups. The consistency of phase parameter distributions for these cases further suggest that phase is a robust modality that can certainly be used in a standardized manner in clinical practice. We believe that quantitative phase may become an important imaging modality in addition to the bright-field imaging that is solely used in the current clinical practice. While this study has been performed for cervical cells we believe that our conclusions regarding importance of quantitative phase may possibly have a wider applicability.
