**3. Intact cell MALDI TOF MS**

MALDI TOF MS employs the laser energy to desorb and ionize molecules of an analyte from the crystallized mixture with the matrix, and subsequently separates the resulting ions according to mass-to-charge (*m/z*) ratio. The organic matrix enhances energy transfer to analyte, preserves the structure of the ionized molecules, e.g. peptides, proteins or other biomolecules, and allows their precise structural analysis and identification. In cell biology, MALDI TOF MS is one of the preferred methods for proteomic analysis in a broad range of samples, such as purified or fractioned extracts of cells or tissues. The MS-based proteomics uses protein fragmentation for identification and further generation of a list of unique peptide or protein signatures in wide range of *m/z* values [34]. However, the methodological complexity and the character of the data output may limit the use of traditional proteomics in routine quality control of stem cell cultures, even if coupled with transcriptomics or (meta)genomics.

Even when the intact (whole) cells are used as an analyte, MALDI TOF MS can generate rich spectra without the need of previous cell lysis, fractionation or protein extraction. Mass spectra contain signals for small proteins and peptides, and a variety of other low-mass molecules, including metabolites. Analysis of specific spectral (peak) signatures has been successfully introduced to clinical microbiology, where MALDI TOF MS enables the rapid discrimination, or "biotyping", of bacterial species without the necessity of complex sample processing [35, 36]. Generally the same approach - utilization of relevant spectral patterns as inputs for further processing and analysis [33] - can be used for discrimination of cancer cells [37, 38] or abnormal stem cells in long-term cultures, even in high-throughput setup [39, 40]. Intact Cell MALDI TOF MS was used to identify spectral signatures of glial cells and their classification to astrocyte, microglia and oligodendrocyte type [41]. Principal component analysis then revealed informative peaks for deeper spatial analysis using mass spectrometry imaging in whole brain sections. Similarly, mass spectra have demonstrated to contain sufficient information to reveal the immunophenotype and activation state of immune cells, [42–45] or to classify distinct mammalian cell lines [46, 47]. Moreover, MS can reveal changes associated with molecular phenotype, which occur within cell lines and sublines of common genetic

origin. Such approach has been used recently by Povey et al., who demonstrated discrimination of neuroblastoma cell lines sensitive to chemotherapy [48], or by Cadoni et al. who classified ovarian cancer cells sensitive or resistant to cisplatin, based on phospholipid patterns generated by MS [37].

#### **3.1 Intact cell MALDI TOF MS of hESCs**

The first step of the preanalytical sample processing is the enzymatic or manual harvesting of hESCs under visual microscopic control. Next, cell clusters are enzymatically disaggregated and washed in isotonic buffers (e.g. phosphate buffered saline, PBS) to remove residual culture medium and additives. PBS has been reported not to interfere with MALDI TOF MS significantly [49]. However, we observed that it may induce random quenching of ionization and decreased intensity of peaks. Therefore, we have added an additional wash with MS fully compatible buffers, such as ammonium acetate [41] or ammonium bicarbonate (ABC) [33], to our protocol, to remove traces of PBS in order to improve mass spectra quality. After cell number assessment, cell are resuspended in 150 mM ABC to desired concentration. Dry cell pellets can be cryostored (at −80°C or lower) with no significant impairment of mass spectra quality.

The MS protocol for hESCs biotyping (fingerprinting) follows the established proteomic or microbiological workflow. Dependent on cell type, instrumentation type and matrix composition, we use typically 1000-25,000 cells per measurement in routine analysis. Cell number can be, though, reduced to several hundred in an optimized experimental design. Cells can be directly placed onto a steel target plate or on transparent indium-tin oxide (ITO) coated glass slides. The ITO coated glass slides enable correlative microscopic analysis in parallel to the MS. In addition, they can be used as a substrate for culture of adherent cells [50].

Sinapinic acid (SA) or α-cyano-4-hydroxycinnamic acid (CHCA) acidified with trifluoroacetic acid are used as a matrix predominantly. SA and CHCA generate uniform-sized crystals, in which cells can be embedded regularly (**Figure 1B**). Although, other matrices, such as 2,5-dihydroxybenzoic acid (DHB) or 2-mercaptobenzothiazole (MBT) can also provide informative output, they form long, needle-like crystals distributed over the target spot unevenly, and therefore are more suitable for solubilized samples.

Routinely, we analyze samples in linear positive mode in *m/z* range of 2-20 kDa, using the usual range of laser energy. Some of the dominant peaks, which have already been partially identified [41, 47, 51], are regularly observed also in hESCs. They correspond to modified histones, thymosin and presumably to ribosomal or other small structural proteins, and can provide an immediate verification of mass spectrum quality.

Processing of the mass spectrum prior to statistical analysis includes reduction of raw data matrix, smoothing of the spectrum, alignment of peaks, baseline subtraction and finally detection of peaks. Average spectrum is then calculated from technical replicates and used to generate a final dataset of *m/z* values with assigned intensities in mV or relative arbitrary units [33, 52, 53].

### **4. Data analysis**

#### **4.1 Mass spectrum as a biomarker**

Mass spectrum recorded in a wide range of *m/z* values contains hundreds of charged molecular entities, which together form a spectral profile, or "fingerprint"

**127**

*Intact Cell Mass Spectrometry for Embryonic Stem Cell Biotyping*

that can be uniquely assigned to a specific cell type, phenotype or state. However, MALDI TOF mass spectra generated from ionized molecules desorbed from the intact cells are complex and depend strongly on the experimental conditions and preanalytical errors, such as matrix choice, hardware setup and even operator skills. Despite the technical variability, individual mass spectra assembled to a correctly processed dataset may serve as input data for sophisticated mathematical analysis. After the reduction of the unwanted inconsistency, informative patterns in mass spectra can be identified. Finally, processed spectral dataset can be organized in twodimensional array of cases and intensities of selected peaks. Before statistical analysis is applied to the spectral dataset, preliminary examination of data quality is required. Such rigorous control of data quality includes verification of reproducibility, meticu-

lous calibration and elimination of apparent technical errors or outliers.

tion of isotopic envelope of high mass peptides or proteins.

that can globally evaluate the similarity of mass spectra [56].

of outliers is thus indicated by an increased value of the rank [58, 59].

data processing in routine analysis of hESCs is summarized in **Figure 2.**

The multivariate evaluation and validation can identify relevant groups or classes within the spectral data. The principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) are commonly used. PCA reduces dimensionality of the spectral data, and defines new vectors - principal components, which maximize the variance. Besides, it enables visual observation of the recognized groups of the samples. PCA is an unsupervised method with minimal bias, and its performance is optimal when the intra-group variability is significantly lower than the inter-group variability. It is a well-established tool for processing of complex spectral data, e.g. in proteomics or microbiology [60, 61]. PLS uses different mathematical model than PCA for the distinction of groups. It represents a supervised discriminant analysis, which involves the group information in the algorithm. PLS can provide an excellent discrimination, however, it can suffer from inherent tendency to over-fit the data and identify the clusters even in a uniform spectral dataset. The validation on independent data is therefore recommended [62]. The workflow of

Mass spectra of complex biological samples usually contain numerous peaks with rather low intensities and low signal-to-noise ratio. Therefore, the peak detection and recognition is dependent on precise calibration. Where appropriate, we do recommend using the clusters of isotopically pure elements, such as nanoparticles of gold (gold clusters) or black and red phosphorus as calibration standards [54, 55], next to commercially available peptide standards. Mono-isotopic calibrants provide well defined peaks corresponding accurately to predicted mass, allowing proper peaks alignment. Besides, they do not suffer with occasionally problematic indica-

For evaluation of mass spectra similarity of technical replicates or experimental

cohorts, mathematical approaches used in proteomics or metabolomics can be applied. Correlation analysis (e.g. Pearson's correlation, Spearman's correlation, Kendall rank correlation or cosine correlation) can provide a quantitative output

Another relevant factor, which can interfere with the outputs of statistical analysis, is the presence of outlier values in the dataset (case) or within the mass spectrum (peak intensity). Despite the precise laboratory work, outlier values are inevitable and are probably associated with stochastic MALDI effects, as have been already described in bacterial Intact Cell MS [57]. One of the classical procedures, which allows to reveal outliers within the data, is provided by factor analysis and includes careful following of the rank of the data matrix by computing eigenvalues. The number of non-zero eigenvalues (rank of the matrix), visualized in a scree plot, immediately gives the crucial information related to the sample, such as the number of recognizable data groups in mass spectra (e.g. cell types or experimental conditions). In case of two data groups, the rank should equal to two. The presence

*DOI: http://dx.doi.org/10.5772/intechopen.95074*

#### *Intact Cell Mass Spectrometry for Embryonic Stem Cell Biotyping DOI: http://dx.doi.org/10.5772/intechopen.95074*

*Mass Spectrometry in Life Sciences and Clinical Laboratory*

based on phospholipid patterns generated by MS [37].

can be used as a substrate for culture of adherent cells [50].

intensities in mV or relative arbitrary units [33, 52, 53].

**3.1 Intact cell MALDI TOF MS of hESCs**

impairment of mass spectra quality.

more suitable for solubilized samples.

spectrum quality.

**4. Data analysis**

**4.1 Mass spectrum as a biomarker**

origin. Such approach has been used recently by Povey et al., who demonstrated discrimination of neuroblastoma cell lines sensitive to chemotherapy [48], or by Cadoni et al. who classified ovarian cancer cells sensitive or resistant to cisplatin,

The first step of the preanalytical sample processing is the enzymatic or manual harvesting of hESCs under visual microscopic control. Next, cell clusters are enzymatically disaggregated and washed in isotonic buffers (e.g. phosphate buffered saline, PBS) to remove residual culture medium and additives. PBS has been reported not to interfere with MALDI TOF MS significantly [49]. However, we observed that it may induce random quenching of ionization and decreased intensity of peaks. Therefore, we have added an additional wash with MS fully compatible buffers, such as ammonium acetate [41] or ammonium bicarbonate (ABC) [33], to our protocol, to remove traces of PBS in order to improve mass spectra quality. After cell number assessment, cell are resuspended in 150 mM ABC to desired concentration. Dry cell pellets can be cryostored (at −80°C or lower) with no significant

The MS protocol for hESCs biotyping (fingerprinting) follows the established proteomic or microbiological workflow. Dependent on cell type, instrumentation type and matrix composition, we use typically 1000-25,000 cells per measurement in routine analysis. Cell number can be, though, reduced to several hundred in an optimized experimental design. Cells can be directly placed onto a steel target plate or on transparent indium-tin oxide (ITO) coated glass slides. The ITO coated glass slides enable correlative microscopic analysis in parallel to the MS. In addition, they

Sinapinic acid (SA) or α-cyano-4-hydroxycinnamic acid (CHCA) acidified with trifluoroacetic acid are used as a matrix predominantly. SA and CHCA generate uniform-sized crystals, in which cells can be embedded regularly (**Figure 1B**). Although, other matrices, such as 2,5-dihydroxybenzoic acid (DHB) or 2-mercaptobenzothiazole (MBT) can also provide informative output, they form long, needle-like crystals distributed over the target spot unevenly, and therefore are

Routinely, we analyze samples in linear positive mode in *m/z* range of 2-20 kDa,

Processing of the mass spectrum prior to statistical analysis includes reduction of raw data matrix, smoothing of the spectrum, alignment of peaks, baseline subtraction and finally detection of peaks. Average spectrum is then calculated from technical replicates and used to generate a final dataset of *m/z* values with assigned

Mass spectrum recorded in a wide range of *m/z* values contains hundreds of charged molecular entities, which together form a spectral profile, or "fingerprint"

using the usual range of laser energy. Some of the dominant peaks, which have already been partially identified [41, 47, 51], are regularly observed also in hESCs. They correspond to modified histones, thymosin and presumably to ribosomal or other small structural proteins, and can provide an immediate verification of mass

**126**

that can be uniquely assigned to a specific cell type, phenotype or state. However, MALDI TOF mass spectra generated from ionized molecules desorbed from the intact cells are complex and depend strongly on the experimental conditions and preanalytical errors, such as matrix choice, hardware setup and even operator skills. Despite the technical variability, individual mass spectra assembled to a correctly processed dataset may serve as input data for sophisticated mathematical analysis. After the reduction of the unwanted inconsistency, informative patterns in mass spectra can be identified. Finally, processed spectral dataset can be organized in twodimensional array of cases and intensities of selected peaks. Before statistical analysis is applied to the spectral dataset, preliminary examination of data quality is required. Such rigorous control of data quality includes verification of reproducibility, meticulous calibration and elimination of apparent technical errors or outliers.

Mass spectra of complex biological samples usually contain numerous peaks with rather low intensities and low signal-to-noise ratio. Therefore, the peak detection and recognition is dependent on precise calibration. Where appropriate, we do recommend using the clusters of isotopically pure elements, such as nanoparticles of gold (gold clusters) or black and red phosphorus as calibration standards [54, 55], next to commercially available peptide standards. Mono-isotopic calibrants provide well defined peaks corresponding accurately to predicted mass, allowing proper peaks alignment. Besides, they do not suffer with occasionally problematic indication of isotopic envelope of high mass peptides or proteins.

For evaluation of mass spectra similarity of technical replicates or experimental cohorts, mathematical approaches used in proteomics or metabolomics can be applied. Correlation analysis (e.g. Pearson's correlation, Spearman's correlation, Kendall rank correlation or cosine correlation) can provide a quantitative output that can globally evaluate the similarity of mass spectra [56].

Another relevant factor, which can interfere with the outputs of statistical analysis, is the presence of outlier values in the dataset (case) or within the mass spectrum (peak intensity). Despite the precise laboratory work, outlier values are inevitable and are probably associated with stochastic MALDI effects, as have been already described in bacterial Intact Cell MS [57]. One of the classical procedures, which allows to reveal outliers within the data, is provided by factor analysis and includes careful following of the rank of the data matrix by computing eigenvalues. The number of non-zero eigenvalues (rank of the matrix), visualized in a scree plot, immediately gives the crucial information related to the sample, such as the number of recognizable data groups in mass spectra (e.g. cell types or experimental conditions). In case of two data groups, the rank should equal to two. The presence of outliers is thus indicated by an increased value of the rank [58, 59].

The multivariate evaluation and validation can identify relevant groups or classes within the spectral data. The principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) are commonly used. PCA reduces dimensionality of the spectral data, and defines new vectors - principal components, which maximize the variance. Besides, it enables visual observation of the recognized groups of the samples. PCA is an unsupervised method with minimal bias, and its performance is optimal when the intra-group variability is significantly lower than the inter-group variability. It is a well-established tool for processing of complex spectral data, e.g. in proteomics or microbiology [60, 61]. PLS uses different mathematical model than PCA for the distinction of groups. It represents a supervised discriminant analysis, which involves the group information in the algorithm. PLS can provide an excellent discrimination, however, it can suffer from inherent tendency to over-fit the data and identify the clusters even in a uniform spectral dataset. The validation on independent data is therefore recommended [62]. The workflow of data processing in routine analysis of hESCs is summarized in **Figure 2.**

**Figure 2.**

*(A) Example of raw mass spectrum generated from intact cells by MALDI TOF MS, (B) data processing workflow, (C) processed mass spectrum, (D) spectral dataset consisting of individual cases (ID1-IDn) and assigned values of peak intensities at defined* m/z*, (E) heat map graphically visualizing the dataset containing spectral data of hESCs and four differentiation stages (DIFA-D), (F) example of the output matrix of PCA with recalculated coordinates, (G) scree plot visualizing the number of significant factors contributing to the variability in the dataset, (H) PCA-based visualization of the differentiation trajectory of hESCs progressing towards endodermal phenotype through the four differentiation stages (DIFA-D). Adapted with permission from [63].*

#### **4.2 Classification by machine learning**

Artificial Neural Networks (ANNs) represent a non-linear mathematical model, which resembles a brain neural architecture, and possess "learning" and "generalization" abilities. For this reason, ANNs belong to a group of artificial intelligence methods with wide spectrum of complex applications, ranging from purely scientific to industrial or clinical. ANNs utilize diverse types of input data, which are processed in the context of previous training history on a defined sample database to produce a relevant output [64]. The unique chemical fingerprints generated by intact cell mass spectrometry allow the ANN to classify the samples even without preceeding identification of relevant peaks. Successful application of ANNs or any other machine-learning algorithms requires building-up a database of spectral patterns specific for individual cell types, phenotypes or states. This has been successfully achieved in clinical microbiology, however, in eukaryote biology, the

**129**

*Intact Cell Mass Spectrometry for Embryonic Stem Cell Biotyping*

complexity of cellular composition and cell plasticity in general represents a major issue. Nevertheless, for the "in-house" databases of well-defined cell models and conditions of their handling and analysis, the Intact Cell MALDI TOF MS coupled with ANNs is a powerful and robust approach that can be easily adapted to any

*Architecture of the representative artificial neural network used for prediction of hESC phenotype (output) using peak intensities arranged in a defined spectral matrix (input). Adapted with permission from [63].*

**5. Applications of intact cell mass spectrometry in quality control of** 

predicted the category based solely on mass spectrum fingerprint [65].

Generally the same approach can be used for rapid discrimination of cells occurring in stem cell cultures. Mass spectra from pure populations of mESCs, hESCs, and mouse embryonic fibroblasts (MEFs) contain enough information to distinguish the cell types by cluster analysis. Interestingly, these spectral profiles are not lost even in case of mixed populations of two cell types, such as in crosscontaminated cell cultures. Therefore, they can serve as a basis for quantitative estimation of the individual cell types in the mixture. To model such scenario, a broad panel of binary suspension mixtures containing hESCs and MEFs or hESCs and mESCs in defined ratios was prepared. Mass spectra were recorded, processed and the spectral patterns assigned to known quantities of cells in suspension. Resulting dataset then represented calibration data matrix, suitable for quantitative

Monitoring of clinical-grade stem cells during manipulation, banking or quality control by appropriate tools is the essential prerequisite for their application. We hypothesized that different cell and tissue types or their different states may vary in levels of numerous small molecules, metabolites or peptides and proteins. An unambiguous and unbiased chemical fingerprint obtained by MS can thus reflect such divergences with high sensitivity. In addition, spectral patterns can serve as a highly informative input for subsequent statistical analysis and classification. To test this hypothesis we used a mouse model of primary hyperoxaluria I - a congenital disorder that affect enzymatic machinery of glyoxylate metabolism. Primary hyperoxaluria I causes oxalate deposits to localize in liver and kidneys, and ultimately lead to hepatorenal failure and extrarenal manifestation of the disease. Alterations of chemical composition within the tissue microenvironment of hyperoxaluric animals can be translated into specific patterns in mass spectra. A dataset, composed of peaks and their corresponding intensities obtained from diseased and healthy animals, was used as an input for cluster and classification analysis and machine learning (ANN) prediction. Spectral patterns clearly distinguished samples from healthy and hyperoxaluric animals and, in parallel, the ANN correctly

*DOI: http://dx.doi.org/10.5772/intechopen.95074*

specific application (**Figure 3**).

**Figure 3.**

**embryonic stem cell cultures**

**Figure 3.**

*Mass Spectrometry in Life Sciences and Clinical Laboratory*

**128**

**Figure 2.**

*from [63].*

**4.2 Classification by machine learning**

Artificial Neural Networks (ANNs) represent a non-linear mathematical model, which resembles a brain neural architecture, and possess "learning" and "generalization" abilities. For this reason, ANNs belong to a group of artificial intelligence methods with wide spectrum of complex applications, ranging from purely scientific to industrial or clinical. ANNs utilize diverse types of input data, which are processed in the context of previous training history on a defined sample database to produce a relevant output [64]. The unique chemical fingerprints generated by intact cell mass spectrometry allow the ANN to classify the samples even without preceeding identification of relevant peaks. Successful application of ANNs or any other machine-learning algorithms requires building-up a database of spectral patterns specific for individual cell types, phenotypes or states. This has been successfully achieved in clinical microbiology, however, in eukaryote biology, the

*(A) Example of raw mass spectrum generated from intact cells by MALDI TOF MS, (B) data processing workflow, (C) processed mass spectrum, (D) spectral dataset consisting of individual cases (ID1-IDn) and assigned values of peak intensities at defined* m/z*, (E) heat map graphically visualizing the dataset containing spectral data of hESCs and four differentiation stages (DIFA-D), (F) example of the output matrix of PCA with recalculated coordinates, (G) scree plot visualizing the number of significant factors contributing to the variability in the dataset, (H) PCA-based visualization of the differentiation trajectory of hESCs progressing towards endodermal phenotype through the four differentiation stages (DIFA-D). Adapted with permission* 

*Architecture of the representative artificial neural network used for prediction of hESC phenotype (output) using peak intensities arranged in a defined spectral matrix (input). Adapted with permission from [63].*

complexity of cellular composition and cell plasticity in general represents a major issue. Nevertheless, for the "in-house" databases of well-defined cell models and conditions of their handling and analysis, the Intact Cell MALDI TOF MS coupled with ANNs is a powerful and robust approach that can be easily adapted to any specific application (**Figure 3**).
