**3. Preparation of the sample for bottom-up proteomics**

Protein extraction and the subsequent preparation for LC–MS analysis represents one of the key steps in proteomics (**Figure 2**). While there have been numerous protocols reported, they have mainly focused on preparation from large amounts (i.e. micrograms to milligrams) of material, which limits their utility in the study of patient clinical samples. Notably, protein extraction from FFPE preserved tissues requires removal of formaldehyde-formed cross links, which is usually carried out by heating samples in a buffered solution at an elevated temperature (95°C or 100°C). The most common buffers used for protein extraction are ammonium bicarbonate, tris(hydroxymethyl)aminomethane (Tris), and Radioimmunoprecipitation assay (RIPA) buffer. Addition of detergents to the buffer composition (e.g. sodium dodecyl sulfate (SDS), sodium dodecyl cholate (SDC), RapiGest SF surfactant™ (Waters), PPS Silent Surfactant™ (Expedeon) have been routinely used to improve protein solubilization efficiency and thus enhance protein extraction. In addition to optimization of the extraction buffers many studies also optimized other parameters like incubation time of the extraction and/or addition of various proteases to improve protein coverage during subsequent LC–MS/MS analysis.

#### **Figure 2.**

*Overview of sample preparation for bottom-up proteomic analysis by tandem mass spectrometry. A) Sample lysis: proteins are extracted from biological matrix in lysis buffer. Mechanical disintegration or sonification is used to homogenize rigid structures present within samples such as is common in mammalian tissue. B) Protein digestion: proteins are proteolytically digested into peptides, usually by the protease trypsin. C) Peptide fractionation: optionally, the complexity of the peptide sample is decreased by addition of fractionation steps orthogonal to methods used in the next step. D) Mass spectrometry analysis: desalted peptide samples are dissolved in an appropriate buffer and introduced into a tandem mass spectrometer. Most often reversed phase liquid chromatography separation is used in this final step to enable sequential introduction of peptides into the tandem mass spectrometer.*

#### **3.1 Detergents**

Traditional detergents and chaotropes such as SDS and urea have been widely used for protein solubilization, however they are also well known to inhibit digestion at higher concentrations and are incompatible with reversed phase chromatography separation (RPLC) used to introduce samples for MS analysis. Therefore, their concentration must be kept low at the time of proteolysis in order to preserve the effectiveness of proteases used for protein digestion. Failure to do so often leads to incomplete protein solubilization and denaturation. Also, presence of detergents in the sample might interfere with later instrumental analysis, therefore there have been different purification methods developed for detergent removal to improve LC–MS outcome. The choice of the most effective procedure depends on the physicochemical properties of the detergent. Some of the procedures might include detergent removal on the basis of size exclusion (i.e. molecular weight cut-off filters) or with the use of spin columns containing appropriate resins for detergent removal. Moreover, heating of the sample in urea buffers often leads to covalent modification of proteins via carbamylation, which might affect peptide retention time during RPLC separation and if not accounted for will interfere with identification. In order to circumvent these problems caused by mass spectrometry incompatible detergents significant effort went into development of reagents that avoid these complications. To this end, acid labile detergents such as RapiGest SF surfactant™ (Waters) and PPS Silent Surfactant™ (Expedeon) were developed that could be easily removed after proteolysis by simple measures like decreasing the pH. For example, the MS compatible surfactant ProteaseMAX™ (Promega) surfactant enhances tryptic, chymotryptic and LysC digestion and then degrades during the course of a digestion reaction. Another compound, Invitrosol™ (Thermo Fisher Scientific) is a homogenous surfactant that does not impact tryptic digestion and elutes during RPLC in three peaks well separated from where peptides elute [6].

#### **3.2 Sample digestion**

Classical bottom-up proteomic sample preparation aims to turn protein extracts into peptides via a process of protein cleavage or digestion with proteases. Notably, proteins extracted from biological material tend to keep their native tertiary structure mostly held by non-covalent interactions of amino acid side groups [7].

**15**

*Trends in Sample Preparation for Proteome Analysis DOI: http://dx.doi.org/10.5772/intechopen.95962*

It is thus essential to disrupt the tertiary structure and linearize the protein sequence to ease the accessibility of proteases to cleavage sites. Protein tertiary structure is frequently disrupted by chaotropic and denaturing reagents. Disulfide bonding contributes to tertiary structure as well via a covalent bond between cysteine side chain groups also termed an S-S bridge. Disulphide bonds are most often broken by use of reducing agents leaving free sulfhydryl groups available that allow the protein to unfold more fully. Dithiothreitol (DTT), tris (2-carboxyethyl) phosphine (TCEP), tris (3-hydroxypropyl) phosphine (THPP) and 2-mercaptoethanol (2-ME) are the most commonly used reducing agents. Sulphur containing reagents such as 2-ME and DTT break the S-S bridge by thiol-disulfide exchange, while phosphorus containing reagents form a phosphine oxide as a result of disulphide bond reduction [8]. Reduction is commonly followed by free sulfhydryl group alkylation to prevent disulphide bond reformation. In this chemistry a free sulfhydryl group performs a nucleophilic attack on the alpha carbon of an alkylating reagent creating a covalent bond between the alkyl group and cysteine. There is a wide palette of alkylating reagents that may be used, but in proteomic sample preparation the most commonly used reagents include iodoacetamide, iodoacetic acid, N-ethylmaleimide (NEM) and S-methyl methanethiosulfonate. Covalent modification of a free sulfhydryl group leaves a mass tag on each cysteine that must be considered as a mass shift to cysteine during interpretation of peptide tandem mass spectra. Alkylated proteins are then further processed by proteolytic cleavage, to shorter segments; peptides, which are then easily detected in a bottom-up experiment carried out by LC–MS/MS analysis. As mentioned above peptides may be produced by enzymatic methods but also chemical methods that can be either specific or unspecific (**Table 1**). In both cases there are a variety of protocols available to digest proteins into peptides for mass spectrometry-based proteomic analysis. Bottom-up proteomics frequently relies on proteolytic enzymes that digest a protein at specific sites. Having predictable digestion rules for a given protease results in a faster database search process that also makes it computationally less demanding and more accurate. Trypsin is the most common protease in bottom-up proteomics cleaving peptide bonds at the C-terminus of arginine and lysine when not followed by proline [16]. Notably, maintaining an optimal temperature of 37°C at a pH optimum between 7 and 8 in the presence of Ca2+ ions in the digestion buffer is important for the reaction to proceed efficiently [17]. The optimal enzyme to substrate ratio is also important and for trypsin this is often from 1:20 to 1:100 (w:w). In some instances LysC endoproteinase, which is isolated from *Lysobacter enzymogenes*, is often combined with trypsin to provide cleavage at lysine C-terminus. This combination of multiple enzymes is used to enhance peptide sequence coverage by producing overlapping peptides. The addition of chymotrypsin and pepsin produce the most orthogonal peptides to trypsin. Chymotrypsin is a serine protease which cleaves a peptide bond at the C-terminus of large hydrophobic side chain amino acids such as phenylalanine, tryptophan, tyrosine and leucine. Chymotrypsin performs best in a 1:50 (w:w) enzyme to substrate ratio at basic pH and a temperature around 37°C. Chymotrypsin is also activated and stabilized by the presence of Ca2+ ions, therefore it is beneficial to use digestion buffers containing calcium ions (e.g. CaCl2) [18]. Pepsin is an endopeptidase that is secreted in gastric chief cells as an inactive precursor called pepsinogen that becomes activated by cleavage of an N-terminal pro-segment in acidic conditions. The optimal enzymatic activity of pepsin is achieved at pH 1.5–2.5 and 37°C. Pepsin cleaves at the C-terminus of phenylalanine, leucine and rarely after histidine and lysine, unless they are adjacent to leucine or phenylalanine. Pepsin is frequently used for on-column protein digestion in hydrogen-deuterium exchange experiments (HDX), but also an application

in off-line pressure assisted protein digestion has been reported [19].

#### *Trends in Sample Preparation for Proteome Analysis DOI: http://dx.doi.org/10.5772/intechopen.95962*

*Mass Spectrometry in Life Sciences and Clinical Laboratory*

Traditional detergents and chaotropes such as SDS and urea have been widely used for protein solubilization, however they are also well known to inhibit digestion at higher concentrations and are incompatible with reversed phase chromatography separation (RPLC) used to introduce samples for MS analysis. Therefore, their concentration must be kept low at the time of proteolysis in order to preserve the effectiveness of proteases used for protein digestion. Failure to do so often leads to incomplete protein solubilization and denaturation. Also, presence of detergents in the sample might interfere with later instrumental analysis, therefore there have been different purification methods developed for detergent removal to improve LC–MS outcome. The choice of the most effective procedure depends on the physicochemical properties of the detergent. Some of the procedures might include detergent removal on the basis of size exclusion (i.e. molecular weight cut-off filters) or with the use of spin columns containing appropriate resins for detergent removal. Moreover, heating of the sample in urea buffers often leads to covalent modification of proteins via carbamylation, which might affect peptide retention time during RPLC separation and if not accounted for will interfere with identification. In order to circumvent these problems caused by mass spectrometry incompatible detergents significant effort went into development of reagents that avoid these complications. To this end, acid labile detergents such as RapiGest SF surfactant™ (Waters) and PPS Silent Surfactant™ (Expedeon) were developed that could be easily removed after proteolysis by simple measures like decreasing the pH. For example, the MS compatible surfactant ProteaseMAX™ (Promega) surfactant enhances tryptic, chymotryptic and LysC digestion and then degrades during the course of a digestion reaction. Another compound, Invitrosol™ (Thermo Fisher Scientific) is a homogenous surfactant that does not impact tryptic digestion and elutes during RPLC in three peaks well separated from where peptides elute [6].

*Overview of sample preparation for bottom-up proteomic analysis by tandem mass spectrometry. A) Sample lysis: proteins are extracted from biological matrix in lysis buffer. Mechanical disintegration or sonification is used to homogenize rigid structures present within samples such as is common in mammalian tissue. B) Protein digestion: proteins are proteolytically digested into peptides, usually by the protease trypsin. C) Peptide fractionation: optionally, the complexity of the peptide sample is decreased by addition of fractionation steps orthogonal to methods used in the next step. D) Mass spectrometry analysis: desalted peptide samples are dissolved in an appropriate buffer and introduced into a tandem mass spectrometer. Most often reversed phase liquid chromatography separation is used in this final step to enable sequential introduction of peptides into the* 

Classical bottom-up proteomic sample preparation aims to turn protein extracts into peptides via a process of protein cleavage or digestion with proteases. Notably, proteins extracted from biological material tend to keep their native tertiary structure mostly held by non-covalent interactions of amino acid side groups [7].

**14**

**3.2 Sample digestion**

**3.1 Detergents**

*tandem mass spectrometer.*

**Figure 2.**

It is thus essential to disrupt the tertiary structure and linearize the protein sequence to ease the accessibility of proteases to cleavage sites. Protein tertiary structure is frequently disrupted by chaotropic and denaturing reagents. Disulfide bonding contributes to tertiary structure as well via a covalent bond between cysteine side chain groups also termed an S-S bridge. Disulphide bonds are most often broken by use of reducing agents leaving free sulfhydryl groups available that allow the protein to unfold more fully. Dithiothreitol (DTT), tris (2-carboxyethyl) phosphine (TCEP), tris (3-hydroxypropyl) phosphine (THPP) and 2-mercaptoethanol (2-ME) are the most commonly used reducing agents. Sulphur containing reagents such as 2-ME and DTT break the S-S bridge by thiol-disulfide exchange, while phosphorus containing reagents form a phosphine oxide as a result of disulphide bond reduction [8]. Reduction is commonly followed by free sulfhydryl group alkylation to prevent disulphide bond reformation. In this chemistry a free sulfhydryl group performs a nucleophilic attack on the alpha carbon of an alkylating reagent creating a covalent bond between the alkyl group and cysteine. There is a wide palette of alkylating reagents that may be used, but in proteomic sample preparation the most commonly used reagents include iodoacetamide, iodoacetic acid, N-ethylmaleimide (NEM) and S-methyl methanethiosulfonate. Covalent modification of a free sulfhydryl group leaves a mass tag on each cysteine that must be considered as a mass shift to cysteine during interpretation of peptide tandem mass spectra. Alkylated proteins are then further processed by proteolytic cleavage, to shorter segments; peptides, which are then easily detected in a bottom-up experiment carried out by LC–MS/MS analysis. As mentioned above peptides may be produced by enzymatic methods but also chemical methods that can be either specific or unspecific (**Table 1**). In both cases there are a variety of protocols available to digest proteins into peptides for mass spectrometry-based proteomic analysis.

Bottom-up proteomics frequently relies on proteolytic enzymes that digest a protein at specific sites. Having predictable digestion rules for a given protease results in a faster database search process that also makes it computationally less demanding and more accurate. Trypsin is the most common protease in bottom-up proteomics cleaving peptide bonds at the C-terminus of arginine and lysine when not followed by proline [16]. Notably, maintaining an optimal temperature of 37°C at a pH optimum between 7 and 8 in the presence of Ca2+ ions in the digestion buffer is important for the reaction to proceed efficiently [17]. The optimal enzyme to substrate ratio is also important and for trypsin this is often from 1:20 to 1:100 (w:w). In some instances LysC endoproteinase, which is isolated from *Lysobacter enzymogenes*, is often combined with trypsin to provide cleavage at lysine C-terminus. This combination of multiple enzymes is used to enhance peptide sequence coverage by producing overlapping peptides. The addition of chymotrypsin and pepsin produce the most orthogonal peptides to trypsin. Chymotrypsin is a serine protease which cleaves a peptide bond at the C-terminus of large hydrophobic side chain amino acids such as phenylalanine, tryptophan, tyrosine and leucine. Chymotrypsin performs best in a 1:50 (w:w) enzyme to substrate ratio at basic pH and a temperature around 37°C. Chymotrypsin is also activated and stabilized by the presence of Ca2+ ions, therefore it is beneficial to use digestion buffers containing calcium ions (e.g. CaCl2) [18]. Pepsin is an endopeptidase that is secreted in gastric chief cells as an inactive precursor called pepsinogen that becomes activated by cleavage of an N-terminal pro-segment in acidic conditions. The optimal enzymatic activity of pepsin is achieved at pH 1.5–2.5 and 37°C. Pepsin cleaves at the C-terminus of phenylalanine, leucine and rarely after histidine and lysine, unless they are adjacent to leucine or phenylalanine. Pepsin is frequently used for on-column protein digestion in hydrogen-deuterium exchange experiments (HDX), but also an application in off-line pressure assisted protein digestion has been reported [19].


**Table 1.** *Proteases used for proteolytic digestion of protein extracts retrieved from biological material such as tissue, body fluids or cell extract. Table 1 presents the enzyme class, pH and temperature optimum, inorganic ion cofactor and specificity of protease. In addition a representative application and literature source is given.*

**17**

*Trends in Sample Preparation for Proteome Analysis DOI: http://dx.doi.org/10.5772/intechopen.95962*

age achieved by trypsin digestion alone [23].

GluC, ArgC, LysN, AspN are also popular proteases in bottom-up proteomics as they predictably produce complementary or orthogonal peptides to trypsin with different substrate affinities. GluC is a serine protease isolated from *Staphylococcus aureus* with specificity dependent on the digestion buffer composition. For example, performing proteolysis in phosphate buffers will lead to cleavage at the C-terminus of glutamic acid and asparatic acid, but only cleavage at the C-terminus of glutamic acid will be catalysed in ammonium acetate (pH 4.0) and ammonium bicarbonate (pH 7.8) buffers [20]. GluC is known to perform optimally under pH 4.0 and pH 7.8 at 37°C while it is stable in denaturing conditions. ArgC, isolated from *Clostridium histolyticum*, is a cysteine endopeptidase cleaving at the C-terminus of arginine and sometimes at the C-terminus of lysine. Its pH optimum is 7.6 and Ca2+ ions also enhance its activity. ArgC digestion has recently been considered an alternative to the conventional trypsin digestion as it cleaves at the C-terminus of arginine. LysN is a metalloprotease that cleaves at the N-terminus of lysine and it is resistant to denaturation allowing digests to proceed even at temperatures higher than mentioned above. AspN is a selective metalloproteinase isolated from *Flavobacterium menigosepticum* requiring zinc atoms for its catalytical activity [21]. Its endopeptidase activity is specific to the N-terminus of aspartic acid or cysteic acid. To maintain optimal enzymatic activity it is recommended to include ZnSO4 in the digestion solution buffered between pH 6.5–8.0 at a temperature of 40°C. Combining AspN with trypsin digestion increases data quality and increases protein coverage [22]. WaLP and MaLP are less known proteases cleaving at aliphatic amino acids, which makes them popular for membrane proteomic applications. Meyer et al. demonstrated that combination of data from trypsin, LysC, WaLP and MaLP digestion leads to an increase in membrane proteome coverage by 101%, compared to cover-

Broad specificity protease digestion is less common to bottom-up sample preparation, nevertheless it is used to digest rigid protein structures that resist digestion using common proteases. Proteinase K is one such serine endopeptidase isolated from fungus *Engyodontium album* that cleaves protein sequences with a broad specificity and like others discussed above requires Ca2+ ions for activity. Generally, it cleaves at the C-terminus of aromatic or aliphatic amino acids and is able to digest proteins in their native state or in presence of detergent such as SDS and Triton-X 100, but works best at alkaline pH 7.5–12.0 and 37°C. Most frequently, it is used for nucleic acid purification to remove proteins, but it is also suitable for some proteomic applications such as non-specific digestion of membrane proteins, protease footprinting or prion digestion. As the name implies thermolysin is a thermostable metalloproteinase isolated from *Bacillus thermoproteolyticus*. Thermolysin requires zinc and calcium ions for proteolytic activity but remains active in temperatures from 65–85°C and between pH 5.0 to 8.5. It cleaves at the N-terminus of alanine, methionine, isoleucine, leucine, valine and phenylalanine and is often used to digest proteins that resist proteolysis using conventional proteases [24]. Papain and elastase have endopeptidase activity and broad specificity that while available are rarely used in bottom-up sample preparation. Elastase is a serine endopeptidase that cleaves at the C-terminus of small hydrophobic side chains such as glycine, valine, isoleucine and leucine. While, papain is cysteine endopeptidase that cleaves at the C-terminus of arginine and lysine if it is preceded by hydrophobic amino acid, but not succeeded by valine. Subtilisin is a serine endopeptidase isolated from soil bacteria (e.g. *Bacillus licheniformis*) that is known to non-specifically cleave the peptide bond with a preference for large uncharged amino acids, although amino acids with basic side chains can be accepted in alternate binding mode [25]. Subtilisin remains active and stable under denaturing and alkaline conditions ranging from pH 8–12 and Ca2+ ions stabilize subtilisin structure, therefore it is essential to include CaCl2

#### *Mass Spectrometry in Life Sciences and Clinical Laboratory*

#### *Trends in Sample Preparation for Proteome Analysis DOI: http://dx.doi.org/10.5772/intechopen.95962*

*Mass Spectrometry in Life Sciences and Clinical Laboratory*

**16**

**Protease**

Trypsin

LysC LysN Chymotrypsin

Pepsin Thermolysin

AspN

GluC ArgC CNBr

**Table 1.**

Chemical

—

—

*Proteases used for proteolytic digestion of protein extracts retrieved from biological material such as tissue, body fluids or cell extract. Table*

*optimum, inorganic ion cofactor and specificity of protease. In addition a representative application and literature source is given.*

Met (C-term)

Cysteine

7.2–8.0/Ca2+

37

Arg, Lys (C-term)

Metalloproteinase

Serine

4.0, 7.8

37

6.5–8.0 / Zn2+

40

Metalloproteinase

5.0–8.5 / Zn2+

65–85

**Class** Serine Serine Metalloproteinase

Serine Aspartic

1.5-2.5

37

8/Ca2+

37

7-9/Zn2+

Thermostable

8.5

37

**pH range/ion**

7–8/Ca2+

37

**t [°C]**

**Cleavage specificity**

Arg, Lys (C-term)

Lys (C-term) Lys (N-term) Hydrophobic AAs (C-term)

Preferentially Phe, Leu (C-term)

Ala, Met, Ile, Leu, Val, Phe

(N-term)

Asp (N-term) Glu, Asp (C-term)

**Example application**

Primary central nervous system lymphoma

Whole liver SDS lysates

HEK 293 cells Cerebrospinal fluid (CSF)

Human liver tissue

Human liver tissue

Brain and liver tissue from C57BL/6 mouse

Cerebrospinal fluid (CSF), brain and liver tissue from

C57BL/6 mouse

Cerebrospinal fluid (CSF), brain and liver tissue from

C57BL/6 mouse

Extracellular matrix of human mammary and liver

tissue

*1 presents the enzyme class, pH and temperature* 

[14]

[12, 14]

[12, 14]

[13, 15]

**Reference**

[9]

[10]

[11]

[12]

[13]

[13]

GluC, ArgC, LysN, AspN are also popular proteases in bottom-up proteomics as they predictably produce complementary or orthogonal peptides to trypsin with different substrate affinities. GluC is a serine protease isolated from *Staphylococcus aureus* with specificity dependent on the digestion buffer composition. For example, performing proteolysis in phosphate buffers will lead to cleavage at the C-terminus of glutamic acid and asparatic acid, but only cleavage at the C-terminus of glutamic acid will be catalysed in ammonium acetate (pH 4.0) and ammonium bicarbonate (pH 7.8) buffers [20]. GluC is known to perform optimally under pH 4.0 and pH 7.8 at 37°C while it is stable in denaturing conditions. ArgC, isolated from *Clostridium histolyticum*, is a cysteine endopeptidase cleaving at the C-terminus of arginine and sometimes at the C-terminus of lysine. Its pH optimum is 7.6 and Ca2+ ions also enhance its activity. ArgC digestion has recently been considered an alternative to the conventional trypsin digestion as it cleaves at the C-terminus of arginine. LysN is a metalloprotease that cleaves at the N-terminus of lysine and it is resistant to denaturation allowing digests to proceed even at temperatures higher than mentioned above. AspN is a selective metalloproteinase isolated from *Flavobacterium menigosepticum* requiring zinc atoms for its catalytical activity [21]. Its endopeptidase activity is specific to the N-terminus of aspartic acid or cysteic acid. To maintain optimal enzymatic activity it is recommended to include ZnSO4 in the digestion solution buffered between pH 6.5–8.0 at a temperature of 40°C. Combining AspN with trypsin digestion increases data quality and increases protein coverage [22]. WaLP and MaLP are less known proteases cleaving at aliphatic amino acids, which makes them popular for membrane proteomic applications. Meyer et al. demonstrated that combination of data from trypsin, LysC, WaLP and MaLP digestion leads to an increase in membrane proteome coverage by 101%, compared to coverage achieved by trypsin digestion alone [23].

Broad specificity protease digestion is less common to bottom-up sample preparation, nevertheless it is used to digest rigid protein structures that resist digestion using common proteases. Proteinase K is one such serine endopeptidase isolated from fungus *Engyodontium album* that cleaves protein sequences with a broad specificity and like others discussed above requires Ca2+ ions for activity. Generally, it cleaves at the C-terminus of aromatic or aliphatic amino acids and is able to digest proteins in their native state or in presence of detergent such as SDS and Triton-X 100, but works best at alkaline pH 7.5–12.0 and 37°C. Most frequently, it is used for nucleic acid purification to remove proteins, but it is also suitable for some proteomic applications such as non-specific digestion of membrane proteins, protease footprinting or prion digestion. As the name implies thermolysin is a thermostable metalloproteinase isolated from *Bacillus thermoproteolyticus*. Thermolysin requires zinc and calcium ions for proteolytic activity but remains active in temperatures from 65–85°C and between pH 5.0 to 8.5. It cleaves at the N-terminus of alanine, methionine, isoleucine, leucine, valine and phenylalanine and is often used to digest proteins that resist proteolysis using conventional proteases [24]. Papain and elastase have endopeptidase activity and broad specificity that while available are rarely used in bottom-up sample preparation. Elastase is a serine endopeptidase that cleaves at the C-terminus of small hydrophobic side chains such as glycine, valine, isoleucine and leucine. While, papain is cysteine endopeptidase that cleaves at the C-terminus of arginine and lysine if it is preceded by hydrophobic amino acid, but not succeeded by valine. Subtilisin is a serine endopeptidase isolated from soil bacteria (e.g. *Bacillus licheniformis*) that is known to non-specifically cleave the peptide bond with a preference for large uncharged amino acids, although amino acids with basic side chains can be accepted in alternate binding mode [25]. Subtilisin remains active and stable under denaturing and alkaline conditions ranging from pH 8–12 and Ca2+ ions stabilize subtilisin structure, therefore it is essential to include CaCl2

in a digestion buffer. Subtilisin's use in bottom-up proteomics is quite limited due to its wide range of specificity, nevertheless it has been reported that it could be used to reveal previously hidden areas of the proteome [26]. Cathepsins form a large group of proteases with endopeptidase activity. Their use in proteomics is not frequent but nevertheless some uses have been reported. Cathepsin L is a cysteine protease located in lysosomes, it is physiologically involved in tissue remodeling and in diseases such as cancer metastasis. Cathepsin L is catalytically active at pH 3.0– 6.5 in the presence of thiol compounds [27]. Digestion using Cathepsin L has been reported in research of histone N-termini. Cathepsin C is a N-terminal dipeptidase physiologically involved in activation of serine proteases and inflammatory cells [28]. Its use in proteomic sample preparation is limited, as its cleavage is unspecific. Nevertheless, it could serve as a potent tool to generate peptides orthogonal to conventional proteases.

Thrombin is a serine protease which is proteolytically activated during the clotting process from an inactive prothrombin precursor. It is exclusively specific towards the Leu-Val-Pro-Arg-Gly-Ser motif. Therefore, it is most often used to cleave a specific linker tethered to another peptide with this sequence motif inserted into recombinant fusion protein constructs. There is a wide palette of these type of protein tag removal endopeptidases; namely Factor Xa cleaving Leu-Val-Pro-Arg-Gly-Ser motif, Enteropeptidase cleaving Asp-Asp-Asp-Asp-Lys motif, TEV Protease cleaving Glu-Asn-Leu-Tyr-Phe-Gln-Gly motif, Rhinovirus 3C Protease cleaving Leu-Glu-Val-Leu-Phe-Gln-Gly-Pro motif and several others [29]. Further details of protein tag removal proteases will not be discussed as it does not fall within scope of this chapter.

Finally, it should be noted that reproducible protein cleavage could be achieved even in non-enzymatic reactions mediated by chemical reagents. The most frequent chemical reagents to cleave peptide bond are dilute acids, such as hydrochloric acid, formic acid and acetic acid or other reagents such as cyanogen bromide (CNBr), hydroxylamine and 2-nitro-5-thiocyanobenzoate (NTCB) [30]. Exposure of proteins to dilute acids results in kinetically favored cleavage of peptide bonds at asparagine but with time others as well, while CNBr cleaves at less abundant methionine [31]. NTCB is specific towards cysteine, while hydroxyl amine reagent cleaves peptide bonds at asparagine and glycine. Generally, chemical mediated cleavage targets peptide bonds of less common amino acids producing long peptides useful in middle-down proteomics [30].

### **4. Technologies for analysis of limited sample amounts**

Given that there is no technology to amplify proteins as may be done for nucleic acids with polymerase chain reaction, historically proteomics has faced limitations in terms of the amount of starting material required for success. Traditional proteomics approaches to sample preparation such as filter-aided sample preparation (FASP), in-gel digestion, and in-solution digestion typically require at least several micrograms of a protein sample, which can be complicated to retrieve from representative clinical samples that are by default limited in availability. Therefore, the traditional method of defining proteomes has generally produced knowledge on the underlying biology that reflect averages rendered from analysis of mixtures of cells of different types present in tissue.

As proteomics and the requisite mass spectrometry instrumentation have evolved, microscale proteomic pipelines that decrease the amount of protein required to sub-microgram levels have become available. Microscale proteomics pipelines rely on modifications of traditional proteomics pipelines frequently

**19**

**Figure 3.**

*Trends in Sample Preparation for Proteome Analysis DOI: http://dx.doi.org/10.5772/intechopen.95962*

difficult across laboratories worldwide.

accompanied with cell sorting, laser capture tissue microdissection (LCM) or single cell extraction methods. Microdevices such as nano-capillary columns, microfluidic chips, miniaturised ESI introduction interfaces and miniaturised enzyme reactors are often required [32]. Introducing microscale proteomics provides a clearer picture of reality as it substantially increases sensitivity, spatial proteome resolution and leads to better understanding of how protein networks coincide on microscopic level. Despite obvious benefits, microscale proteomics still requires special instrumentation making implementation of these protocols for the moment some what

One recent promising such technology is nanoPOTS (nanodroplet processing in one pot for trace samples) (**Figure 3A**). The nanoPOTS platform is intended for processing small cell populations in nanoliter volumes. NanoPOTS benefits from downscaling the processing volumes that in turn substantially reduces surface associated sample losses. The final step of nanoPOTS is accompanied with solid phase extraction (SPE) that concentrates, desalts and efficiently introduces a sample to nanoLC fluidics. Recently, a modification of nanoPOTS termed microPOTS was reported that is a more adoptable variant not requiring a robotic platform [33]. It has been reported that nanoPOTS could identify >3000 proteins from 10 cultured mammalian cells, while microPOTS has been reported to reproducibly identify up to 1200 and 1800 proteins from 25 HeLa cells and 50 mm square mouse liver tissue, respectively [33]. Several nanoPOTS modifications have been reported since it was

*Modern limited proteomic sample preparation approaches. (A) NanoPOTS; A limited proteomic sample preparation protocol that uses an automated robotic platform operating with nanoliter volumes. Sample is processed in a nano-well patterned slide. Sample preparation is based on principles of classical in-solution protein digestion. Protein digest is then transferred into SPE cartridge, where peptides are desalted and concentrated. Following, peptides are separated and analysed using mass spectrometry. (B) SCoPE-MS; a single cell proteome analysis platform. Carrier proteome is used to overcome sample losses accompanied due to peptide adsorption to surfaces. TMT labelling identifies the carrier and analysed proteomes. It could also serve for relative quantification of compared proteomes (SCoPE-MS2). Protein presence in the investigated sample and* 

*its quantity is determined based on reporter ion intensity.*

#### *Trends in Sample Preparation for Proteome Analysis DOI: http://dx.doi.org/10.5772/intechopen.95962*

*Mass Spectrometry in Life Sciences and Clinical Laboratory*

conventional proteases.

this chapter.

useful in middle-down proteomics [30].

cells of different types present in tissue.

**4. Technologies for analysis of limited sample amounts**

in a digestion buffer. Subtilisin's use in bottom-up proteomics is quite limited due to its wide range of specificity, nevertheless it has been reported that it could be used to reveal previously hidden areas of the proteome [26]. Cathepsins form a large group of proteases with endopeptidase activity. Their use in proteomics is not frequent but nevertheless some uses have been reported. Cathepsin L is a cysteine protease located in lysosomes, it is physiologically involved in tissue remodeling and in diseases such as cancer metastasis. Cathepsin L is catalytically active at pH 3.0– 6.5 in the presence of thiol compounds [27]. Digestion using Cathepsin L has been reported in research of histone N-termini. Cathepsin C is a N-terminal dipeptidase physiologically involved in activation of serine proteases and inflammatory cells [28]. Its use in proteomic sample preparation is limited, as its cleavage is unspecific. Nevertheless, it could serve as a potent tool to generate peptides orthogonal to

Thrombin is a serine protease which is proteolytically activated during the clotting process from an inactive prothrombin precursor. It is exclusively specific towards the Leu-Val-Pro-Arg-Gly-Ser motif. Therefore, it is most often used to cleave a specific linker tethered to another peptide with this sequence motif inserted into recombinant fusion protein constructs. There is a wide palette of these type of protein tag removal endopeptidases; namely Factor Xa cleaving Leu-Val-Pro-Arg-Gly-Ser motif, Enteropeptidase cleaving Asp-Asp-Asp-Asp-Lys motif, TEV Protease cleaving Glu-Asn-Leu-Tyr-Phe-Gln-Gly motif, Rhinovirus 3C Protease cleaving Leu-Glu-Val-Leu-Phe-Gln-Gly-Pro motif and several others [29]. Further details of protein tag removal proteases will not be discussed as it does not fall within scope of

Finally, it should be noted that reproducible protein cleavage could be achieved even in non-enzymatic reactions mediated by chemical reagents. The most frequent chemical reagents to cleave peptide bond are dilute acids, such as hydrochloric acid, formic acid and acetic acid or other reagents such as cyanogen bromide (CNBr), hydroxylamine and 2-nitro-5-thiocyanobenzoate (NTCB) [30]. Exposure of proteins to dilute acids results in kinetically favored cleavage of peptide bonds at asparagine but with time others as well, while CNBr cleaves at less abundant methionine [31]. NTCB is specific towards cysteine, while hydroxyl amine reagent cleaves peptide bonds at asparagine and glycine. Generally, chemical mediated cleavage targets peptide bonds of less common amino acids producing long peptides

Given that there is no technology to amplify proteins as may be done for nucleic

acids with polymerase chain reaction, historically proteomics has faced limitations in terms of the amount of starting material required for success. Traditional proteomics approaches to sample preparation such as filter-aided sample preparation (FASP), in-gel digestion, and in-solution digestion typically require at least several micrograms of a protein sample, which can be complicated to retrieve from representative clinical samples that are by default limited in availability. Therefore, the traditional method of defining proteomes has generally produced knowledge on the underlying biology that reflect averages rendered from analysis of mixtures of

As proteomics and the requisite mass spectrometry instrumentation have evolved, microscale proteomic pipelines that decrease the amount of protein required to sub-microgram levels have become available. Microscale proteomics pipelines rely on modifications of traditional proteomics pipelines frequently

**18**

accompanied with cell sorting, laser capture tissue microdissection (LCM) or single cell extraction methods. Microdevices such as nano-capillary columns, microfluidic chips, miniaturised ESI introduction interfaces and miniaturised enzyme reactors are often required [32]. Introducing microscale proteomics provides a clearer picture of reality as it substantially increases sensitivity, spatial proteome resolution and leads to better understanding of how protein networks coincide on microscopic level. Despite obvious benefits, microscale proteomics still requires special instrumentation making implementation of these protocols for the moment some what difficult across laboratories worldwide.

One recent promising such technology is nanoPOTS (nanodroplet processing in one pot for trace samples) (**Figure 3A**). The nanoPOTS platform is intended for processing small cell populations in nanoliter volumes. NanoPOTS benefits from downscaling the processing volumes that in turn substantially reduces surface associated sample losses. The final step of nanoPOTS is accompanied with solid phase extraction (SPE) that concentrates, desalts and efficiently introduces a sample to nanoLC fluidics. Recently, a modification of nanoPOTS termed microPOTS was reported that is a more adoptable variant not requiring a robotic platform [33]. It has been reported that nanoPOTS could identify >3000 proteins from 10 cultured mammalian cells, while microPOTS has been reported to reproducibly identify up to 1200 and 1800 proteins from 25 HeLa cells and 50 mm square mouse liver tissue, respectively [33]. Several nanoPOTS modifications have been reported since it was

#### **Figure 3.**

*Modern limited proteomic sample preparation approaches. (A) NanoPOTS; A limited proteomic sample preparation protocol that uses an automated robotic platform operating with nanoliter volumes. Sample is processed in a nano-well patterned slide. Sample preparation is based on principles of classical in-solution protein digestion. Protein digest is then transferred into SPE cartridge, where peptides are desalted and concentrated. Following, peptides are separated and analysed using mass spectrometry. (B) SCoPE-MS; a single cell proteome analysis platform. Carrier proteome is used to overcome sample losses accompanied due to peptide adsorption to surfaces. TMT labelling identifies the carrier and analysed proteomes. It could also serve for relative quantification of compared proteomes (SCoPE-MS2). Protein presence in the investigated sample and its quantity is determined based on reporter ion intensity.*

introduced. For example, Zhu et al. claim that a combination of nanoPOTS with fluorescence activated cell sorting (FACS) could detect 670 protein groups from a single mammalian cell [34]. Later a combination of nanoPOTS, nanoLC separation operated at 20 nL/min and Orbitrap Eclipse and Tribrid mass spectrometer led even to a slight increase in sensitivity identifying ~1000 protein groups from a single HeLa cell [35]. Extraordinary low sample requirements predispose nanoPOTS to being useful for LC–MS/MS tissue imaging. Spatially resolved proteomic maps of a mouse blastocyst embedding into placenta have been produced using a combination of nanoPOTS and LCM. The nanoPOTS - LCM combination produced quantitative tissue images for >2000 proteins with 100-μm spatial resolution which substantially outperformed classical protein imaging mass spectrometry (IMS) [36]. The universality of nanoPOTS is well documented in several publications summarising results from pancreas, liver brain tissue thin sections as well as plant samples.

Achieving submicrogram detection limits has also been reached by introducing a carrier proteome to decrease adsorption of the proteome of interest in combination with TMT labelling (**Figure 3B**). The carrier proteome spike-in helped the method known as Single-Cell-ProtEomics-by-Mass-Spectrometry (SCoPE MS) to overcome extensive losses due to adsorption of proteins to surfaces (e.g. LC columns) while the addition of TMT labelling identifies the carrier and analysed proteomes. Moreover, TMT labels enable relative protein quantitation of multiple samples/ conditions per one LC–MS run. The SCoPE MS approach has enabled detection of >1000 proteins from a single mouse embryonic stem cell [37]. Specht et al. further exploited quantitative potential of TMT labels and claimed to reproducibly quantitate >1000 proteins in a SCoPE MS experiment investigating differentiating monocytes heterogeneity [38].

Introducing on-column immobilised protease digestion (IMER) downscales sample requirements up to the sub-microgram level, especially when combined with miniaturised column diameter. Utilising various nanostructured materials such as nanoporous material, nanoparticles, nanofibers and nanotubes succeeded in IMER nanobiocatalysis as it has led to enzyme stabilisation and increasing apparent enzyme activity per unit mass of immobilisation host [39]. Several sub-microgram proteomic setups combining IMER with downstream microfluidic platforms have been reported [40–42].

The microfluidic platform termed Open tubular lab-on-column combines LysC and trypsin enzymatic digestion on 20 mm inner diameter (ID) column with on-line connected nano LC–MS/MS system. Open tubular lab-on-column benefits from very narrow capillary ID and IMER column ID that prevent excessive peptide dilution and adsorption to fluidics. The authors detected a biomarker Axin 1 in 10 ng of HCT15 colon cancer cells [40]. Huang et al. characterised 348 proteins from 25 mice blastocysts on a platform termed SNaPP coupling enzymatic digestion on 150 mm ID IMER to nanofluidics [41]. Naldi et al. coupled SCX column-based IMER proteomic reactor to nano-proteomic platform capable of protein capture, reduction, alkylation, digestion and the first dimensional SCX peptide pre-separation followed by LC–MS/MS. These authors claim that the platform performs with as low as 200 ng protein starting material [42]. Moreover, the integrated Proteome Analysis Device (iPAD) couples a 10 port valve, digestion loop and SPE trap column in a microfluidic setup that is intended for micro sample preparation prior to mass spectrometry. The authors claim that the iPAD approach is capable of identifying 813 proteins in approx. 100 Duke's type C colorectal adenocarcinoma [43].

Capillary electrophoresis (CE) is an efficient and sensitive separation technique reliably resolving proteins or peptides. Historically, it has been less robust than nanoLC but recently this has begun to change. Specifically, the introduction of CE-ESI interfaces that do not lead to an excessive peptide dilution have made

**21**

*Trends in Sample Preparation for Proteome Analysis DOI: http://dx.doi.org/10.5772/intechopen.95962*

development of microscale CE-MS proteomics.

**5. Conclusions and future perspectives**

been used.

uncover therapeutic targets.

**Acknowledgements**

CE-MS applicable in microproteomics [44]. Several reports describe various proteomic pipelines coupling CE to MS. An ultrasensitive electrokinetically pumped nanospray ionization source coupled with CE was able to identify 283 proteins from 80 ng of MCF7 breast cancer cells. Moreover, the detection limit of spiked-in angiotensin II in bovine serum albumin digest was 2 attomole/injection [45]. Although animal proteomics does not fall within scope of this chapter it is worth mentioning that CE-MS input allowed analysis down to 50 ng of *Xenopus laevis* eggs in a single protein extract. The authors of this study used linear polyacrylamide coating and sulfonate-silica hybrid strong cation exchange monolith for SPE followed by CE-MS [46]. Combining SPE with CE in 2D manner is a promising candidate for the future

Developments in proteomics to identify clinically relevant proteins has been widely used in scientific research. Sample preparation has been considered as one of the key steps during analysis, and as such a variety of protocols to minimize variability and to obtain best sensitivity and protein recovery from the material have

Constant development of technologies that could be applied in a medical context and potentially used for screening of patient samples have been rising in recent years. Technological evolution has also had an impact to provide platforms for proteome screening of limited cell numbers, i.e. some technologies have clearly demonstrated success on the single cell level. Cellular heterogeneity at the cellular level results during tumour development that can confound analysis. Therefore, advancement of the tools for profiling of cellular subpopulations or regions of tumours has great potential to provide novel insight in mechanisms of tumour growth. Moreover, integration of developed tools with machine learning algorithms to discover and map molecules that manifest pathological development will likely lead to a better understanding of mechanisms of oncogenesis and potentially

This work was supported by the International Centre for Cancer Vaccine Science, carried out within the International Research Agendas program of the Foundation for Polish Science, co-financed by the European Union under the European Regional Development Fund. The University of Victoria-Genome BC Proteomics Centre is grateful to Genome Canada and Genome British Columbia for financial support for Genomics Technology Platforms (GTP) funding for

operations and technology development (264PRO).

*Trends in Sample Preparation for Proteome Analysis DOI: http://dx.doi.org/10.5772/intechopen.95962*

*Mass Spectrometry in Life Sciences and Clinical Laboratory*

monocytes heterogeneity [38].

been reported [40–42].

introduced. For example, Zhu et al. claim that a combination of nanoPOTS with fluorescence activated cell sorting (FACS) could detect 670 protein groups from a single mammalian cell [34]. Later a combination of nanoPOTS, nanoLC separation operated at 20 nL/min and Orbitrap Eclipse and Tribrid mass spectrometer led even to a slight increase in sensitivity identifying ~1000 protein groups from a single HeLa cell [35]. Extraordinary low sample requirements predispose nanoPOTS to being useful for LC–MS/MS tissue imaging. Spatially resolved proteomic maps of a mouse blastocyst embedding into placenta have been produced using a combination of nanoPOTS and LCM. The nanoPOTS - LCM combination produced quantitative tissue images for >2000 proteins with 100-μm spatial resolution which substantially outperformed classical protein imaging mass spectrometry (IMS) [36]. The universality of nanoPOTS is well documented in several publications summarising results

from pancreas, liver brain tissue thin sections as well as plant samples.

Achieving submicrogram detection limits has also been reached by introducing a carrier proteome to decrease adsorption of the proteome of interest in combination with TMT labelling (**Figure 3B**). The carrier proteome spike-in helped the method known as Single-Cell-ProtEomics-by-Mass-Spectrometry (SCoPE MS) to overcome extensive losses due to adsorption of proteins to surfaces (e.g. LC columns) while the addition of TMT labelling identifies the carrier and analysed proteomes. Moreover, TMT labels enable relative protein quantitation of multiple samples/ conditions per one LC–MS run. The SCoPE MS approach has enabled detection of >1000 proteins from a single mouse embryonic stem cell [37]. Specht et al. further exploited quantitative potential of TMT labels and claimed to reproducibly quantitate >1000 proteins in a SCoPE MS experiment investigating differentiating

Introducing on-column immobilised protease digestion (IMER) downscales sample requirements up to the sub-microgram level, especially when combined with miniaturised column diameter. Utilising various nanostructured materials such as nanoporous material, nanoparticles, nanofibers and nanotubes succeeded in IMER nanobiocatalysis as it has led to enzyme stabilisation and increasing apparent enzyme activity per unit mass of immobilisation host [39]. Several sub-microgram proteomic setups combining IMER with downstream microfluidic platforms have

The microfluidic platform termed Open tubular lab-on-column combines LysC and trypsin enzymatic digestion on 20 mm inner diameter (ID) column with on-line connected nano LC–MS/MS system. Open tubular lab-on-column benefits from very narrow capillary ID and IMER column ID that prevent excessive peptide dilution and adsorption to fluidics. The authors detected a biomarker Axin 1 in 10 ng of HCT15 colon cancer cells [40]. Huang et al. characterised 348 proteins from 25 mice blastocysts on a platform termed SNaPP coupling enzymatic digestion on 150 mm ID IMER to nanofluidics [41]. Naldi et al. coupled SCX column-based IMER proteomic reactor to nano-proteomic platform capable of protein capture, reduction, alkylation, digestion and the first dimensional SCX peptide pre-separation followed by LC–MS/MS. These authors claim that the platform performs with as low as 200 ng protein starting material [42]. Moreover, the integrated Proteome Analysis Device (iPAD) couples a 10 port valve, digestion loop and SPE trap column in a microfluidic setup that is intended for micro sample preparation prior to mass spectrometry. The authors claim that the iPAD approach is capable of identifying 813 proteins in approx. 100 Duke's type C colorectal adenocarcinoma [43]. Capillary electrophoresis (CE) is an efficient and sensitive separation technique reliably resolving proteins or peptides. Historically, it has been less robust than nanoLC but recently this has begun to change. Specifically, the introduction of CE-ESI interfaces that do not lead to an excessive peptide dilution have made

**20**

CE-MS applicable in microproteomics [44]. Several reports describe various proteomic pipelines coupling CE to MS. An ultrasensitive electrokinetically pumped nanospray ionization source coupled with CE was able to identify 283 proteins from 80 ng of MCF7 breast cancer cells. Moreover, the detection limit of spiked-in angiotensin II in bovine serum albumin digest was 2 attomole/injection [45]. Although animal proteomics does not fall within scope of this chapter it is worth mentioning that CE-MS input allowed analysis down to 50 ng of *Xenopus laevis* eggs in a single protein extract. The authors of this study used linear polyacrylamide coating and sulfonate-silica hybrid strong cation exchange monolith for SPE followed by CE-MS [46]. Combining SPE with CE in 2D manner is a promising candidate for the future development of microscale CE-MS proteomics.
