2. The principles of protein separation by 2DE

Application of different electrophoretic separation methods in two-dimensional combinations has a long history [16–23]. Continuous development on improvement of electrophoretic techniques and working out new support media for separation of proteins allowed to achieve better and better resolution. Finally, the best combination was chosen and optimized by O'Farrell in 1975 [24]. The method is based on separations in completely denaturing conditions according to two independent parameters (pI and Mw). Isoelectric focusing (IEF) that separates proteins due to difference in their isoelectric points (pI) is used as a first direction. High resolution according to protein size or weight (Mw) in the second direction is achieved by SDS polyacrylamide gel electrophoresis (SDS-PAGE) [23]. Quickly, the method was accepted by scientific community as the most powerful approach for separation of complex protein samples [25, 26]. Besides the high resolution, an important part of this method is that all stages of separation are performed in denaturing conditions, so in case of complex mixtures like cell extracts, all proteins or proteoforms are separated according to their basic parameters (pI and Mw) of their primary chemical structure (Figure 1). This is very important, as it allows not only to separate proteoforms but also to experimentally determine their pI and Mw. Also, these parameters can be calculated based on information about the chemical structure of the

proteoforms. Accordingly, two-dimensional separation of proteoforms can be performed not

Figure 1. 2DE map of HepG2 proteins. Spots with proteins identified by MALDI TOF-MS are annotated. Sections in 2D

Two-Dimensional Gel Electrophoresis as an Information Base for Human Proteome

http://dx.doi.org/10.5772/intechopen.75125

93

gel selected for following LC ESI-MS/MS analysis are shown. Adapted from Naryzhny et al. [28].

Separation with high resolution of proteoforms is the main function of 2DE, but extra steps should be done toward obtaining necessary information. Protein identification is the next very important step. Immunological methods as most specific approaches were mainly used for this

only experimentally but virtually as well [27–29] (Figure 2).

3. Information produced by 2DE gels

Two-Dimensional Gel Electrophoresis as an Information Base for Human Proteome http://dx.doi.org/10.5772/intechopen.75125 93

Figure 1. 2DE map of HepG2 proteins. Spots with proteins identified by MALDI TOF-MS are annotated. Sections in 2D gel selected for following LC ESI-MS/MS analysis are shown. Adapted from Naryzhny et al. [28].

proteoforms. Accordingly, two-dimensional separation of proteoforms can be performed not only experimentally but virtually as well [27–29] (Figure 2).

### 3. Information produced by 2DE gels

expression of a single gene. Each proteoform is a chemically clearly defined molecule. These molecules are different due to genetic variation, alternatively spliced RNA, transcripts and posttranslational modifications [5, 7]. Accordingly, the term protein refers to its coding gene and, therefore, becomes as the umbrella term for all developing proteoforms/protein species [6]. Sometimes the term "proteoform" is used for the description of structural variants of proteins as well [8]. But it will make the issue of terminology in proteomics more complicated and confused, as even inside the abovementioned definition, a proteomics field of proteoforms is very broad and could encompass many billions of components [9–12]. For instance, all combinations of 30 known modifications of histone H3 alone can theoretically produce more than 1 billion of proteoforms [10, 13]. Because of such a variety, huge range of concentration (7–8 orders of magnitude in blood plasma), and dynamic changes during life cycle, their identification, quantitation and database organization is a serious challenge. Nevertheless, there is evident progress in this area. So far, the main workhorse in proteomics was bottom-up mass spectrometry, but the top-down approach is becoming pre-eminent today [14]. Top-down proteomics implies that mass spectrometry is applied at the proteoform level, allowing the acquisition of information about all intramolecular complexity preserved during analysis, that might be overlooked in bottom-up shotgun workflows [14, 15]. But top-down proteomics cannot be just a one-step procedure. There are also several approaches based on protein separation that are involved in proteoform analysis. Among these methods, two-dimensional gel electrophoresis (2DE) occupies the special place. Accordingly, different schemes could be used to establish a basis for a comprehensive

Application of different electrophoretic separation methods in two-dimensional combinations has a long history [16–23]. Continuous development on improvement of electrophoretic techniques and working out new support media for separation of proteins allowed to achieve better and better resolution. Finally, the best combination was chosen and optimized by O'Farrell in 1975 [24]. The method is based on separations in completely denaturing conditions according to two independent parameters (pI and Mw). Isoelectric focusing (IEF) that separates proteins due to difference in their isoelectric points (pI) is used as a first direction. High resolution according to protein size or weight (Mw) in the second direction is achieved by SDS polyacrylamide gel electrophoresis (SDS-PAGE) [23]. Quickly, the method was accepted by scientific community as the most powerful approach for separation of complex protein samples [25, 26]. Besides the high resolution, an important part of this method is that all stages of separation are performed in denaturing conditions, so in case of complex mixtures like cell extracts, all proteins or proteoforms are separated according to their basic parameters (pI and Mw) of their primary chemical structure (Figure 1). This is very important, as it allows not only to separate proteoforms but also to experimentally determine their pI and Mw. Also, these parameters can be calculated based on information about the chemical structure of the

knowledge base for protein/proteoform inventory.

92 Electrophoresis - Life Sciences Practical Applications

2. The principles of protein separation by 2DE

Separation with high resolution of proteoforms is the main function of 2DE, but extra steps should be done toward obtaining necessary information. Protein identification is the next very important step. Immunological methods as most specific approaches were mainly used for this

Figure 2. A virtual 2DE map of proteins coded by human chromosome 18.

purpose. They are still successfully used in 2DE-based proteomics for protein identification (Western blot) (Figure 3).

To study protein-protein interactions, so-called Far-Western blot is used (Figure 4) [32, 33]. In this case, the proteins after 2DE separation are transferred to a membrane and then treated with a set of buffers, which apply washes that allow "prey" proteins to denature and renature [32–34]. The membrane is then blocked and probed with a purified "bait" protein (the protein, which binding partners ("preys") need to be detected). The bait protein is detected on spots where the prey protein is located if the bait proteins and the prey protein bind together. For detection of bait proteins, they can be labeled, or antibodies can be used [32, 33]. This technique is not only very informative but also laborious and not convenient for a large-scale analysis. The situation was radically improved when mass spectrometry became a central element for proteomics analysis [35, 36].

protein identification. Using more sophisticated MS instruments (especially ESI LC–MS/MS), it was revealed that, depending on the gel resolution, the 2DE spots often contain more than a single protein, especially in the case of mammalian cells [29, 42]. Accordingly, the quantitation of proteins became an ambiguous task, as the densitometry of spots cannot be used for accurate proteoform quantitation. Fortunately, the special MS-based quantitative approaches can be applied in this case. In addition, as only the proteins detected as stained spots are analyzed, a lot of information is missing. This problem can be solved only by analyzing all parts of the gel [28, 43]. The general view of this method is shown in Figure 1. The main steps

Figure 3. Multiple spots of PCNA (proteoforms) can be detected by 2DE. After 2DE separation and transfer to Immobilon-P membrane, immunostaining was performed using PCNA-specific antibody PC10. A, M, B, T and D denote acidic, main, basic, probable trimer and probable dimer forms of PCNA, respectively. U1 and U2 are ubiquitinated forms of PCNA monomer. H is a cluster of PCNA hydrolysis products, which is produced by proteasome action [30]. (A) 2DE (pH 4–7, 13 cm – first dimension, 10% SDS-PAGE, 13 cm – second dimension). (B) 2DE (pH 4–5, 18 cm – first dimension,

Two-Dimensional Gel Electrophoresis as an Information Base for Human Proteome

http://dx.doi.org/10.5772/intechopen.75125

95

are as follows:

1. 2DE separation.

2. Staining the gel with Coomassie R350.

3. Scanning the image produced (2DE map).

10% SDS-PAGE, 13 cm – second dimension). Adapted from Naryzhny [31].

The MS approach for protein identification by searching for the best match between peptide masses produced by specific hydrolysis and peptide masses calculated from theoretical cleavage of proteins was developed simultaneously by different groups [37–41]. In this case, the peptide mass sets are acquired by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This approach was named peptide mass fingerprinting (PMF). For a long time, PMF has been the most powerful technique for high-throughput

Figure 3. Multiple spots of PCNA (proteoforms) can be detected by 2DE. After 2DE separation and transfer to Immobilon-P membrane, immunostaining was performed using PCNA-specific antibody PC10. A, M, B, T and D denote acidic, main, basic, probable trimer and probable dimer forms of PCNA, respectively. U1 and U2 are ubiquitinated forms of PCNA monomer. H is a cluster of PCNA hydrolysis products, which is produced by proteasome action [30]. (A) 2DE (pH 4–7, 13 cm – first dimension, 10% SDS-PAGE, 13 cm – second dimension). (B) 2DE (pH 4–5, 18 cm – first dimension, 10% SDS-PAGE, 13 cm – second dimension). Adapted from Naryzhny [31].

protein identification. Using more sophisticated MS instruments (especially ESI LC–MS/MS), it was revealed that, depending on the gel resolution, the 2DE spots often contain more than a single protein, especially in the case of mammalian cells [29, 42]. Accordingly, the quantitation of proteins became an ambiguous task, as the densitometry of spots cannot be used for accurate proteoform quantitation. Fortunately, the special MS-based quantitative approaches can be applied in this case. In addition, as only the proteins detected as stained spots are analyzed, a lot of information is missing. This problem can be solved only by analyzing all parts of the gel [28, 43]. The general view of this method is shown in Figure 1. The main steps are as follows:

1. 2DE separation.

purpose. They are still successfully used in 2DE-based proteomics for protein identification

To study protein-protein interactions, so-called Far-Western blot is used (Figure 4) [32, 33]. In this case, the proteins after 2DE separation are transferred to a membrane and then treated with a set of buffers, which apply washes that allow "prey" proteins to denature and renature [32–34]. The membrane is then blocked and probed with a purified "bait" protein (the protein, which binding partners ("preys") need to be detected). The bait protein is detected on spots where the prey protein is located if the bait proteins and the prey protein bind together. For detection of bait proteins, they can be labeled, or antibodies can be used [32, 33]. This technique is not only very informative but also laborious and not convenient for a large-scale analysis. The situation was radically improved when mass spectrometry became a central element for proteomics analysis

The MS approach for protein identification by searching for the best match between peptide masses produced by specific hydrolysis and peptide masses calculated from theoretical cleavage of proteins was developed simultaneously by different groups [37–41]. In this case, the peptide mass sets are acquired by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This approach was named peptide mass fingerprinting (PMF). For a long time, PMF has been the most powerful technique for high-throughput

(Western blot) (Figure 3).

94 Electrophoresis - Life Sciences Practical Applications

Figure 2. A virtual 2DE map of proteins coded by human chromosome 18.

[35, 36].


4. Analyzes and calibration of the 2DE map by Image Master 2DE Platinum (GE Healthcare, Pittsburgh, PA, USA). Calibration was performed according to the position of several major protein spots that had been previously identified: actin cytoplasmic (ACTB\_HUMAN, pI 5.29/ Mw 42,052), 78 kDa glucose-regulated protein (GRP78\_HUMAN, pI 5.07/Mw 72,333), tropomyosin alpha-3 chain (TPM3\_HUMAN, pI 4.68/Mw 32,950), stathmin 1 (STMN\_HUMAN, pI 5.76/Mw 17,292) and alpha-enolase (ENOA\_HUMAN, pI 7.01/Mw 47,481).

6. The treatment of each section with trypsin according to the protocol for mass-spectrometry

Two-Dimensional Gel Electrophoresis as an Information Base for Human Proteome

http://dx.doi.org/10.5772/intechopen.75125

97

7. Analysis of the tryptic peptides obtained from each 2DE section by Orbitrap Q-Exactive

8. Protein identification and relative quantification are performed using Mascot "2.4.1" and

In total, up to 500 unique proteins were identified in each section of 2DE gel after separation of proteins from glioblastoma cell extract (Figure 1). All proteins detected in the same section

Respectively, the same proteins detected in different sections were considered as different proteoforms. About 20,000 proteoforms coded by ~4000 genes were identified using this approach [28, 43]. Additionally, 3D graphs represented gene centric expression of proteoforms can be generated (Figure 5). Here, the proteoform profiles for each gene can

Figure 5. The examples of 3D graphs showing distribution of proteoforms between different sections of the 2DE map. A semiquantitative (estimated by emPAI) distribution of the same protein (gene) around the different gel sections was plotted. Proteins, the potential biomarkers, are shown. Reproduced with permission from Naryzhny et al. [43].

exponentially modified form of protein abundance index (emPAI).

by ESI LC–MS/MS.

mass spectrometer.

were given the pI/Mw parameters of this section.

5. The separation of the gel into 96 sections (identified as 1–12 along the Mw dimension and A–H along the pI dimension). According to the calibration, each section is given pI/Mw coordinates.

Figure 4. Identification of PCNA-binding proteins in normal and cancer cells by 2DE-Far-Western blotting. Protein extracts from HMEC (A and B) or MDA-MB468 (C and D) were separated by 2DE, transferred onto a membrane, renatured and incubated with PCNA. PCNA-binding proteins were then detected with the PC10 monoclonal anti-PCNA antibody. (B and D) Proteins separated by 2DE were stained with Coomassie R350, protein spots corresponding to signals on the membrane (A and C) were identified, cut out, and the identity of the proteins was determined by mass spectrometry. Adapted from Naryzhny et al. [32].

6. The treatment of each section with trypsin according to the protocol for mass-spectrometry by ESI LC–MS/MS.

4. Analyzes and calibration of the 2DE map by Image Master 2DE Platinum (GE Healthcare, Pittsburgh, PA, USA). Calibration was performed according to the position of several major protein spots that had been previously identified: actin cytoplasmic (ACTB\_HUMAN, pI 5.29/ Mw 42,052), 78 kDa glucose-regulated protein (GRP78\_HUMAN, pI 5.07/Mw 72,333), tropomyosin alpha-3 chain (TPM3\_HUMAN, pI 4.68/Mw 32,950), stathmin 1 (STMN\_HUMAN, pI

5. The separation of the gel into 96 sections (identified as 1–12 along the Mw dimension and A–H along the pI dimension). According to the calibration, each section is given pI/Mw

Figure 4. Identification of PCNA-binding proteins in normal and cancer cells by 2DE-Far-Western blotting. Protein extracts from HMEC (A and B) or MDA-MB468 (C and D) were separated by 2DE, transferred onto a membrane, renatured and incubated with PCNA. PCNA-binding proteins were then detected with the PC10 monoclonal anti-PCNA antibody. (B and D) Proteins separated by 2DE were stained with Coomassie R350, protein spots corresponding to signals on the membrane (A and C) were identified, cut out, and the identity of the proteins was determined by mass spectrom-

5.76/Mw 17,292) and alpha-enolase (ENOA\_HUMAN, pI 7.01/Mw 47,481).

coordinates.

96 Electrophoresis - Life Sciences Practical Applications

etry. Adapted from Naryzhny et al. [32].


In total, up to 500 unique proteins were identified in each section of 2DE gel after separation of proteins from glioblastoma cell extract (Figure 1). All proteins detected in the same section were given the pI/Mw parameters of this section.

Respectively, the same proteins detected in different sections were considered as different proteoforms. About 20,000 proteoforms coded by ~4000 genes were identified using this approach [28, 43]. Additionally, 3D graphs represented gene centric expression of proteoforms can be generated (Figure 5). Here, the proteoform profiles for each gene can

Figure 5. The examples of 3D graphs showing distribution of proteoforms between different sections of the 2DE map. A semiquantitative (estimated by emPAI) distribution of the same protein (gene) around the different gel sections was plotted. Proteins, the potential biomarkers, are shown. Reproduced with permission from Naryzhny et al. [43].

be observed. Considering that only 96 sections from a small gel (8 cm 8 cm) were taken for analysis, the situation can be improved significantly by increasing the gel size, the sampleloading and the number of sections. This approach also solves the problem with a sensitivity of the staining. However, there are still some issues that need to be tackled. One of them is a resolution, which drops largely if we cut the gel into big sections. Ideally, the size of these sections should be close to the size of the smallest spots. But in this case, we are facing a dramatically increased amount of samples (around 1000). As there is a limitation in the processing time of each sample by ESI LC–MS/MS (usually at least 0.5 h per sample), we will need about 1 month of continuous work on the Orbitrap to analyze these samples from a single 2DE gel. Another issue is proteoform quantitation. There are several options available. The exponentially modified form of protein abundance index (emPAI) [28, 43, 44] is not ideal, since it gives only a relative and not very accurate estimation of the protein content. Another option is MaxQuant that recently became possibly the most frequently used platforms for mass-spectrometry (MS)-based proteomics data analysis [45]. Since its release in 2008, it has been improved substantially [45, 46]. Selected reaction monitoring (SRM) and isotope-coded affinity tag (ICAT) are other excellent examples of the power of MS technology in protein quantitation [47–49]. There are also more approaches that are described and reviewed [50, 51].

5. Collection of experimental and theoretical data on the platform of

multiple proteins can be attributed to the same spot in case of these cells.

The proteomes can be retrieved from experimental data or generated by available programs, which calculate theoretical protein parameters [27, 53, 54]. According to their basic parameters (pI/Mw), the unique principles of separation of polypeptides allow to organize the crosstalk between experimental and theoretical data. In a simple way, this approach was realized in abovementioned 2DE databases. The main idea behind the approach, where each spot is considered as containing only one protein, works well only with simple proteomes (for instance, mycoplasm), where the number of proteoforms are not so big as in mammalian cells [11, 12, 55, 56]. A European pathogenic microorganism proteome database focused on pathogenic microorganisms was launched. Now, under the name "The Proteome 2D-PAGE Database," this database currently contains 13,893 identified spots and 3245 mass peak lists in 57 reference maps representing experiments from 26 different organisms and strains. The database provides protein information such as ORF name, predicted isoelectric point (pI) and molecular weight (Mw), several protein identifiers, identification method, sequence coverage, and so on. [57]. This database contains information about mammalian cells as well. Therefore,

Two-Dimensional Gel Electrophoresis as an Information Base for Human Proteome

http://dx.doi.org/10.5772/intechopen.75125

99

Figure 6. Relationship between theoretically (in silico) determined pI (left) or mw (right) of proteins (canonical forms) and experimentally detected pI/mw of proteoforms. Top – HepG2 cells, bottom – human depleted plasma. Adapted from

2DE gel

Naryzhny et al. [29].

#### 4. Databases based on 2DE gels

The overall approach has been used to generate annotated 2DE gel databases for many cell types. The first 2DE database, SWISS-2DPAGE database has been launched in 1993 and is maintained by the Central Clinical Chemistry Laboratory of the Geneva University Hospital and the Swiss Institute of Bioinformatics (SIB). Now this database is a part of the World-2DPAGE (http://world-2dpage.expasy.org/list/), a dynamic portal to query simultaneously world-wide gel-based proteomics databases. These databases put together over 250 maps for 23 species, totalizing nearly 40,000 identified spots, making it the biggest gel-based proteomics dataset accessible from a single interface. Here, we can select a 2DE map which will be displayed for inspection. The database can be queried by keywords (protein description, protein name, gene name, species, author, full text, protein spot serial number) or graphically by clicking on a spot. Each spot is linked to a page containing the corresponding gene (protein) information and identification details. Also, information is displayed about other spots in different maps, where product of the same gene is detected. All these spots are highlighted in the maps and the calculated parameters [isoelectric point (pI) and molecular weight (Mw)] are displayed. There is a possibility for cross-references and obtaining more information from different 2DE databases and from UniProtKB. UniProtKB, a comprehensive protein sequence knowledge base has two sections: UniProtKB/Swiss-Prot, which is manually curated and UniProtKB/TrEMBL that contains computer-annotated entries. UniProtKB/Swiss-Prot entries provide users with cross-links to about 100 external databases and with access to additional information or tools [52].
