5. Collection of experimental and theoretical data on the platform of 2DE gel

be observed. Considering that only 96 sections from a small gel (8 cm 8 cm) were taken for analysis, the situation can be improved significantly by increasing the gel size, the sampleloading and the number of sections. This approach also solves the problem with a sensitivity of the staining. However, there are still some issues that need to be tackled. One of them is a resolution, which drops largely if we cut the gel into big sections. Ideally, the size of these sections should be close to the size of the smallest spots. But in this case, we are facing a dramatically increased amount of samples (around 1000). As there is a limitation in the processing time of each sample by ESI LC–MS/MS (usually at least 0.5 h per sample), we will need about 1 month of continuous work on the Orbitrap to analyze these samples from a single 2DE gel. Another issue is proteoform quantitation. There are several options available. The exponentially modified form of protein abundance index (emPAI) [28, 43, 44] is not ideal, since it gives only a relative and not very accurate estimation of the protein content. Another option is MaxQuant that recently became possibly the most frequently used platforms for mass-spectrometry (MS)-based proteomics data analysis [45]. Since its release in 2008, it has been improved substantially [45, 46]. Selected reaction monitoring (SRM) and isotope-coded affinity tag (ICAT) are other excellent examples of the power of MS technology in protein quantitation [47–49]. There are also more

The overall approach has been used to generate annotated 2DE gel databases for many cell types. The first 2DE database, SWISS-2DPAGE database has been launched in 1993 and is maintained by the Central Clinical Chemistry Laboratory of the Geneva University Hospital and the Swiss Institute of Bioinformatics (SIB). Now this database is a part of the World-2DPAGE (http://world-2dpage.expasy.org/list/), a dynamic portal to query simultaneously world-wide gel-based proteomics databases. These databases put together over 250 maps for 23 species, totalizing nearly 40,000 identified spots, making it the biggest gel-based proteomics dataset accessible from a single interface. Here, we can select a 2DE map which will be displayed for inspection. The database can be queried by keywords (protein description, protein name, gene name, species, author, full text, protein spot serial number) or graphically by clicking on a spot. Each spot is linked to a page containing the corresponding gene (protein) information and identification details. Also, information is displayed about other spots in different maps, where product of the same gene is detected. All these spots are highlighted in the maps and the calculated parameters [isoelectric point (pI) and molecular weight (Mw)] are displayed. There is a possibility for cross-references and obtaining more information from different 2DE databases and from UniProtKB. UniProtKB, a comprehensive protein sequence knowledge base has two sections: UniProtKB/Swiss-Prot, which is manually curated and UniProtKB/TrEMBL that contains computer-annotated entries. UniProtKB/Swiss-Prot entries provide users with cross-links to about 100 external databases and with access to additional

approaches that are described and reviewed [50, 51].

4. Databases based on 2DE gels

98 Electrophoresis - Life Sciences Practical Applications

information or tools [52].

The proteomes can be retrieved from experimental data or generated by available programs, which calculate theoretical protein parameters [27, 53, 54]. According to their basic parameters (pI/Mw), the unique principles of separation of polypeptides allow to organize the crosstalk between experimental and theoretical data. In a simple way, this approach was realized in abovementioned 2DE databases. The main idea behind the approach, where each spot is considered as containing only one protein, works well only with simple proteomes (for instance, mycoplasm), where the number of proteoforms are not so big as in mammalian cells [11, 12, 55, 56]. A European pathogenic microorganism proteome database focused on pathogenic microorganisms was launched. Now, under the name "The Proteome 2D-PAGE Database," this database currently contains 13,893 identified spots and 3245 mass peak lists in 57 reference maps representing experiments from 26 different organisms and strains. The database provides protein information such as ORF name, predicted isoelectric point (pI) and molecular weight (Mw), several protein identifiers, identification method, sequence coverage, and so on. [57]. This database contains information about mammalian cells as well. Therefore, multiple proteins can be attributed to the same spot in case of these cells.

Figure 6. Relationship between theoretically (in silico) determined pI (left) or mw (right) of proteins (canonical forms) and experimentally detected pI/mw of proteoforms. Top – HepG2 cells, bottom – human depleted plasma. Adapted from Naryzhny et al. [29].

The idea to manage the theoretical proteomes and link them with experimental, 2DE-based data has been tried to realize in another online database, DynaProt 2D [58]. It was developed for dynamic access to proteomes and 2DE gels. Here, a 2DE gel could serve as a reference map and as a tool for navigation of the database [58]. Integrated into 2DE database a complete theoretical proteome could provide a powerful tool allowing simply linking newly identified spots to the already available appropriate theoretical data [58]. But this idea was tried for one organism only, Lactococcus lactis, and stopped in realization.

one. It can be usually because of adding negative charges (phosphorylation) or removing positive charges (acetylation). An opposite situation with increased pI (the dots are under the diagonal) is observed; then, PTMs are removing negative charges from proteins (esterification). In case of Mw, location of a dot above the diagonal means that theoretical mass (weight) of this polypeptide is bigger than the experimentally observed one. It could be if the polypeptide is truncated or proteolytically processed. If the dots are observed below the diagonal, it can be because of several reasons. It can happen because of technical issues of 2DE procedure (protein polymerization, aggregation or precipitation). Glycosylation also can strongly increase a polypeptide mass. This situation can be analyzed in more detail by representation of all proteoforms corresponding to the same gene on the separate chart

Two-Dimensional Gel Electrophoresis as an Information Base for Human Proteome

http://dx.doi.org/10.5772/intechopen.75125

101

The continuing evolution of the detection technique (mostly mass spectrometry) and usage of it in combination with optimum protein separation techniques will finally allow us to reach the main aim of the HUPO—image of the whole human proteome. A union of such a classic proteomics method for separation of proteins as 2DE with bottom-up mass spectrometry (shot-gun analysis of peptides by ESI LC-MS/MS) is an efficient approach for increasing the productivity of tandem mass-spectrometry. Additionally, this union of top-down and bottomup approaches allows very convenient visual representation (profiling) of information about diverse proteoforms. As 2DE maps are a convenient and effective way to represent information about proteomes and navigate around all its proteoforms, it will allow the construction of a knowledge base for an inventory of all human protein species/proteoforms, that is, visually

Particularly, the development of chromosome-centric interactive virtual 2DE maps of proteins coded by specific genes in combination with experimental 2DE protein maps will allow executing more effectively C-HPP, to estimate more accurately the number of proteoforms and could be a basis for the knowledge base of human proteins. The development of such inventory will be based on existing databases like http://world-2dpage.expasy.org/, https://www.nextprot.org/,

The study was performed in frames of Program of Fundamental Research of State Academies of Sciences for 2013-2020. Mass spectrometry measurements were performed using the equipment of "Human Proteome" Core Facilities of the Institute of Biomedical Chemistry (Russia) which is supported by Ministry of Education and Science of the Russian Federation (agree-

(Figure 7).

6. Conclusion

attractive, clear, easy to search and perceptive.

Acknowledgements

http://www.uniprot.org/ and http://atlas.topdownproteomics.org.

ment 14.621.21.0017, unique project ID RFMEFI62117X0017).

If we plot experimentally measured physicochemical parameters of proteoforms (pI or Mw) against the theoretical ones, the general view of the proteome according to the diversity of proteoforms is revealed (Figure 6). Here, the dots are distributed along the diagonal in the graph, if experimentally detected parameters match or are close to the theoretical values. Otherwise, dots are distributed above or below the diagonal, and the bigger the difference between theoretical and experimental parameters, the bigger the deviation of the dot position from the diagonal. In particular, in case of pI, the location of proteoform dots above the diagonal shows that the experimental pIs of this proteoform is smaller than the theoretical

Figure 7. The examples of variety of proteoforms in HepG2 cells (top) and human plasma (bottom). The arrow shows the predicted location of the master polypeptide (polypeptide coded by the canonical sequence). Adapted from Naryzhny et al. [29].

one. It can be usually because of adding negative charges (phosphorylation) or removing positive charges (acetylation). An opposite situation with increased pI (the dots are under the diagonal) is observed; then, PTMs are removing negative charges from proteins (esterification). In case of Mw, location of a dot above the diagonal means that theoretical mass (weight) of this polypeptide is bigger than the experimentally observed one. It could be if the polypeptide is truncated or proteolytically processed. If the dots are observed below the diagonal, it can be because of several reasons. It can happen because of technical issues of 2DE procedure (protein polymerization, aggregation or precipitation). Glycosylation also can strongly increase a polypeptide mass. This situation can be analyzed in more detail by representation of all proteoforms corresponding to the same gene on the separate chart (Figure 7).
