**2. Technologies used in proteomic studies.**

In recent years, for the purpose of complementing the information obtained by means of ge‐ nome sequencing and transcriptome, proteomics, one of the dimensions of the post-genome era [2], arises with a set of highly powerful techniques for separation and identification of proteins in biological samples, allowing better understanding of the networks of cellular op‐ eration and regulation upon representing the link between the genotype and the phenotype

A Comprehensive Survey of International Soybean Research - Genetics, Physiology, Agronomy and Nitrogen

For the aforementioned reasons, proteomic analysis is now one of the most efficient means for functional study of the genes and genomes of complex organisms [3]. This has generated new data, as well as validated, complemented and even corrected information obtained through other approaches, thus contributing to better understanding of plant biology.

**Figure 1.** Pathways in which gene and protein expression may be regulated or modified in transcription or in post-

Its study involves the entire set of proteins expressed by the genome of a cell, or only those that are expressed differentially under specific conditions. Also it is directed to the set of protein isoforms and post-translational modifications, to the interactions among them, as

Bidimensional electrophoresis and mass spectrometry are the core technologies of proteo‐ mics, although new methodologies are being applied to plants for specific studies [4,5,6]. Among the most recent proteomic techniques are Difference Gel Electrophoresis (DIGE) and Multi-dimensional Protein Identification Tecnology (MudPIT), used in separation of pro‐ teins from a complex mixture. Other methods involved are Stable Isotopic Labeling using Amino Acids in Cell Culture (SILAC), Isotope Coded Affinity Tag (ICAT) and Isobaric Tag for Relative and Absolute Quantitation (iTRAQ) are based on labeling with isotopes for

In spite of the recent nature of research in this area, diverse studies with soybeans using pro‐ teomic tools are being performed throughout the world, showing this to be a promising area for selection of genotypes for genetic breeding programs [7,8]. Moreover, the study of plant responses to infections from pathogens has supplied significant data for understanding the signaling process that triggers the defense response in plants [9]. Additionally, there are

well as to the structural description of molecules and their complexes.

quantification of molecules by mass spectrometry.

of an organism.

Relationships

532

translation [13].

Execution of a proteomic study involves the integration of many technologies which perme‐ ate the fields of molecular biology, biochemistry, physiology, statistics and bioinformatics, among other areas. The key steps in this type of study are separation of complex mixtures of proteins and their identification.

Separation is performed through the use of electrophoresis a term created by Michaelis in 1909. The first electrophoresis of proteins (Figure 2) was performed in 1937. Alfenas (1998) [14] explains that electrophoresis aims at separation of molecules in terms of their electrical charges, their molecular weights and their conformations, in porous supports and appropri‐ ate buffers, under the influence of a continuous electrical field. Molecules with a preponder‐ ance of negative charges migrate in the electrical field to the positive pole (anode), and molecules with excess of positive charges migrate to the negative pole (cathode). The pre‐ ponderant charge of a proteic molecule is in accordance with its amino acids.

Many of the technologies currently used in proteomics were developed much before the be‐ ginning of proteomics, as is the case of electrophoresis. Nevertheless, it was the advance in protein sequencing technology by means of mass spectrometry that allowed its emergence and development [15].

The study of proteomics may be performed by means of techniques like two-dimensional electrophoresis in polyacrylamide gel (2D PAGE) followed by mass spectrometry (MS) (Fig‐ ure 3), or furthermore, more recently, by the association of ionization and chromatographic

methods, among others, which increase detection sensitivity even more. Nevertheless, the point of departure has still been the exposure of a large number of proteins from a cell line or organism in two-dimensional polyacrylamide gels [16,17,18].

**Figure 2.** Polyacrylamide-gel electrophoresis (SDS-PAGE) used in proteome analysis [19].

### **2.1. Two-dimensional polyacrylamide gel electrophoresis (2D PAGE).**

Two-dimensional polyacrylamide gel electrophoresis constitutes an analytical method capa‐ ble of separating hundreds of proteins in a single analytical run. In this case, the gel, with the sample already applied, is submitted to an electrical field for two-dimensional separa‐ tion. In the first dimension, separation occurs through isoelectric focalization, in which physical separation of the proteins occurs in terms of their respective isoelectric points on a strip of polyacrylamide with continuous gradation and known pH (IPG - immobilized pH gradient) submitted to increasing voltage. In the second dimension, the proteins under focus are submitted to polyacrylamide gel electrophoresis in the presence of SDS (SDS-PAGE) for separation according to their specific molecular masses (Figure 4). Thus, this is a technique that separates the proteins through different charges and masses.

The result of two-dimensional electrophoresis is a profile of spot distribution formed by sin‐ gle proteins or simple mixtures of proteins [21]. Each spot visualized in the gel may be con‐ sidered as an orthogonal coordinate of a protein that migrated specifically in accordance with its isoelectric point (x axis) and its molecular mass (y axis), as shown in Figure 4.

The next step consists of staining the gel with silver, Coomassie blue, fluorescence, radioac‐ tive labeling or specific markers for phosphoproteins and glycoproteins, among others. This allows visualization of the protein expression pattern and photodocumentation of the gel (Figure 5). After that, sectioning and digestion of selected spots of the gel are carried out and, finally, proteins of interest are identified by mass spectrometry integrated with a bioin‐ formatics tool.

methods, among others, which increase detection sensitivity even more. Nevertheless, the point of departure has still been the exposure of a large number of proteins from a cell line

A Comprehensive Survey of International Soybean Research - Genetics, Physiology, Agronomy and Nitrogen

or organism in two-dimensional polyacrylamide gels [16,17,18].

Relationships

534

**Figure 2.** Polyacrylamide-gel electrophoresis (SDS-PAGE) used in proteome analysis [19].

**2.1. Two-dimensional polyacrylamide gel electrophoresis (2D PAGE).**

that separates the proteins through different charges and masses.

Two-dimensional polyacrylamide gel electrophoresis constitutes an analytical method capa‐ ble of separating hundreds of proteins in a single analytical run. In this case, the gel, with the sample already applied, is submitted to an electrical field for two-dimensional separa‐ tion. In the first dimension, separation occurs through isoelectric focalization, in which physical separation of the proteins occurs in terms of their respective isoelectric points on a strip of polyacrylamide with continuous gradation and known pH (IPG - immobilized pH gradient) submitted to increasing voltage. In the second dimension, the proteins under focus are submitted to polyacrylamide gel electrophoresis in the presence of SDS (SDS-PAGE) for separation according to their specific molecular masses (Figure 4). Thus, this is a technique

The result of two-dimensional electrophoresis is a profile of spot distribution formed by sin‐ gle proteins or simple mixtures of proteins [21]. Each spot visualized in the gel may be con‐ sidered as an orthogonal coordinate of a protein that migrated specifically in accordance

The next step consists of staining the gel with silver, Coomassie blue, fluorescence, radioac‐ tive labeling or specific markers for phosphoproteins and glycoproteins, among others. This allows visualization of the protein expression pattern and photodocumentation of the gel

with its isoelectric point (x axis) and its molecular mass (y axis), as shown in Figure 4.

**Figure 3.** Stages of plant proteomics, using interface two-dimensional electrophoresis (2D-PAGE) and mass spectrom‐ etry [20].

**Figure 4.** Two-dimensional electrophoresis 2D-PAGE used in analysis of proteomes [19].

Two-dimensional electrophoresis gels reflect the protein expression pattern of the biolog‐ ical sample analyzed and allow detection of variation of even a single amino acid be‐ tween two isoforms or covalent modifications in the same protein thanks to change in the position of the spot.

It is important to highlight that each sample, depending on its nature, requires a specific type of processing for extraction and focalization. Therefore, it is expected that the user checks beforehand in related publications as to the protocols and methodologies that best suit the experimental needs.

Some limitations are associated with two-dimensional electrophoresis, such as low reprodu‐ cibility and little power of automation. Nevertheless, reproducibility may be increased by defining optimal conditions for the electrophoresis, while automation of the process is only possible in relation to analysis of gels. Gel analysis software determines the spots and identi‐ fies those expressed differentially and their volumes, inferring a relative quantification of expression of that protein in comparison to the same spot of another gel [22]. Thus, by a process of subtraction, the differences among the different samples are revealed, as, for ex‐ ample, the presence, absence or intensity of the proteins. Thus, the proteins of interest may then be identified based on knowledge of the isoelectric point and of apparent molecular weight, determined by the two-dimensional gels [23].

**Figure 5.** Proteins extracted and separated by two-dimensional (2D) gel electrophoresis and stained with Coomassie blue [24].

### **2.2. Differential in gel electrophoresis (DIGE).**

An efficient procedure in the attempt to eliminate variation from gel to gel is use of the tech‐ nique of differential in gel electrophoresis or DIGE (Figure 6), which allows analysis of up to three proteomes in a single gel. These results in one internal pattern common to all the gels and two different samples labeled with distinct fluorophores (CyDye) [25]. That way, only the proteins labeled with their own fluorophore are visualized. In addition, this technique uses labeling of proteins with a broad dynamic range of detection and has sensitivity greater than staining of the gels by silver methods, allowing proteomic studies of a quantitative na‐ ture to be performed with greater precision, accuracy and sensitivity [26].

### **2.3. Liquid chromatography**

It is important to highlight that each sample, depending on its nature, requires a specific type of processing for extraction and focalization. Therefore, it is expected that the user checks beforehand in related publications as to the protocols and methodologies that best

A Comprehensive Survey of International Soybean Research - Genetics, Physiology, Agronomy and Nitrogen

Some limitations are associated with two-dimensional electrophoresis, such as low reprodu‐ cibility and little power of automation. Nevertheless, reproducibility may be increased by defining optimal conditions for the electrophoresis, while automation of the process is only possible in relation to analysis of gels. Gel analysis software determines the spots and identi‐ fies those expressed differentially and their volumes, inferring a relative quantification of expression of that protein in comparison to the same spot of another gel [22]. Thus, by a process of subtraction, the differences among the different samples are revealed, as, for ex‐ ample, the presence, absence or intensity of the proteins. Thus, the proteins of interest may then be identified based on knowledge of the isoelectric point and of apparent molecular

**Figure 5.** Proteins extracted and separated by two-dimensional (2D) gel electrophoresis and stained with Coomassie

An efficient procedure in the attempt to eliminate variation from gel to gel is use of the tech‐ nique of differential in gel electrophoresis or DIGE (Figure 6), which allows analysis of up to three proteomes in a single gel. These results in one internal pattern common to all the gels and two different samples labeled with distinct fluorophores (CyDye) [25]. That way, only the proteins labeled with their own fluorophore are visualized. In addition, this technique

suit the experimental needs.

Relationships

536

blue [24].

weight, determined by the two-dimensional gels [23].

**2.2. Differential in gel electrophoresis (DIGE).**

Another form used for separation of proteins is by means of liquid chromatography. The sample that is, for example, a mixture of peptides generated by proteolytic digestion from a protein extract passes through a first separation, by means of liquid chromatography, where the enriched peptide fractions are collected and applied in the spectrometer. As complete automation is the main target of the methods for large scale analyses, methods of separation were developed free of gel by reverse phase liquid chromatography connected with tandem mass spectrometry (LC/MS/MS). In Figure 7 the operational and equipment sequence in‐ volved in a typical analysis via LC/MS/MS is shown.

Greater automation is possible with multidimensional liquid chromatography, which uses different characteristics of the proteins in columns of distinct properties or in a single twophase column [29]. The fraction eluted in the first column is directly introduced in the sec‐ ond column, which may be directly connected to the mass spectrometer. This technique, called MudPIT, is inserted in the context of the shotgun proteomic, in which greater resolu‐ tion of the proteomes is possible, facilitating identification of the less abundant proteins fre‐ quently lost when gels are used [30].

**Figure 6.** Differential in gel electrophoresis technique or DIGE [27].

**Figure 7.** Protein identification with chromatographic separation (LC/MS/MS) [28].

#### **2.4. Protein identification methods.**

After separation of proteins, the next stage consists of their characterization and identification using mass spectrometry, which is a technique where the ratio between the mass and the charge (m/z) of ionized molecules in the gas phase is measured. In general, a mass spectrome‐ ter consists of an ionization source, a mass analyzer, a detector and a data acquisition system.

The great variety of spectrometers found on the market is the result of different combina‐ tions of types of sources of ionization and mass analyzers, which provide certain levels of sensitivity and accuracy in the results. At the ionization source, the molecules are ionized and transferred to the gas phase. In the mass analyzer, the ions formed are separated in ac‐ cordance with their m/z ratios and later detected, usually by electron multiplier [31].

With the development of ever more specialized equipment for proteins, mass spectrometry has become a revolutionary tool in modern protein chemistry. This technology has allowed identification of proteins by a methodology called peptide mass fingerprinting. Rocha et al. (2003) [3], state that this methodology is based on protein digestion to be identified by a pro‐ teolytic enzyme, for example trypsin, producing fragments called peptides. The masses of these peptides obtained form a kind of fingerprinting of the protein, which are then deter‐ mined with great acuity (0.1 to 0.5 Da) by mass spectrometry.

Special software allows comparing the peptide mass fingerprinting of the protein one wishes to identify with those theoretically generated for all the protein sequences present in the data‐ bases. If the protein sequence problem is in the database, it will immediately be identified [32].

### **2.5. Relative protein quantification**

**Figure 7.** Protein identification with chromatographic separation (LC/MS/MS) [28].

mined with great acuity (0.1 to 0.5 Da) by mass spectrometry.

After separation of proteins, the next stage consists of their characterization and identification using mass spectrometry, which is a technique where the ratio between the mass and the charge (m/z) of ionized molecules in the gas phase is measured. In general, a mass spectrome‐ ter consists of an ionization source, a mass analyzer, a detector and a data acquisition system.

A Comprehensive Survey of International Soybean Research - Genetics, Physiology, Agronomy and Nitrogen

The great variety of spectrometers found on the market is the result of different combina‐ tions of types of sources of ionization and mass analyzers, which provide certain levels of sensitivity and accuracy in the results. At the ionization source, the molecules are ionized and transferred to the gas phase. In the mass analyzer, the ions formed are separated in ac‐

With the development of ever more specialized equipment for proteins, mass spectrometry has become a revolutionary tool in modern protein chemistry. This technology has allowed identification of proteins by a methodology called peptide mass fingerprinting. Rocha et al. (2003) [3], state that this methodology is based on protein digestion to be identified by a pro‐ teolytic enzyme, for example trypsin, producing fragments called peptides. The masses of these peptides obtained form a kind of fingerprinting of the protein, which are then deter‐

Special software allows comparing the peptide mass fingerprinting of the protein one wishes to identify with those theoretically generated for all the protein sequences present in the data‐ bases. If the protein sequence problem is in the database, it will immediately be identified [32].

cordance with their m/z ratios and later detected, usually by electron multiplier [31].

**2.4. Protein identification methods.**

Relationships

538

Large scale protein quantification methods make an estimate of relative expression possible by means of labeling with radioactive isotopes, fluorescents and light/heavy, allowing the same protein to be quantified in a relative way among differently labeled samples. Some of the most used radioactive isotopes are the iCAT (Isotopic coded affinity tag), iTRAQ (isobar‐ ic tags) and H2O18.

The iCAT consists of addition of a label that has affinity for cysteine residues and which has a bonded molecule of eight atoms of hydrogen or eight atoms of deuterium. One sample is la‐ beled with the tag containing hydrogen and the other sample with the tag containing deuteri‐ um. After digestion of the proteins, the resulting peptides are identified by mass spectrometry. Equal peptides labeled in the two samples are identified by overlap of the peaks that show dis‐ tinct m/z due to the type of bonded isotope, with the ratio between the area of the two peaks being a relative measure of the expression of that protein. According to Yi & Goodlett (2003) [33], the main problems associated with this technique are the need for the presence of cys‐ teine residues, the high cost of the reagents and the greater time necessary for sequencing.

In the iTRAQ technique, labeling of proteins with tags and identification by mass spectrom‐ etry is also used. The tags bond to all the free amino groups at the N terminal of all the pep‐ tides and on the internal side chains with lysine residues and vary according to the reporter group they carry, and they may have 114, 115, 116 or 117Da, thus allowing for the quantifi‐ cation of proteins in up to four types of samples at the same time. The relative quantification is carried out in the same way as in the iCAT, but high cost has restricted its use [34].

The aforementioned techniques require the consumption of specific and expensive reagents. Nevertheless, the same goal may be achieved with a simpler labeling method in which the proteins are labeled with one or two atoms of O2. These are incorporated in the carboxyl ter‐ minal by simply supplying a solution with H2O for one sample and a solution with H2O18 for the other sample. Thus, the relative abundance of the peptides that will differ by 2Da is estimated [35].

Another quantification technique is Stable isotope labeling by amino acids in cell culture, (SILAC) which, together with mass spectrometry and bioinformatics resources, has proven to be quite adequate in proteomic studies. It is a technique that detects differences in the abundance of proteins among cell cultures by means of isotopic labeling of proteins. Label‐ ing with stable isotopes is obtained by supplying isotopically enriched amino acids to a cell culture and natural amino acids to the culture to be compared (Figure 8).

#### **2.6. Analysis of post-translational modifications (PTM's).**

Another area of great interest in plant proteomics is in regard to characterization of posttranslational modifications or PTM's, essential for proteins to play their roles in the varied cell events, producing different proteins from the same gene.

These modifications occur at specific sites in the proteins [37] changing their physical, chem‐ ical and biological properties [38]. They may occur by means of cleavages or by the addition

of a chemical group to one or more amino acids [39]. The main goals of PTM studies in pro‐ teomics are identifying the proteins that have them, mapping the sites where these modifi‐ cations occur, quantifying their occurrence at the different sites and characterizing cooperative PTM's [40].

**Figure 8.** General outline of the SILAC technique [36].

The fact that covalent modifications result in changes in the protein molecular masses makes it possible for these modifications and the amino acids that carry them to be identified by mass spectrometry, allowing more than 300 different types of PTM's to be identified until now with the aid of this technique. Nevertheless, according to Mann and Jensen (2003) [41], mass spectrometry has reduced power of resolution of PTM's because they occur at low stoi‐ chiometric levels. This problem may be resolved by adopting fractioning methods prior to sequencing that allow enrichment of the sample for the proteins that have a certain type of PTM. Large scale modified protein enrichment systems are generally carried out by means of affinity chromatography.

One example is the IMAC system – a column of immobilization through affinity to a metal for isolation of phosphorylated proteins in which metal ions of Fe(III) are joined to the ma‐ trix to promote the isolation of proteins that have phosphorylate residues since the Fe(III) ion is capable of interacting in a reversible manner with the phosphate group of the modi‐ fied peptide keeping it attached to the column [41].

Contrary to that which occurs with the reversible yet permanent PTM's, like glycosylation, low stoichiometry does not occur, but the addition of carbohydrates hinders the proteolytic digestion necessary for identification by mass spectrometry [21]. In addition, when the modified peptide is fragmented for sequencing, it loses sugar residues, impeding the identi‐ fication of the modified amino acids. To resolve this problem, digestion of the proteins is performed so as to remove the sugar residues and produce a modification in the modified site that makes it identifiable [42].

Electrophoresis gels may also be used in enrichment of samples for PTM's as performed for de‐ tection of phosphorylations and glycosylations with commercially available kits. The modi‐ fied proteins, specifically labeled in the gel, are visualized and excised for identification by mass spectrometry. One important aspect of the use of gels for identification of PTM is the pos‐ sibility of visualizing the spots differentially expressed among samples that have the PTM.
