**2. Proteomics: isolation, identification and classification**

teomes, characterization of various protein classes including membrane and hydrophobic

Food allergy can be a serious nutritional problem in children and adults. Any proteincontaining food has the potential to elicit an allergic reaction in the human population. Antibody IgE-mediated reactions are the most prevalent allergic reactions to food. These responses occur after the release of chemical mediators from mast cells and basophils as a result of interactions between food proteins and specific IgE molecules on the surface of these receptor cells. Eight foods or food groups have been identified as the most fre‐ quent sources of human food allergens and account for over 90% of the documented food allergies worldwide. These foods are milk, eggs, fish, crustaceans, wheat, peanuts, tree nuts and soy [4]. Despite their well-documented allergenicity, soy derivatives continue to be increasingly used in a variety of food products due to their well-documented health benefits. Soybean has also been one of the selected target crops for genetic modification (GM). For example, the artificial introduction of 5-enolpyruvylshikimate-3-phosphate syn‐ thase in soybean crop creates an alternative pathway which is insensitive to glyphosate (most potent herbicide), thus increasing overall crop yield. One of the major concerns regarding the safety of GM foods is the potential allergenicity of the resulting products, namely the possible occurrence of either altered or *de novo* expressed of endogenous aller‐ gens after genetic manipulation. This concern justifies careful plant characterization [5]. Proteomics is one of the powerful approaches allowing rapid and reliable protein identi‐ fication. It can provide information about their post-translational modifications, sub-cellu‐ lar localization, level of protein expression and protein-protein interactions. Despite the importance of soybean and the availability of powerful tools for the analysis of proteins from sub-cellular organelles, and specifically for the identification of allergens, only a lim‐

Soybean is an important source of protein for human and animal nutrition, as well as a ma‐ jor source of vegetable oil. Although soybean is adapted to grow in a range of climatic con‐ ditions including adverse environmental and biological factors, still it has been affected with respect to growth, development, and global production For instance, drought reduces the yield of soybean by about 40%, affecting all stages of plant development from germination to flowering thus reducing the quality of the seeds. [6]. Several other abiotic stresses, such as flooding, high temperature, irradiation, or the presence of pollutants in the air and soil have detrimental effects on the growth and productivity of soybean. Along with morphological and physiological studies on the responses of plants to stress conditions, several molecular mechanisms from gene transcription to translation as well as metabolites were investigated. Recent advances in the field of proteomics have created an opportunity for dissecting quan‐ titative traits in a more meaningful way. Proteomics can investigate the molecular mecha‐ nisms of plants' responses to stresses and provides a path toward increasing the efficiency of indirect selection for inherited traits. In soybean a comprehensive functional genomics is yet to be performed; therefore, proteomics approaches form a powerful tool for analyzing

the functions of complete set of proteins including those involved in stress protection.

proteins which are recalcitrant to isolation and analysis is still inaccessible [3].

Relationships

396

A Comprehensive Survey of International Soybean Research - Genetics, Physiology, Agronomy and Nitrogen

ited number of reports have been published to date.

In plant proteomics, the type of the plant species, tissues, organs, cell organelles, and the na‐ ture of desired proteins affect the techniques that can be used for protein extraction. Further‐ more, the extraction process becomes more tedious when the protein is present inside vacuoles, rigid cell walls, or membrane plastids. A perfect protein extraction method in‐ volves complete solubilization of total proteins from a given sample and minimizing postextraction artifact formation, proteolytic degradation as well as removal of nonproteinaceous contaminants. To date, only the proteome of *Arabidopsis* and rice have been studied while less attention has been paid to other plants including soybean. Soybean has high levels of phenolic compounds, proteolytic and oxidative enzymes, terpenes, organic acids, and carbohydrates due to which protein extraction is very tedious. Further it contains contains large quantities of secondary metabolites, *viz.* flavone glycosides (kaempferol and quercetin glycosides), phenolic compounds, lipids and carbohydrates. Thus impedes highquality protein extraction in turn high-resolution protein separation in 2-DE.

In classical proteome analyses, proteins are initially separated by a 2-DE technique with iso‐ electric focusing (IEF) as the first dimension and sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) as the second dimension. A greater resolution in protein sepa‐ ration has been achieved by introducing immobilized pH gradients (IPGs) for the first di‐ mension. Methodological advances in 2-DE have led to the introduction of two-dimensional fluorescence difference gel electrophoresis (2D-DIGE), which has been used for the compa‐ rative analysis of the proteome of soybean subjected to abiotic and biotic stresses [7]. The separated proteins can be subsequently identified by sequencing or by mass spectrometry. By introduction of mass spectrometry into protein chemistry, matrix-assisted laser desorp‐ tion/ionization time-of-flight mass spectrometry (MALDI-TOF MS) and liquid chromatogra‐ phy/tandem mass spectrometry (LC-MS/MS) have become the methods of choice for highthroughput identification of proteins. An alternative technique known variously as 'gel-free proteomics', 'shotgun proteomics', or 'LC-MS/MS-based proteomics' can also be used in high-throughput protein analysis. This approach is based on LC separation of complex pep‐ tide mixtures coupled with tandem mass spectrometric analysis. A multidimensional pro‐ tein identification technology (MudPIT) that usually incorporates separation on a strong cation exchange, reverse-phase column and MS/MS analysis helps the efficient separation of complex peptide mixtures. The gel-free technique have the advantage of being capable of identifying low-abundance proteins, proteins with extreme molecular weights or p*I* values, and hydrophobic proteins that cannot be identified by using gel-based technique. A combi‐ nation of gel-based and gel-free proteomics has been used for identification of soybean plas‐ ma membrane proteins under abiotic stress, *viz*. flooding, osmotic, salinity stress. Methods for protein identification are not usually organism specific, and they can be applied to a wide range of living organisms in addition to soybean. Identification of proteins is normally performed by using a database search engine such as MASCOT or SEQUEST.

Soybean has an estimated genome size of 1115 Mbp, which is significantly larger than those of other crops, such as rice (490 Mbp) or sorghum (818 Mbp). Sequencing of the 1100 Mbp of

total soybean genome predicts the presence of 46,430 protein-encoding genes, 70% more than in *Arabidopsis* [8]. The soybean genome database contains 75,778 sequences and 25,431,846 residues have been constructed on the basis of the Soybean Genome Project, DOE Joint Genome Institute; this database is available at http://www.phytozome.net. Although the genome sequence information is almost completed, no high-quality genome assembly is available because the results from the computational gene-modeling algorithm are imper‐ fect. In addition, duplications in the genome of soybean result in nearly 75% of the genes being present as multiple copies, which further complicate the analysis. The soybean pro‐ teome database (http://proteome.dc.affrc.go.jp/Soybean) provides valuable information in‐ cluding 2-DE maps and functional analysis of soybean proteins. However, the presence of a considerable number of proteins with unknown functions highlights the limitations of bioin‐ formatics prediction tools and the need for further functional analyses. The cellular proteo‐ mics helps in identification of changes in protein expression under different growing condition and treatments. The analytical methodology for the separation and identification of a large numbers of proteins should be authentic and confirmable. The proteome map of mature dry soybean seeds has been prepared by employing robotic automation at subse‐ quent steps of 2-DE. Further, UniGene database was implemented for proteins identifica‐ tions. Total protein from mature dry soybean (*Glycine max* cv. Jefferson) seed was isolated and 2D-PAGE performed using 13 cm IPG strips and subsequently doing SDS-PAGE. Pro‐ tein spots were analyzed using Phoretix 2D-Advanced software. Excised protein spots were arrayed into 96-well plates and transferred to a Multiprobe II EX liquid handling station for subsequent destaining, tryptic digestion and peptide extraction. MALDI-TOF MS was oper‐ ated in the positive ion delayed extraction reflector mode. Peptide spectra were submitted to a MS Fit program of Protein Prospector. Assignments from UniGene contigs were subse‐ quently searched against the NCBI non-redundant database using the BLASTP search algo‐ rithm to determine similarity matches [9].

Trichloroacetic acid (TCA)/acetone-based and phenol-based buffers are most frequently used in protein extraction from plants. A comprehensive proteomic study was performed on nine organs from soybean plants in various developmental stages by using three different methods for protein extraction and solubilization. The results showed that the use of an al‐ kaline phosphatase buffer followed by TCA/acetone precipitation caused horizontal streak‐ ing in 2-DE while use of a Mg/NP-40 buffer followed by extraction with alkaline phenol and methanol/ammonium acetate produced high-quality proteome maps with well-separated spots, high spot intensities, and high numbers of separate protein spots in 2-DE gels [10, 11]. In the case of organelle proteomics particularly that of membrane proteomics, a different ex‐ traction procedure is required that involves modifications to dissolve hydrophobic proteins and additional purification steps. Furthermore, when studying protein–protein interactions, it is necessary to extract protein complexes by using buffers with less or no detergent to get the proteins in their native states. Despite the importance of seed filling in the synthesis of storage reserves for germination, systematic proteomic analysis of this phase in legumes is yet to be carried out.

Total seed proteins of soybean (cv. Maverick) at different stages of flowering (14, 21, 28, 35 and 42 days) were isolated and subsequently 2D-PAGE was done. Initially IPG strips of pH 3 to 10 were taken then narrowed down to pH range to 4 to 7 for high-resolution proteome maps. A total of 488 and 679 proteins were identified from 2D-PAGE gels of pH range 4 to 7 and 3 to 10 gels, respectively. Each of the 679 proteins was excised from reference gels for identification by MALDI-TOF MS and a total of 422 proteins (62%) were identified. One unique protein was often represented by more than one spot on the 2D-PAGE gel, most like‐ ly due to post-translational modifications or genetic isoforms. Taking into account this re‐ dundancy, 216 unique proteins out of 422 were identified. A total of 82 proteins were associated with metabolism (the largest functional class) and the second largest functional class were comprised of 52 spots assigned to the seed storage proteins *β*-conglycinin and glycinin. An overall down- and up-regulation was observed for metabolism and storage re‐ lated proteins, respectively, during seed filling, suggesting metabolic activity curtails as seeds approach maturity. Abundance of proteins related to metabolite transporter, disease and defense, energy production, cell growth and division, signal transduction, protein syn‐ thesis and secondary metabolism did not vary significantly. Furthermore, 13 sucrose-bind‐ ing proteins have been mapped to the same UniGene accession number, suggesting the importance of sucrose as a signaling molecule in seed and embryo development. There were a total of 92 unknown proteins which could not be classified, therefore grouped into five ex‐ pression profiles [12].

total soybean genome predicts the presence of 46,430 protein-encoding genes, 70% more than in *Arabidopsis* [8]. The soybean genome database contains 75,778 sequences and 25,431,846 residues have been constructed on the basis of the Soybean Genome Project, DOE Joint Genome Institute; this database is available at http://www.phytozome.net. Although the genome sequence information is almost completed, no high-quality genome assembly is available because the results from the computational gene-modeling algorithm are imper‐ fect. In addition, duplications in the genome of soybean result in nearly 75% of the genes being present as multiple copies, which further complicate the analysis. The soybean pro‐ teome database (http://proteome.dc.affrc.go.jp/Soybean) provides valuable information in‐ cluding 2-DE maps and functional analysis of soybean proteins. However, the presence of a considerable number of proteins with unknown functions highlights the limitations of bioin‐ formatics prediction tools and the need for further functional analyses. The cellular proteo‐ mics helps in identification of changes in protein expression under different growing condition and treatments. The analytical methodology for the separation and identification of a large numbers of proteins should be authentic and confirmable. The proteome map of mature dry soybean seeds has been prepared by employing robotic automation at subse‐ quent steps of 2-DE. Further, UniGene database was implemented for proteins identifica‐ tions. Total protein from mature dry soybean (*Glycine max* cv. Jefferson) seed was isolated and 2D-PAGE performed using 13 cm IPG strips and subsequently doing SDS-PAGE. Pro‐ tein spots were analyzed using Phoretix 2D-Advanced software. Excised protein spots were arrayed into 96-well plates and transferred to a Multiprobe II EX liquid handling station for subsequent destaining, tryptic digestion and peptide extraction. MALDI-TOF MS was oper‐ ated in the positive ion delayed extraction reflector mode. Peptide spectra were submitted to a MS Fit program of Protein Prospector. Assignments from UniGene contigs were subse‐ quently searched against the NCBI non-redundant database using the BLASTP search algo‐

A Comprehensive Survey of International Soybean Research - Genetics, Physiology, Agronomy and Nitrogen

Trichloroacetic acid (TCA)/acetone-based and phenol-based buffers are most frequently used in protein extraction from plants. A comprehensive proteomic study was performed on nine organs from soybean plants in various developmental stages by using three different methods for protein extraction and solubilization. The results showed that the use of an al‐ kaline phosphatase buffer followed by TCA/acetone precipitation caused horizontal streak‐ ing in 2-DE while use of a Mg/NP-40 buffer followed by extraction with alkaline phenol and methanol/ammonium acetate produced high-quality proteome maps with well-separated spots, high spot intensities, and high numbers of separate protein spots in 2-DE gels [10, 11]. In the case of organelle proteomics particularly that of membrane proteomics, a different ex‐ traction procedure is required that involves modifications to dissolve hydrophobic proteins and additional purification steps. Furthermore, when studying protein–protein interactions, it is necessary to extract protein complexes by using buffers with less or no detergent to get the proteins in their native states. Despite the importance of seed filling in the synthesis of storage reserves for germination, systematic proteomic analysis of this phase in legumes is

rithm to determine similarity matches [9].

yet to be carried out.

Relationships

398
