**Meet the editor**

Dr Hon-Chiu Eastwood Leung finished his Ph.D. degree in microbiology and molecular genetics from University of Texas, Houston in 1997. He did his post-doctoral training in Baylor College of Medicine studying genomics and proteomics profiles of pediatrics medulloblastoma and osteosarcoma. Dr. Leung was a research scientist in Ciphergen Biosystems Inc. briefly, and then was

recruited to Genome Institute of Singapore to lead the clinical proteomics section. In 2006 he returned to USA. He was the director of the genomics and proteomics core laboratory of the Texas Children's Hospital and director of genomics profiling of Baylor College of Medicine. At present, he is the director of the mass spectrometry-proteomics core facility of Baylor College of Medicine.

Contents

**Preface IX** 

**Part 1 Proteomics – Historical Review 1** 

Chapter 1 **Strategies for Protein Separation 3** 

Chapter 2 **Evolution of Proteomic Methods for** 

and Aline de Lima Leite

**Part 2 Sample Preparation 45** 

Chapter 3 **Proteomic Analyses of Cells** 

Chapter 4 **A Critical Review of Trypsin** 

Chapter 6 **Labeling Methods in** 

Chapter 5 **Simple and Rapid Proteomic Analysis** 

Chapter 7 **Preparation of Protein Samples for** 

Fernanda Salvato, Mayra Costa da Cruz Gallo de Carvalho

**Analysis of Complex Biological Samples – Implications for Personalized Medicine 29**  Amanda Nouwens and Stephen Mahler

**Isolated by Laser Microdissection 47** 

**Digestion for LC-MS Based Proteomics 73** 

**by Protease-Immobilized Microreactors 93** 

Karen A. Sap and Jeroen A. A. Demmers

**Part 3 2D Gel Electrophoresis and Databases 133** 

**2-DE from Different Cotton Tissues 135** 

Valentina Fiorilli, Vincent P. Klink and Raffaella Balestrini

Hanne Kolsrud Hustoft, Helle Malerod, Steven Ray Wilson, Leon Reubsaet, Elsa Lundanes and Tyge Greibrokk

Hiroshi Yamaguchi, Masaya Miyazaki and Hideaki Maeda

**Mass Spectrometry Based Quantitative Proteomics 111** 

Chengjian Xie, Xiaowen Wang, Anping Sui and Xingyong Yang

### Contents

#### **Preface** XIII


#### **Part 2 Sample Preparation 45**

	- **Part 3 2D Gel Electrophoresis and Databases 133**

X Contents


Contents VII

Chapter 19 **Functional Proteomics:**

**Part 5 Structural Proteomics 393** 

Chapter 21 **The Utility of Mass Spectrometry** 

**Part 6 Bioinformatics Tools 413** 

Chapter 23 **Application of Bioinformatics** 

**Mapping Lipid-Protein Interactomes 363** 

Chapter 20 **Protein Thiol Modification and Thiol Proteomics 379**  Yingxian Li, Xiaogang Wang and Qi Li

**Biopharmaceutical Biologics Development 395**

**Proteomics and Transcriptomics Data Mining – Application to the Exploration of Gene** 

**Expression Profiles of Aggressive Lymphomas 415** 

Clive D'Santos and Aurélia E. Lewis

**Based Structural Proteomics in** 

Parminder Kaur and Mark R. Chance

Chapter 22 **nwCompare and AutoCompare Softwares for** 

Bernard Ycart and Jean-Jacques Fournié

**Tools in Gel-Based Proteomics 425**  Kah Wai Lin, Min Jia and Serhiy Souchelnytskyi

Fréderic Pont, Marie Tosolini,


Chapter 16 **Vinyl Sulfone: A Multi-Purpose Function in Proteomics 301**  F. Javier Lopez-Jaramillo, Fernando Hernandez-Mateo and Francisco Santoyo-Gonzalez


#### **Part 5 Structural Proteomics 393**

VI Contents

Chapter 8 **2D-PAGE Database for Studies on Energetic Metabolism** 

Pavel Bouchal, Robert Stein, Zbyněk Zdráhal,

Peter R. Jungblut and Igor Kučera

Eunju Choi and Michelle M. Hill

Xiaotian Zhong and Will Somers

Chapter 11 **Detection of Protein Phosphorylation** 

Chapter 10 **Recent Advances in Glycosylation Modifications** 

**by Open-Sandwich Immunoassay 197** 

Chapter 12 **Phosphoproteomics: Detection, Identification** 

Takhar Kasumov and Brian K. Hubbard

**Tracked by Quantitative Proteomics 257** 

Chapter 15 **Proteomics Analysis of Kinetically Stable Proteins 281** 

**A Multi-Purpose Function in Proteomics 301** F. Javier Lopez-Jaramillo, Fernando Hernandez-Mateo

and Francisco Santoyo-Gonzalez

Baptiste Leroy, Nicolas Houyoux,

H. R. Fuller and G. E. Morris

Chapter 18 **Quantitative Proteomics Using** 

Chapter 17 **Gel-Free Proteome Analysis Isotopic Labelling Vs.** 

Sabine Matallana-Surget and Ruddy Wattiez

**iTRAQ Labeling and Mass Spectrometry 347**

Ke Xia, Marta Manning, Songjie Zhang and Wilfredo Colón

**Label-Free Approaches for Quantitative Proteomics 327** 

Chapter 14 **Dynamics of Protein Complexes** 

Séverine Boulon

Chapter 16 **Vinyl Sulfone:**

**Part 4 Subproteomes Analyses 157**

Chapter 9 **Targeted High-Throughput** 

**of the Denitrifying Bacterium** *Paracoccus denitrificans* **141**

**Glycoproteomics for Glyco-Biomarker Discovery 159** 

**in the Context of Therapeutic Glycoproteins 183** 

Yuki Ohmuro-Matsuyama, Masaki Inagaki and Hiroshi Ueda

**Isotopes with Mass Spectrometry-Based Analyses 233** 

**and Importance of Protein Phosphorylation 215**  Min Jia, Kah Wai Lin and Serhiy Souchelnytskyi

Chapter 13 **Proteome Kinetics: Coupling the Administration of Stable** 

Stephen F. Previs, Haihong Zhou, Sheng-Ping Wang, Kithsiri Herath, Douglas G. Johns, Thomas P. Roddy,

Chapter 21 **The Utility of Mass Spectrometry Based Structural Proteomics in Biopharmaceutical Biologics Development 395**  Parminder Kaur and Mark R. Chance

#### **Part 6 Bioinformatics Tools 413**


Preface

Proteomics has come of age. Two-dimensional gel electrophoresis (2D gel) was first used in the 1970s. With the advent of the human genome sequence and associated bioinformatics tools, and the advancement on protein separation and mass spectrometry in the 1990s and 2000s, the field Proteomics blossomed. The past decade has witnessed the development of clinical proteomics studies and fine tuning of proteomics studies to focus on a subset of proteome that are related to a biological phenotype. The rigorous bioinformatics standards developed along the way helped

The goal of this book is to provide insights for a broad audience base ranging from those who are new to those who are more experienced in the field of proteomics. This book covers the historical overview of proteomics, sample preparations, 2D gel separation of complex proteomes and associated databases, structural proteomics, and subproteome analyses with focus on i) glycoproteome, ii) phosphor-proteome, iii) protein:protein interactions, and iv) sub-proteomes captured by specific chemical groups. Surely, the data generated in the field of proteomics will only be making sense by the corresponding development of bioinformatics tools. The chapters in the section of bioinformatics will also touch on the pathway generation from the proteomics data.

Finally, I would like to thank the authors for sharing their experience within the chapters and to express my gratitude to Martina Durovic for the wonderful administrative assistance. Last but not least, I would like to thank the staff for taking

**Hon-Chiu Eastwood Leung, Ph.D.** 

Houston, Texas,

USA

Assistant Professor, Baylor College of Medicine,

the data across platforms to be more compatible with each other.

the final steps to bring this publication to reality.

### Preface

Proteomics has come of age. Two-dimensional gel electrophoresis (2D gel) was first used in the 1970s. With the advent of the human genome sequence and associated bioinformatics tools, and the advancement on protein separation and mass spectrometry in the 1990s and 2000s, the field Proteomics blossomed. The past decade has witnessed the development of clinical proteomics studies and fine tuning of proteomics studies to focus on a subset of proteome that are related to a biological phenotype. The rigorous bioinformatics standards developed along the way helped the data across platforms to be more compatible with each other.

The goal of this book is to provide insights for a broad audience base ranging from those who are new to those who are more experienced in the field of proteomics. This book covers the historical overview of proteomics, sample preparations, 2D gel separation of complex proteomes and associated databases, structural proteomics, and subproteome analyses with focus on i) glycoproteome, ii) phosphor-proteome, iii) protein:protein interactions, and iv) sub-proteomes captured by specific chemical groups. Surely, the data generated in the field of proteomics will only be making sense by the corresponding development of bioinformatics tools. The chapters in the section of bioinformatics will also touch on the pathway generation from the proteomics data.

Finally, I would like to thank the authors for sharing their experience within the chapters and to express my gratitude to Martina Durovic for the wonderful administrative assistance. Last but not least, I would like to thank the staff for taking the final steps to bring this publication to reality.

**Hon-Chiu Eastwood Leung, Ph.D.** 

Assistant Professor, Baylor College of Medicine, Houston, Texas, USA

**Part 1** 

**Proteomics – Historical Review** 

## **Part 1**

**Proteomics – Historical Review** 

**1** 

*Brazil* 

**Strategies for Protein Separation** 

*2Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA) 3Universidade de São Paulo, Faculdade de Odontologia de Bauru* 

and Aline de Lima Leite3

Fernanda Salvato1, Mayra Costa da Cruz Gallo de Carvalho2

*1Universidade de São Paulo, Escola Superior de Agricultura Luiz de Queiroz* 

The proteome of a cell or tissue depends on cellular and environmental conditions, showing a dynamic system subject to large variations. To study these large changes of variability and quantity, proteomics has emerged, providing techniques dedicated to global characterization of all proteins simultaneously. The expectation is that this information will produce new insights into the biological function of proteins in different physiological states

The proteome has a dynamic and complex nature that is the result of many posttranslational modifications, molecular interactions, and a variety of proteins arising from alternative mRNA splicing. With this in mind, the number of modified and unmodified proteins found in any biological system is much bigger than the number of genes (Anderson et al., 2004), which is why mRNA expression may not correlate with protein content (Rogers et. al, 2008). In addition, not all proteins are expressed in the same or similar level in the proteome. For example, the enzyme Rubisco comprises 3050% of leaf proteome (Feller et al., 2008), which is a big issue in the proteomic assessment of low-abundance proteins. In fact, the majority of proteins are in the low-abundance level. To overcome these challenges, the proteome must be fractionated for effective detection and quantification by mass spectrometry (MS). Consequently, the analysis of proteins on the large or small scale is

As the ultimate goal in proteomics is to resolve all individual proteins in the cell, although it is quite difficult to find a separation method that could accommodate the diversity of proteins equally, protein separation methods directly affect the achievement of reliable results. Such methods are based on the physical or chemical properties of different proteins,

The combination of sequential methods exploiting different properties can provide highresolution analysis of very complex protein mixtures. Then, current analytical strategies can reach different levels of resolution depending on the platform used. Two-dimensional gel electrophoresis (2DGE) and multidimensional liquid chromatography (MDLC) are the two methods that dominate the separation steps in proteomics. The differences of each strategy are basically related to sensitivity, automation, and high-throughput possibilities. In this

chapter, the limitations and principles of these techniques will be discussed.

**1. Introduction** 

of a cell or tissue.

dependent on separation methods.

such as their mass or net charge.

### **Strategies for Protein Separation**

Fernanda Salvato1, Mayra Costa da Cruz Gallo de Carvalho2

and Aline de Lima Leite3 *1Universidade de São Paulo, Escola Superior de Agricultura Luiz de Queiroz 2Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA) 3Universidade de São Paulo, Faculdade de Odontologia de Bauru Brazil* 

#### **1. Introduction**

The proteome of a cell or tissue depends on cellular and environmental conditions, showing a dynamic system subject to large variations. To study these large changes of variability and quantity, proteomics has emerged, providing techniques dedicated to global characterization of all proteins simultaneously. The expectation is that this information will produce new insights into the biological function of proteins in different physiological states of a cell or tissue.

The proteome has a dynamic and complex nature that is the result of many posttranslational modifications, molecular interactions, and a variety of proteins arising from alternative mRNA splicing. With this in mind, the number of modified and unmodified proteins found in any biological system is much bigger than the number of genes (Anderson et al., 2004), which is why mRNA expression may not correlate with protein content (Rogers et. al, 2008). In addition, not all proteins are expressed in the same or similar level in the proteome. For example, the enzyme Rubisco comprises 3050% of leaf proteome (Feller et al., 2008), which is a big issue in the proteomic assessment of low-abundance proteins. In fact, the majority of proteins are in the low-abundance level. To overcome these challenges, the proteome must be fractionated for effective detection and quantification by mass spectrometry (MS). Consequently, the analysis of proteins on the large or small scale is dependent on separation methods.

As the ultimate goal in proteomics is to resolve all individual proteins in the cell, although it is quite difficult to find a separation method that could accommodate the diversity of proteins equally, protein separation methods directly affect the achievement of reliable results. Such methods are based on the physical or chemical properties of different proteins, such as their mass or net charge.

The combination of sequential methods exploiting different properties can provide highresolution analysis of very complex protein mixtures. Then, current analytical strategies can reach different levels of resolution depending on the platform used. Two-dimensional gel electrophoresis (2DGE) and multidimensional liquid chromatography (MDLC) are the two methods that dominate the separation steps in proteomics. The differences of each strategy are basically related to sensitivity, automation, and high-throughput possibilities. In this chapter, the limitations and principles of these techniques will be discussed.

Strategies for Protein Separation 5

the IPG strip and accumulate in the ends, inducing water accumulation and electric current

The first dimension in 2D-PAGE, also called isoelectric focusing (IEF), is performed in acrylamide gel strips with immobilized pH gradient (IPG strips). The gel in the strip is formed through acrylamide polymerization with amphoteric acrylamide monomers named immobilins. Immobilins with different pKa are added to the acrylamide mixture and after gel polymerization; immobilins are immobilized in the strip generating the pH gradient, that's why strips used in IEF are called immobilized pH gradient or IPG strips. The IPG strips are commercially available in many pH ranges such as 6–9, 6–11 or 7–10. They are sold dried and should be rehydrated to be used. In this process, the rehydration solution must be composed by a commercial mixture of carrier ampholytes containing molecules corresponding to all pIs (isoelectric points) in the strip pH range and by the solubilized protein sample to be separated. Ampholytes act as good buffering agents next to their pIs,

Isoelectric focusing like the whole electrophoresis process is based on the migration of charged biomolecules under an electric field. The separation of a protein mixture in a pH gradient occurs because proteins are amphoteric molecules and thus can present negative or positive charges in their ionized groups depending on the pH medium. When an electric current is applied, proteins migrate in the gel while the balance between their charges is positive or negative until the difference between charges became equal to zero (isoelectric point – pI), in this point protein migration ceases and protein get focused. Proteins positively charged, i. e., the ones those are in the strip region where pH value is lower than their pI, keep migrating directly to the positive pole until reach their pI. In the other side, proteins negatively charged, i.e., the ones those are in the strip region where pH value is higher than their pI, keep migrating directly to the negative pole until reach their pI.

After IEF ends, the strip containing focused proteins must be equilibrated with the anionic detergent sodium dodecyl sulfate (SDS) solution that denatured these proteins and forms negatively charged protein/SDS complex. The amount of SDS linked into the protein should be directly proportional to its weight, thus proteins that are totally coupled to SDS will migrate in polyacrylamide gel (SDS-PAGE) only due to their weight. Other reagents in the reaction include Tris-HCl buffer, urea, glycerol, DTT, iodoacetamide and bromophenol blue. The second dimension is performed by placing the IPG strip above and in direct contact with the gel in a system composed by two spaced glass. An electric current is applied and proteins migrate from the strip to a second dimension where they are solved due to their molecular weight. In the second dimension, gel can be heterogeneous: with a superior phase or stacking gel with acrylamide 6% and with an inferior phase or resolution gel containing 12 to 15%. In some cases, gel can be homogeneous with acrylamide 13% (Görg et al., 2000). The second dimension can be performed vertically or horizontally, but only the horizontal systems allow multiple runs simultaneously. Gels usually run with 1 or 2 W of current in the first hour, followed by 15 mA/gel overnight with temperature regulation (10°C to 18°C)

To visualize the spots in the gel dyes visible to naked eye or fluorescent dyes can be used. In both cases, are necessary to fix the gel after the run, using an acid (phosphoric acid or acetic acid) or an alcoholic (ethanol or methanol) solution depending on the chosen dyeing protocol (Görg et al., 2000). Among the non-fluorescent dyes are Coomassie Brilliant Blue, Colloidal Coomassie Blue and silver nitrate which detect spots respectively with minimal protein of 50,

reduction, what interferes in the focalization process.

assisting proteins in the mixture to migrate in the gel.

Focusing process can last from 12 to 20 hours.

(Chevalier, 2010).

#### **2. 2D-PAGE: Principles, advantages and limitations**

The 2D-PAGE (two dimensional polyacrylamide gel electrophoresis) was developed by Patrick H. O'Farrell who successfully combined two known electrophoresis methods, isoeletric focusing (IEF) and sodium dodecyl sulfate electrophoresis (SDS-PAGE) (O'Farrell, 1975) with the objective of resolving more complex proteomes. The author was brilliant in his idea of combining both techniques once now proteins could be separated by two nonrelated properties given a uniform distribution throughout the gel. Surprisingly, the paper "High Resolution Two-dimensional Electrophoresis of Proteins" was firstly rejected by the JBC (Journal of Biological Chemistry) journal because of its "speculative character", as pointed out in the commemorative issue of the JBC (2006), but the power of 2D gel electrophoresis in resolving proteomes had already spread rapidly. Although the combination made by O'Farrell had immediately caused great impact on proteins separation, its commercial application in proteomics become possible only after a technical modification that made the 2D gel electrophoresis reproducible. In the mid1980s, was introduced to the 2D-PAGE system, commercial strips with immobilized pH gradients (IPG strip) and instruments for IEF (isoelectric focusing) (Bjellqvist et al., 1982) and, since then, the 2D-PAGE assume a central role in proteomics. Together, the 2D-PAGE and mass spectrometric techniques provided the characterization of thousands of proteins in single gels.

#### **2.1 Principles of 2D-PAGE**

To perform proteins separation, the two dimensional electrophoresis uses sequentially two non-related physical proprieties. In a first dimension, proteins are separated owing their migration in an immobilized pH gradient. Then, in a second dimension, proteins that occasionally took the same migration point after the first separation could now be separate in the polyacrylamide gel, according to their molecular weight, what guarantees to this technique a greater resolution power than achieved in one dimensional electrophoresis. Protein separation can be achieved as low as 0.1 isoelectric point (pI) unit and 1 kDa in molecular weight (MW) (Figeys, 2005). The spots visualized in a second dimension gel are unique proteins or simple mixture of proteins depending on certain factors that can influence technique resolution. To improve resolution, proteins should be completely denatured, reduced, disaggregated from protein complex and solubilized to disrupt macromolecular interactions (Chevalier, 2010).

In the 2D-PAGE protocol, preparation of protein samples is a fundamental and determining stage in electrophoresis efficiency. Usually, to solubilize samples, buffers containing chaotropic agents (urea and/or thiourea), nonionic or zwitterionic (CHAPS or Triton X-100) detergents, reducing agent (DTT) and proteases and phosphatase inhibitors are used. The chaotropic agents will act in the non-covalent macromolecular interactions, interfering in hydrophobic interactions; surfactants (CHAPS and Triton X-100) will act synergistically with chaotropics preventing the adsorption or aggregation of hydrophobic proteins which after the action of thiourea will have their hydrophobic domains exposed; the reducing agent will reduce protein disulfides breaking up intra and inter molecular interactions and proteases and phosphatase inhibitors will avoid modifications in the proteome. Also to optimize proteases and phosphatase inhibition, diluted TCA or TCA-acetone can be used in the solubilization process. One important aspect in solubilization process is to avoid salts accumulation through dialysis or precipitation. Salts can migrate through the pH gradient in

The 2D-PAGE (two dimensional polyacrylamide gel electrophoresis) was developed by Patrick H. O'Farrell who successfully combined two known electrophoresis methods, isoeletric focusing (IEF) and sodium dodecyl sulfate electrophoresis (SDS-PAGE) (O'Farrell, 1975) with the objective of resolving more complex proteomes. The author was brilliant in his idea of combining both techniques once now proteins could be separated by two nonrelated properties given a uniform distribution throughout the gel. Surprisingly, the paper "High Resolution Two-dimensional Electrophoresis of Proteins" was firstly rejected by the JBC (Journal of Biological Chemistry) journal because of its "speculative character", as pointed out in the commemorative issue of the JBC (2006), but the power of 2D gel electrophoresis in resolving proteomes had already spread rapidly. Although the combination made by O'Farrell had immediately caused great impact on proteins separation, its commercial application in proteomics become possible only after a technical modification that made the 2D gel electrophoresis reproducible. In the mid1980s, was introduced to the 2D-PAGE system, commercial strips with immobilized pH gradients (IPG strip) and instruments for IEF (isoelectric focusing) (Bjellqvist et al., 1982) and, since then, the 2D-PAGE assume a central role in proteomics. Together, the 2D-PAGE and mass spectrometric techniques provided the characterization of thousands of proteins in single

To perform proteins separation, the two dimensional electrophoresis uses sequentially two non-related physical proprieties. In a first dimension, proteins are separated owing their migration in an immobilized pH gradient. Then, in a second dimension, proteins that occasionally took the same migration point after the first separation could now be separate in the polyacrylamide gel, according to their molecular weight, what guarantees to this technique a greater resolution power than achieved in one dimensional electrophoresis. Protein separation can be achieved as low as 0.1 isoelectric point (pI) unit and 1 kDa in molecular weight (MW) (Figeys, 2005). The spots visualized in a second dimension gel are unique proteins or simple mixture of proteins depending on certain factors that can influence technique resolution. To improve resolution, proteins should be completely denatured, reduced, disaggregated from protein complex and solubilized to disrupt

In the 2D-PAGE protocol, preparation of protein samples is a fundamental and determining stage in electrophoresis efficiency. Usually, to solubilize samples, buffers containing chaotropic agents (urea and/or thiourea), nonionic or zwitterionic (CHAPS or Triton X-100) detergents, reducing agent (DTT) and proteases and phosphatase inhibitors are used. The chaotropic agents will act in the non-covalent macromolecular interactions, interfering in hydrophobic interactions; surfactants (CHAPS and Triton X-100) will act synergistically with chaotropics preventing the adsorption or aggregation of hydrophobic proteins which after the action of thiourea will have their hydrophobic domains exposed; the reducing agent will reduce protein disulfides breaking up intra and inter molecular interactions and proteases and phosphatase inhibitors will avoid modifications in the proteome. Also to optimize proteases and phosphatase inhibition, diluted TCA or TCA-acetone can be used in the solubilization process. One important aspect in solubilization process is to avoid salts accumulation through dialysis or precipitation. Salts can migrate through the pH gradient in

**2. 2D-PAGE: Principles, advantages and limitations** 

gels.

**2.1 Principles of 2D-PAGE** 

macromolecular interactions (Chevalier, 2010).

the IPG strip and accumulate in the ends, inducing water accumulation and electric current reduction, what interferes in the focalization process.

The first dimension in 2D-PAGE, also called isoelectric focusing (IEF), is performed in acrylamide gel strips with immobilized pH gradient (IPG strips). The gel in the strip is formed through acrylamide polymerization with amphoteric acrylamide monomers named immobilins. Immobilins with different pKa are added to the acrylamide mixture and after gel polymerization; immobilins are immobilized in the strip generating the pH gradient, that's why strips used in IEF are called immobilized pH gradient or IPG strips. The IPG strips are commercially available in many pH ranges such as 6–9, 6–11 or 7–10. They are sold dried and should be rehydrated to be used. In this process, the rehydration solution must be composed by a commercial mixture of carrier ampholytes containing molecules corresponding to all pIs (isoelectric points) in the strip pH range and by the solubilized protein sample to be separated. Ampholytes act as good buffering agents next to their pIs, assisting proteins in the mixture to migrate in the gel.

Isoelectric focusing like the whole electrophoresis process is based on the migration of charged biomolecules under an electric field. The separation of a protein mixture in a pH gradient occurs because proteins are amphoteric molecules and thus can present negative or positive charges in their ionized groups depending on the pH medium. When an electric current is applied, proteins migrate in the gel while the balance between their charges is positive or negative until the difference between charges became equal to zero (isoelectric point – pI), in this point protein migration ceases and protein get focused. Proteins positively charged, i. e., the ones those are in the strip region where pH value is lower than their pI, keep migrating directly to the positive pole until reach their pI. In the other side, proteins negatively charged, i.e., the ones those are in the strip region where pH value is higher than their pI, keep migrating directly to the negative pole until reach their pI. Focusing process can last from 12 to 20 hours.

After IEF ends, the strip containing focused proteins must be equilibrated with the anionic detergent sodium dodecyl sulfate (SDS) solution that denatured these proteins and forms negatively charged protein/SDS complex. The amount of SDS linked into the protein should be directly proportional to its weight, thus proteins that are totally coupled to SDS will migrate in polyacrylamide gel (SDS-PAGE) only due to their weight. Other reagents in the reaction include Tris-HCl buffer, urea, glycerol, DTT, iodoacetamide and bromophenol blue. The second dimension is performed by placing the IPG strip above and in direct contact with the gel in a system composed by two spaced glass. An electric current is applied and proteins migrate from the strip to a second dimension where they are solved due to their molecular weight. In the second dimension, gel can be heterogeneous: with a superior phase or stacking gel with acrylamide 6% and with an inferior phase or resolution gel containing 12 to 15%. In some cases, gel can be homogeneous with acrylamide 13% (Görg et al., 2000). The second dimension can be performed vertically or horizontally, but only the horizontal systems allow multiple runs simultaneously. Gels usually run with 1 or 2 W of current in the first hour, followed by 15 mA/gel overnight with temperature regulation (10°C to 18°C) (Chevalier, 2010).

To visualize the spots in the gel dyes visible to naked eye or fluorescent dyes can be used. In both cases, are necessary to fix the gel after the run, using an acid (phosphoric acid or acetic acid) or an alcoholic (ethanol or methanol) solution depending on the chosen dyeing protocol (Görg et al., 2000). Among the non-fluorescent dyes are Coomassie Brilliant Blue, Colloidal Coomassie Blue and silver nitrate which detect spots respectively with minimal protein of 50,

Strategies for Protein Separation 7

number of gels to reach the required reproducibility in the "average gel" and then perform the comparison between samples. Since DIGE technique or difference gel electrophoresis (Unlu et al., 1997) was developed, the reproducibility problem of 2D-PAGE gel was bypassed. The DIGE system consists in a modification in the conventional protocol of 2D-PAGE that make possible to analyze in a unique gel three different samples giving the electrophoresis system a "multiplex" character. Samples are pre-labeled with fluorescent markers such as Cy2, Cy3 and Cy5, pooled and separate in a single run. Therefore, in addition to solve the 2D-PAGE reproducibility problem, DIGE system allow the direct quantification of spots from different samples resolved in the same gel and is much more sensitive due to fluorescent dye labeling raising the gel resolution dynamic range up to

The third sample used in the DIGE system is an internal running control composed by identical aliquots from each experimental sample. The mixture: internal standard and sample 01 and 02 are labeled, pooled and resolved in the same gel what avoid diversion on sample preparation. The internal standard control is normally labeled with Cy2 dye and the other samples with Cy3 and Cy5 dyes. The quantification of each protein is obtained from the signal Cy3:Cy2 and Cy5:Cy2 ratio. The Cy3:Cy2 and Cy5:Cy2 ratios for each protein are then normalized across all the gels in a large experiment, using the Cy2 signals for separate

A problem in DIGE lies in the hydrophobicity of the cyanine dyes, which label the protein by reacting to a large extent, with surface-exposed lysines in the protein, and lead to removal of multiple charges from the protein. Consequently, this decreases the solubility of the labeled protein, and in some cases may lead to protein precipitation prior to gel electrophoresis. To address this problem, minimal labeling is generally employed in DIGE. In this reaction only 1-5% of total lysines in a given protein are labeled avoiding protein precipitation. Alternatively to minimal labeling the saturation labeling method can be done for Cy3 and Cy5 dyes by reacting to free cysteines in a protein (Shaw et al., 2003). This strategy circumvents the sensitivity problem of minimal labeling but limits the proteome

The 2D-PAGE can also be very useful to identify post-translation modifications (PTMs). The affinity chromatography systems are usually used to enrich samples containing a specific PTM, but the 2D-PAGE visual character enable the direct selection on spots differentially expressed to a specific PTM. Proteins resolved in gel can be for example, specifically labeled to detect phosphorylations or glycosylations, and after visually selected; proteins are excised

Technical characteristics related to gel reproducibility and others that can prevent or influence protein resolution are considered the main limitation of 2D-PAGE technique. However, gel reproducibility in 2D-PAGE method is also strongly influenced by own sample biology, what cannot be considered a limitation from the system *per se*. Some

Protocols available for protein extraction can be applied to various types of biological samples, but the efficiency varies a lot depending on the biological characteristics of the sample. It's much simpler to reproduce the proteome from samples with unique cellular

normalization of each protein under survey (Lilley & Friedman, 2004).

analysis to proteins that show free-cystein residues (Chevalier, 2010).

limitations associated to 2D-PAGE are pointed and discussed below.

and identified using mass chromatography.

**2.3 Limitations of 2D-PAGE** 

**2.3.1 Reproducibility** 

1,000 times (Chevalier et al., 2010).

10 and 0.5 ng (Patton, 2002 & Smejkal, 2004). Usually, fluorescent dyes used are SYPRO dyes, Flamingo and Deep Purple. All these three dyes are sensitive enough to detect spots with up to 1 ng protein (Patton, 2002) however, because of their high costs, they are less used.

Once stained, gels are scanned and gel image can be analyzed using specific software available. It's always recommended to reproduce the proteome of the sample in at least three gels, representing identical technical repetitions. Software will search for a representative spots profile among all repetitions and, if desirable, compare this generated profile with others previously obtained. To perform the comparison, normally markers spots are designated in the gel and the position of all others spots is determinate using these spots as reference. It's also possible to estimate the volume of interested spots assigning them relative quantification values when the objective is to compare proteins differentially expressed. Through the use of these tools a proteomic map for a determined sample can be assembled and yet information on protein differential expression can be obtained. Among the software available are: Image Master, Progenesis, PDQuest, Samespots, the Melanie package from the Swiss Institute of Bioinformatics, the Phoretix 2D software from Phoretix and Gellab II from Scanalytics.

#### **2.2 Why to use 2D-PAGE**

The 2D-PAGE cannot be used alone to directly identity proteins through the visualization of resulting spots in the gel, even when proteomic maps and sequence information are available to the tested sample. That's because there's a great variation in the proteome of two identical samples, which beings in the protein extraction technique and solubilization and ends in the electrophoresis acrylamide gel. Thus, the identification of proteins in the spots depends on a sequencing stage performed through mass spectrometry (MS or MS/MS). This workflow is usually assumed in proteomics laboratories once can be easily conducted, is applied in many laboratories despite the structure and offers a resolution power enough to detect hundreds of proteins in one gel. Besides that, the 2D-PAGE system is unique about the possibility to visualize the protein profile of a studied sample, allowing immediate comparison with distinct profiles, interesting spots isolation to further studies or yet the enrichment of labeled proteins or specially stained. All these characteristics guarantee its massive application in proteomics characterization. Many other high throughput "gel free" strategies to perform protein separation are available today, but the 2D-PAGE system keep being an important toll in different workflows proposed to protein studies.

The proteomic map assembly is, until the present moment, realized merely by twodimensional electrophoresis. Gel images are digitalized and made available in data banks what enables the *in silico* comparison between different profiles and the selection of interesting spots. The Japanese Bank containing rice proteomic maps shows for example, more than 13000 characterized spots to different tissues and development phases (http://gene64.dna.affrc.go.jp/RPD/). In humans, there are a great number of studies that report the generation of proteomic maps directed to protein identification that works as biomarkers to reproductive dysfunctions and tumor development (Guo et al., 2010 & Klein-Scory et al., 2010).

Another important contribution of 2D-PAGE system to proteomics is found in the identification and relative quantification of differentially expressed proteins between samples, i.e., differential-display proteomics. Until 1997, this assignment was not easy due to proteome variation in identical samples and gel-to-gel variation frequently observed in the repetition of runs from the same sample. In that time, it was necessary to obtain a great

10 and 0.5 ng (Patton, 2002 & Smejkal, 2004). Usually, fluorescent dyes used are SYPRO dyes, Flamingo and Deep Purple. All these three dyes are sensitive enough to detect spots with up to

Once stained, gels are scanned and gel image can be analyzed using specific software available. It's always recommended to reproduce the proteome of the sample in at least three gels, representing identical technical repetitions. Software will search for a representative spots profile among all repetitions and, if desirable, compare this generated profile with others previously obtained. To perform the comparison, normally markers spots are designated in the gel and the position of all others spots is determinate using these spots as reference. It's also possible to estimate the volume of interested spots assigning them relative quantification values when the objective is to compare proteins differentially expressed. Through the use of these tools a proteomic map for a determined sample can be assembled and yet information on protein differential expression can be obtained. Among the software available are: Image Master, Progenesis, PDQuest, Samespots, the Melanie package from the Swiss Institute of Bioinformatics, the Phoretix 2D software from Phoretix

The 2D-PAGE cannot be used alone to directly identity proteins through the visualization of resulting spots in the gel, even when proteomic maps and sequence information are available to the tested sample. That's because there's a great variation in the proteome of two identical samples, which beings in the protein extraction technique and solubilization and ends in the electrophoresis acrylamide gel. Thus, the identification of proteins in the spots depends on a sequencing stage performed through mass spectrometry (MS or MS/MS). This workflow is usually assumed in proteomics laboratories once can be easily conducted, is applied in many laboratories despite the structure and offers a resolution power enough to detect hundreds of proteins in one gel. Besides that, the 2D-PAGE system is unique about the possibility to visualize the protein profile of a studied sample, allowing immediate comparison with distinct profiles, interesting spots isolation to further studies or yet the enrichment of labeled proteins or specially stained. All these characteristics guarantee its massive application in proteomics characterization. Many other high throughput "gel free" strategies to perform protein separation are available today, but the 2D-PAGE system keep being an important toll in

The proteomic map assembly is, until the present moment, realized merely by twodimensional electrophoresis. Gel images are digitalized and made available in data banks what enables the *in silico* comparison between different profiles and the selection of interesting spots. The Japanese Bank containing rice proteomic maps shows for example, more than 13000 characterized spots to different tissues and development phases (http://gene64.dna.affrc.go.jp/RPD/). In humans, there are a great number of studies that report the generation of proteomic maps directed to protein identification that works as biomarkers to reproductive dysfunctions and tumor development (Guo et al., 2010 & Klein-

Another important contribution of 2D-PAGE system to proteomics is found in the identification and relative quantification of differentially expressed proteins between samples, i.e., differential-display proteomics. Until 1997, this assignment was not easy due to proteome variation in identical samples and gel-to-gel variation frequently observed in the repetition of runs from the same sample. In that time, it was necessary to obtain a great

1 ng protein (Patton, 2002) however, because of their high costs, they are less used.

and Gellab II from Scanalytics.

different workflows proposed to protein studies.

**2.2 Why to use 2D-PAGE** 

Scory et al., 2010).

number of gels to reach the required reproducibility in the "average gel" and then perform the comparison between samples. Since DIGE technique or difference gel electrophoresis (Unlu et al., 1997) was developed, the reproducibility problem of 2D-PAGE gel was bypassed. The DIGE system consists in a modification in the conventional protocol of 2D-PAGE that make possible to analyze in a unique gel three different samples giving the electrophoresis system a "multiplex" character. Samples are pre-labeled with fluorescent markers such as Cy2, Cy3 and Cy5, pooled and separate in a single run. Therefore, in addition to solve the 2D-PAGE reproducibility problem, DIGE system allow the direct quantification of spots from different samples resolved in the same gel and is much more sensitive due to fluorescent dye labeling raising the gel resolution dynamic range up to 1,000 times (Chevalier et al., 2010).

The third sample used in the DIGE system is an internal running control composed by identical aliquots from each experimental sample. The mixture: internal standard and sample 01 and 02 are labeled, pooled and resolved in the same gel what avoid diversion on sample preparation. The internal standard control is normally labeled with Cy2 dye and the other samples with Cy3 and Cy5 dyes. The quantification of each protein is obtained from the signal Cy3:Cy2 and Cy5:Cy2 ratio. The Cy3:Cy2 and Cy5:Cy2 ratios for each protein are then normalized across all the gels in a large experiment, using the Cy2 signals for separate normalization of each protein under survey (Lilley & Friedman, 2004).

A problem in DIGE lies in the hydrophobicity of the cyanine dyes, which label the protein by reacting to a large extent, with surface-exposed lysines in the protein, and lead to removal of multiple charges from the protein. Consequently, this decreases the solubility of the labeled protein, and in some cases may lead to protein precipitation prior to gel electrophoresis. To address this problem, minimal labeling is generally employed in DIGE. In this reaction only 1-5% of total lysines in a given protein are labeled avoiding protein precipitation. Alternatively to minimal labeling the saturation labeling method can be done for Cy3 and Cy5 dyes by reacting to free cysteines in a protein (Shaw et al., 2003). This strategy circumvents the sensitivity problem of minimal labeling but limits the proteome analysis to proteins that show free-cystein residues (Chevalier, 2010).

The 2D-PAGE can also be very useful to identify post-translation modifications (PTMs). The affinity chromatography systems are usually used to enrich samples containing a specific PTM, but the 2D-PAGE visual character enable the direct selection on spots differentially expressed to a specific PTM. Proteins resolved in gel can be for example, specifically labeled to detect phosphorylations or glycosylations, and after visually selected; proteins are excised and identified using mass chromatography.

#### **2.3 Limitations of 2D-PAGE**

Technical characteristics related to gel reproducibility and others that can prevent or influence protein resolution are considered the main limitation of 2D-PAGE technique. However, gel reproducibility in 2D-PAGE method is also strongly influenced by own sample biology, what cannot be considered a limitation from the system *per se*. Some limitations associated to 2D-PAGE are pointed and discussed below.

#### **2.3.1 Reproducibility**

Protocols available for protein extraction can be applied to various types of biological samples, but the efficiency varies a lot depending on the biological characteristics of the sample. It's much simpler to reproduce the proteome from samples with unique cellular

Strategies for Protein Separation 9

a lower concentration of thiourea, a chaotropic agent much more efficient than urea. The elevated concentration of urea was necessary to solubilize thiourea, which was used in lower concentration, because if in higher concentration, thiourea can interfere in protein focusing process (Molloy, 2000). This modification in the original protocol of protein solubilization resulted in a greater efficiency to solubilize hydrophobic proteins but yet the combination of urea-thiourea cannot keep the proteins in solubilized forms in the aqueous environment necessary to IEF. Other variations in the solubilization protocols combining urea and others nonionic or zwitterionic detergents were suggested, but all resulted in a additional identification of only some membrane proteins spots (reviewed by Rabilloud et al., 2008). It was clear that the solubilization of membrane proteins, mainly those with high hydrophobicity (multiple transmembrane domain), could not be achieved under IEF compatible conditions (reviewed by Tan et al., 2008). The gel systems intent to resolve membrane proteins should use strong detergents for solubilization of this kind of proteins and agents that can add charges to the proteins preventing their aggregation. Such gel-based systems (blue native-PAGE or BN-PAGE, clear-native-PAGE or CN-PAGE, benzyldimethyln-hexadecylammonium chloride or BAC, and SDS/SDS or dSDS-PAGE) exclude the IEF resulting in a severely impaired gel resolution (reviewed byTan et al., 2008). The resulting spots are generally composed by a misture of proteins that carry different post-translational modification and/or by complexes of membrane and soluble associated proteins (Rabilloud et al., 2008). Other strategies to detect membrane proteins are available using gel free systems, sample pre-fractionation through subcellular fractionation or affinity purification, and the avidin–biotin technology (Elia, 2008). However, there is still a great necessity of development of protocols that allow the high resolution detection of membrane proteins and the simultaneously detection of membrane and soluble proteins. This is especially important when we are looking for desease responses or physiological phenomena because membrane proteins play key functions in normal development, participating in cellular recognition and signal transduction. Identification of altered membrane proteins could lead to the discovery of novel biomarkers in the disease diagnosis (Adam et al., 2002 & Jang &

Hanash, 2003) and targets to therapeutic approaches (Bianco et al., 2006).

The basic proteins represent approximately one third to half of total cellular proteome. Among them, are ribosomal proteins and nucleases which exhibit pI superior to 10 and because of this reason are poorly resolved in pH ranges available for alkaline proteins (pH ranges 6–9, 6–11 or 7–10). This 2D-PAGE limitation began to be settle with the commercialization of IPG strips comprising pH ranges of 3-12, 6-12 and 9-12 which are successfully used in the resolution of strongly alkaline protein, with pI superior to 11

Chromatographic separation methods have been applied in different laboratories around the world to decipher the many complex problems in industry and science, involving, for example, amino acids and proteins, nucleic acids, carbohydrates, drugs, pesticides, etc. This method separates the components of a mixture by the distribution of these components into two phases, where an immiscible stationary phase remains fixed while the other moves through it. The sample components more strongly connected to the stationary phase move

**2.3.2.4 Basic proteins** 

(Drews et al., 2004 & Görg et al., 1997).

**3. Principles of liquid chromatography in proteomics** 

types, like a cell culture for example, than from samples containing many distinct types of cells or cells in different development phases, like for example, from an onion root. The cellular type also offers challenges to protein extraction and solubilization procedures, with a higher reproducibility to animal cell than to plant cells, which are cover by cellular walls and are rich in membranous compartments (plastids). Besides these intrinsic factors associated to sample biology, the proteome dynamism represents an important variation source in experimental repetitions from a same sample, especially when the objective is to perform a protein relative quantification. The proteome can be promptly modify by degradative pathways or by any of the hundreds post-translational modification that exists. Furthermore, small variations due to differences in the genetic backgrounds between sample repetitions can introduce relevant variations in the proteome.

Another factor that can reduce importantly gel reproducibility in 2D-PAGE is gel-to-gel variations which begin in the sample preparation and extend to focusing process and SDS-PAGE. Even in simultaneous runs that preserve exactly the same experimental conditions the gel-to-gel oscillations are present. To minimize or eliminate this effect, two alternatives are available: built an average gel from at least three replicates or use a multiplex run system (DIGE).

 It's also important to emphasize that the reproducibility problem of 2D-PAGE system can restrict the applicability of proteomic maps databank that are being generated when no sequencing information to interesting spots are available.

#### **2.3.2 Resolution**

#### **2.3.2.1 Proteins with high molecular weight**

Proteins with molecular weight higher than 250kDa cannot be resolved in polyacrylamide gels. To realize this, the ideal is to use an agarose gel followed by an isoelectric focusing (Yokoyama et al., 2009).

#### **2.3.2.2 Low abundance or rare proteins**

Low abundance proteins operate in cellular activities of high interest, participating in signal reception, gene activity regulation and in signal transduction cascades. The detection of these proteins is masked in 2D gels by abundant proteins, which depending on the sample can be present in a magnitude concentration up to 12 times higher. That's the case for example, of albumin protein present in plasma samples. One possible strategy to avoid this problem is the depletion of abundant protein through methods such as affinity chromatography (Greenough et al., 2004). This is normally used to plasma sample, but is not yet possible to other many systems. In plant cells, the abundance of ribulose bisphosphate carboxylase/oxygenase (RuBisCo) enzyme mask low abundant proteins and the current used strategy to understate this effect is to reduce sample complexity through the use of IPG strips with overlapping narrow pH ranges (Görg et al., 2000).

#### **2.3.2.3 Hydrophobic proteins and membrane proteins**

In 1998, an important paper was published (Wilkins et al., 1998) in which was demonstrated that hydrophobic proteins were almost absent in 2D gels done using urea as the only chaotropic agent in the protein solubilization solution. This information was very valuable once hydrophobic proteins comprise the proteins present in cellular membranes and represent around 30% of total proteome (Molloy, 2000). After this observation, proteins solubilization began to be realized using a combination of higher concentration of urea and

types, like a cell culture for example, than from samples containing many distinct types of cells or cells in different development phases, like for example, from an onion root. The cellular type also offers challenges to protein extraction and solubilization procedures, with a higher reproducibility to animal cell than to plant cells, which are cover by cellular walls and are rich in membranous compartments (plastids). Besides these intrinsic factors associated to sample biology, the proteome dynamism represents an important variation source in experimental repetitions from a same sample, especially when the objective is to perform a protein relative quantification. The proteome can be promptly modify by degradative pathways or by any of the hundreds post-translational modification that exists. Furthermore, small variations due to differences in the genetic backgrounds between

Another factor that can reduce importantly gel reproducibility in 2D-PAGE is gel-to-gel variations which begin in the sample preparation and extend to focusing process and SDS-PAGE. Even in simultaneous runs that preserve exactly the same experimental conditions the gel-to-gel oscillations are present. To minimize or eliminate this effect, two alternatives are available: built an average gel from at least three replicates or use a multiplex run system

 It's also important to emphasize that the reproducibility problem of 2D-PAGE system can restrict the applicability of proteomic maps databank that are being generated when no

Proteins with molecular weight higher than 250kDa cannot be resolved in polyacrylamide gels. To realize this, the ideal is to use an agarose gel followed by an isoelectric focusing

Low abundance proteins operate in cellular activities of high interest, participating in signal reception, gene activity regulation and in signal transduction cascades. The detection of these proteins is masked in 2D gels by abundant proteins, which depending on the sample can be present in a magnitude concentration up to 12 times higher. That's the case for example, of albumin protein present in plasma samples. One possible strategy to avoid this problem is the depletion of abundant protein through methods such as affinity chromatography (Greenough et al., 2004). This is normally used to plasma sample, but is not yet possible to other many systems. In plant cells, the abundance of ribulose bisphosphate carboxylase/oxygenase (RuBisCo) enzyme mask low abundant proteins and the current used strategy to understate this effect is to reduce sample complexity through the use of IPG

In 1998, an important paper was published (Wilkins et al., 1998) in which was demonstrated that hydrophobic proteins were almost absent in 2D gels done using urea as the only chaotropic agent in the protein solubilization solution. This information was very valuable once hydrophobic proteins comprise the proteins present in cellular membranes and represent around 30% of total proteome (Molloy, 2000). After this observation, proteins solubilization began to be realized using a combination of higher concentration of urea and

sample repetitions can introduce relevant variations in the proteome.

sequencing information to interesting spots are available.

strips with overlapping narrow pH ranges (Görg et al., 2000).

**2.3.2.3 Hydrophobic proteins and membrane proteins** 

**2.3.2.1 Proteins with high molecular weight** 

**2.3.2.2 Low abundance or rare proteins** 

(DIGE).

**2.3.2 Resolution** 

(Yokoyama et al., 2009).

a lower concentration of thiourea, a chaotropic agent much more efficient than urea. The elevated concentration of urea was necessary to solubilize thiourea, which was used in lower concentration, because if in higher concentration, thiourea can interfere in protein focusing process (Molloy, 2000). This modification in the original protocol of protein solubilization resulted in a greater efficiency to solubilize hydrophobic proteins but yet the combination of urea-thiourea cannot keep the proteins in solubilized forms in the aqueous environment necessary to IEF. Other variations in the solubilization protocols combining urea and others nonionic or zwitterionic detergents were suggested, but all resulted in a additional identification of only some membrane proteins spots (reviewed by Rabilloud et al., 2008). It was clear that the solubilization of membrane proteins, mainly those with high hydrophobicity (multiple transmembrane domain), could not be achieved under IEF compatible conditions (reviewed by Tan et al., 2008). The gel systems intent to resolve membrane proteins should use strong detergents for solubilization of this kind of proteins and agents that can add charges to the proteins preventing their aggregation. Such gel-based systems (blue native-PAGE or BN-PAGE, clear-native-PAGE or CN-PAGE, benzyldimethyln-hexadecylammonium chloride or BAC, and SDS/SDS or dSDS-PAGE) exclude the IEF resulting in a severely impaired gel resolution (reviewed byTan et al., 2008). The resulting spots are generally composed by a misture of proteins that carry different post-translational modification and/or by complexes of membrane and soluble associated proteins (Rabilloud et al., 2008). Other strategies to detect membrane proteins are available using gel free systems, sample pre-fractionation through subcellular fractionation or affinity purification, and the avidin–biotin technology (Elia, 2008). However, there is still a great necessity of development of protocols that allow the high resolution detection of membrane proteins and the simultaneously detection of membrane and soluble proteins. This is especially important when we are looking for desease responses or physiological phenomena because membrane proteins play key functions in normal development, participating in cellular recognition and signal transduction. Identification of altered membrane proteins could lead to the discovery of novel biomarkers in the disease diagnosis (Adam et al., 2002 & Jang & Hanash, 2003) and targets to therapeutic approaches (Bianco et al., 2006).

#### **2.3.2.4 Basic proteins**

The basic proteins represent approximately one third to half of total cellular proteome. Among them, are ribosomal proteins and nucleases which exhibit pI superior to 10 and because of this reason are poorly resolved in pH ranges available for alkaline proteins (pH ranges 6–9, 6–11 or 7–10). This 2D-PAGE limitation began to be settle with the commercialization of IPG strips comprising pH ranges of 3-12, 6-12 and 9-12 which are successfully used in the resolution of strongly alkaline protein, with pI superior to 11 (Drews et al., 2004 & Görg et al., 1997).

#### **3. Principles of liquid chromatography in proteomics**

Chromatographic separation methods have been applied in different laboratories around the world to decipher the many complex problems in industry and science, involving, for example, amino acids and proteins, nucleic acids, carbohydrates, drugs, pesticides, etc. This method separates the components of a mixture by the distribution of these components into two phases, where an immiscible stationary phase remains fixed while the other moves through it. The sample components more strongly connected to the stationary phase move

Strategies for Protein Separation 11

6000 psi, with no pulse output; flow rate ranging from 0.1 to 10 mL/min; constant solvent flow (with no variations greater than 0.5%); and corrosion-resistant components. There are two main pumps for HPLC: reciprocal pumps, which are employed in 90% of HPLC systems, consist of a small chamber in which the solvent is pumped by an oscillatory movement of the piston controlled by a motor. Because of this, the flow is not continuous, requiring a shock pulse. Syringe pumps consist of a large chamber equipped with a plunger that is activated by a screw mechanism. The rotation of the screw provides a continuous movement of the mobile phase that is free from pulsations from 0.1 to 5 mL/min. The most common injection system is sampling loops, which allow the introduction of samples up to 7000 psi with excellent precision. These loops can be manual or automated (Oliver, 1991;

In a chromatograph, there are two types of columns: a guard column and separation column. The guard column has a length of 2 to 5 cm and is placed between the injector and separation column, allowing it to retain possible solids that can block the filters of the column and, in some cases, retain materials that can precipitate chemical reactions in the stationary phase. The separation columns (stationary phase) are the heart of a chromatograph, since they are responsible for the separation of the components present in the sample. They consist of a tube of inert material, usually stainless steel, and uniform internal diameter (i.d.), capable of resisting high pressures. They can be classified according

**Column designation Internal Diameter [mm]** 

Table 1. Classification of liquid chromatography according to internal diameter (ID) of

Silica is the most common stationary phase in HPLC because of advantages such as resistance to high pressures and physicochemical properties. Despite these advantages, silica has two limitations: the first restricts its use in a pH range of 2 to 8 because at pH below 2, bonds of Si-O-Si, which compose the silica and are responsible for maintaining the organic groups immobilized on the silica surface, become more susceptible to hydrolysis. On the other hand, at pH above 8, the hydroxyl groups (OH-) can easily react with the residual silanols, promoting silica dissolution that result in low efficiency and peak enlargement. The second limitation refers to the presence of residual silanol groups that can result on the asymmetry of the peak when basic samples are analyzed (Neue, 1997; Oliver,

The particle structures are classified as porous, non-porous, and pellicular. The porous particles are most often used for HPLC, since it allows for greater surface area for interactions between the stationary phase and the analyte. Non-porous particles allow faster chromatography without losing efficiency because there is no diffusion of the analyte inside

Conventional HPLC 3 – 5 Narrow-bore HPLC 2 Micro LC 0.5 – 1 Capillary LC 0.1 – 0.5 Nano LC 0.01 – 0.1 Open tubular LC 0.005 – 0.05

Meyer, 2010).

to i.d (Saito et al., 2004).

columns (Saito et al., 2004)

1991; Meyer, 2010).

very slowly in mobile phase flow, while those linked more weakly to the stationary phase move more quickly. This process results in differential migration of these components.

The main criteria for classification of chromatographic separation are related to the separation mechanism involved and the different types of stages used. Thus, the physical form of the system classifies the general technique as planar or column chromatography. In the former, the stationary phase is prepared on a flat surface, while in the latter, the stationary phase is arranged in a cylinder. In Gas Chromatography (GC), the mobile phase is an inert gas that does not contribute to the separation process, whereas in Liquid Chromatography (LC), the mobile phase is a liquid that can interact with the solutes, so their composition is very important in the separation process. Supercritical Fluid Chromatography (SFC) utilizes a substance with temperature and pressure higher than the critical temperature (Tc) and critical pressure (Pc) proper to fluids, with the advantage of having lower viscosity than the liquid while maintaining the properties of interaction with the solutes (Skoog et al., 2006).

The LC techniques may be further divided into classic liquid chromatography (LC) and High-Performance Liquid Chromatography (HPLC). LC utilizes glass columns at atmospheric pressure, and the flow rate is due to gravitational forces. HPLC is the automation of LC under conditions that provide for enhanced separations during a shorter time. It utilizes a metal column and the mobile phase flow rate is due to a high-pressure pump, which increases the efficiency achieved in the separation of compounds, thus making HPLC one of the main techniques used in the separation of proteins and peptides from a wide variety of synthetic or biological sources.

The HPLC equipment comprises a reservoir of mobile phase, which contains the solvents used as the mobile phase to achieve selectivity in HPLC; a pumping system; sample injector; columns; and detectors (Figure 1). The pumping system is required to pump the mobile phase and overcome the pressure exerted by the particles of the column. The major requirements for an efficient pumping system include the ability to generate pressure to

Fig. 1. Scheme of a HPLC system.

very slowly in mobile phase flow, while those linked more weakly to the stationary phase move more quickly. This process results in differential migration of these components. The main criteria for classification of chromatographic separation are related to the separation mechanism involved and the different types of stages used. Thus, the physical form of the system classifies the general technique as planar or column chromatography. In the former, the stationary phase is prepared on a flat surface, while in the latter, the stationary phase is arranged in a cylinder. In Gas Chromatography (GC), the mobile phase is an inert gas that does not contribute to the separation process, whereas in Liquid Chromatography (LC), the mobile phase is a liquid that can interact with the solutes, so their composition is very important in the separation process. Supercritical Fluid Chromatography (SFC) utilizes a substance with temperature and pressure higher than the critical temperature (Tc) and critical pressure (Pc) proper to fluids, with the advantage of having lower viscosity than the liquid while maintaining the properties of interaction with

The LC techniques may be further divided into classic liquid chromatography (LC) and High-Performance Liquid Chromatography (HPLC). LC utilizes glass columns at atmospheric pressure, and the flow rate is due to gravitational forces. HPLC is the automation of LC under conditions that provide for enhanced separations during a shorter time. It utilizes a metal column and the mobile phase flow rate is due to a high-pressure pump, which increases the efficiency achieved in the separation of compounds, thus making HPLC one of the main techniques used in the separation of proteins and peptides from a

The HPLC equipment comprises a reservoir of mobile phase, which contains the solvents used as the mobile phase to achieve selectivity in HPLC; a pumping system; sample injector; columns; and detectors (Figure 1). The pumping system is required to pump the mobile phase and overcome the pressure exerted by the particles of the column. The major requirements for an efficient pumping system include the ability to generate pressure to

the solutes (Skoog et al., 2006).

Fig. 1. Scheme of a HPLC system.

wide variety of synthetic or biological sources.

6000 psi, with no pulse output; flow rate ranging from 0.1 to 10 mL/min; constant solvent flow (with no variations greater than 0.5%); and corrosion-resistant components. There are two main pumps for HPLC: reciprocal pumps, which are employed in 90% of HPLC systems, consist of a small chamber in which the solvent is pumped by an oscillatory movement of the piston controlled by a motor. Because of this, the flow is not continuous, requiring a shock pulse. Syringe pumps consist of a large chamber equipped with a plunger that is activated by a screw mechanism. The rotation of the screw provides a continuous movement of the mobile phase that is free from pulsations from 0.1 to 5 mL/min. The most common injection system is sampling loops, which allow the introduction of samples up to 7000 psi with excellent precision. These loops can be manual or automated (Oliver, 1991; Meyer, 2010).

In a chromatograph, there are two types of columns: a guard column and separation column. The guard column has a length of 2 to 5 cm and is placed between the injector and separation column, allowing it to retain possible solids that can block the filters of the column and, in some cases, retain materials that can precipitate chemical reactions in the stationary phase. The separation columns (stationary phase) are the heart of a chromatograph, since they are responsible for the separation of the components present in the sample. They consist of a tube of inert material, usually stainless steel, and uniform internal diameter (i.d.), capable of resisting high pressures. They can be classified according to i.d (Saito et al., 2004).


Table 1. Classification of liquid chromatography according to internal diameter (ID) of columns (Saito et al., 2004)

Silica is the most common stationary phase in HPLC because of advantages such as resistance to high pressures and physicochemical properties. Despite these advantages, silica has two limitations: the first restricts its use in a pH range of 2 to 8 because at pH below 2, bonds of Si-O-Si, which compose the silica and are responsible for maintaining the organic groups immobilized on the silica surface, become more susceptible to hydrolysis. On the other hand, at pH above 8, the hydroxyl groups (OH-) can easily react with the residual silanols, promoting silica dissolution that result in low efficiency and peak enlargement. The second limitation refers to the presence of residual silanol groups that can result on the asymmetry of the peak when basic samples are analyzed (Neue, 1997; Oliver, 1991; Meyer, 2010).

The particle structures are classified as porous, non-porous, and pellicular. The porous particles are most often used for HPLC, since it allows for greater surface area for interactions between the stationary phase and the analyte. Non-porous particles allow faster chromatography without losing efficiency because there is no diffusion of the analyte inside

Strategies for Protein Separation 13

wide choice of mobile and stationary phases. Because the mobile phase carries the solutes through the stationary phase, the correct choice of mobile phase is extremely important in the separation process, as it can completely change the selectivity of separations. The solvents used must be compatible with the stationary phase and detector and the high power of sample solubilization. The elution mode can be isocratic or gradient. In isocratic elution, the separation employs a single solvent or solvent mixture of constant composition, and the mobile phase remains constant with time. Gradient elution, in contrast, utilizes two or more solvent systems that differ significantly in polarity. In this case, when the elution process is begun, the ratio of the solvents varies with time, and separation efficiency is

The major separation modes that are used to separate most compounds are normal-phase chromatography (NP), reverse-phase chromatography (RP), size-exclusion chromatography

In **normal-phase chromatography**, the stationary phase is polar while the mobile phase is non-polar. The retention of analytes occurs by the interaction of the stationary phase's polar functional groups with the polar groups on the particles' surfaces, and they elute from the column by addition of the low polarity compound followed by other compounds of increasing polarity (Figure 2). This method is widely used to separate analytes with low to

Fig. 2. Diagram of normal-phase chromatography separation. The stationary phase is polar and retains the polar molecule (blue) most strongly. The relatively non-polar molecules (red circles) are quickly eluted by the mobile phase, a non-polar solvent. An increase in mobile

**Reverse-phase liquid chromatography** has become a powerful tool widely used in the analysis and purification of biomolecules because of the high resolution provided by the technique. It is considered a very versatile technique because it can be used for non-polar, polar, ionizable, and ionic molecules. In RP-HPLC, the separation principle is based on the hydrophobic interaction between the analytes and non-polar groups bound on the stationary phase. Silica is the most common material used for column packing, which consists mainly of silicon dioxide (SiO2) and has octadecyl (hydrocarbons having 18 carbon

phase polarity will move polar molecules through the column.

(SEC), ion-exchange chromatography (IEX), and affinity chromatography (AC).

greatly enhanced by gradient elution (Skoog et all., 2006).

intermediate polarity (Skoog et al., 2006).

of the particles. However, to keep the sample capacity, it is necessary to use particles with diameters of 1 to 2 µm, as the capacity is 50 times less than that of porous particles. Pellicular particles are constituted of a solid nucleus coated with a thin layer (13 µm) of the stationary phase, and they have a good efficiency when analyzing macromolecules due to the fast mass transfer kinetics. The particle shape may be regular (spherical), irregular, or monolithic. The columns packed with spherical particles have a higher resistance to high pressures and good efficiency. The columns packed with irregular particles can have good efficiency when compared to regular particles; however, they have no mechanic stability and can result in higher pressures in the system (Meyer, 2010). Recently, monolithic particles have been introduced in HPLC. They are single pieces of porous silica or a highly intercrossed porous polymer such as polyacrylamide. The skeletons of monolithic particles contain macropores with diameters of approximately 2 μm and mesopores with diameters of approximately 13 nm. Because of those characteristics, they can provide higher flow rates without increasing the pressure, as well as great chemical stability and high permeability (Neue, 1997; Meyer, 2010).

In HPLC, there are different ways to detect the compounds eluting from the column. The ideal detectors are linear, selective and non-destructive and have adequate sensitivity, good stability and reproducibility, and a short response time. However, there are no detectors with all the features mentioned above, so the choice of the detector should be based on objective analysis as well as the type of sample to be analyzed. Liquid chromatographic detectors are basically of two types. Bulk property detectors respond to mobile-phase properties, such as refractive index, dielectric constant, or density. In contrast, solute property detectors respond to properties of solutes, such as UV absorbance, fluorescence, or diffusion current, which are not present in the mobile phase (Skoog et al., 2006; Meyer, 2010). Table 2 shows the major detectors used in HPLC.


Source: Skoog et al., 2006

Table 2. The most common detectors used in HPLC.

High-performance chromatography supplanted gas phase chromatography because it is more versatile; it is not limited to volatile and thermally stable samples, thus allowing a

of the particles. However, to keep the sample capacity, it is necessary to use particles with diameters of 1 to 2 µm, as the capacity is 50 times less than that of porous particles. Pellicular particles are constituted of a solid nucleus coated with a thin layer (13 µm) of the stationary phase, and they have a good efficiency when analyzing macromolecules due to the fast mass transfer kinetics. The particle shape may be regular (spherical), irregular, or monolithic. The columns packed with spherical particles have a higher resistance to high pressures and good efficiency. The columns packed with irregular particles can have good efficiency when compared to regular particles; however, they have no mechanic stability and can result in higher pressures in the system (Meyer, 2010). Recently, monolithic particles have been introduced in HPLC. They are single pieces of porous silica or a highly intercrossed porous polymer such as polyacrylamide. The skeletons of monolithic particles contain macropores with diameters of approximately 2 μm and mesopores with diameters of approximately 13 nm. Because of those characteristics, they can provide higher flow rates without increasing the pressure, as well as great chemical stability and high permeability

In HPLC, there are different ways to detect the compounds eluting from the column. The ideal detectors are linear, selective and non-destructive and have adequate sensitivity, good stability and reproducibility, and a short response time. However, there are no detectors with all the features mentioned above, so the choice of the detector should be based on objective analysis as well as the type of sample to be analyzed. Liquid chromatographic detectors are basically of two types. Bulk property detectors respond to mobile-phase properties, such as refractive index, dielectric constant, or density. In contrast, solute property detectors respond to properties of solutes, such as UV absorbance, fluorescence, or diffusion current, which are not present in the mobile phase (Skoog et al., 2006; Meyer,

**Type of Detector Limit of Detection Commercial Available** 

High-performance chromatography supplanted gas phase chromatography because it is more versatile; it is not limited to volatile and thermally stable samples, thus allowing a

Absorbance 10 pg Yes Conductivity 100 pg – 1 ng Yes Electrochemical 100 pg Yes Element Selective 1 ng No Fluorescence 10 fg Yes FTIR 1 µg Yes Light Scattering 1 µg Yes Mass Spectrometers < 1 pg Yes Optical Activity 1 ng No Photoionization < 1 pg No Refractive Index 1 ng Yes

(Neue, 1997; Meyer, 2010).

Source: Skoog et al., 2006

2010). Table 2 shows the major detectors used in HPLC.

Table 2. The most common detectors used in HPLC.

wide choice of mobile and stationary phases. Because the mobile phase carries the solutes through the stationary phase, the correct choice of mobile phase is extremely important in the separation process, as it can completely change the selectivity of separations. The solvents used must be compatible with the stationary phase and detector and the high power of sample solubilization. The elution mode can be isocratic or gradient. In isocratic elution, the separation employs a single solvent or solvent mixture of constant composition, and the mobile phase remains constant with time. Gradient elution, in contrast, utilizes two or more solvent systems that differ significantly in polarity. In this case, when the elution process is begun, the ratio of the solvents varies with time, and separation efficiency is greatly enhanced by gradient elution (Skoog et all., 2006).

The major separation modes that are used to separate most compounds are normal-phase chromatography (NP), reverse-phase chromatography (RP), size-exclusion chromatography (SEC), ion-exchange chromatography (IEX), and affinity chromatography (AC).

In **normal-phase chromatography**, the stationary phase is polar while the mobile phase is non-polar. The retention of analytes occurs by the interaction of the stationary phase's polar functional groups with the polar groups on the particles' surfaces, and they elute from the column by addition of the low polarity compound followed by other compounds of increasing polarity (Figure 2). This method is widely used to separate analytes with low to intermediate polarity (Skoog et al., 2006).

Fig. 2. Diagram of normal-phase chromatography separation. The stationary phase is polar and retains the polar molecule (blue) most strongly. The relatively non-polar molecules (red circles) are quickly eluted by the mobile phase, a non-polar solvent. An increase in mobile phase polarity will move polar molecules through the column.

**Reverse-phase liquid chromatography** has become a powerful tool widely used in the analysis and purification of biomolecules because of the high resolution provided by the technique. It is considered a very versatile technique because it can be used for non-polar, polar, ionizable, and ionic molecules. In RP-HPLC, the separation principle is based on the hydrophobic interaction between the analytes and non-polar groups bound on the stationary phase. Silica is the most common material used for column packing, which consists mainly of silicon dioxide (SiO2) and has octadecyl (hydrocarbons having 18 carbon

Strategies for Protein Separation 15

Fig. 4. Affinity chromatography column. The sample is loaded under ideal binding conditions. The target molecules bind specifically to the affinity ligands, while all other

are classified as strong ion exchangers that are completely ionized at a wide range of pH levels, while weak ion exchangers are ionized within a narrow pH range. Thus, weak exchangers offer more flexibility in selectivity than do strong ion exchangers, although the strong ion exchangers are used for initial development and optimization, because binding capacity does not change with pH. For the separation, the column is equilibrated with a start buffer, and then an analyte containing an opposite charge binds to the ionic groups of the matrix, whereas uncharged molecules, or those with the same charge as the ionic groups, are not retained. The adsorbed analyte of interest can be eluted by a gradient of ionic strength, pH values, or a combination of both in the mobile phase. The action mechanism of ion exchanger is shown in Figure 5 (Oliver, 1991; GE Healthcare, 2004;

**Size-exclusion chromatography (SEC)** is a preparative and non-destructive analytical technique that, unlike other methods, is not based on interactions between molecules and the stationary phase, but on the size of molecules (Figure 6). The column is packed with inert material with pores of controlled size, within the stationary phase, such that the small molecules can enter most of the pores and therefore will be retained the longest time, while the larger molecules cannot penetrate and are kept for a shorter time period. The SEC can be classified according to the mobile phase used in *gel filtration chromatography* or *gel permeation chromatography*. Gel filtration chromatography (GFC) uses an aqueous mobile phase, which may contain organic modifiers or salts to change the ionic force or buffer solutions to change pH. Gel permeation chromatography (GPC) is a method used to separate high polymers, and it has become a prominent and widely used method for estimating molecular-weight distributions. Unlike GFC, GPC uses organic mobile phases such as tetrahydrofuran (THF), toluene, chloroform, dichloromethane, or dimethylformamide (Oliver, 1991; Meyer, 2004;

sample components, are not adsorbed.

Meyer, 2010).

GE Healthcare, 2010).

atoms) and octyl (hydrocarbons having 8 carbon atoms) groups chemically bound to the surface. The mobile phase composition is usually water or a water-miscible organic solvent (methanol, acetonitrile). The analytes adsorbed on the hydrophobic surface remain bound until the higher concentration of the organic solvent promotes the desorption of the molecules from the hydrophobic surface (Figure 3). More hydrophobic analytes are eluted slower than are the hydrophilic analytes (Skoog et al., 2006; GE Healthcare, 2006).

Fig. 3. Diagram of Reverse-phase chromatography separation. The stationary phase is nonpolar and retains the non-polar molecule (red) most strongly. The relatively polar molecules (blue circles) are quickly eluted by the mobile phase, a polar solvent. A decrease in mobile phase polarity will move non-polar molecules through the column.

**Affinity chromatography** is the most specific chromatographic method. The separation is based on specific biochemical interactions such as enzyme-inhibitor, antigen-antibody or hormone-carrier. The stationary phase involves an inert matrix coupled with an *affinity ligand* specific for a binding site on the target molecule. The substance to be purified is specifically and reversibly adsorbed to a ligand, immobilized by a covalent bond to a chromatographic matrix. The samples are loaded in an affinity column containing the specific ligand, and the analyte of interest is adsorbed from the sample, while the molecules which have no affinity for the ligand pass through the column (Figure 4). Recovery of molecules of interest can be achieved by changing experimental conditions such as pH values, temperature, or ionic strength or by adding a stronger ligand to the mobile phase. For success in affinity chromatography, some important points have to be considered, such as finding a ligand specific enough and determining the ideal conditions for safe binding between analyte and ligand, as well as the ideal conditions for the retention and elution of the molecules involved (Skoog et al., 2006; GE Healthcare, 2007; Hage, 1999).

**Ion-exchange chromatography (IEC)** is based on the charge properties of the molecules. A stationary phase matrix constituted from a porous and inert material contains charged groups that interact with analyte ions of opposite charge. If these groups are acidic in nature, they interact with positively charged analytes and are called cation exchangers; however, if these groups are basic in nature, they interact with negatively charged molecules and are called anion exchangers. As the matrix material, they can be classified as organic (most common) and inorganic, natural or synthetic. Charged groups binding to the matrix

atoms) and octyl (hydrocarbons having 8 carbon atoms) groups chemically bound to the surface. The mobile phase composition is usually water or a water-miscible organic solvent (methanol, acetonitrile). The analytes adsorbed on the hydrophobic surface remain bound until the higher concentration of the organic solvent promotes the desorption of the molecules from the hydrophobic surface (Figure 3). More hydrophobic analytes are eluted

Fig. 3. Diagram of Reverse-phase chromatography separation. The stationary phase is nonpolar and retains the non-polar molecule (red) most strongly. The relatively polar molecules (blue circles) are quickly eluted by the mobile phase, a polar solvent. A decrease in mobile

**Affinity chromatography** is the most specific chromatographic method. The separation is based on specific biochemical interactions such as enzyme-inhibitor, antigen-antibody or hormone-carrier. The stationary phase involves an inert matrix coupled with an *affinity ligand* specific for a binding site on the target molecule. The substance to be purified is specifically and reversibly adsorbed to a ligand, immobilized by a covalent bond to a chromatographic matrix. The samples are loaded in an affinity column containing the specific ligand, and the analyte of interest is adsorbed from the sample, while the molecules which have no affinity for the ligand pass through the column (Figure 4). Recovery of molecules of interest can be achieved by changing experimental conditions such as pH values, temperature, or ionic strength or by adding a stronger ligand to the mobile phase. For success in affinity chromatography, some important points have to be considered, such as finding a ligand specific enough and determining the ideal conditions for safe binding between analyte and ligand, as well as the ideal conditions for the retention and elution of

**Ion-exchange chromatography (IEC)** is based on the charge properties of the molecules. A stationary phase matrix constituted from a porous and inert material contains charged groups that interact with analyte ions of opposite charge. If these groups are acidic in nature, they interact with positively charged analytes and are called cation exchangers; however, if these groups are basic in nature, they interact with negatively charged molecules and are called anion exchangers. As the matrix material, they can be classified as organic (most common) and inorganic, natural or synthetic. Charged groups binding to the matrix

phase polarity will move non-polar molecules through the column.

the molecules involved (Skoog et al., 2006; GE Healthcare, 2007; Hage, 1999).

slower than are the hydrophilic analytes (Skoog et al., 2006; GE Healthcare, 2006).

Fig. 4. Affinity chromatography column. The sample is loaded under ideal binding conditions. The target molecules bind specifically to the affinity ligands, while all other sample components, are not adsorbed.

are classified as strong ion exchangers that are completely ionized at a wide range of pH levels, while weak ion exchangers are ionized within a narrow pH range. Thus, weak exchangers offer more flexibility in selectivity than do strong ion exchangers, although the strong ion exchangers are used for initial development and optimization, because binding capacity does not change with pH. For the separation, the column is equilibrated with a start buffer, and then an analyte containing an opposite charge binds to the ionic groups of the matrix, whereas uncharged molecules, or those with the same charge as the ionic groups, are not retained. The adsorbed analyte of interest can be eluted by a gradient of ionic strength, pH values, or a combination of both in the mobile phase. The action mechanism of ion exchanger is shown in Figure 5 (Oliver, 1991; GE Healthcare, 2004; Meyer, 2010).

**Size-exclusion chromatography (SEC)** is a preparative and non-destructive analytical technique that, unlike other methods, is not based on interactions between molecules and the stationary phase, but on the size of molecules (Figure 6). The column is packed with inert material with pores of controlled size, within the stationary phase, such that the small molecules can enter most of the pores and therefore will be retained the longest time, while the larger molecules cannot penetrate and are kept for a shorter time period. The SEC can be classified according to the mobile phase used in *gel filtration chromatography* or *gel permeation chromatography*. Gel filtration chromatography (GFC) uses an aqueous mobile phase, which may contain organic modifiers or salts to change the ionic force or buffer solutions to change pH. Gel permeation chromatography (GPC) is a method used to separate high polymers, and it has become a prominent and widely used method for estimating molecular-weight distributions. Unlike GFC, GPC uses organic mobile phases such as tetrahydrofuran (THF), toluene, chloroform, dichloromethane, or dimethylformamide (Oliver, 1991; Meyer, 2004; GE Healthcare, 2010).

Strategies for Protein Separation 17

In proteomics, LC can be performed downstream of 2D gels to fractionate peptides from excised spots before MS analysis, upstream of 2D gels to prefractionate the protein sample, or instead of 2D gels as the main separation method in a multidimensional protein identification technology (MudPIT) experiment (item 4). Two strategies for protein identification and characterization by MS currently are employed in proteomics. In the bottom-up strategy, purified proteins or complex protein mixtures are subjected to enzymatic digestion, and the peptide products are analyzed by MS (Andersen et al., 1996; Pandley & Man, 2000) (Figure 7). In top-down proteomics, intact proteins or big protein fragments are subjected to fragmentation during MS analysis (Kelleher, 2004; Siuti & Kelleher, 2007). The major problem in bottom-up proteomics is that many peptides are generated for subsequent mass analysis, so it is not possible to get full protein sequence coverage and protein inference can be a problem. Different proteins can share some peptides, and consequently identification through bottom-up proteomics could be ambiguous. For these reasons, many efforts have focused on separation methods such as multidimensional chromatography to improve sensitivity and resolution. On the other hand, top-down proteomics permits high coverage sequence, thus overcoming one of the most important challenges in bottom-up strategy; however, it is a newer approach with

several instrument limitations that will benefit from some hardware MS advances.

Fig. 7. Schematic representation of separation methods in bottom-up proteomics approaches

(Adapted from Agrawal et al.,2008).

**3.1 Liquid chromatography coupled to MS** 

Fig. 5. The principle of IEC Separation. The mobile phase has ions negatively charged (red circles) that binding to the stationary phase (positively charged). The sample containing mixture of positively and negatively charged groups flows through the column. Those analytes containing negative charge are able to displace the mobile phase ions and bind to the stationary phase, while the positives groups (yellow) are eluted. The sample bound in the stationary phase can be eluted by increasing the concentration of a similarly charged species. The analyte that binds weakly (green) in the stationary phase will be eluted by buffer with salt ions at lower concentration. The analyte that binds strongly (blue) in the stationary phase will be eluted by buffer with salt ions at higher concentration.

Fig. 6. Molecules smaller than pore will become trapped in the matrix. Those of larger molecular weight will not be trapped but will flow through column. Thus, larger molecules elute first, while smaller molecules are held longer inside the pores and will be eluted last.

Fig. 5. The principle of IEC Separation. The mobile phase has ions negatively charged (red circles) that binding to the stationary phase (positively charged). The sample containing mixture of positively and negatively charged groups flows through the column. Those analytes containing negative charge are able to displace the mobile phase ions and bind to the stationary phase, while the positives groups (yellow) are eluted. The sample bound in the stationary phase can be eluted by increasing the concentration of a similarly charged species. The analyte that binds weakly (green) in the stationary phase will be eluted by buffer with salt ions at lower concentration. The analyte that binds strongly (blue) in the

stationary phase will be eluted by buffer with salt ions at higher concentration.

Fig. 6. Molecules smaller than pore will become trapped in the matrix. Those of larger molecular weight will not be trapped but will flow through column. Thus, larger molecules elute first, while smaller molecules are held longer inside the pores and will be eluted last.

#### **3.1 Liquid chromatography coupled to MS**

In proteomics, LC can be performed downstream of 2D gels to fractionate peptides from excised spots before MS analysis, upstream of 2D gels to prefractionate the protein sample, or instead of 2D gels as the main separation method in a multidimensional protein identification technology (MudPIT) experiment (item 4). Two strategies for protein identification and characterization by MS currently are employed in proteomics. In the bottom-up strategy, purified proteins or complex protein mixtures are subjected to enzymatic digestion, and the peptide products are analyzed by MS (Andersen et al., 1996; Pandley & Man, 2000) (Figure 7). In top-down proteomics, intact proteins or big protein fragments are subjected to fragmentation during MS analysis (Kelleher, 2004; Siuti & Kelleher, 2007). The major problem in bottom-up proteomics is that many peptides are generated for subsequent mass analysis, so it is not possible to get full protein sequence coverage and protein inference can be a problem. Different proteins can share some peptides, and consequently identification through bottom-up proteomics could be ambiguous. For these reasons, many efforts have focused on separation methods such as multidimensional chromatography to improve sensitivity and resolution. On the other hand, top-down proteomics permits high coverage sequence, thus overcoming one of the most important challenges in bottom-up strategy; however, it is a newer approach with several instrument limitations that will benefit from some hardware MS advances.

Fig. 7. Schematic representation of separation methods in bottom-up proteomics approaches (Adapted from Agrawal et al.,2008).

Strategies for Protein Separation 19

correlated properties. Other factors to be considered include the fact that the first column should have a larger loading capacity and have solvent compatibility with the subsequent columns. Thus, this compatibility between dimensions is essential in online systems. The last step of chromatography immediately before MS analysis should be compatible with electrospray ionization (ESI). For this purpose, the reverse-phase (RP) chromatography is frequently used, as it can desalt the samples and it is completely compatible with ESI, thus

Several proteomic studies using different multidimensional configurations have been done along these years (reviewed by Gao et al., 2010) Here we point out some examples of the

MUDPIT was first introduced by Yates and co-workers (Link et al., 1999; Wolters et al., 2001) using a biphasic column sequentially packed with SCX particles and then with C18 particles for peptide separation prior to MS analysis. Other works reported the use of an SCX column and an RP column connected to perform an online SCX-RP-MS/MS analysis (Tram et al., 2008; Gilar et al., 2005). Besides their orthogonality, SCX and RP columns show

In the SCX-RP configuration, peptides from a complex mixture are acidified and applied to an SCX column, in which the elution steps can be done using a salt gradient. Then, a fraction of peptides are absorbed in the RP column, and after washing away salts and buffers, peptides are eluted from the RP column into the mass spectrometer using a gradient of an organic solvent. Finally, the RP is reequilibrated and new fraction of peptides from the SCX

Several studies came out to improve the application of the MudPIT technique. One of the limitations of SCX-RPLC is related to the use of salt gradient SCX separation, resulting in limited resolution and peak capacity for peptide separation. Extensive salt usage could also cause ion suppression, thus reducing MS performance. Second, by using coupled SCX-RP columns, separation on both dimensions has to be performed at the same flow rate, which may sacrifice MS detection sensitivity for low-abundance components. As an alternative approach, offline SCX-RP separation has been used to identify more than 1200 proteins from zebrafish liver (Wang et al., 2007).However, the implementation of offline configurations is not always the best option, as extensive offline sample handling increases the overall

Dai et al. (2005) reported an integrated column composed of SCX and RP where peptides were fractionated by a pH step gradient. The exclusion of salt removal steps could lead to fast 2D-LC separation and MS/MS detection. Some years later, the same research group developed a SCX and SAX combined pre-fraction strategy called Yin-Yang multidimensional chromatography (Dai et al., 2007). Peptides eluted from a SCX column at pH 2.5 were then separated by a SAX column using a pH gradient solution. Subsequently, SAX fractions were analyzed by RPLC-MS/MS. This approach revealed proteins from a

The use of RP columns in both dimensions of a MudPIT analysis exhibited high-throughput, automatability, and performance comparable with that of SCX-RP. Despite all successful

providing effective resolution and facilitating MS detection.

**4.1 Ion-exchange and reverse-phase chromatography (IEX-RPLC)** 

mobile phase compatibility between each other and with MS analysis.

column is eluted to be absorbed in the RP column, and the process repeats.

analysis time and causes sample loss and sensitivity reduction.

**4.2 Reverse-phase-reverse-phase chromatography (RP-RPLC)** 

broad range of different pH values.

most applicable approaches.

Also, the so-called "shotgun proteomics" is applied in a typical bottom-up approach and involves the utilization of HPLC coupled to tandem mass spectrometry (MS/MS) for identification of proteins on a large scale. This platform has been facilitated by the use of MudPIT, discussed below. As bottom-up proteomics is the most mature and most widely used approach for protein identification, in the following section we focus on chromatographic platforms for peptide separation.

#### **4. Multidimensional liquid chromatography in proteomics**

As previously mentioned, bottom-up proteomics offers some challenges to be overcome, such as sample complexity and large differences in protein concentration. As we know, these problems are far from being solved by single chromatographic or electrophoretic methods. Basically, two main approaches have been developed to face these difficulties: methods to separate abundant proteins from lower abundance proteins and multidimensional separation methods to maximize fractionation of peptides, thus increasing the proteome coverage analyzed by MS.

Abundant proteins can be a big problem when analyzing complex samples from certain tissues. Some tissues have high percentage (around 50% or more) of some classes of proteins. For example, leaves show approximately 50% of Rubisco (Feller et al., 2008) and the great majority of human blood serum proteome is comprised of albumin, fibrinogen, and immunoglobulins, transferrin, haptoglobin, and lipoproteins (Burtis & Ashwood, 2001; Turner & Hulme, 1970). If these abundant proteins are not removed from the sample, the peptides resulting from these proteins for proteomics analysis will overlap the peptides from the low-abundance proteins. To remove abundant proteins in complex samples, immunoaffinity separation or gel-based fractionation could be good options, although immunoaffinity separation can also remove low- abundance proteins that have any level of interaction with those abundant proteins (Granger et al., 2005).

Sequential chromatographic separations utilizing different chemical or physical properties have improved the assessment of classes of proteins that are difficult to handle in gel electrophoresis. MDLC can reach the same resolution of 2D gel electrophoresis with added advantages like automation, better sensitivity, and increased proteome coverage. MDLC was first described by Giddings in 1984 (Giddings, 1984) as a technique which combines two or more types of LC to increase the peak capacity and selectivity, contributing to a better fractionation of peptides that will enter the mass spectrometer.

The peak capacity is the number of peaks that can be resolved in a given time. To increase the peak capacity, the combination of two or more orthogonal separation dimensions is necessary. In other words, the properties affecting the separation in the first dimension do not affect the separation in the subsequent dimensions; thus, the process simplifies the sample complexity and improves the power of resolution, fractionating more components in a given time.

Many LC combinations have been reported utilizing chromatographic methods like strong cation exchange (SCX), strong anion exchange (SAX), size-exclusion (SEC) and reversephase chromatography (Zhang et al., 2007; Hynek et al., 2006; Moritz et al., 2005). Some factors are essential when considering an MDLC approach. The choice of different types of chromatographic columns to reach a satisfactory MDLC peak capacity is the most important point within this strategy. In order to separate a large variety of peptides in highperformance chromatography, the columns used in each dimension have to work with no

Also, the so-called "shotgun proteomics" is applied in a typical bottom-up approach and involves the utilization of HPLC coupled to tandem mass spectrometry (MS/MS) for identification of proteins on a large scale. This platform has been facilitated by the use of MudPIT, discussed below. As bottom-up proteomics is the most mature and most widely used approach for protein identification, in the following section we focus on

As previously mentioned, bottom-up proteomics offers some challenges to be overcome, such as sample complexity and large differences in protein concentration. As we know, these problems are far from being solved by single chromatographic or electrophoretic methods. Basically, two main approaches have been developed to face these difficulties: methods to separate abundant proteins from lower abundance proteins and multidimensional separation methods to maximize fractionation of peptides, thus increasing

Abundant proteins can be a big problem when analyzing complex samples from certain tissues. Some tissues have high percentage (around 50% or more) of some classes of proteins. For example, leaves show approximately 50% of Rubisco (Feller et al., 2008) and the great majority of human blood serum proteome is comprised of albumin, fibrinogen, and immunoglobulins, transferrin, haptoglobin, and lipoproteins (Burtis & Ashwood, 2001; Turner & Hulme, 1970). If these abundant proteins are not removed from the sample, the peptides resulting from these proteins for proteomics analysis will overlap the peptides from the low-abundance proteins. To remove abundant proteins in complex samples, immunoaffinity separation or gel-based fractionation could be good options, although immunoaffinity separation can also remove low- abundance proteins that have any level of

Sequential chromatographic separations utilizing different chemical or physical properties have improved the assessment of classes of proteins that are difficult to handle in gel electrophoresis. MDLC can reach the same resolution of 2D gel electrophoresis with added advantages like automation, better sensitivity, and increased proteome coverage. MDLC was first described by Giddings in 1984 (Giddings, 1984) as a technique which combines two or more types of LC to increase the peak capacity and selectivity, contributing to a better

The peak capacity is the number of peaks that can be resolved in a given time. To increase the peak capacity, the combination of two or more orthogonal separation dimensions is necessary. In other words, the properties affecting the separation in the first dimension do not affect the separation in the subsequent dimensions; thus, the process simplifies the sample complexity and improves the power of resolution, fractionating more components in

Many LC combinations have been reported utilizing chromatographic methods like strong cation exchange (SCX), strong anion exchange (SAX), size-exclusion (SEC) and reversephase chromatography (Zhang et al., 2007; Hynek et al., 2006; Moritz et al., 2005). Some factors are essential when considering an MDLC approach. The choice of different types of chromatographic columns to reach a satisfactory MDLC peak capacity is the most important point within this strategy. In order to separate a large variety of peptides in highperformance chromatography, the columns used in each dimension have to work with no

chromatographic platforms for peptide separation.

the proteome coverage analyzed by MS.

a given time.

**4. Multidimensional liquid chromatography in proteomics** 

interaction with those abundant proteins (Granger et al., 2005).

fractionation of peptides that will enter the mass spectrometer.

correlated properties. Other factors to be considered include the fact that the first column should have a larger loading capacity and have solvent compatibility with the subsequent columns. Thus, this compatibility between dimensions is essential in online systems. The last step of chromatography immediately before MS analysis should be compatible with electrospray ionization (ESI). For this purpose, the reverse-phase (RP) chromatography is frequently used, as it can desalt the samples and it is completely compatible with ESI, thus providing effective resolution and facilitating MS detection.

Several proteomic studies using different multidimensional configurations have been done along these years (reviewed by Gao et al., 2010) Here we point out some examples of the most applicable approaches.

#### **4.1 Ion-exchange and reverse-phase chromatography (IEX-RPLC)**

MUDPIT was first introduced by Yates and co-workers (Link et al., 1999; Wolters et al., 2001) using a biphasic column sequentially packed with SCX particles and then with C18 particles for peptide separation prior to MS analysis. Other works reported the use of an SCX column and an RP column connected to perform an online SCX-RP-MS/MS analysis (Tram et al., 2008; Gilar et al., 2005). Besides their orthogonality, SCX and RP columns show mobile phase compatibility between each other and with MS analysis.

In the SCX-RP configuration, peptides from a complex mixture are acidified and applied to an SCX column, in which the elution steps can be done using a salt gradient. Then, a fraction of peptides are absorbed in the RP column, and after washing away salts and buffers, peptides are eluted from the RP column into the mass spectrometer using a gradient of an organic solvent. Finally, the RP is reequilibrated and new fraction of peptides from the SCX column is eluted to be absorbed in the RP column, and the process repeats.

Several studies came out to improve the application of the MudPIT technique. One of the limitations of SCX-RPLC is related to the use of salt gradient SCX separation, resulting in limited resolution and peak capacity for peptide separation. Extensive salt usage could also cause ion suppression, thus reducing MS performance. Second, by using coupled SCX-RP columns, separation on both dimensions has to be performed at the same flow rate, which may sacrifice MS detection sensitivity for low-abundance components. As an alternative approach, offline SCX-RP separation has been used to identify more than 1200 proteins from zebrafish liver (Wang et al., 2007).However, the implementation of offline configurations is not always the best option, as extensive offline sample handling increases the overall analysis time and causes sample loss and sensitivity reduction.

Dai et al. (2005) reported an integrated column composed of SCX and RP where peptides were fractionated by a pH step gradient. The exclusion of salt removal steps could lead to fast 2D-LC separation and MS/MS detection. Some years later, the same research group developed a SCX and SAX combined pre-fraction strategy called Yin-Yang multidimensional chromatography (Dai et al., 2007). Peptides eluted from a SCX column at pH 2.5 were then separated by a SAX column using a pH gradient solution. Subsequently, SAX fractions were analyzed by RPLC-MS/MS. This approach revealed proteins from a broad range of different pH values.

#### **4.2 Reverse-phase-reverse-phase chromatography (RP-RPLC)**

The use of RP columns in both dimensions of a MudPIT analysis exhibited high-throughput, automatability, and performance comparable with that of SCX-RP. Despite all successful

Strategies for Protein Separation 21

Glycosylation is the most complex PTM presented in eucaryotes. Like other PTMs, glycopeptides are often very minor constituents compared to peptides derived from proteolytic digestion. Therefore, enrichment of glycoproteins or glycopeptides is essential in a glycoproteomics study (Ito et al., 2009). The lectin affinity approach is the most common tool for glycoprotein/glycopeptide enrichment. Lectins are sugar-binding proteins that are highly specific to sugar moieties. For example, concanavalin A (conA) is a mannose-binding lectin largely used to study N-glycosylated proteins, while lectin from *Vicia villosa* (VVL), which preferentially binds to alpha- or beta-linked terminal GalNAc, has a different preference. All lectins have sugar specificity, and therefore serial lectin affinity columns (SLAC) have been developed to reduce the complexity of proteolytic digests by more than one order of magnitude. SLAC can be used to study O-glycosylation proteins, which are difficult to access because lectins for studying them are not specific enough. Jacalin, a lectin, is relatively specific for O-glycosylation but has the problem that it also selects mannose Nglycans. This problem can be overcome by first using a ConA affinity column to first remove mannose, and then using Jacalin columns. When used in a serial configuration, O-

First of all, it is important to understand the difference between repeatability and reproducibility. The first, represents the variations in the measurements on the same sample, made in the same instrument and by the same operator (Bland & Altman, 1986). The second, is the variation observed for an analytical technique when operator, instrumentation, time, or location is changed (McNaught & Wilkinson, 1997). So in proteomics, it is possible to calculate variations in results from run to run estimating the repeatability of the analytical technique or estimating the reproducibility between different laboratories in completely differently instruments. These measurements are imperative to the clinical utility of biomarker candidates and must be expected (Baggerly et al.,2005). During proteomics analysis there are many potential contributors to variability that can compromise the approach repeatability and reproducibility. The variations can begin in sample collection, specimen processing techniques, storage, and instrument performance (Hsieh at al, 2006; Banks et al., 2005; Pilny et al., 2006). All steps in the proteome analysis can offer a source of variations. The proteomics analysis by LC-MS begins with the digestion of complex mixtures followed by peptide fractionation in LC systems. Then MS/MS scans are acquired and spectra matches are resulted from bioinformatics analysis. The complexity of these steps leads to variations in peptide and protein identification and quantification. Minor differences in LC can result in modified peptide elution times (Prakash et al., 2006) or

change which peptides are selected for MS/MS fragmentation (Liu et al., 2004).

In this context, the use of MudPIT also can introduce more variation. Delmotte et al. (2009) showed that an introduction of a separation dimension decreases the repeatability by approximately 25% upon 1D or 2D chromatographic separations. Slebos et al. (2008) reported superior reproducibility in isoelectric focusing (IEF) compared to SCX separations. They found that IEF more quickly reached maximal detection within three replicate analysis. In contrast, the SCX required six replicates. In this study, approximately 90% of all peptide identifications are found in a single fraction. In contrast, SCX is characterized by spread of peptides into adjacent fractions. Peptides at lower abundance or those generating lower signal intensity are more likely to be selected for MS/MS if they appear in multiple

**4.5 Enrichment of glycopeptides or glycoproteins** 

glycosylated peptides can be accessed (Durham & Regnier, 2006).

**4.6 LC-MS repeatability and reproducibility** 

studies based on SCX-RP, these approaches are frequently encountering higher complexity samples than can be accommodated by their power of resolution. Accordingly, new strategies have been developed, including the RP-RPLC. Gilar et al. (2005) investigated the use of a pH gradient in a RP-RP platform. The RP-RP approach provides higher peak capacity in both dimensions, and the pH gradient has the most significant impact on the selectivity of this platform. The pH gradient modulates the peptide hydrophobicity during the elution of fractions of peptides.

Another factor that has been increasing the sensitivity of this approach is the integration of a high-pressure system in RP-RP platforms. This was first implemented in the Agilent 1100 2- D liquid chromatographic system and more recently by the 2D NanoACQUITTY system from Waters, frequently used with a pH gradient and two columns of C18 particles. Zhou et al. (2010) revealed that RP-RP fractionation outperforms SXC-RP. It was also demonstrated that the combination of RP-RP systems with the nanoflow format had good impact in the efficiency of electrospray ionization prior to MS/MS analysis.

#### **4.3 Affinity chromatography**

Another type of chromatography largely used in proteomics is affinity-based. The main usage of this technique is to enrich post-translationally modified (PTM) peptides or proteins subject to MS analysis. Due to their low stoichiometry and dynamic modification patterns, PTM materials have to be enriched in complex mixtures prior to MS analysis. Affinity chromatographic columns can be used online or offline to RP columns and the mass spectrometer.

#### **4.4 Enrichment of phosphopeptides or phosphoproteins**

In phosphoproteomics, the interest focuses on the identification and quantification of phosphorylated proteins or peptides. With this purpose, several selective enrichment methods are applied to increase the content of phosphopeptides or phosphoproteins in complex mixtures, thus preventing the suppression of ion signals by unphosphorylated molecules.

Immobilized metal affinity chromatography (IMAC) is one of the most popular techniques employed in phosphoproteomics. IMAC contains immobilized positive ions (Fe3+, Ga3+ and Al3+) interacting with the negative ions of phosphate groups. The elution of the interacted molecules makes the enrichment possible. Tsai et al. (2011) showed a pH/acid-controlled IMAC protocol for phosphopeptide purification with high specificity and lower sample loss. They characterized over 2,360 nondegenerate phosphopeptides and 2,747 phosphorylation sites in the H1299 lung cancer cell line, showing that a low pH buffer increases the specificity of IMAC for phosphopeptides. The buffer composition was directly associated to the specificity and selectivity of the IMAC technique (Tsai et al., 2008; Jensen et al., 2007).

Another important enrichment technique widely used is the metal oxide affinity chromatography (MOAC). MOAC uses a principle similar to that of IMAC, incorporating metal oxides, such as titanium dioxide TiO2, zirconium dioxide ZrO2, or aluminum hydroxide Al(OH)3. Metal oxides tend to have higher selectivity for phosphopeptides, making it easier to trap them in the column (Zhou et al., 2007). During the last few years, TiO2 has emerged as the main MOAC-based phosphopeptide enrichment method (Pinkse et al., 2004). The principle is the same as that of IMAC; however, when loading peptides with DHB (2,5-dihydroxybenzoic acid) non-specific binding is reduced, thus increasing the selectivity of TiO2 (Larsen et al., 2005).

studies based on SCX-RP, these approaches are frequently encountering higher complexity samples than can be accommodated by their power of resolution. Accordingly, new strategies have been developed, including the RP-RPLC. Gilar et al. (2005) investigated the use of a pH gradient in a RP-RP platform. The RP-RP approach provides higher peak capacity in both dimensions, and the pH gradient has the most significant impact on the selectivity of this platform. The pH gradient modulates the peptide hydrophobicity during

Another factor that has been increasing the sensitivity of this approach is the integration of a high-pressure system in RP-RP platforms. This was first implemented in the Agilent 1100 2- D liquid chromatographic system and more recently by the 2D NanoACQUITTY system from Waters, frequently used with a pH gradient and two columns of C18 particles. Zhou et al. (2010) revealed that RP-RP fractionation outperforms SXC-RP. It was also demonstrated that the combination of RP-RP systems with the nanoflow format had good impact in the

Another type of chromatography largely used in proteomics is affinity-based. The main usage of this technique is to enrich post-translationally modified (PTM) peptides or proteins subject to MS analysis. Due to their low stoichiometry and dynamic modification patterns, PTM materials have to be enriched in complex mixtures prior to MS analysis. Affinity chromatographic columns can be used online or offline to RP columns and the mass

In phosphoproteomics, the interest focuses on the identification and quantification of phosphorylated proteins or peptides. With this purpose, several selective enrichment methods are applied to increase the content of phosphopeptides or phosphoproteins in complex mixtures, thus preventing the suppression of ion signals by unphosphorylated

Immobilized metal affinity chromatography (IMAC) is one of the most popular techniques employed in phosphoproteomics. IMAC contains immobilized positive ions (Fe3+, Ga3+ and Al3+) interacting with the negative ions of phosphate groups. The elution of the interacted molecules makes the enrichment possible. Tsai et al. (2011) showed a pH/acid-controlled IMAC protocol for phosphopeptide purification with high specificity and lower sample loss. They characterized over 2,360 nondegenerate phosphopeptides and 2,747 phosphorylation sites in the H1299 lung cancer cell line, showing that a low pH buffer increases the specificity of IMAC for phosphopeptides. The buffer composition was directly associated to the specificity and selectivity of the IMAC technique (Tsai et al., 2008; Jensen et al., 2007). Another important enrichment technique widely used is the metal oxide affinity chromatography (MOAC). MOAC uses a principle similar to that of IMAC, incorporating metal oxides, such as titanium dioxide TiO2, zirconium dioxide ZrO2, or aluminum hydroxide Al(OH)3. Metal oxides tend to have higher selectivity for phosphopeptides, making it easier to trap them in the column (Zhou et al., 2007). During the last few years, TiO2 has emerged as the main MOAC-based phosphopeptide enrichment method (Pinkse et al., 2004). The principle is the same as that of IMAC; however, when loading peptides with DHB (2,5-dihydroxybenzoic acid) non-specific binding is reduced, thus increasing the

the elution of fractions of peptides.

**4.3 Affinity chromatography** 

selectivity of TiO2 (Larsen et al., 2005).

spectrometer.

molecules.

efficiency of electrospray ionization prior to MS/MS analysis.

**4.4 Enrichment of phosphopeptides or phosphoproteins** 

#### **4.5 Enrichment of glycopeptides or glycoproteins**

Glycosylation is the most complex PTM presented in eucaryotes. Like other PTMs, glycopeptides are often very minor constituents compared to peptides derived from proteolytic digestion. Therefore, enrichment of glycoproteins or glycopeptides is essential in a glycoproteomics study (Ito et al., 2009). The lectin affinity approach is the most common tool for glycoprotein/glycopeptide enrichment. Lectins are sugar-binding proteins that are highly specific to sugar moieties. For example, concanavalin A (conA) is a mannose-binding lectin largely used to study N-glycosylated proteins, while lectin from *Vicia villosa* (VVL), which preferentially binds to alpha- or beta-linked terminal GalNAc, has a different preference. All lectins have sugar specificity, and therefore serial lectin affinity columns (SLAC) have been developed to reduce the complexity of proteolytic digests by more than one order of magnitude. SLAC can be used to study O-glycosylation proteins, which are difficult to access because lectins for studying them are not specific enough. Jacalin, a lectin, is relatively specific for O-glycosylation but has the problem that it also selects mannose Nglycans. This problem can be overcome by first using a ConA affinity column to first remove mannose, and then using Jacalin columns. When used in a serial configuration, Oglycosylated peptides can be accessed (Durham & Regnier, 2006).

#### **4.6 LC-MS repeatability and reproducibility**

First of all, it is important to understand the difference between repeatability and reproducibility. The first, represents the variations in the measurements on the same sample, made in the same instrument and by the same operator (Bland & Altman, 1986). The second, is the variation observed for an analytical technique when operator, instrumentation, time, or location is changed (McNaught & Wilkinson, 1997). So in proteomics, it is possible to calculate variations in results from run to run estimating the repeatability of the analytical technique or estimating the reproducibility between different laboratories in completely differently instruments. These measurements are imperative to the clinical utility of biomarker candidates and must be expected (Baggerly et al.,2005).

During proteomics analysis there are many potential contributors to variability that can compromise the approach repeatability and reproducibility. The variations can begin in sample collection, specimen processing techniques, storage, and instrument performance (Hsieh at al, 2006; Banks et al., 2005; Pilny et al., 2006). All steps in the proteome analysis can offer a source of variations. The proteomics analysis by LC-MS begins with the digestion of complex mixtures followed by peptide fractionation in LC systems. Then MS/MS scans are acquired and spectra matches are resulted from bioinformatics analysis. The complexity of these steps leads to variations in peptide and protein identification and quantification. Minor differences in LC can result in modified peptide elution times (Prakash et al., 2006) or change which peptides are selected for MS/MS fragmentation (Liu et al., 2004).

In this context, the use of MudPIT also can introduce more variation. Delmotte et al. (2009) showed that an introduction of a separation dimension decreases the repeatability by approximately 25% upon 1D or 2D chromatographic separations. Slebos et al. (2008) reported superior reproducibility in isoelectric focusing (IEF) compared to SCX separations. They found that IEF more quickly reached maximal detection within three replicate analysis. In contrast, the SCX required six replicates. In this study, approximately 90% of all peptide identifications are found in a single fraction. In contrast, SCX is characterized by spread of peptides into adjacent fractions. Peptides at lower abundance or those generating lower signal intensity are more likely to be selected for MS/MS if they appear in multiple

Strategies for Protein Separation 23

better accuracy in protein inference, mainly in bottom-up proteomics. In parallel, top-down proteomics is developing rapidly, with significant progress in MDLC systems for protein

Finally, the achievement of better representation, resolution, sensitivity, and automation has been developed with the application of LC methods coupled to MS, but they are far from being considered as the perfect strategy and are subject to improvements. The use of 2D gels or LC as the central separation method or both in combination is debatable, depending on the study objective. However, the constant advancement of these platforms is resulting in

Adam, B.L.; Qu, Y.; Davis, J.W.; Ward, M.D.; Clements, M.A.; Cazares, L.H.; Semmes, O.J.;

Agrawal, G.K.; Hajduch, M.;, Graham, K.; Thelen, J.J.(2008) In-Depth Investigation of the

Albrethsen, J.; Bogebo, R.; Olsen, J.; Raskov, H.; Gammeltoft, S. (2006). Preanalytical and

Andersen, J. S., Svensson, B., and Roepstorff, P. (1996) Electrospray ionization and matrix

in recombinant protein chemistry. *Nature Biotechnology.* Vol.14, pp. 449–457 Anderson, N.L.; Polanski, M.; Pieper, R.; Gatlin, T.; Tirumalai, R.S.; Conrads, T.P.; Veenstra,

Baggerly, K. A.; Morris, J. S.; Edmonson, S. R.; Coombes, K. R. (2005) Signal in noise:

Banks, R. E.; Stanley, A. J.; Cairns, D. A.; Barrett, J. H.; Clarke, P.; Thompson, D.; Selby, P. J.

Bianco, C.; Strizzi, L.; Mancino, M.; Rehman, A.; Hamada, S.; Watanabe, K.; De Luca, A.;

Bjellqvist, B.; Ek, K.; Righetti, P.G.; Gianazza, E.; Görg, A.; Westermeier, R. & Postel, W.

Schellhammer, P.F.; Yasui, Y.; Feng, Z. & Wright, G.L.Jr. (2002). Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. *Cancer Research*, Vol.62,

Soybean Seed-Filling Proteome and Comparison with a Parallel Study of Rapeseed.

analytical variation of surface-enhanced laser desorption-ionization time-of-flight mass spectrometry of human serum. *Clinical Chemistry and Laboratory Medicine*,

assisted laser desorption/ionization mass spectrometry: Powerful analytical tools

T.D.; Adkins, J.N.; Pounds, J.G.; Fagan, R. & Lobley, A. (2004). The Human Plasma Proteome: A Non-Redundant List Developed by Combination of Four Separate

evaluating reported reproducibility of serum proteomic tests for ovarian cancer.

(2005). Influences of blood sample processing on low-molecular-weight proteome identified by surface-enhanced laser desorption/ionization mass spectrometry.

Jones, B.; Balogh, G.; Russo, J.; Mailo, D.; Palaia, R.; D'Aiuto, G.; Botti, G.; Perrone, F.; Salomon, D.S. & Normanno, N. (2006). Identification of cripto-1 as a novel serologic marker for breast and colon cancer. *Clinical Cancer Research*, Vol.12, No.17,

(1982).Isoelectric-focusing in immobilized ph gradients - principle, methodology and some applications. *Journal of Biochemical and Biophysical Methods*, Vol.6, No.4,

separation and probably warranting more attention in the near future.

No.13, (July 2002), pp.3609–3614, ISSN 1541-7786

Sources, *Mol. Cell Proteomics,* Vol. 3, pp. 311–326

*Clinical Chemistry*, Vol. 51, No.9, pp. 1637-49

(September 2006), pp.5158-64, ISSN 1557 -3265

(September 1982), pp.317-339 ISSN 0165-022X

*Journal of the National Cancer Institute*, Vol.97, No.4, pp. 307-9

*Plant Physiology*, Vol. 148, pp. 504–518

Vol.44, No.10, pp. 1243-52

successful proteomics studies.

**6. References** 

fractions. Thus SCX produces more chances of peptides detection which is consequence of the greater SXC sensitivity of peptide identifications. Though, greater sensitivity leads to much higher variability in peptide detection at each fractionation step from run to run, demanding a greater number of technical replicas, which compromises the overall throughput. Several groups have evaluated the number of replicates necessary to observe a particular percentage of the proteins in a sample (Liu et al., 2004; Slebos et al., 2008; Kislinger et al., 2005).

On the other hand, Tabb et al. (2009) concluded that a standardized platforms results a high degree of repeatability and reproducibility (around 70-80%) in protein identifications. This is an indication that LC-MS/MS platforms can generate consistent protein identifications. They showed that instrumentation can also increment variations. In this study, they observed that the high resolution of Orbitraps outperforms lower resolution of LTQs in repeatability and reproducibility. They also observed that reproducibility between different instruments of the same type is lower than repeatability of technical replicates on a single instrument by several percent.

Standardizations of methodologies and system configurations, as well as appropriate instrument tuning and maintenance would result in lower noise level from biological, chemical or instrumental sources between LC-MS/MS analysis. A comprehensive understanding about components that affect reproducibility and repeatability can help in the determination of the best alternative of experimental design to reach the reliability desired in proteomics studies.

#### **5. Concluding remarks**

Protein separation methods are vital for the characterization of proteomes. All these methods exploit one or more general properties of proteins and are directly linked to the effectiveness of any proteomic analysis. Traditionally, 2D gels have been the most frequently employed protein separation tool, capable of separating up to 10,000 components. However, 2D gels have some limitations, as discussed above, preventing efficient automation. In addition, LC methods coupled to MS analysis have been emerged as a promising strategy to achieve better automation and sensitivity. Combining orthogonal chromatographic columns increases peak capacity and permits a large dynamic range to be more efficiently measured in complex samples. Also, affinity chromatography allows the selection of a subgroup of proteins (proteins with certain PTMs), directing the study into specific biological pathways and increasing the generation of relevant information. Although its resolution has been improved resolution, LC fractioning has several concerns to be circumvented, such as the amount of time required, reproducibility, and protein inference from peptides. An experiment involving RP analysis can take 1.5 to 2 hours for a single analysis. Then, considering 15 fractions (taken from an IEX) from a complex mixture, there will be 22.5 hours required. Moreover, there is the analysis of biological (10 individual) and technical replicas (3 injections) that could reach 56 days of analysis, taking into account an experiment comparing control and affected. The required time increases instrument and maintenance costs and influences the experimental throughput. The second concern involves reproducibility during fractionating, which can be related to the number of dimensions employed and if they are inline coupled to the mass spectrometer or not. Another point that has received considerable attention is the development of data analysis pipelines to effectively process LC data from multiple peptide fractionation. The challenge is to reach better accuracy in protein inference, mainly in bottom-up proteomics. In parallel, top-down proteomics is developing rapidly, with significant progress in MDLC systems for protein separation and probably warranting more attention in the near future.

Finally, the achievement of better representation, resolution, sensitivity, and automation has been developed with the application of LC methods coupled to MS, but they are far from being considered as the perfect strategy and are subject to improvements. The use of 2D gels or LC as the central separation method or both in combination is debatable, depending on the study objective. However, the constant advancement of these platforms is resulting in successful proteomics studies.

#### **6. References**

22 Integrative Proteomics

fractions. Thus SCX produces more chances of peptides detection which is consequence of the greater SXC sensitivity of peptide identifications. Though, greater sensitivity leads to much higher variability in peptide detection at each fractionation step from run to run, demanding a greater number of technical replicas, which compromises the overall throughput. Several groups have evaluated the number of replicates necessary to observe a particular percentage of the proteins in a sample (Liu et al., 2004; Slebos et al., 2008;

On the other hand, Tabb et al. (2009) concluded that a standardized platforms results a high degree of repeatability and reproducibility (around 70-80%) in protein identifications. This is an indication that LC-MS/MS platforms can generate consistent protein identifications. They showed that instrumentation can also increment variations. In this study, they observed that the high resolution of Orbitraps outperforms lower resolution of LTQs in repeatability and reproducibility. They also observed that reproducibility between different instruments of the same type is lower than repeatability of technical replicates on a single

Standardizations of methodologies and system configurations, as well as appropriate instrument tuning and maintenance would result in lower noise level from biological, chemical or instrumental sources between LC-MS/MS analysis. A comprehensive understanding about components that affect reproducibility and repeatability can help in the determination of the best alternative of experimental design to reach the reliability

Protein separation methods are vital for the characterization of proteomes. All these methods exploit one or more general properties of proteins and are directly linked to the effectiveness of any proteomic analysis. Traditionally, 2D gels have been the most frequently employed protein separation tool, capable of separating up to 10,000 components. However, 2D gels have some limitations, as discussed above, preventing efficient automation. In addition, LC methods coupled to MS analysis have been emerged as a promising strategy to achieve better automation and sensitivity. Combining orthogonal chromatographic columns increases peak capacity and permits a large dynamic range to be more efficiently measured in complex samples. Also, affinity chromatography allows the selection of a subgroup of proteins (proteins with certain PTMs), directing the study into specific biological pathways and increasing the generation of relevant information. Although its resolution has been improved resolution, LC fractioning has several concerns to be circumvented, such as the amount of time required, reproducibility, and protein inference from peptides. An experiment involving RP analysis can take 1.5 to 2 hours for a single analysis. Then, considering 15 fractions (taken from an IEX) from a complex mixture, there will be 22.5 hours required. Moreover, there is the analysis of biological (10 individual) and technical replicas (3 injections) that could reach 56 days of analysis, taking into account an experiment comparing control and affected. The required time increases instrument and maintenance costs and influences the experimental throughput. The second concern involves reproducibility during fractionating, which can be related to the number of dimensions employed and if they are inline coupled to the mass spectrometer or not. Another point that has received considerable attention is the development of data analysis pipelines to effectively process LC data from multiple peptide fractionation. The challenge is to reach

Kislinger et al., 2005).

instrument by several percent.

desired in proteomics studies.

**5. Concluding remarks** 


Strategies for Protein Separation 25

Gilar, M.; Olivova; P., Daly; A. E. & Gebler, J. C. (2005). Two-dimensional separation of

separation dimensions, *Journal of Separation Science*, Vol.*28*, pp. 1694–1703 Görg, A.; Obermaier, C.; Boguth, G.; Csordas, A.; Diaz, J.J. & Madjar, J.J. Very alkaline

Görg, A.; Obermaier, K.; Boguth,G.; Harder, A.; Scheibe, B.; Wildgruder, R. & Weiss,

Granger, J.; Siddiqui, S.; Copeland, S. & Remick D. (2005). Albumin depletion of human

Greenough, C.; Jenkins, R.E.; Kitteringham, N.R.; Pirmohamed, M.; Park, B.K. & Pennington,

Guo, X.; Zhao, C.; Wang, F.; Zhu, Y.; Cui, Y.; Zhou, Z.; Huo, R. & Sha, J. (2010). Investigation

*of Andrology*, Vol31, No.(4), pp.419-429, (July-August 2010), ISSN 0196-3635 Hsieh, S. Y.; Chen, R. K.; Pan, Y. H.; Lee, H. L. (2006). Systematical evaluation of the effects

Hynek, R.; Svensson, B.; Jensen, O.N.; Barkholt, V. & Finnie, C. (2006). nrichment and

Ito, S.; Hayama, K. & Hirabayashi, J. (2009). Enrichment strategies for glycopeptides,

Jang, J.H. & Hanash, S. (2003). Profiling of the cell surface proteome. *Proteomics*, Vol.3,

JBC Centennial 1905–2005 100 Years of Biochemistry and Molecular Biology. (2006). The

Jensen S.S. & Larsen M.R. (2007). Evaluation of the impact of some experimental procedures

Klein-Scory, S.; Kübler, S.; Diehl, H.; Eilert-Micus, C.; Reinacher-Schick, A.; Stühler, K.;

Larsen M.R.; Thingholm T.E.; Jensen O.N.; Roepstorff P. & Jorgensen TJ. (2005). Highly

titanium dioxide microcolumns. *Mol. Cell Proteomics,* Vol. 4, pp. 873–86

*Cancer*, Vol.10, No.70, pp.2-19, (February 2010), ISSN 1471-2407

profiling. *Proteomics*, Vol. 6, No.10, pp.3189-98

*Methods in Molecular Biology,Vol.*534, pp.195-203

No.10, (October 2003), pp.1947-1954, ISSN 1615-9853

337, ISSN 1522-2683

Vol.5, pp. 4713–4718.

Vol.5, pp. 3105–3113

Vol. 21, pp. 3635–45

351X

2683

1615-9853

peptides using RP-RP-HPLC system with different pH in first and second

immobilized pH gradients for two-dimensional electrophoresis of ribosomal and nuclear proteins. (1997). *Electrophoresis*, Vol.18, No.3-4, (March-April 1997), pp.328-

W.(2000). The current state of two-dimentional electrophoresis with immobilized pH gradients. *Electrophoresis*, Vol.21, No.6, (April 2000), pp.1037-1053, ISSN 1522-

plasma also removes low abundance proteins including the cytokines *Proteomics*,

S.R. (2004). A method for the rapid depletion of albumin and immunoglobulin from human plasma. *Proteomics*, Vol.4, No.10, (October 2004), pp.3107-3111, ISSN

of human testis protein heterogeneity using 2-dimensional electrophoresis. *Journal* 

of sample collection procedures on low-molecular-weight serum/plasma proteome

identification of integral membrane proteins from barley aleurone layers by reversed-phase chromatography, SDS-PAGE and LC-MS/MS, *Journal Proteome Res*.,

Development of Two-dimensional Electrophoresis by Patrick H. O'Farrell. *The Journal of Biological Chemistry*, Vol.281, No.32, (August 2006), pp. e26, ISSN 1083-

on different phosphopeptide enrichment techniques. *Rapid Commun. Mass Spectrom,* 

Warscheid, B.; Meyer, H.E.; Schmiegel, W. & Schwarte-Waldhoff, I. (2010). Immunoscreening of the extracellular proteome of colorectal cancer cells. *BMC* 

selective enrichment of phosphorylated peptides from peptide mixtures using


Bland, J. M.; Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. *The Lancet*, Vol. 1, No. 8476, pp. 307–310 Burtis,C.A. & Ashwood E. R. (Eds.). (2001). *Tietz Fundamentals of Clinical Chemistry*, 5th

Chevalier, F. (2010). Highlights on the capacities of "Gel-based" Proteomics. In: *Proteome* 

Dai, J.; Jin, W.H.; Sheng, Q.H.; Shieh, C.H.; Wu, J.R. & Zeng, R. (2007). Protein

Dai, J.; Shieh, C.H.; Sheng, Q.; Zhou, H. & Zeng, R. (2005). Proteomic analysis with

Delmotte, N.; Lasaosa, M.; Tholey, A.; Heinzle, E.; van Dorsselaer, A.; Huber, C. G. (2009).

Drews, O.; Reil, G.; Parlar, H. & Görg, A. Setting up standards and a reference map for the

Durham, M. & Regnier F. E. (2006). Targeted glycoproteomics: serial lectin affinity

human blood proteome. *Journal of chromatography A,* Vol. 1132, pp. 165-73. Elia, G. (2008). Biotinylation reagents for the study of cell surfasse Proteins. *Proteomics*, Vol.

Feller, U.; Anders, I. & Mae, T. (2008). Rubiscolytics: fate of Rubisco after its enzymatic

Figeys, D. (2005). Proteomics: The Basic Overview. In: *Industrial Proteomics: Applications for* 

Gao, M.; Qi, D.; Zhang, P.; Deng, C. & Zhang, X. (2010). Development of multidimensional

GE Healthcare*.* (2004). *Ion Exchange Chromatography & Chromatofocusing,* Amersham

GE Healthcare*.* (2006). *Hydrophobic Interaction and Reversed Phase Chromatography, Principles* 

GE Healthcare*.* (2007). *Affinity Chromatography, Principles & Methods,* GE imagination at

GE Healthcare*.* (2010). *Gel Filtration, Principles and Methods,* GE imagination at work*,* 18-

Giddings, J.C. (1984). Two-Dimensional Separations: Concept and Promise, *Anal. Chem*.,

*Proteomics*, Vol.4, No.5, (May 2004), pp.1293-1304, ISSN 1615-9853

Sons Incorporated, ISBN 0471703753, San Francisco, USA

Biosciences Limited*,* 11-0004-21, Edition AA, Sweden.

*and Methods,* GE imagination at work*,* 11-0012-69 AA, Sweden.

*Proteomics*, Vol.7, No.5, pp.665-678

work*,* 18-1022-29 AE, Sweden.

1022-18 AK, Sweden.

Vol.56, pp.1258A–1270A

phosphorylation and expression profiling by Yin-yang multidimensional liquid chromatography (Yin-yang MDLC) mass spectrometry. *Journal of Proteome Research*,

integrated multiple dimensional liquid chromatography/mass spectrometry based on elution of ion exchange column using pH steps, *Analytical Chemistry*, Vol.77, pp.

Repeatability of peptide identifications in shotgun proteome analysis employing off-line two-dimensional chromatographic separations and ion-trap MS, *Journal of* 

alkaline proteome of the Gram-positive bacterium *Lactococcus lactis*. (2004).

chromatography in the selection of O-glycosylation sites on proteins from the

function in a cell is terminated. *Journal of Experimental Botany*, Vol.59, pp. 1615–1624

*Biotechnology and Pharmaceuticals*, edited by Daniel Figeys, pp.1-62, John Wiley &

liquid chromatography and application in proteomic analysis, *Expert Review of* 

Edition, WB Saunders, ISBN 0-7216-8634-6, Philadelphia

*Science*, Vol.8, No.23, 29/07/2011, Available from:

*Separation Science.*, Vol. *32, No.*8, pp. 1156–11164

<http://www.proteomesci.com/content/8/1/23>

Vol.6, pp. 250-62

8, pp. 4012–4024

5793–5799


Strategies for Protein Separation 27

Pinkse M.W.; Uitto P.M.; Hilhorst M.J.; Ooms B. & Heck A.J. (2004). Selective isolation at the

Prakash, A.; Mallick, P.; Whiteaker, J.; Zhang, H.; Paulovich, A.; Flory, M.; Lee, H.;

Saito, Y.; Jinno, K.; Greibrokk, T. (2004). Capillary columns in liquid chromatography:

Santoni, V.; Molloy, M. & Rabilloud, T. (2000). Membrane proteins and proteomics: Un

Shaw, J.; Rowlinson, R.; Nickson, J.; Stone, T.; Sweet, A.; Williams, K. & Tonge, R. (2003).

Slebos, R. J.; Brock, J. W.; Winters, N. F.; Stuart, S. R.; Martinez, M. A.; Li, M.; Chambers, M.

Skoog, D.A.; Holler, J.F.; Crouch S.R. (2006*).* Liquid Chromatography, In: *Principles of* 

Brooks/Cole (David Harris), pp. (816-855), ISBN 9780495012016, Canada. Smejkal, G.B. (2004) The Coomassie chronicles: past, present and future perspectives in

Tan, S.; Tan, H.T. & Chung, M.C.M. (2008). Membrane proteins and membrane proteomics.

Tabb, D. L.; Veja-Montoto, L.; Rudnick, P.A.; Variyath, A. M. et al. (2010). Repeatability and

Tran, B. Q.; Loftheim, H.; Reubsaet, L.; Lundanes, E. & Greibrokk, T. (2008). On-Line

Mass Spectrometry, *Journal of Proteome Research,* Vol.9, pp. 761–776

MS, *Journal of Separation Science*, Vol.*31*, pp. 2913–2923

comparative proteomics, *Mol. Cell.Proteomics*, Vol.*5, No.*3, pp.423–432 Rogers, S.; Girolami, M.; Kolch, W.; Waters, K.M.; Liu, T.; Thrall, B. & Wiley, H.S. (2008).

No.17-18, (December 2004), pp. 1379-1390, ISSN 1615-9306.

spectrometry, *Nature Methods*,Vol.*4*, pp.817-821

2004), pp.381-387, ISSN 1478-9450

*Proteomics*, Vol. 8, pp. 3924–3932

Vol. 8, pp. 3965–3973

2900

1522-2683

pp.5286–5294

femtomole level of phosphopeptides from proteolytic digests using 2D-NanoLC-ESI-MS/MS and titanium oxide precolumns. *Anal. Chem.* Vol.76, pp. 3935–43 Rabilloud, T.; Chevallet, M.; Luche, S. & Lelong, C. (2008). Fully denaturing two-

dimensional electrophoresis of membrane proteins: A critical update. *Proteomics*,

Aebersold, R.; Schwikowski, B. (2006). Signal maps for mass spectrometry-based

Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models *Bioinformatics*, Vol.24, pp. 2894–

between conventional columns and microchips, *Journal of Separation Science*, Vol. 27,

amour impossible? *Electrophoresis*, Vol.21, No.6 , (April 2000), pp.1054-1070, ISSN

Evaluation of saturation labelling two-dimensional difference gel electrophoresis fluorescent dyes. *Proteomics*, Vol.3, No.7, (July 2003), pp.1181-1195, ISSN 1615-9853 Siuti, N; Kelleher, N .L. (2007). Decoding protein modifications using top-down mass

C.; Zimmerman, L. J.; Ham, A. J.; Tabb, D. L.; Liebler, D. C. (2008). Evaluation of strong cation exchange versus isoelectric focusing of peptides for multidimensional liquid chromatography-tandem mass spectrometry. *J. Proteome Res.*, Vol.7, No*.*12,

*Instrumental Analysis (6th)*, Skoog, D.A.; Holler, J.F.; Crouch, Thomson

polyacrylamide gel staining. *Expert Review of Proteomics*, Vol.1, No.4, (December

Reproducibility in Proteomic Identifications by Liquid Chromatography-Tandem

multitasking analytical proteomics: How to separate, reduce, alkylate and digest whole proteins in an on-Line multidimensional chromatography system coupled to


Lilley, K.S. & Friedman, D.B. (2004). All about DIGE: quantification technology for

Link, A.J.; Eng, J.; Schieltz, D.M.; Carmack, E.; Mize, G.J.; Morris, D.R.; Garvik, B.M. & Yates,

Liu, H.; Sadygov, R. G.; Yates, J. R., III. (2004). A model for random sampling and estimation

Kelleher, N. L. (2004). Top-Down Proteomics, *Analytical Chemistry*, Vol.76, No.11, pp.197A–

Kislinger, T.; Gramolini, A. O.; MacLennan, D. H.; Emili, A. (2005). Multidimensional

McNaught, A. D.; Wilkinson, A. (1997). *Compendium of Chemical Terminology (the "Gold* 

Meyer,V.R. (2010). *Practical High-Performance Liquid Chromatography (5th)*, Wiley, ISBN 978-0-

Molloy, M.P. (2000). Two-dimensional electrophoresis of membrane proteins using

Moritz, R.L. ; Clippingdale, A.B.; Kapp, E.A.; Eddes, J.S.; Ji, H.; Gilbert, S.; Connolly, L.M. &

Neue, U. D. (1997). *HPLC Columns, Theory, Technology and Practice*, Wiley, ISBN 978-0-471-

O'Farrell, P. (1975). High Resolution Two-Dimensional Electrophoresis of Proteins. *The*

Oliver, R.W.A. (1991). *HPLC of macromolecules: a practical approach*, IRL press, ISBN

Pandey, A., and Mann, M. (2000) Proteomics to study genes and genomes. *Nature*, Vol.405,

Patton, W.F. (2002) Detection technologies in proteome analysis. *Journal of Chromatography B-*

Pilny, R.; Bouchal, P.; Borilova, S.; Ceskova, P.; Zaloudik, J.; Vyzula, R.; Vojtesek, B.; Valik,

containing collection devices. *Clinical Chemistry*, Vol.52,No.11, pp.2115-6

(December 2004), pp.401-409, ISSN 1478-9450

North Carolina, United States of America

*Biotechnology*, Vol.17, pp. 676–682

*No.*14, pp.4193–4201

No. (8), pp. 1207–1220

10, ISSN 1096-03

19037-0, New York

0199630208, England.

351X

pp.837–846

470-68218-0, United Kingdom

*Proteomics*, Vol.5, pp. 3402–3413

2002), pp.3-31, ISSN 1873-376X

203A

differential-display 2D-gel proteomics. *Expert Review of Proteomics*, Vol.1, No.4,

J.R. III (1999). Direct analysis of protein complexes using mass spectrometry*, Nature* 

of relative protein abundance in shotgun proteomics. *Analytical Chemistry.*, Vol.*76,* 

protein identification technology (MudPIT): technical overview of a profiling method optimized for the comprehensive proteomic investigation of normal and diseased heart tissue. *Journal of the American Society for Mass Spectrometry,* Vol. 16,

*Book").*, 2.02 ed.; Blackwell Scientific Publications: Oxford, ISBN 0-86542-684-8,

immobilized pH gradients. *Analytical Biochemistry*, Vol.280, No.1, (April 2000), pp.1-

Simpson, R.J. (2005). Application of 2-D free-fl ow electrophoresis/ RP-HPLC for proteomic analysis of human plasma depleted of multi-high abundance proteins,

*Journal of Biological Chemistry*, Vol.250, No.10, (May 1975), pp.4007–4021, ISSN 1083-

*Analytical Technologies in the Biomedical and Life Sciences*, Vol.771, No.(1-2), (May

D. (2006). Surface-enhanced laser desorption ionization/time-of-flight mass spectrometry reveals significant artifacts in serum obtained from clot activator-


**2** 

*Australia* 

**Evolution of Proteomic Methods for** 

Amanda Nouwens and Stephen Mahler

*The University of Queensland* 

**Analysis of Complex Biological Samples –** 

We are on the threshold of a paradigm shift for proteomics, moving from largely a qualitative discipline, to now being capable of quantification of a protein within a complex sample at great sensitivity. The potential application of such advanced proteomic technology is enormous as we will be able to detect and quantify low levels of expressed

The evolving practice of personalized medicine will be dependent on devising new techniques and methodologies that will allow the detection and quantification of proteins that are implicated in contributing to the diseased state. There is perhaps somewhere over 5000 genes that are linked to disease states and complex networks of interactions of these expressed genes ultimately lead to these disease states. The myriad of single nucleotide polymorphisms (SNPs) contribute to such phenomena, as an individual's SNP profile play a major role in susceptibility to disease and in adverse reactions to drugs, for example. Coupled with mutations that occur throughout life, the complex "disease state" proteome will contain mutant proteins at low levels that need to be identified and quantified, so that

proteins in complex samples, and so move comparative proteomics to a new level.

therapeutic intervention based on rational scientific hypotheses can be investigated.

partitioning low abundance peptides away from peptides in the bulk sample.

Plasma and serum contain an unknown number of proteins with amounts ranging from pgg/L levels (i.e. very high dynamic range). As we know one of the major problems faced by proteomic studies of plasma or serum, or indeed any complex protein sample, is that a relatively small number of abundant proteins accounts for the great majority of protein content of the sample. The upshot is that the proteins of interest, which may have regulatory function, are masked by these abundant proteins, and non-targeted methods of proteomic analysis bias at the top end of the abundance scale. The development of new methods for quantifying low abundance proteins has evolved rapidly, concomitant with the evolution of powerful mass spectrometers of increasing sensitivity. The use of antibodies for targeting peptides prior to mass spectrometry analysis is becoming prominent, as a means of

This review will provide a broad overview of the evolution of proteomic methods to analyse biological samples, including Differential In-Gel Electrophoresis (DIGE), Isotope-Coded Affinity Tag (ICAT), Isobaric tags for relative and absolute quantification (iTRAQ), Stable isotope labeling with amino acids in cell culture (SILAC), Unique ion signature Mass

**1. Introduction** 

**Implications for Personalized Medicine** 


### **Evolution of Proteomic Methods for Analysis of Complex Biological Samples – Implications for Personalized Medicine**

Amanda Nouwens and Stephen Mahler *The University of Queensland Australia* 

#### **1. Introduction**

28 Integrative Proteomics

Tsai, C.F.; Wang, Y.T.; Chen, Y.R.; Lai, C.Y.; Lin, P.Y.; et al. (2008). Immobilized metal

Tsai, C.F.; Wang, Y.T.; Lin, P.Y. & Chen, Y.J. (2011). Phosphoproteomics by Highly Selective

Turner, M. W. & Hulme, B. (1970) *The Plasma Proteins: An Introduction*, Pitman Medical &

Ünlü, M.; Morgan, M.E. & Minden, J.S. (1997). Difference gel electrophoresis: a single gel

Wang, N., MacKenzie, L., De Souza, A. G., Zhong, H. Y., Goss, G., Li, L.. (2007), Proteome

Wilkins, M.R.; Gasteiger, E.; Sanchez, J.C.; Bairoch, A. & Hochstrasser, D.F. (1998). Two-

Wolters, D.A.; Washburn, M.P. & Yates, J.R.III (2001). An automated multidimensional

Yokoyama, R.; Iwafune, Y.; Kawasaki, H. & Hirano, H. Isoelectric focusing of high-

Zhang, J.; Xu, X.; Gao, M.; Yang, P. & Zhang, X. (2007). Comparison of 2-D LC and 3-D LC

Zhou, F.; Cardoza, J.D.; Ficarro, S.B.; Adelmant, G.O.; Lazaro, J.B.; Marto, J.A. (2010). Online

normal human liver tissue, *Proteomics*, Vol.7, pp. 500–512

phosphoproteomics. *Journal Proteome Res.,* Vol. 7, pp. 4058–69

Date: May-25-2011

Scientific Publishing Co., Ltd., London

*Proteome Research,* Vol.6, pp. 263–272

pp.1501-1505, ISSN 1522-2683

Vol.73, pp. 5683–5690

2201-2215

pp.2071-2077, (October 1997), ISSN 1522-2683

affinity chromatography revisited: pH/acid control toward high selectivity in

IMAC Protocol, *Neuroproteomics*, Series Neuromethods, Vol.57, pp.181-196, Pub.

method for detecting changes in protein extracts. *Electrophoresis*, Vol.18, No.11,

profile of cytosolic component of zebrafish liver generated by LC-ESI MS/MS combined with trypsin digestion and microwave-assisted acid hydrolysis, *Journal of* 

dimensional gel electrophoresis for proteome projects: the effects of protein hydrophobicity and copy number. *Electrophoresis*, Vol.19, No.8-9, (June 1998),

protein identification technology for shotgun proteomics, *Analytical Chemistry*,

molecular-weight protein complex under native conditions using agarose gel. (2009). *Analytical Biochemistry*, Vol.387, No.1, (April 2009), pp.60-63, ISSN 1096-0309

with post- and pre-tryptic-digestion SEC fractionation for proteome analysis of

Nanoflow RP-RP-MS Reveals Dynamics of Multicomponent Ku Complex in Response to DNA Damage, *Journal of Proteome Research*, Vol. 9, No.12, pp. 6242-55 Zhou, H. J.; Tian, R. J.; Ye, M. L.; Xu, S. Y.; Feng, S.; Pan, C. S.; Jiang, X. G.; Li, X. & Zou, H. F.

(2007). Highly specific enrichment of phosphopeptides by zirconium dioxide nanoparticles for phosphoproteome analysis. *Electrophoresis*, Vol.28, No.13, pp. We are on the threshold of a paradigm shift for proteomics, moving from largely a qualitative discipline, to now being capable of quantification of a protein within a complex sample at great sensitivity. The potential application of such advanced proteomic technology is enormous as we will be able to detect and quantify low levels of expressed proteins in complex samples, and so move comparative proteomics to a new level.

The evolving practice of personalized medicine will be dependent on devising new techniques and methodologies that will allow the detection and quantification of proteins that are implicated in contributing to the diseased state. There is perhaps somewhere over 5000 genes that are linked to disease states and complex networks of interactions of these expressed genes ultimately lead to these disease states. The myriad of single nucleotide polymorphisms (SNPs) contribute to such phenomena, as an individual's SNP profile play a major role in susceptibility to disease and in adverse reactions to drugs, for example. Coupled with mutations that occur throughout life, the complex "disease state" proteome will contain mutant proteins at low levels that need to be identified and quantified, so that therapeutic intervention based on rational scientific hypotheses can be investigated.

Plasma and serum contain an unknown number of proteins with amounts ranging from pgg/L levels (i.e. very high dynamic range). As we know one of the major problems faced by proteomic studies of plasma or serum, or indeed any complex protein sample, is that a relatively small number of abundant proteins accounts for the great majority of protein content of the sample. The upshot is that the proteins of interest, which may have regulatory function, are masked by these abundant proteins, and non-targeted methods of proteomic analysis bias at the top end of the abundance scale. The development of new methods for quantifying low abundance proteins has evolved rapidly, concomitant with the evolution of powerful mass spectrometers of increasing sensitivity. The use of antibodies for targeting peptides prior to mass spectrometry analysis is becoming prominent, as a means of partitioning low abundance peptides away from peptides in the bulk sample.

This review will provide a broad overview of the evolution of proteomic methods to analyse biological samples, including Differential In-Gel Electrophoresis (DIGE), Isotope-Coded Affinity Tag (ICAT), Isobaric tags for relative and absolute quantification (iTRAQ), Stable isotope labeling with amino acids in cell culture (SILAC), Unique ion signature Mass

Evolution of Proteomic Methods for Analysis of

specific physiological state, or diseased state.

**2.2 Personalized medicine for cancer** 

Identification of new tumour targets

Development of multi-drug resistance (MDR)

 Drug potency, due to inadequate concentration at the cell surface. Non-selective nature of cytotoxic agents and a low therapeutic index.

alternative splicing events.

**2.1 Significance** 

adverse effect.

Complex Biological Samples – Implications for Personalized Medicine 31

reflect protein levels, nor account for post-translational modifications on proteins or

There are a number ways in which proteomics may be utilized in personalized medicine and more broadly in drug discovery, research and development. In its simplest application, proteomics has contributed to the discovery of disease biomarkers, clinical entities that define and /or predict normal and pathogenic states (Krejsa et al., 2006). Furthermore, the clinical response to treatment can be monitored through proteome profiling of relevant biomarkers. The clinical use of a biomarker is contingent on whether it is a validated biomarker, which ultimately depends on its clinical reliability and utility. Combinations of validated markers biomarkers can indicate surrogate endpoints that can predict clinical outcomes. On a more global protein expression level, comparative proteomics can generate patterns of protein expression or expression profiles, which may be utilized to define a

One area where proteomics can yield information not possible by other means is the identification and localization of proteins in various cellular compartments and extracellular space. The paradigm that proteins have fixed locations within cells has recently proven to be simplistic, and that proteins have diverse functions depending on their cellular location. The identification of a protein outside of its known functional zone in cellular preparations was once thought to be due to rupture of cells/cell organelles and leakage of the protein into other fractions. However it is now known that proteins translocate between intracellular and extracellular compartments (Butler et al., 2009). This has enormous implications in drug targeting as the presence of a target in multiple locations may complicate therapy. For example, chaperone proteins, including HSP 10, 70 and 90 have now been shown to exist in extracellular locations, where it was once thought that the chaperone proteins were exclusively located intracellularly, to aid protein folding and carry out chaperone function. Inhibitors of HSP90α are in clinical trials for treatment of cancer (Banerji, 2009), however inhibition of the extracellular wound-healing function of HSP90α21 could be an undesired

Due to the great diversity of cancer types, and individual variation within specific tumours, cancer perhaps shows the greatest potential for development of personalized therapy. Cancer accounts for about 13% of all deaths annually world wide, and is a major cause of morbidity and mortality (Krause-Heuer et al., 2009). Notwithstanding the emergence of new chemotherapeutic drugs and novel therapies for cancer, significant challenges remain for understanding tumourigenesis and tumour cell biology, and in developing new, effective strategies for cancer treatment. (Mozafari et al., 2009). Some of these challenges include;

The development of therapeutic monoclonal antibodies (mAbs) has shown promise in treatment of cancer amongst other indications. At present there are around 30 approved therapeutic mAbs predominantly for treating cancer and diseases associated with

Spectrometry, Selected reaction monitoring (SRM)- based targeted mass spectrometry and Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA). Examples of the application of these methods to the identification of proteins involved in a variety of disease states and their implication for personalized medicine will be provided.

#### **2. Overview of personalized medicine**

Personalized medicine is designed medicine based on the genotype, or more specifically the SNP profile of individuals. Personalized medicine facilitates the selection of treatments best matched to the individual and disease phenotype (Marko-Varga et al., 2007). What are the main factors which contribute to genotype diversity? SNP is the overriding factor, reflecting past mutation, and occurs wherever there is more than one nucleotide when comparing two sequences. It is the spread of SNPs within genomes which contribute to our individuality, with an estimated 93% of genes containing an SNP (Chakravati, 2001). The individual "fingerprint" of SNPs reflects differences in susceptibility to disease and our varying response to drugs. Pharmacogenetics is the study of how these differences in genotype are manifested in inter-individual variation in response to drugs. The convergence of traditional pharmacogenetics with the relatively new discipline of human genomics has resulted in the evolution of pharmacogenomics (Weinshilbaum et al., 2004). The Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB) is an entity associated with the cataloging genes involved in modulating the response to drugs. The Pharmacogenomics Research Network (PGRN) is a collaboration of scientists studying the effect of genes on responsiveness to a wide variety of medicines (Altman, 2007). The PGRN is linked to the PharmGKB integrative database containing genetic and clinical information on participants in studies (http:www.pharmgkb.org). Thus the integration and availability of data associated with information at the genomic and transcriptomic levels are well developed and a valuable resource for researchers involved in the development of personalized medicine.

The incorporation of proteomics in the further development of the concept of personalized medicine is a more recent phenomenon, and has given rise to the area of pharmacoproteomics, which in essence studies how the proteome changes in response to a drug, and is a logical extension of the pharmacogenetics and pharmacogenomics (Jain, 2004). While pharmacogenetics and pharmacogenomics provide information at the level of the genome and transcriptome, pharmacoproteomics yields information on function (i.e. translational level), although it should be pointed out that small nuclear (sn) RNAs have relatively recently been shown to contribute to regulation of cellular processes and thus have a functional role (Mercer et al., 2009; Mattick, 2009). To this end proteomics and proteomic profiling of individuals' serum and tissues are becoming increasingly important in patient diagnosis and assessment, and together with pharmacogenetics and pharmacogenomics, will provide a more complete picture of the status of an individual, particularly at the functional level.

Identification of disease states can be based on genomic analyses. For example, identification of mutations in breast cancer genes BRAC1 and BRAC2 can be used in the diagnosis of breast cancer (Miki et al., 1994; Wooster et al., 1995). However, at present DNA alone does not necessarily reflect the physiological state of functioning cells and thus analysis of gene products, both RNA and protein are required. RNA expression in comparison to protein, is easier to perform, but transcript levels in the cell do not always reflect protein levels, nor account for post-translational modifications on proteins or alternative splicing events.

#### **2.1 Significance**

30 Integrative Proteomics

Spectrometry, Selected reaction monitoring (SRM)- based targeted mass spectrometry and Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA). Examples of the application of these methods to the identification of proteins involved in a variety of

Personalized medicine is designed medicine based on the genotype, or more specifically the SNP profile of individuals. Personalized medicine facilitates the selection of treatments best matched to the individual and disease phenotype (Marko-Varga et al., 2007). What are the main factors which contribute to genotype diversity? SNP is the overriding factor, reflecting past mutation, and occurs wherever there is more than one nucleotide when comparing two sequences. It is the spread of SNPs within genomes which contribute to our individuality, with an estimated 93% of genes containing an SNP (Chakravati, 2001). The individual "fingerprint" of SNPs reflects differences in susceptibility to disease and our varying response to drugs. Pharmacogenetics is the study of how these differences in genotype are manifested in inter-individual variation in response to drugs. The convergence of traditional pharmacogenetics with the relatively new discipline of human genomics has resulted in the evolution of pharmacogenomics (Weinshilbaum et al., 2004). The Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB) is an entity associated with the cataloging genes involved in modulating the response to drugs. The Pharmacogenomics Research Network (PGRN) is a collaboration of scientists studying the effect of genes on responsiveness to a wide variety of medicines (Altman, 2007). The PGRN is linked to the PharmGKB integrative database containing genetic and clinical information on participants in studies (http:www.pharmgkb.org). Thus the integration and availability of data associated with information at the genomic and transcriptomic levels are well developed and a valuable resource for researchers involved in the development of personalized

The incorporation of proteomics in the further development of the concept of personalized medicine is a more recent phenomenon, and has given rise to the area of pharmacoproteomics, which in essence studies how the proteome changes in response to a drug, and is a logical extension of the pharmacogenetics and pharmacogenomics (Jain, 2004). While pharmacogenetics and pharmacogenomics provide information at the level of the genome and transcriptome, pharmacoproteomics yields information on function (i.e. translational level), although it should be pointed out that small nuclear (sn) RNAs have relatively recently been shown to contribute to regulation of cellular processes and thus have a functional role (Mercer et al., 2009; Mattick, 2009). To this end proteomics and proteomic profiling of individuals' serum and tissues are becoming increasingly important in patient diagnosis and assessment, and together with pharmacogenetics and pharmacogenomics, will provide a more complete picture of the status of an individual,

Identification of disease states can be based on genomic analyses. For example, identification of mutations in breast cancer genes BRAC1 and BRAC2 can be used in the diagnosis of breast cancer (Miki et al., 1994; Wooster et al., 1995). However, at present DNA alone does not necessarily reflect the physiological state of functioning cells and thus analysis of gene products, both RNA and protein are required. RNA expression in comparison to protein, is easier to perform, but transcript levels in the cell do not always

disease states and their implication for personalized medicine will be provided.

**2. Overview of personalized medicine** 

medicine.

particularly at the functional level.

There are a number ways in which proteomics may be utilized in personalized medicine and more broadly in drug discovery, research and development. In its simplest application, proteomics has contributed to the discovery of disease biomarkers, clinical entities that define and /or predict normal and pathogenic states (Krejsa et al., 2006). Furthermore, the clinical response to treatment can be monitored through proteome profiling of relevant biomarkers. The clinical use of a biomarker is contingent on whether it is a validated biomarker, which ultimately depends on its clinical reliability and utility. Combinations of validated markers biomarkers can indicate surrogate endpoints that can predict clinical outcomes. On a more global protein expression level, comparative proteomics can generate patterns of protein expression or expression profiles, which may be utilized to define a specific physiological state, or diseased state.

One area where proteomics can yield information not possible by other means is the identification and localization of proteins in various cellular compartments and extracellular space. The paradigm that proteins have fixed locations within cells has recently proven to be simplistic, and that proteins have diverse functions depending on their cellular location. The identification of a protein outside of its known functional zone in cellular preparations was once thought to be due to rupture of cells/cell organelles and leakage of the protein into other fractions. However it is now known that proteins translocate between intracellular and extracellular compartments (Butler et al., 2009). This has enormous implications in drug targeting as the presence of a target in multiple locations may complicate therapy. For example, chaperone proteins, including HSP 10, 70 and 90 have now been shown to exist in extracellular locations, where it was once thought that the chaperone proteins were exclusively located intracellularly, to aid protein folding and carry out chaperone function. Inhibitors of HSP90α are in clinical trials for treatment of cancer (Banerji, 2009), however inhibition of the extracellular wound-healing function of HSP90α21 could be an undesired adverse effect.

#### **2.2 Personalized medicine for cancer**

Due to the great diversity of cancer types, and individual variation within specific tumours, cancer perhaps shows the greatest potential for development of personalized therapy. Cancer accounts for about 13% of all deaths annually world wide, and is a major cause of morbidity and mortality (Krause-Heuer et al., 2009). Notwithstanding the emergence of new chemotherapeutic drugs and novel therapies for cancer, significant challenges remain for understanding tumourigenesis and tumour cell biology, and in developing new, effective strategies for cancer treatment. (Mozafari et al., 2009). Some of these challenges include;


The development of therapeutic monoclonal antibodies (mAbs) has shown promise in treatment of cancer amongst other indications. At present there are around 30 approved therapeutic mAbs predominantly for treating cancer and diseases associated with

Evolution of Proteomic Methods for Analysis of

**2.3.1 Two-dimensional gel electrophoresis** 

identifying new potential markers.

**2.3.2 Differential In-Gel Electrophoresis (DIGE)** 

Complex Biological Samples – Implications for Personalized Medicine 33

by creating complex "proteomic fingerprints" of healthy and diseased states (and transitions thereof), one may recognize perturbations from the healthy state phenotype before manifestation of the disease state. Therapeutic intervention during the transition-to-disease state may instigate a reversion to the healthy phenotype. Thus a systems biology approach to studying the proteomes of cells in normal and diseased states, and also the network of

Proteomics and the tools to identify and quantify proteins have evolved substantially since its conception. The availability of genomic information, particularly the complete human genome sequence has in many ways pushed the bottleneck from the genomic to proteomic arena. Developments include tools for protein separation, protein identification, quantification and automated processes. The following sections provide a summary of some of the major approaches to protein separation, identification and quantification using both gel and gel-free proteomic methods. Separation of proteins can be based on one or more physical or biochemical parameter including size, pI, sub-cellular location, or other depletion / enrichment strategies, with separation involving two or more 'orthagonal'

One of the earliest approaches to protein separation was based on the use of twodimensional gel electrophoresis (2DGE), in which proteins are first separated by charge in a pH gradient (isoelectric focusing), followed by separation based on size in SDS-PAGE gels (O'Farrell, 1975). The gel is subsequently stained to visualize proteins, with each protein spot-volume representative of abundance of the protein(s) within it. This approach, particularly when combined with other upstream fractionation steps (e.g. sub-cellular) can provide separation and resolution of a large number of proteins (Cordwell et al., 2000) and has been used in various projects aiming to identify differentially expressed proteins – potentially biomarkers, by comparing control vs. diseased samples via gel-analysis software and statistical tests. Using 2DGE, identification of candidate biomarkers has been achieved for a range of diseases including atherothrombotic ischemic stroke (Brea et al., 2008), pancreatic cancer (Park et al., 2011) and breast cancer (Lee, et al., 2011) with many studies

Improvements to 2DGE include the addition of small fluorescent tags (CyDyes) on protein samples prior to separation, thus allowing multiple samples to be combined in the same physical gel (Tonge et al., 2001). This approach, known as Differential In-Gel Electrophoresis or DIGE, circumvents some of the problems of quantifying proteins across different samples in different gels as the distinct fluorescent tags on different samples allow the researcher to detect proteins from control vs. disease samples simultaneously. The inclusion of a pooled internal standard representing a mix of all samples can help circumvent some of the technical difficulties (e.g. gel warping and spot matching) that arise from single-sample gels. An advantage of DIGE is that due to the sensitive fluorescent nature of the CyDye labels low amounts of sample (as little as 10 g) are required for analysis. However, a caveat is that although protein separation and quantification can be achieved, the identity of the proteins still remains unknown unless the protein spots are excised and further analysed, which can be difficult due to the low amounts of protein used in the analysis. Limitations also exist in

protein-protein interactions, should enhance the opportunity for attaining this goal.

approaches providing greater separation than a single dimension alone.

inflammation (Walsh, 2010). Anti-cancer antibodies are designed to target tumour cell surface antigens, with subsequent eliciting of an immune response on tumour binding. Most commonly this is termed antibody dependent cellular cytotoxicity (ADCC), and activated natural killer cells are recruited to attack the tumour. Currently approved anti-cancer therapeutic mAbs targeting the tumour cell surface are specific for antigens including EGFR, HER2/*neu*, CD20, CD52 and CD33. Other targets which are receiving much attention are the mucins, principally MUC1 (there is no existing approved antibody for MUC1), IGFR and CEA and cancer stem cell antigen CD44. The future development of monoclonal antibody (and other) cancer therapies will be contingent on the identification, development and validation of new tumour targets. However identification of new tumour biomarkers that reliably and accurately diagnose early stage cancer has not been met with great success. As an example, a group of researchers, as part of the Early Detection Research Network (EDRN), tested a group of recently discovered putative biomarkers for ovarian cancer, however none were superior to CA-125, which has been used extensively for 30 years. Notwithstanding the significant challenges in the discovery and development of clinically useful biomarkers, proteomics will be central for the discovery of new, novel biomarkers for early detection and diagnosis; some of these biomarkers may be suitable for development as novel drug targets (Pastwal et al., 2007).

Recently the International Cancer Genomes Consortium (ICGC) was formed, with a charter to co-ordinate and integrate large-scale cancer genome sequencing projects, focusing on 50 different types of cancers (The International Cancer Genome Consortium, 2010). The expanded studies will consist of investigating around 25,000 specific cancers (biopsy material from individuals). The primary objectives of the consortium were made public in April 2008 and were released in April 2008 (http://www.icgc.org/files/ICGC\_ April\_29\_2008. pdf). However these studies will be at the genomic, epigenomic and transcriptomic levels. At the proteomic level, the Human Proteome Organization (HUPO, http://www.hupo.org/) instigated the Human Proteome Project (HPP), a co-ordinated global initiative to map the protein-based molecular architecture of the human body. This initiative, will aid in the discovery and cataloging of new tumour associated antigens and potential targets.

#### **2.3 Evolution of methods of analysis using proteomics**

The term proteomics, encompassing the analysis of and tools used to examine proteins expressed by a genome was coined only about 15 years ago (Wilkins et al., 1996). The progression and development of proteomics since this time however, has naturally afforded a refinement in techniques and methods for simultaneously detecting, identifying and quantifying proteins in biological samples. Fundamental to any identification or further characterization is the ability to first separate proteins from complex samples. This is particularly important for samples such as blood where it is estimated that the dynamic range of proteins is greater than 10exp9. With such a vast dynamic range, just 22 proteins account for 99% of protein content in blood (reviewed in Simpson et al., 2008). With respect to the search for protein biomarkers, the situation is also complicated by the notion that most biomarkers will be low abundance proteins. As protein function and levels of abundance are often altered in disease states, identification of such changes by comparison of healthy and disease samples will allow a greater understanding of the disease, provide new therapeutic targets, as well as identify markers of disease status. Establishment of what could be defined as the "healthy phenotype", will depend on detailed characterization of the proteomes of healthy and diseased states of cells/tissue. One school of thought suggests that

inflammation (Walsh, 2010). Anti-cancer antibodies are designed to target tumour cell surface antigens, with subsequent eliciting of an immune response on tumour binding. Most commonly this is termed antibody dependent cellular cytotoxicity (ADCC), and activated natural killer cells are recruited to attack the tumour. Currently approved anti-cancer therapeutic mAbs targeting the tumour cell surface are specific for antigens including EGFR, HER2/*neu*, CD20, CD52 and CD33. Other targets which are receiving much attention are the mucins, principally MUC1 (there is no existing approved antibody for MUC1), IGFR and CEA and cancer stem cell antigen CD44. The future development of monoclonal antibody (and other) cancer therapies will be contingent on the identification, development and validation of new tumour targets. However identification of new tumour biomarkers that reliably and accurately diagnose early stage cancer has not been met with great success. As an example, a group of researchers, as part of the Early Detection Research Network (EDRN), tested a group of recently discovered putative biomarkers for ovarian cancer, however none were superior to CA-125, which has been used extensively for 30 years. Notwithstanding the significant challenges in the discovery and development of clinically useful biomarkers, proteomics will be central for the discovery of new, novel biomarkers for early detection and diagnosis; some of these biomarkers may be suitable for development as

Recently the International Cancer Genomes Consortium (ICGC) was formed, with a charter to co-ordinate and integrate large-scale cancer genome sequencing projects, focusing on 50 different types of cancers (The International Cancer Genome Consortium, 2010). The expanded studies will consist of investigating around 25,000 specific cancers (biopsy material from individuals). The primary objectives of the consortium were made public in April 2008 and were released in April 2008 (http://www.icgc.org/files/ICGC\_ April\_29\_2008. pdf). However these studies will be at the genomic, epigenomic and transcriptomic levels. At the proteomic level, the Human Proteome Organization (HUPO, http://www.hupo.org/) instigated the Human Proteome Project (HPP), a co-ordinated global initiative to map the protein-based molecular architecture of the human body. This initiative, will aid in the

discovery and cataloging of new tumour associated antigens and potential targets.

The term proteomics, encompassing the analysis of and tools used to examine proteins expressed by a genome was coined only about 15 years ago (Wilkins et al., 1996). The progression and development of proteomics since this time however, has naturally afforded a refinement in techniques and methods for simultaneously detecting, identifying and quantifying proteins in biological samples. Fundamental to any identification or further characterization is the ability to first separate proteins from complex samples. This is particularly important for samples such as blood where it is estimated that the dynamic range of proteins is greater than 10exp9. With such a vast dynamic range, just 22 proteins account for 99% of protein content in blood (reviewed in Simpson et al., 2008). With respect to the search for protein biomarkers, the situation is also complicated by the notion that most biomarkers will be low abundance proteins. As protein function and levels of abundance are often altered in disease states, identification of such changes by comparison of healthy and disease samples will allow a greater understanding of the disease, provide new therapeutic targets, as well as identify markers of disease status. Establishment of what could be defined as the "healthy phenotype", will depend on detailed characterization of the proteomes of healthy and diseased states of cells/tissue. One school of thought suggests that

**2.3 Evolution of methods of analysis using proteomics** 

novel drug targets (Pastwal et al., 2007).

by creating complex "proteomic fingerprints" of healthy and diseased states (and transitions thereof), one may recognize perturbations from the healthy state phenotype before manifestation of the disease state. Therapeutic intervention during the transition-to-disease state may instigate a reversion to the healthy phenotype. Thus a systems biology approach to studying the proteomes of cells in normal and diseased states, and also the network of protein-protein interactions, should enhance the opportunity for attaining this goal.

Proteomics and the tools to identify and quantify proteins have evolved substantially since its conception. The availability of genomic information, particularly the complete human genome sequence has in many ways pushed the bottleneck from the genomic to proteomic arena. Developments include tools for protein separation, protein identification, quantification and automated processes. The following sections provide a summary of some of the major approaches to protein separation, identification and quantification using both gel and gel-free proteomic methods. Separation of proteins can be based on one or more physical or biochemical parameter including size, pI, sub-cellular location, or other depletion / enrichment strategies, with separation involving two or more 'orthagonal' approaches providing greater separation than a single dimension alone.

#### **2.3.1 Two-dimensional gel electrophoresis**

One of the earliest approaches to protein separation was based on the use of twodimensional gel electrophoresis (2DGE), in which proteins are first separated by charge in a pH gradient (isoelectric focusing), followed by separation based on size in SDS-PAGE gels (O'Farrell, 1975). The gel is subsequently stained to visualize proteins, with each protein spot-volume representative of abundance of the protein(s) within it. This approach, particularly when combined with other upstream fractionation steps (e.g. sub-cellular) can provide separation and resolution of a large number of proteins (Cordwell et al., 2000) and has been used in various projects aiming to identify differentially expressed proteins – potentially biomarkers, by comparing control vs. diseased samples via gel-analysis software and statistical tests. Using 2DGE, identification of candidate biomarkers has been achieved for a range of diseases including atherothrombotic ischemic stroke (Brea et al., 2008), pancreatic cancer (Park et al., 2011) and breast cancer (Lee, et al., 2011) with many studies identifying new potential markers.

#### **2.3.2 Differential In-Gel Electrophoresis (DIGE)**

Improvements to 2DGE include the addition of small fluorescent tags (CyDyes) on protein samples prior to separation, thus allowing multiple samples to be combined in the same physical gel (Tonge et al., 2001). This approach, known as Differential In-Gel Electrophoresis or DIGE, circumvents some of the problems of quantifying proteins across different samples in different gels as the distinct fluorescent tags on different samples allow the researcher to detect proteins from control vs. disease samples simultaneously. The inclusion of a pooled internal standard representing a mix of all samples can help circumvent some of the technical difficulties (e.g. gel warping and spot matching) that arise from single-sample gels. An advantage of DIGE is that due to the sensitive fluorescent nature of the CyDye labels low amounts of sample (as little as 10 g) are required for analysis. However, a caveat is that although protein separation and quantification can be achieved, the identity of the proteins still remains unknown unless the protein spots are excised and further analysed, which can be difficult due to the low amounts of protein used in the analysis. Limitations also exist in

Evolution of Proteomic Methods for Analysis of

radioactivity.

monitored.

Complex Biological Samples – Implications for Personalized Medicine 35

with unique tags. The physical properties of the tags differ only in the isotopes used in their synthesis, meaning during LC separations, and in MS mode are identical. Only upon fragmentation in MS/MS mode are the isobaric tags distinguishable. The result is that proteins can be identified via MS/MS and due to unique reporter ions from the ITRAQ tag, the protein can also be quantified. The initial ITRAQ labels were designed to label up to four different samples, although tags to label and detect up to eight different samples are now available. Limitations for ITRAQ lie with the difficulty of identify proteins and quantify them when uniquely expressed in only one sample type eg a protein expressed only in the diseased state. Some technical difficulties, in particular with 8plex tags, resulting in a reduction in identification efficiency have also been reported (Thingholm et al., 2010). Other label-based strategies such as Stable Isotope Labeling with Amino acids in Cell culture (SILAC) also exist. This method is based on labeling proteins in culture with heavy and light forms of amino acids. The approach is a useful way of comparing two samples, but is limited only to cells grown in culture or in some cases, animal models (Zanivan et al., 2012) and would not be applicable to human or clinical studies due to the use of

Selected reaction monitoring (SRM), also known as Multiple Reaction Monitoring (MRM) provides a targeted approach to quantifying proteins in a sample. It advances the 'global' approach to quantification by simply targeting those proteins specifically of interest to the researcher. Typically MS instruments such as triple quadrupoles are used, where a precursor mass representing a peptide (typically tryptic) from the protein of interest is selected, fragmented and specific product ions unique to that peptide, monitored. Generally for each protein of interest a number of precursor ions, and subsequent product ions are monitored to ensure specificity. This information is then used to identify and quantify the proteins present. The main limitation of SRM/MRM analyses is that it is essential to know the proteins of interest beforehand, so that appropriate precursor / product ion can be

SRM / MRM is a targeted approach, where identify of the protein(s) of interest must be known beforehand. The usefulness of SRM/MRM analyses is thus as a downstream technology after discovery-phase experiments have concluded, and candidate proteins of interest requiring quantification already identified. The advantage of SRM / MRM assays is the ability to simultaneously monitor numerous potential biomarkers in a single analysis and quantify protein levels and is thus currently a popular area of investigation. To help with SRM / MRM analyses, a consortium called SRMAtlas has been established (www.mrmatlas.org) to quantify proteins in complex samples by MS. As well as human entries, mouse and yeast information is also contained, and provides both web-interface and command line tools to search for assays. This readily available information means researchers can potentially circumvent some method development steps as optimal

Stable isotope standards and capture by antipeptide antibodies (SISCAPA) is a method which allows the quantification of peptides from complex digests. Originally described by Anderson et al., 2004 the method utilizes stable-isotope-labeled internal standards for

coordinates for SRM / MRM transitions of numerous proteins are available.

**2.4.3 Stable isotope standards and capture by anti-peptide antibodies** 

**2.4.2 Selected reaction monitoring / multiple reaction monitoring** 

the dynamic range of protein detection, estimated at 10exp4 (reviewed in Rabilloud, 2002) Typically a separate, unlabelled gel with larger amounts of protein loaded are required to be run and subsequently cross matched with the original DIGE experiment, as described by Matigian et al., 2010. Despite these difficulties, 2DGE has a proven track record in the separation and identification of proteins, with numerous differentially expressed proteins and potential biomarkers uncovered. 2DGE has been applied to a wide range of sample types including tissues (e.g. breast, skin, brain) and fluids such as cerebrospinal fluid, serum, urine, and tears targeting diseases such as various cancers, Alzheimers disease and dementia, cardiovascular diseases, infections such as HIV, conjunctivitis and toxoplasmosis. Overall, 2DGE with or without fluorescent labeling of samples does provide good separation of proteins, but is time consuming and a laborious process both in the gel procedures, and analysis of 2D protein spot profiles.

#### **2.4 Mass spectrometry-based approaches**

Mass spectrometry (MS) has formed the basis for standard protein identification for many years, typically in a 'bottom-up' approach (in which proteins are digested, usually with trypsin) and the resulting peptides analyzed to determine protein identity, but also in some cases by 'top-down' approaches where intact proteins form the basis of analysis. MS-based approaches hold some advantages over traditional 2DGE methods in that samples can potentially be analyzed and identified simultaneously through methods such as two dimensional LC-MS/MS using a combination of strong cation exchange followed by reverse phase separation of peptides. In comparison to gel-based approaches, MS analyses appear to be more effective at identifying low abundance proteins, as well as those with extreme physical properties such as molecular weights (low or high) or pI values, which are often difficult to resolve on gels. MS-based analyses also offer better prospects for automation of separation, analysis and identification of proteins.

With the ability to rapidly identify large numbers of proteins via MS, the emphasis has since shifted to also quantifying those proteins detected by MS. Broadly speaking, the approaches to quantifying proteins via mass spectrometry can be based on labeled or label-free methods. Each approach has its own advantages and disadvantages. Label-based methods include Isotope-Coded Affinity Tag or ICAT, and Isotope Tag for Relative and Absolute Quantitation or ITRAQ, in which multiple samples can be labeled, mixed and then analyzed simultaneously via MS to avoid technical issues relating to reproducibility that may be encountered with label-free approaches.

#### **2.4.1 ITRAQ, ICAT and SILAC**

ICAT and ITRAQ differ in their labeling chemistries and site of attachment. For ICAT, cysteine (cys) residues are targeted and selected for via avidin affinity chromatography. The enrichment of only those cys-containing peptides provides one avenue to quantify samples without the complexity of analyzing all peptides in a sample. However, ICAT becomes problematic for analysis of proteins which lack any cys residues. Furthermore, as reviewed in Patton et al., 2002, approximately 70% of proteins contain four or less cys residues thus limiting the usefulness of this approach.

ITRAQ utilizes lysine residues for labeling (Ross et al., 2004), thus avoiding the problem of limited cys residues encountered with ICAT. ITRAQ is a multiplexed approach, where tags are based on isobaric reagents. This means that up to eight different samples can be labeled

the dynamic range of protein detection, estimated at 10exp4 (reviewed in Rabilloud, 2002) Typically a separate, unlabelled gel with larger amounts of protein loaded are required to be run and subsequently cross matched with the original DIGE experiment, as described by Matigian et al., 2010. Despite these difficulties, 2DGE has a proven track record in the separation and identification of proteins, with numerous differentially expressed proteins and potential biomarkers uncovered. 2DGE has been applied to a wide range of sample types including tissues (e.g. breast, skin, brain) and fluids such as cerebrospinal fluid, serum, urine, and tears targeting diseases such as various cancers, Alzheimers disease and dementia, cardiovascular diseases, infections such as HIV, conjunctivitis and toxoplasmosis. Overall, 2DGE with or without fluorescent labeling of samples does provide good separation of proteins, but is time consuming and a laborious process both in the gel

Mass spectrometry (MS) has formed the basis for standard protein identification for many years, typically in a 'bottom-up' approach (in which proteins are digested, usually with trypsin) and the resulting peptides analyzed to determine protein identity, but also in some cases by 'top-down' approaches where intact proteins form the basis of analysis. MS-based approaches hold some advantages over traditional 2DGE methods in that samples can potentially be analyzed and identified simultaneously through methods such as two dimensional LC-MS/MS using a combination of strong cation exchange followed by reverse phase separation of peptides. In comparison to gel-based approaches, MS analyses appear to be more effective at identifying low abundance proteins, as well as those with extreme physical properties such as molecular weights (low or high) or pI values, which are often difficult to resolve on gels. MS-based analyses also offer better prospects for automation of

With the ability to rapidly identify large numbers of proteins via MS, the emphasis has since shifted to also quantifying those proteins detected by MS. Broadly speaking, the approaches to quantifying proteins via mass spectrometry can be based on labeled or label-free methods. Each approach has its own advantages and disadvantages. Label-based methods include Isotope-Coded Affinity Tag or ICAT, and Isotope Tag for Relative and Absolute Quantitation or ITRAQ, in which multiple samples can be labeled, mixed and then analyzed simultaneously via MS to avoid technical issues relating to reproducibility that may be

ICAT and ITRAQ differ in their labeling chemistries and site of attachment. For ICAT, cysteine (cys) residues are targeted and selected for via avidin affinity chromatography. The enrichment of only those cys-containing peptides provides one avenue to quantify samples without the complexity of analyzing all peptides in a sample. However, ICAT becomes problematic for analysis of proteins which lack any cys residues. Furthermore, as reviewed in Patton et al., 2002, approximately 70% of proteins contain four or less cys residues thus

ITRAQ utilizes lysine residues for labeling (Ross et al., 2004), thus avoiding the problem of limited cys residues encountered with ICAT. ITRAQ is a multiplexed approach, where tags are based on isobaric reagents. This means that up to eight different samples can be labeled

procedures, and analysis of 2D protein spot profiles.

separation, analysis and identification of proteins.

encountered with label-free approaches.

limiting the usefulness of this approach.

**2.4.1 ITRAQ, ICAT and SILAC** 

**2.4 Mass spectrometry-based approaches** 

with unique tags. The physical properties of the tags differ only in the isotopes used in their synthesis, meaning during LC separations, and in MS mode are identical. Only upon fragmentation in MS/MS mode are the isobaric tags distinguishable. The result is that proteins can be identified via MS/MS and due to unique reporter ions from the ITRAQ tag, the protein can also be quantified. The initial ITRAQ labels were designed to label up to four different samples, although tags to label and detect up to eight different samples are now available. Limitations for ITRAQ lie with the difficulty of identify proteins and quantify them when uniquely expressed in only one sample type eg a protein expressed only in the diseased state. Some technical difficulties, in particular with 8plex tags, resulting in a reduction in identification efficiency have also been reported (Thingholm et al., 2010).

Other label-based strategies such as Stable Isotope Labeling with Amino acids in Cell culture (SILAC) also exist. This method is based on labeling proteins in culture with heavy and light forms of amino acids. The approach is a useful way of comparing two samples, but is limited only to cells grown in culture or in some cases, animal models (Zanivan et al., 2012) and would not be applicable to human or clinical studies due to the use of radioactivity.

#### **2.4.2 Selected reaction monitoring / multiple reaction monitoring**

Selected reaction monitoring (SRM), also known as Multiple Reaction Monitoring (MRM) provides a targeted approach to quantifying proteins in a sample. It advances the 'global' approach to quantification by simply targeting those proteins specifically of interest to the researcher. Typically MS instruments such as triple quadrupoles are used, where a precursor mass representing a peptide (typically tryptic) from the protein of interest is selected, fragmented and specific product ions unique to that peptide, monitored. Generally for each protein of interest a number of precursor ions, and subsequent product ions are monitored to ensure specificity. This information is then used to identify and quantify the proteins present. The main limitation of SRM/MRM analyses is that it is essential to know the proteins of interest beforehand, so that appropriate precursor / product ion can be monitored.

SRM / MRM is a targeted approach, where identify of the protein(s) of interest must be known beforehand. The usefulness of SRM/MRM analyses is thus as a downstream technology after discovery-phase experiments have concluded, and candidate proteins of interest requiring quantification already identified. The advantage of SRM / MRM assays is the ability to simultaneously monitor numerous potential biomarkers in a single analysis and quantify protein levels and is thus currently a popular area of investigation. To help with SRM / MRM analyses, a consortium called SRMAtlas has been established (www.mrmatlas.org) to quantify proteins in complex samples by MS. As well as human entries, mouse and yeast information is also contained, and provides both web-interface and command line tools to search for assays. This readily available information means researchers can potentially circumvent some method development steps as optimal coordinates for SRM / MRM transitions of numerous proteins are available.

#### **2.4.3 Stable isotope standards and capture by anti-peptide antibodies**

Stable isotope standards and capture by antipeptide antibodies (SISCAPA) is a method which allows the quantification of peptides from complex digests. Originally described by Anderson et al., 2004 the method utilizes stable-isotope-labeled internal standards for

Evolution of Proteomic Methods for Analysis of

**2.4.5 Post-translational modifications** 

analysis of cancer patients.

patients (Semaan et al., 2011).

et al., 2008).

**2.4.6 The Human Proteome Project** 

will aim to address three parts (HUPO Views, 2010):

 Identification and characterization of proteins from every gene. Distribution of proteins in all normal tissues and organs. Mapping of pathways and protein networks and interactions.

Complex Biological Samples – Implications for Personalized Medicine 37

Other approaches for analysis of samples include Surface Enhanced Laser Desorption Ionisation (SELDI) and Matrix Assisted Laser Desorption Ionisation (MALDI), which have both been utilised particularly in examination of body fluids such as serum for biomarker discovery. This is effectively a "protein pattern recognition" approach (reviewed in Zhan & Desiderio, 2010) which compares profiles from control versus disease samples to identify those proteins differentially expressed. This approach has been used in particular for

The majority of the above technologies focus on protein expression and differential expression in control vs. disease states. However, greater emphasis in the future on protein post-translational modifications (PTMs) such as glycosylation and phosphorylation will be needed. Already, perturbations in modifications of proteins by the glycan O-linked B-Nacetylglucosamine (O-GlcNAc) has been implicated in a range of diseases, including Alzheimers and diabetes (reviewed in Dias & Hart 2007). Similarly, differential phosphorylation has been identified in diseased states such as cancer compared to control

Fundamental to rational design for disease treatment and prevention is the understanding of genes present, and the expression and function of gene products, including proteins involved in the disease process. The Human Genome Project (HGP) was established to map all genes encoded in the human genome. Surprisingly, the total number of protein coding genes, only approximately 20,300, was substantially lower than expected, with increased complexity presumably due to splice variants, and post-translational modifications. Following on from this ground-breaking work, is the recent establishment of the Human Proteome Project which aims to map the human proteome (Legrain et al., 2011). At present, of the protein-coding genes in humans identified in the human genome, approximately one third have not been detected at the protein level, while for many others, basic information such as abundance, sub-cellular localization, or function are unknown. Mapping of the proteome will be valuable in understanding human biology, and downstream applications in developing diagnostics, prognostics and new therapies to treat diseases. The HPP will have a 'gene-centric' approach to map information about proteins back to gene loci. HPP

With respect to sample type, bodily fluids relatively easily attainable, such as urine, saliva, tears as well as those requiring more slightly more invasive methods for collection such as serum, plasma and CFS have all been analyzed for a variety of diseases. Fluids as opposed to solid tissues would generally form a better basis for determining personalized signatures and biomarker detection due to their ease of attainment. There has been some question over whether blood is the best choice for searching for biomarkers. The rationale has been that specific proteins are secreted by the body from different organs, and that these can represent a biological "fingerprint" of physiological state (reviewed in Simpson

comparison with (unlabelled) peptides that are enriched via anti-peptide antibodies, with subsequent quantification performed by electrospray mass spectrometry. The approach offers increased sensitivity over non-enriched methods, particularly when coupled with SRM / MRM assays. SISCAPA also offers potential in the verification of diagnostic protein panels from large samples as well as increased efficiency in assay time for the bind/elute process over conventional reverse phase separations. There are also distinct advantages over traditional techniques such as ELISA in development time for biomarker assays (Whiteaker et al, 2009). The main disadvantage of SISCAPA is the need for preselected targets, as well as the cost in producing the internal peptide standards and generation of the peptide binding antibodies. However, given the sensitivity of the assay (only low fmol – pmol amounts are required), and the fact the antibody itself can be recycled and used again means on-going costs can be reduced. Since the original design of SISCAPA, refinements in the assay have been developed to reduce loss of low abundance peptides, automated processing steps, and improvement in antibody sources i.e. from polyclonal to monoclonal (Anderson et al., 2009; Schoenherr 2009). As this method is only a relatively recent development, no biomarkers have as yet been published as validated with this approach, although proof-ofprinciple experiments have been performed with established biomarkers such as tropinin I (cTnI) (Kuhn, et al., 2009) and thus SISCAPA remains a promising tool.

#### **2.4.4 Alternative strategies**

In addition to the above technologies, other strategies have been developed to complement gel and MS approaches to detect biomarkers through improved sample preparation methods. For example, hexapeptide libraries, based on combinatorial peptide libraries offer a way to deplete samples of highly abundant proteins (Guerrier et al., 2008). In this innovative technique, a large collection of specific hexapeptides (hexapeptide library) is attached to beads. The complex protein sample of interest is mixed with the hexapeptidebead library. The peptide library is of high diversity and so it would be expected that a specific peptide(s) in the library would have affinity for each individual protein in the complex sample. After separation of the beads from the mixture, the adsorbed proteins are eluted from the beads. As each hexapeptide is equally represented within the library, the end result is that the abundant proteins are depleted, while proteins of low abundance are concentrated. This approach is particularly useful for biological fluids (serum, saliva, urine etc) which have particularly large dynamic range of protein abundance. For example, hexapeptide enrichment of urine has uncovered an additional 251 proteins that were not previously known to be present in this fluid (Castagna et al., 2005). Although depletion/enrichment strategies may not, in their own right, uncover biomarkers, their usefulness lies in the ability to mine deeper into the proteome of these highly complex samples so that low abundance proteins can be identified.

Depletion / enrichment strategies can be problematic if the abundant protein is a carrier for low abundance molecules and the use of depletion strategies must be done with caution. For example it has been shown that the depletion of albumin from human plasma can also remove low abundance proteins such as cytokines (Granger et al., 2005). More recent studies (Bellei et al., 2010) have also concluded that removal of high-abundance proteins can result in a loss of non-targeted, less abundant proteins. Obviously unintentional and unknowing loss of low abundance proteins is a cause for concern in the search for biomarkers of disease.

Other approaches for analysis of samples include Surface Enhanced Laser Desorption Ionisation (SELDI) and Matrix Assisted Laser Desorption Ionisation (MALDI), which have both been utilised particularly in examination of body fluids such as serum for biomarker discovery. This is effectively a "protein pattern recognition" approach (reviewed in Zhan & Desiderio, 2010) which compares profiles from control versus disease samples to identify those proteins differentially expressed. This approach has been used in particular for analysis of cancer patients.

#### **2.4.5 Post-translational modifications**

36 Integrative Proteomics

comparison with (unlabelled) peptides that are enriched via anti-peptide antibodies, with subsequent quantification performed by electrospray mass spectrometry. The approach offers increased sensitivity over non-enriched methods, particularly when coupled with SRM / MRM assays. SISCAPA also offers potential in the verification of diagnostic protein panels from large samples as well as increased efficiency in assay time for the bind/elute process over conventional reverse phase separations. There are also distinct advantages over traditional techniques such as ELISA in development time for biomarker assays (Whiteaker et al, 2009). The main disadvantage of SISCAPA is the need for preselected targets, as well as the cost in producing the internal peptide standards and generation of the peptide binding antibodies. However, given the sensitivity of the assay (only low fmol – pmol amounts are required), and the fact the antibody itself can be recycled and used again means on-going costs can be reduced. Since the original design of SISCAPA, refinements in the assay have been developed to reduce loss of low abundance peptides, automated processing steps, and improvement in antibody sources i.e. from polyclonal to monoclonal (Anderson et al., 2009; Schoenherr 2009). As this method is only a relatively recent development, no biomarkers have as yet been published as validated with this approach, although proof-ofprinciple experiments have been performed with established biomarkers such as tropinin I

In addition to the above technologies, other strategies have been developed to complement gel and MS approaches to detect biomarkers through improved sample preparation methods. For example, hexapeptide libraries, based on combinatorial peptide libraries offer a way to deplete samples of highly abundant proteins (Guerrier et al., 2008). In this innovative technique, a large collection of specific hexapeptides (hexapeptide library) is attached to beads. The complex protein sample of interest is mixed with the hexapeptidebead library. The peptide library is of high diversity and so it would be expected that a specific peptide(s) in the library would have affinity for each individual protein in the complex sample. After separation of the beads from the mixture, the adsorbed proteins are eluted from the beads. As each hexapeptide is equally represented within the library, the end result is that the abundant proteins are depleted, while proteins of low abundance are concentrated. This approach is particularly useful for biological fluids (serum, saliva, urine etc) which have particularly large dynamic range of protein abundance. For example, hexapeptide enrichment of urine has uncovered an additional 251 proteins that were not previously known to be present in this fluid (Castagna et al., 2005). Although depletion/enrichment strategies may not, in their own right, uncover biomarkers, their usefulness lies in the ability to mine deeper into the proteome of these highly complex

Depletion / enrichment strategies can be problematic if the abundant protein is a carrier for low abundance molecules and the use of depletion strategies must be done with caution. For example it has been shown that the depletion of albumin from human plasma can also remove low abundance proteins such as cytokines (Granger et al., 2005). More recent studies (Bellei et al., 2010) have also concluded that removal of high-abundance proteins can result in a loss of non-targeted, less abundant proteins. Obviously unintentional and unknowing loss of low abundance proteins is a cause for concern in the

(cTnI) (Kuhn, et al., 2009) and thus SISCAPA remains a promising tool.

samples so that low abundance proteins can be identified.

search for biomarkers of disease.

**2.4.4 Alternative strategies** 

The majority of the above technologies focus on protein expression and differential expression in control vs. disease states. However, greater emphasis in the future on protein post-translational modifications (PTMs) such as glycosylation and phosphorylation will be needed. Already, perturbations in modifications of proteins by the glycan O-linked B-Nacetylglucosamine (O-GlcNAc) has been implicated in a range of diseases, including Alzheimers and diabetes (reviewed in Dias & Hart 2007). Similarly, differential phosphorylation has been identified in diseased states such as cancer compared to control patients (Semaan et al., 2011).

#### **2.4.6 The Human Proteome Project**

Fundamental to rational design for disease treatment and prevention is the understanding of genes present, and the expression and function of gene products, including proteins involved in the disease process. The Human Genome Project (HGP) was established to map all genes encoded in the human genome. Surprisingly, the total number of protein coding genes, only approximately 20,300, was substantially lower than expected, with increased complexity presumably due to splice variants, and post-translational modifications. Following on from this ground-breaking work, is the recent establishment of the Human Proteome Project which aims to map the human proteome (Legrain et al., 2011). At present, of the protein-coding genes in humans identified in the human genome, approximately one third have not been detected at the protein level, while for many others, basic information such as abundance, sub-cellular localization, or function are unknown. Mapping of the proteome will be valuable in understanding human biology, and downstream applications in developing diagnostics, prognostics and new therapies to treat diseases. The HPP will have a 'gene-centric' approach to map information about proteins back to gene loci. HPP will aim to address three parts (HUPO Views, 2010):


With respect to sample type, bodily fluids relatively easily attainable, such as urine, saliva, tears as well as those requiring more slightly more invasive methods for collection such as serum, plasma and CFS have all been analyzed for a variety of diseases. Fluids as opposed to solid tissues would generally form a better basis for determining personalized signatures and biomarker detection due to their ease of attainment. There has been some question over whether blood is the best choice for searching for biomarkers. The rationale has been that specific proteins are secreted by the body from different organs, and that these can represent a biological "fingerprint" of physiological state (reviewed in Simpson et al., 2008).

Evolution of Proteomic Methods for Analysis of

specificity and confidence.

symptoms.

**4. References** 

will allow/facilitate the following:

 Predicting a patient's response to drugs. Development of customized' prescriptions.

Improving rational drug development.

Reducing the overall cost of healthcare.

on current methodologies now available.

based on genomic/proteomic information.

early detection, while tumors <5 mm are normally not detected.

Minimizing, or in some cases eliminating adverse events.

Complex Biological Samples – Implications for Personalized Medicine 39

TOF/TOF analyses, a combination of 14 biomarkers that can identify breast cancer patients from controls, with a specificity of 67%. It is unlikely at this stage that this panel or signature will entirely replace imaging diagnostics, but does have the potential to aid current diagnostic approaches, particularly when cancer survival rates are greatly improved with

One of the fundamental problems of assigning 'biomarker' status for a protein found to be differentially expressed in a disease is the overlap of these differentially expressed proteins across different diseases. A number of proteins have been implicated across a number of different diseases, making the notion of a single biomarker to indicate a specific disease more difficult. For example, serum amyloid A has been proposed as a prognostic marker for melanoma (Findeisen et al., 2009), breast cancer (Schaub et al., 2009), atherothrombotic ischemic stroke (Brea et a., 2009). Potentially, for greater confidence in disease diagnosis or prognosis, it may be required that a suite of biomarkers, be needed to provide greater

The significance of the future development of personalized medicine is far reaching, and

Improving drug R&D and the approval of new drugs - better designed clinical trials

Screening and monitoring certain diseases e.g. advanced diagnosis before disease

If the concept of routine personalized medicine is to become a reality in the future, the development of new proteomic techniques and methodologies will be vital, and will build

Altman, R. (2007). PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. *Nature Genetics*, Vol.39, pp. 426, ISSN 1061-4036 Anderson, N.L.; Anderson, N.G.; Haines, L.R.; Hardie, D.B.; Olafson, R.W.; & Pearson T.W.

Anderson, N.L.; Jackson, A.; Smith, D.; Hardie, D.; Borchers, C. & Pearson, T.W. (2009)

Device. *Molecular & Cellular Proteomics,* Vol.8, pp.995-1005, ISSN 1535-9476 Banerji, U. (2009). Heat shock protein 90 as a drug target: some like it hot. *Clinical Cancer* 

Butler, G.S. & Overall, C.M. (2009). Proteomic Identification of Multitasking Proteins in

*Proteome Research,* Vol.3, pp. 235-244, ISSN 1535-3893

*Research,* Vol.15, pp. 9–14, ISSN 1078-0432

Vol.8, pp. 935-948, ISSN 1474-1776

(2004). Mass Spectrometric Quantitation of Peptides and Proteins Using Stable Isotope Standards and Capture by Anti-peptide Antibodies (SISCAPA) *Journal of* 

SISCAPA Peptide Enrichment on Magnetic Beads Using an In-line Bead Trap

Unexpected Locations Complicates Drug Targeting *Nature Reviews Drug Discovery,* 

Using model systems such as mice, researchers have shown changes in the plasma proteome prior to any clinical evidence of breast cancer (Pitteri et al., 2011). A separate study in humans (Li, 2011) also suggests that it may be possible to observe proteome plasma changes prior to diagnosis. Plasma proteomics has also been used in the search for pre-diagnostic markers in other diseases such as coronary heart disease (Prentice et al., 2010). The ability to identify proteome changes prior to manifestation of disease phenotypically, will potentially improve patient outcome, particularly for those diseases such as cancer where early diagnosis strongly correlates with survival rates.

#### **3. Conclusion**

The ability to define proteomic "signatures"' for individuals will vastly enhance the ability of the medical community to diagnose and treat diseases, as well as potentially identify disease before symptoms appear. Early treatment will in turn prolong life, as well as potentially address healthcare costs through the application of more refined and defined therapies suitable for individual patients. The heterogeneity of some diseases such as breast cancer, in which specific proteins, e.g. progesterone receptor, estrogene receptor and HER-2 may or may not be expressed, make it difficult to broadly treat patients, as a 'one size-fits all' approach does not always apply, and emphasizes the need for individualized and personalized medicine. By examining the proteome, it is possible to gain a better understanding of the heterogeneity present in an individual and potentially can help determine best choice of therapies, as well as indicate disease status and progression. Given the complication of genetic factors and environmental influences on an individual, personalized medicine strategies will require complementation of proteomic data with other areas and strategies for analysis and compile this information to determine diagnostic approaches and tailor therapeutic strategies for the individual.

As yet, despite the excitement of biomarker discovery, and the vast number of publications claiming detection of biomarkers for a specific disease, the majority of candidate biomarkers are yet to be validated or used in clinical settings. However, once candidate biomarkers are confirmed, the emphasis will be on high-throughput approaches to expand analyses to greater numbers of samples. Clearly the proteomic tools available to detect and characterize samples, particularly in a high-throughput quantitative fashion are now a reality. Thus, personalized medicine is not far off the horizon. We anticipate a new era of therapeutic approaches and more refined medicinal treatments for diseases which will be more targeted and precise, not just for the disease, but for the individual, based on establishment of "proteomic fingerprints". In addition to greater confidence in diagnoses, proteome signatures would allow a more individualized and targeted approach to therapy. Potentially, such signatures may also provide better insight into future recurrences of the disease.

Besides the quest for discovery, research and development of new and unique biomarkers, other facets of biomarker research incorporate aims such as improving reliability, increasing the speed of detection and reducing the amount of sample needed for analysis. However, the search for biomarkers is particularly important for those diseases such as breast cancer, for which there are no current clinical biomarkers available, and for which mortality is tightly related to disease stage in the initial surgery (Bohm et al., 2011). Using proteomics however, a biomarker signature for non-metastatic breast cancer has been uncovered. This study (Bohm et al., 2011) found using serum samples and SELDI-TOF and MALDI-

TOF/TOF analyses, a combination of 14 biomarkers that can identify breast cancer patients from controls, with a specificity of 67%. It is unlikely at this stage that this panel or signature will entirely replace imaging diagnostics, but does have the potential to aid current diagnostic approaches, particularly when cancer survival rates are greatly improved with early detection, while tumors <5 mm are normally not detected.

One of the fundamental problems of assigning 'biomarker' status for a protein found to be differentially expressed in a disease is the overlap of these differentially expressed proteins across different diseases. A number of proteins have been implicated across a number of different diseases, making the notion of a single biomarker to indicate a specific disease more difficult. For example, serum amyloid A has been proposed as a prognostic marker for melanoma (Findeisen et al., 2009), breast cancer (Schaub et al., 2009), atherothrombotic ischemic stroke (Brea et a., 2009). Potentially, for greater confidence in disease diagnosis or prognosis, it may be required that a suite of biomarkers, be needed to provide greater specificity and confidence.

The significance of the future development of personalized medicine is far reaching, and will allow/facilitate the following:


If the concept of routine personalized medicine is to become a reality in the future, the development of new proteomic techniques and methodologies will be vital, and will build on current methodologies now available.

#### **4. References**

38 Integrative Proteomics

Using model systems such as mice, researchers have shown changes in the plasma proteome prior to any clinical evidence of breast cancer (Pitteri et al., 2011). A separate study in humans (Li, 2011) also suggests that it may be possible to observe proteome plasma changes prior to diagnosis. Plasma proteomics has also been used in the search for pre-diagnostic markers in other diseases such as coronary heart disease (Prentice et al., 2010). The ability to identify proteome changes prior to manifestation of disease phenotypically, will potentially improve patient outcome, particularly for those diseases such as cancer where early

The ability to define proteomic "signatures"' for individuals will vastly enhance the ability of the medical community to diagnose and treat diseases, as well as potentially identify disease before symptoms appear. Early treatment will in turn prolong life, as well as potentially address healthcare costs through the application of more refined and defined therapies suitable for individual patients. The heterogeneity of some diseases such as breast cancer, in which specific proteins, e.g. progesterone receptor, estrogene receptor and HER-2 may or may not be expressed, make it difficult to broadly treat patients, as a 'one size-fits all' approach does not always apply, and emphasizes the need for individualized and personalized medicine. By examining the proteome, it is possible to gain a better understanding of the heterogeneity present in an individual and potentially can help determine best choice of therapies, as well as indicate disease status and progression. Given the complication of genetic factors and environmental influences on an individual, personalized medicine strategies will require complementation of proteomic data with other areas and strategies for analysis and compile this information to determine diagnostic

As yet, despite the excitement of biomarker discovery, and the vast number of publications claiming detection of biomarkers for a specific disease, the majority of candidate biomarkers are yet to be validated or used in clinical settings. However, once candidate biomarkers are confirmed, the emphasis will be on high-throughput approaches to expand analyses to greater numbers of samples. Clearly the proteomic tools available to detect and characterize samples, particularly in a high-throughput quantitative fashion are now a reality. Thus, personalized medicine is not far off the horizon. We anticipate a new era of therapeutic approaches and more refined medicinal treatments for diseases which will be more targeted and precise, not just for the disease, but for the individual, based on establishment of "proteomic fingerprints". In addition to greater confidence in diagnoses, proteome signatures would allow a more individualized and targeted approach to therapy. Potentially, such signatures may also provide better insight into future recurrences of the

Besides the quest for discovery, research and development of new and unique biomarkers, other facets of biomarker research incorporate aims such as improving reliability, increasing the speed of detection and reducing the amount of sample needed for analysis. However, the search for biomarkers is particularly important for those diseases such as breast cancer, for which there are no current clinical biomarkers available, and for which mortality is tightly related to disease stage in the initial surgery (Bohm et al., 2011). Using proteomics however, a biomarker signature for non-metastatic breast cancer has been uncovered. This study (Bohm et al., 2011) found using serum samples and SELDI-TOF and MALDI-

diagnosis strongly correlates with survival rates.

approaches and tailor therapeutic strategies for the individual.

**3. Conclusion** 

disease.

Altman, R. (2007). PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. *Nature Genetics*, Vol.39, pp. 426, ISSN 1061-4036


Evolution of Proteomic Methods for Analysis of

1776

9147

1535-3893

pp. 785-798, ISSN 1754-8403

ISSN 1471-0056

Vol.1178, pp. 29-46, ISSN 0077-8923

Complex Biological Samples – Implications for Personalized Medicine 41

Krause-Heuer, A.M.; Grant, M.P.; Orkey, N. & Aldrich-Wright JR. (2008). Drug Delivery

Krejsa, C.; Rogge, M. & Sadee, W. (2006). Protein therapeutics: new applications for

Kuhn, E.; Addona, T.; Keshishian, H.; Burgess, M.; Mani, D.R.; Lee, R.T.; Sabatine, M.S.;

Lee, S.; Terry, D.; Hurst, D.R.; Welch, D.R.; & Sang, Q-X.A. (2011). Protein signatures in

Legrain, P.; Aebersold, R.; Archakov, A.; Bairoch A.; Bala, K.; Beretta, L.; Bergeron, J.;

Li, C.I (2011). Discovery and Validation of Breast Cancer Early Detection Biomarkers in Preclinical Samples. *Hormones and Cancer,* Vol.2, pp. 125-131, ISSN 1868-8497 Marko-Varga, G.; Ogiwara, A.; Nishimura, T.; Kawamura, T.; Fujii, K.; Kawakami, K.;

Matigian, N.; Abrahamsen, G.; Sutharsan, R.; Cook, A.; Vitale, A.; Nouwens, A.; Bellette, B.;

Mattick, J.S. (2009). Deconstructing the dogma: a new view of the evolution and genetic

Mercer, T.R.; Dinger, M.E.; Sunkin, S.M.; Mehler, M.F. & Mattick, J.S. (2009). Long non-

Miki, Y.; Swensen, J.; Shattuck-Eidens, D.; Futreal, P.A.; Harshman, K,; Tavtigian S.; Liu, Q.;

*and Cellular Proteomics,* Vol.10, M111.009993 ISSN 1535-9476

*Journal of Chemistry,* Vol.61, pp. 675-81, ISSN 0004-9425

*Cancer,* Vol.2, pp. 165-176, ISSN 1837-9664

Devices and Targeting Agents for Platinum(ii) Anticancer Complexes. *Australian* 

pharmacogenetics. *Nature Reviews Drug Discovery,* Vol.5, pp. 507-521, ISSN 1474-

Gerszten, R.E.; Carr, S.A. (2009). Developing Multiplexed Assays for Troponin I and Interleukin-33 in Plasma by Peptide Immunoaffinity Enrichment and Targeted Mass Spectrometry. *Clinical Chemistry,* Vol.55, pp. 1108-1117, ISSN 0009-

human MDA-MB-231 breast cancer cells indicating a more invasive phenotype following knockdown of human endometase / matrilysin-2 by siRNA. *Journal of* 

Borchers, C.H.; Corthals, G.L.; Costello, C.E.; Deutsch, E.W.; Domon, B.; Hancock, W.; He, F.; Hochstrasser, D.; Marko-Varga, G.; Salekdeh, G.H.; Sechi, S.; Snyder, M.; Srivastava, S.; Uhlén, M.; Wu, C.H.; Yamamoto, T.; Paik, Y.K. & Omenn, G.S. (2011) The human proteome project: current state and future direction. *Molecular* 

Kyono, Y.; Tu, H.; Anyoji, H.; Kanazawa, M.; Akimoto, S.; Hirano, T.; Tsuboi, M.; Nishio, K.; Hada, S.; Jiang, H.; Fukuoka, M.; Nakata, K.; Nishiwaki, Y.;Kunito, H.; Peers, I.; Harbron, C.; South, M.; Tim Higenbottam, T.; Nyberg, F.; Kudoh, S and I and Kato, H. (2007). Personalized Medicine and Proteomics: Lessons from Non-Small Cell Lung Cancer. *Journal of Proteome Research,* Vol.6, pp. 2925-2935, ISSN

An, J.; Anderson, M.; Beckhouse, A.; Bennebroek, M.; Cecil, R.; Chalk, A.; Cochrane, J.; Fan, Y.; Féron, F.; McCurdy, R.; McGrath, J.; Murrell, M.; Perry, C.; Raju, J.; Ravishankar, S.; Silburn, P.; Sutherland, G.; Mahler, S.; Mellick G.; Wood, S.; Sue, C.; Wells, C. & Mackay-Sim, A. (2010). Disease-specific, neurospherederived cells as models for brain disorders. *Disease Models and Mechanisms,* Vol.3,

programming of complex organisms. *Annals of the New York Academy of Sciences,*

coding RNAs: insights into functions. *Nature Reviews Genetics,* Vol.10, pp. 155-159,

Cochran, C.; Bennett, L.M.; Ding, W.; et al., (1994). A strong candidate for the breast


Bellei, E.; Bergamini, S.; Monari, E.;Fantoni, L.I.; Cuoghi, A.; Ozben, T.; & Tomasi, A.

Bohm, D.; Keller, K.; Wehrwein, N.; Lebrecht, A.; Schmidt, M.; Kolbl, H. & Grus, F.-H.

biomarker profile. *Oncology Reports,* Vol.26, pp. 1051-1056, ISSN 1021-335X Brea, D.; Sobrino, T.; Blanco, M.; Fraga, M.; Agulla, J.; Rodriguez-Yanez, M.; Rodriguez-

Buchen, L. (2011). Missing the Mark: Why Is It So Hard to Find a Test to Predict Cancer

Butler, G.S. & Overall, C.M. (2009). Proteomic Identification of Multitasking Proteins in

Castagna, A.; Cecconi, D.; Sennels, L.; Rappsilber, J.; Guerrier, L.; Fortis, F.; Boschetti, E.;

Chakravati, A. (2001). To a Future of Genetic Medicine (2001) *Nature,* Vol.409, pp. 822-823,

Cordwell, S.J.; Nouwens, A.S.; Verrills, N.M.; Basseal, D.J.; & Walsh, B.J (2000). Sub-

Dias, W.B. & Hart, G.W. (2007) *O*-GlcNAc modification in diabetes and Alzheimer's disease.

Filiou, M.; Turck, C. & Martins-de-Souza, M. (2011). Quantitative proteomics for

Granger, J.; Siddiqui, J.; Copeland, S.; & Remick, D. (2005). Albumin depletion of human

Guerrier, L.; Righetti, P.G.; & Boschetti, E. (2008). Reduction of Dynamic Protein

HUPO Views (2010). A Gene Centric Human Proteome Project. *Molecular and Cellular* 

Jain, R. (2004). Role of Pharmacogenomics in the Development of Personalized Medicine

ISSN 0939-4451

1535-3893

ISSN 0028-0836

ISSN 1862-8354

890, ISSN 1750-2799

pp. 561-567, ISSN 0021-9150

Vol.8, pp. 935-948, ISSN 1474-1776

Vol. 21, pp. 1094-1103, ISSN 0173-0835

Vol.5, pp. 4713-4718, ISSN 1615-9853

*Proteomics,* Vol.9, pp. 427-429, ISSN 1535-9476

*Pharmacogenomics*, Vol.5, pp. 331-336, ISSN 1462-2416

*Molecular Biosystems,* Vol.3, pp. 766-772, ISSN 1742-206X

*Nature,* Vol.471, pp. 428-432, ISSN 0028-0836

(2010). High-abundance proteins depletion from serum proteomic analysis: concomitant removal of non-targeted proteins. *Amino Acids* Vol.40, pp.145-156

(2011). Serum proteome profiling of primary breast cancer indicates a specific

Gonzalez, R.; Perez de la Ossa, N.; Leira, R.; Forteza, J.; Davalos, A. & Castillo, J. (2009) Usefulness of haptoglobin and serum amyloid A proteins as biomarkers for atherothrombotic ischemic stroke diagnosis confirmation. *Atherosclerosis,* Vol.205,

Unexpected Locations Complicates Drug Targeting *Nature Reviews Drug Discovery,* 

Lomas, L.; & Righetti, P.G. (2005). Exploring the Hidden Human Urinary Proteome via Ligand Library Beads. *Journal of Proteome Research,* Vol.4, pp. 1917-1930, ISSN

proteomics based upon protein cellular location and relative solubilities in conjuction with composite two-dimensional electrophoresis gels. *Electrophoresis,*

investigating psychiatric disorders. *Proteomics Clinical Applications,* Vol.5, pp. 38-49,

plasma also removes low abundance proteins including the cytokines. *Proteomics*

Concentration Range of Biological Extracts for the Discovery of Low-Abundance Proteins by Means of Hexapeptide Ligand Library. *Nature Protocols,* Vol.3, pp.883-


Evolution of Proteomic Methods for Analysis of

1535-9476

ISSN 0022-1759

*Biology*, Vol.12, pp. 72-77, ISSN 1367-5931

Vol.1, pp. 377-396, ISSN 1615-9853

9476

61-65, ISSN 0733-222X

*Drug Discovery,* Vol.5, pp. 38-49, ISSN 1474-1776

*Communications,* Vol.412, pp. 127-131, ISSN 0006-291X

*Cellular Proteomics,* Vol.8.9, pp. 2051-2062, ISSN 1535-9476

genome Projects. *Nature,* Vol.464, pp. 993-998, ISSN 0028-0836

Efficiency. *Journal of Proteome Research*, Vol.9, pp. 4045-4052

Complex Biological Samples – Implications for Personalized Medicine 43

Schoenherr, R.M.; Zhao, L.; Whiteaker, J.R.; Feng, L.-C.; Li, L.; Liu, L.; Liu, X. & Paulovich,

Simpson, R.J.; Berhhard, O.K.; Greening, D.W. & Moritz, R.L. (2008). Proteomics-driven

Semaan, S.M. ; Wang, X. ; Stewart, P.A.; Marshall, A.G. & Sang, Q.X.A (2011). Differential

Sherman, J.; McKay, M.; Ashman, K. & Molloy, M.P. (2009). Unique Ion Signature Mass

The International Cancer Genome Consortium (2010). International Network of Cancer

Thingholm, T.E.; Palmisano, G.; Kjeldsen, F. and Larsen, M.R. (2010). Undesirable Charge-

Tonge, R.; Shaw, J.; Middleton, B.; Rowlinson, R.; Rayner, S.; Young, J,; Pognan, F.; Hawkins,

Walsh G. (2010). Biopharmaceutical Benchmarks. *Nature Biotechnology,* Vol.28, pp. 917-924. Weinshilbaum, R. & Wang, L. (2004). Pharmacogenomics: Bench to Bedside. *Nature Reviews* 

Whiteaker, J.R.; Zhao, L.; Anderson, L. & Paulovich, A.G. (2010). An Automated and

Wilkins, M.R.; Pasquali, C.; Appel, R.D,; Ou, K,; Golaz, O.; Sanchez, J.-C.; Yan, J.X.; Gooley,

Wooster, R.; Bignell, G.;, Lancaster J, Swift S, Seal S, Mangion J, Collins N, Gregory S,

Zanivan, S.; Krueger, M. & Mann, M. (2012). In vivo Quantitative Proteomics: The SILAC mouse. *Methods in Molecular Biology,* Vol.757, pp. 435-450, ISSN 1064-3745

gene BRCA2. *Nature,* Vol.375, pp. 789-792, ISSN 0028-0836

tagging reagents. *Molecular and Cellular Proteomics,* Vol.3, pp. 1154-1169, ISSN

A.G. (2010) Automated Screening of Monoclonal Antibodies for SISCAPA Assays using a Magnetic Bead Processor and Liquid Chromatography-selected Reaction Monitoring-mass Spectrometry. *Journal of Immunological Methods,* Vol.353, pp. 40-61,

cancer biomarker discovery: looking to the future. *Current Opinion in Chemical* 

phosphopeptide expression in a benign breast tissue, and triple-negative primary and metastatic breast cancer tissues from the same African-American woman by LC-LTQ/FT-ICR mass spectrometry. *Biochemical and Biophysical Research* 

Spectrometry, a Deterministic Method to Assign Peptide Identity. *Molecular and* 

Enhancement of Isobaric Tagged Phosphopeptides Leads to Reduced Identification

E.; Currie, I. & Davison, M. (2001). Validation and development of fluorescence two-dimensional differential gel electrophoresis proteomics technology. *Proteomics,*

Multiplexed Method for High Throughput Peptide Immunoaffinity Enrichment and Multiple Reaction Monitoring Mass Spectrometry-based Quantification of Protein Biomarkers. *Molecular and Cellular Proteomics,* Vol.9, pp. 184-196, ISSN 1535-

A.A.; Hughes, G.; Humphery-Smith, I,; Williams, K.L. & Hochstrasser, D.F. (1996). From Proteins to Proteomes: Large Scale Protein Identification by Two-Dimensional Electrophoresis and Amino Acid Analysis. *Biotechnology,* Vol.14, pp.

Gumbs C, Micklem G. et al., (1995). Identification of the breast cancer susceptibility

and ovarian cancer susceptibility gene BRCA1. *Science,* Vol.266, pp. 66-71, ISSN 0036-8075


Mozafari, M.R.; Pardakhty, A.; Azarmi, S.; Jazayeri, J.A.; Nokhodchi, A. & Omri, A. (2009).

O'Farrell, P.H. (1975). High resolution two-dimensional electrphoresis of proteins. *Journal of* 

Park, J.Y.; Kim, S.-A.; Chung, J.W.; Bang, S.; Park, S.W.; Paik, Y.K. & Song, S.Y. (2011).

Pastwal, E.; Somiari, S.; Czyz, M. & Somiari, R. (2007). Proteomics in human cancer research.

Patterson, S.; Van Eyk, J. & Banks, R. (2010). Report from the Wellcome Trust/EBI

Pitteri, S.J,; Kelly-Spratt, K.S.; Gurley, K.E,; Kennedy, J.; Busald Buson, T.; Chin, A.; Wang,

Prentice, R.L.; Paczesny, S.J.; Aragaki, A.; Amon, L.M.; Chen, L.; Pitteri, S.J.; McIntosh, M.;

Rabilloud, T. (2002). Two-dimensional gel electrophoresis in proteomics : Old, old

Remily-Wood, E.; Liu, R.; Xiang, Y.; Chen, Y.; Thomas, C.; Rajyaguru1, N.; Kaufman, L.;

Ross. P.L.; Huang, Y.N.; Marchese, J.N.; Williamson, B.; Parker, K.; Hattan, S.;

*Proteomics Clinical Applications,* Vol.1, pp. 4-17, ISSN 1862-8354

0036-8075

Vol.19, pp. 310-21, ISSN 0898-2104

1238, ISSN 0171-5216

ISSN 0958-1669

994X

9853

pp. 383-396, ISSN 1862-8354

*Biological Chemistry,* Vol.250, pp. 4007-4021

*Research,* Vol.71, pp. 5090-5100, ISSN 0008-5472

and ovarian cancer susceptibility gene BRCA1. *Science,* Vol.266, pp. 66-71, ISSN

Role of nanocarrier systems in cancer nanotherapy. *Journal of Liposome Research,* 

Proteomic analysis of pancreatic juice for the identification of biomarkers of pancreatic cancer. *Journal Cancer Research and Clinical Oncology,* Vol.137, pp. 1229-

''Perspectives in Clinical Proteomics'' retreat – A Strategy to Implement Next-Generation Proteomic Analyses to the Clinic for Patient Benefit: Pathway to Translation. *Proteomics Clinical Applications,* Vol.4, pp. 883-887, ISSN 1862-8354 Patton, W.F.; Schulenberg, B. & Steinberg, T.H. (2002) Two-dimensional gel electrophoresis:

better than a poke in the ICAT? *Current Opinion in Biotechnology,* Vol.13, pp.321-328,

H.; Zhang, Q.; Wong, C.-H.; Chodosh, L.A ; Nelson, P.S.; Hanash, S.M.; & Kemp, C.J. (2011). Tumor Microenvironment-Derived Proteins Dominate the Plasma Proteome Response during Breast Cancer Induction and Progression. *Cancer* 

Wang, P. ; Buson Busald, T,; Hsia, J. ; Jackson, R.D. ; Rossouw, J.E. ; Manson, J.E. ; Johnson, K.; Eaton, C. & Hanash, S.M. (2010). Novel proteins associated with risk for coronary heart disease or stroke among postmenopausal women identified by in-depth plasma proteome profiling. *Genome Medicine,* Vol.2, pp. 48-60, ISSN 1756-

fashioned, but it still climbs up the mountains. *Proteomics* Vol.2 pp. 3-10, ISSN 1615-

Ochoa, J.; Hazlehurst, L.; Pinilla-Ibarz, J.; Lancet, J.; Zhang, G.; Haura, E.; Shibata, D.; Yeatman, T.; Smalley, K.; Dalton, W.; Huang, E.; Scott, E.; Bloom, G.; Eschrich, S. & Koomen, J. (2011). A database of reaction monitoring mass spectrometry assays for elucidating therapeutic response in cancer. *Proteomics Clinical Applications,* Vol.5,

Khainovski, N.; Pillai, S.; Dey, S.; Daniels, S.; Purkayastha, S.; Juhasz, P.; Martin, S.; Bartlet-Jones, M.; He, F.; Jacobson, A. & Pappin, D.J. (2004). Multiplexed Protein Quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. *Molecular and Cellular Proteomics,* Vol.3, pp. 1154-1169, ISSN 1535-9476


**Part 2** 

**Sample Preparation** 

Zhan, X. & Desiderio, D.M. (2010). The use of variations in proteomes to predict, prevent, and personalize treatment for clinically nonfunctional pituitary adenomas. *EPMA Journal,* Vol.1, pp. 439-459, ISSN 1878-5077
