**7. Alternatives to EMSA**

There are several alternatives to EMSA used in the analysis of nucleic acid-protein interactions with its own advantages and disadvantages when compared to EMSA.

### **7.1 Footprinting**

Footprinting is essentially a protection assay used to characterize the binding site recognized by a given protein. It relies on the fact that a protein bound to the nucleic acid will protect it and interfere with the modification of the sequence it is bound to. The modification can be chemical or enzymatic and it is usually the endonuclease cleavage of radioisotope-labeled nucleic acid previously mixed with the protein(s) of interest. After cleavage the resulting ladder is analyzed on denaturing polyacrylamide gel and visualized by autoradiography. The gaps in the ladder are indicative of sites protected by the protein or proteins in the mixture (reviewed by Hampshire et al., 2007). This method was originally developed to characterize sequence selectivity but it is also helpful in estimating the binding strength through a footprinting reaction over a range of protein concentrations. For slow binding reactions footprinting can also be applied to assess the reaction kinetics estimating the association and dissociation rates. Although it is a widely used method, there are other approaches that provide higher throughput as the ones described ahead.

A variant on DNA footprinting is the *in vivo* approach, a technique that enables the detection of DNA-protein interactions as they occur in the cell. *In vivo* footprinting also relies on the fact that the bound protein protects the nucleic acid, at its binding site, from cleavage by endonucleases or modification by a chemical agent. The difference is that the cleavage of DNA is carried out within the nucleus following the *in vivo* binding of the proteins to chromatin. Footprints and endonuclease hypersensitive sites that are due to deformations of DNA in chromatin can be detected by this *in vivo* method. This method has been coupled with deep sequencing to identify DNaseI hypersensitive sites in the genome of different cell lines. It enabled the precise identification of a large number of specific cisregulatory protein binding events with a single experiment (Boyle et al., 2011). Accordingly, the data obtained by this procedure may be more significant and representative of true events when compared with data obtained by the previously described *in vitro* footprinting.

#### **7.2 Nitrocellulose filter binding**

220 Gel Electrophoresis – Advanced Techniques

other nucleic acid probes making them quite versatile methods to identify nucleic acid-

Chernov and collaborators have developed a similar protocol with two dimensions but instead of aiming to identify the interacting protein(s) it characterizes and maps the specific protein target sites in regions of the human genome (Chernov et al., 2006). This approach is also based on first separating the complexes from the free nucleic acid in a non-denaturing gel and afterwards separating it under denaturing conditions (Vetchinova et al., 2006). The group used a pool of radioisotope-labeled short DNA sequences covering the genome region of interest and mixed it with a nuclear extract from a specific cell line. The formed complexes were separated in a non-denaturing one-dimensional standard EMSA. The complexes were localized by autoradiography and the gel strip containing them was excised and treated with a denaturing agent, SDS, to disrupt the preformed complexes. The strip is then loaded onto the second-dimension denaturing gel and another electrophoresis is performed. The gel is autoradiographed to determine the location of the freed DNAs, which are afterwards cut from the gel to be analyzed. By pairing this method with highthroughput sequencing the authors were able to identify a multitude of specific protein

A three dimensional approach has very recently emerged to purify nucleic acid binding proteins from complexes separated by EMSA (Jiang et al., 2011). This method focuses on recovering the protein in high yield for subsequent analysis and has been developed to study low abundant transcription factors. In this EMSA-based purification procedure the complexes formed are extracted after a native PAGE retardation assay and applied to twodimensional electrophoresis, isoelectric focusing and SDS-PAGE. The EMSA conditions are systematically optimized to reduce non-specific binding and increase protein yield. After the three electrophoreses the sample can then be electrotransfered onto a nitrocellulose or polyvinylidene difluoride membrane for southwestern and western blotting analysis to further characterize the complexes. Spots of interest can be cut from the gel or the

There are several alternatives to EMSA used in the analysis of nucleic acid-protein

Footprinting is essentially a protection assay used to characterize the binding site recognized by a given protein. It relies on the fact that a protein bound to the nucleic acid will protect it and interfere with the modification of the sequence it is bound to. The modification can be chemical or enzymatic and it is usually the endonuclease cleavage of radioisotope-labeled nucleic acid previously mixed with the protein(s) of interest. After cleavage the resulting ladder is analyzed on denaturing polyacrylamide gel and visualized by autoradiography. The gaps in the ladder are indicative of sites protected by the protein or proteins in the mixture (reviewed by Hampshire et al., 2007). This method was originally

interactions with its own advantages and disadvantages when compared to EMSA.

interacting proteins.

binding sites within a given genomic region.

**6.5 EMSA-three-dimensional-electrophoresis (EMSA-3DE)** 

membrane for protein identification by mass spectrometry.

**7. Alternatives to EMSA** 

**7.1 Footprinting** 

Nitrocellulose filter binding assays were developed in the 70s as a rapid enough method to allow kinetic as well as equilibrium studies of DNA-protein interactions (Riggs et al., 1968 and Riggs et al., 1970 as cited in Helwa & Hoheisel**,** 2010). The manipulation required is rapid enough to allow such measurements. The assay is based on the premise that proteins can bind to nitrocellulose without losing the ability to bind DNA. After the binding reaction the mixture is separated by electrophoresis and then blotted onto a nitrocellulose membrane. Only protein bound DNA remains on the membrane as the free double-stranded DNA will not be retained on nitrocellulose. The amount of DNA on the membrane can be quantified by measuring the label on the nucleic acid. However, this method has its limitations such as the fact that the proteins involved are not identified or the proportion in which they bind DNA. It also provides no information on the DNA sequence the protein interacts with unless well defined nucleic acid fragments are used and is limited to double stranded DNA as single stranded DNA can bind to nitrocellulose under certain conditions resulting in undesirable background.

#### **7.3 Microfluidic mobility shift assay (MMSA)**

The capillary microfluidic mobility shift assay (MMSA) is a method that uses fluorescencebased multi-well capillary electrophoresis to characterize protein-nucleic acid interactions. For example, it has been used effectively in characterizing RNA-protein binding in a study of the interaction between human immunodeficiency virus 1 transactivator of transcription and the transactivation-responsive RNA (Fourtounis et al., 2011). This technique requires only nanoliter amounts of sample that are introduced into microscopic channels and separated by pressure-driven flow and application of a potential difference. The free molecules or complexes are visualized by LED-induced fluorescence, discarding the need for hazardous radiolabeling. With the ability to perform 384-well screening this method has an increased capacity over regular EMSA to be compatible with high-throughput screenings.

Electrophoretic Mobility Shift Assay: Analyzing Protein – Nucleic Acid Interactions 223

proteins that bind directly to the genomic DNA and those that only interact with other

The Systematic Evolution of Ligands by Exponential Enrichment (SELEX) is a well established method that enables the selection of enriched sequences from a random library that bind recombinant proteins. This procedure starts with the synthesis of the oligonucleotide library and then incubating the generated sequences with the putative interacting protein(s). The sequences that bind are eluted, amplified by PCR and subjected to more rounds of selection with increasing stringency conditions. This allows the identification of the tightest-binding sequences. It is a widely used approach to obtain transcription factors binding motifs as it requires low amounts of purified proteins (Matys et al., 2006). This approach becomes very complicated to use when large numbers of nucleic acid-binding proteins are analyzed as it then requires multiple rounds of selection. Another limitation is the fact that it is aimed at the identification of the best binding DNA targets *in* 

A protein microarray is a method that allows high-throughput analysis in which labeled nucleic acids are queried against proteins immobilized on a chip (reviewed by Hu et al., 2011). In a functional protein microarray, thousands of purified recombinant proteins can be immobilized in a glass slide in discrete locations forming a high-density protein matrix, providing a flexible platform to characterize different protein activities. It is a very versatile method as it can perform a semi-quantitative analysis of protein binding to a wide range of molecules (nucleic acids, other proteins, antibodies, lipids, glycans…). In theory, it is feasible to print arrays of all the annotated proteins of a given organism originating a whole proteome microarray. However, it implies the expression and purification of each individual protein and several conditions need to be optimized to render the proteins apt for this method. Since the protein is immobilized it is crucial to guarantee that its structural

Nucleic acid microarrays can also be used for a direct analysis of protein-nucleic acid interactions. In this case it is the nucleic acid that is immobilized and not the protein. Nucleic acid chips are a powerful and versatile tool in biological research. They consist of high-density arrays of oligonucleotides or complementary DNA that can cover a whole genome (reviewed by Stoughton, 2005). For protein-interaction studies, the protein(s) of interest is expressed usually with an epitope tag, and purified. The tag serves two purposes; it helps to isolate the protein through affinity purification, and allows detection by an epitope-specific reporter antibody. After incubation of the protein with the nucleic acid chip

RNA immunoprecipitation and chip hybridization (RIP) is a protocol very similar to ChIPchip except that it targets RNA-protein interactions rather than DNA-protein (Keene et al.,

*vitro* and does not allow the characterization of the exact *in vivo* selectivity.

integrity remains intact especially the binding domains that are to be studied.

the signal intensities at the several array spots can be measured.

**7.9 Ribonucleoprotein Immunoprecipitation – Microarray (RIP-chip)** 

proteins that do bind.

**7.7 Protein microarray** 

**7.8 Nucleic acid microarrays** 

**7.6 SELEX** 

### **7.4 Yeast hybrid systems**

The yeast one-hybrid is an approach used to identify proteins that bind a given nucleic acid sequence as opposed to the methods that are suited to identify the nucleic acid sequences preferably recognized by a known protein. The protocol is based on a hybrid prey protein fused to a transcription activation domain that allows the expression of a reporter gene when the prey protein interacts with the DNA bait (reviewed by Deplancke et al., 2004). This method allows for a proteome-scale analysis depending on the prey protein library but only detects monomers that bind the target nucleic acid. Although it is an *in vivo* approach it is performed in yeast (*Saccharomyces cerevisiae*), which may not be the endogenous context, and is limited to DNA-protein interactions.

RNA-protein interactions can be studied with a yeast three-hybrid system that involves the expression in yeast cells of not one but three chimerical molecules, which assemble in order to activate two reporter genes (Kraemer et al., 2000). It represents a modification of the yeast two-hybrid system, widely used to identify protein-protein interactions, that was designed to allow high sensitivity *in vivo* detection of RNA-protein interactions. The yeast threehybrid system includes: a fusion protein consisting of a DNA binding protein and a RNAbinding protein; a hybrid protein consisting of a transcription activating domain and a peptide thought to interact with a particular RNA; a RNA intermediate that promotes the interaction of the two hybrid proteins, this RNA includes the RNA that interacts with the system's RNA-binding protein and the RNA molecule to be investigated. The successful interaction of these 3 components allows the reconstitution of a transcription factor and subsequent activation of reporter genes (Hook et al., 2005 and Wurster & Maher, 2010)

#### **7.5 ChiP assays**

Chromatin immunoprecipitation (ChiP) is a commonly used method to study DNA-binding proteins *in vivo* and a standard method for the identification of transcription binding sites and histone modification locations (reviewed by Massie & Mills, 2008). In this method a cross-linking agent (e.g. formaldehyde) is added to cells to covalently bind proteins and chromatin that are in direct contact. Afterwards, the cells are lysed and chromosomal DNA is isolated and fragmented. Specific antibodies are used to immunoprecipitate the targeted proteins with the cross-linked DNA. The bound nucleic acid is released by reverting the cross-linking and then analyzed. Classically, the DNA was characterized by polymerase chain reaction (PCR) which required some previous knowledge of the candidate DNA regions. Nowadays, the DNA bound to protein is more commonly characterized through more powerful tools either coupled with microarrays that represent the genome (ChIP-chip) or state-of-the-art high-throughput sequencing (ChIP-seq). The improvements in DNA sequencing technology allow tens of millions of sequence reads, therefore ChIP-seq has a major advantage of increased sensitivity and resolution to add to the fact that it is not limited to predetermined probe sets as ChIP-chip. The major strength of the ChIP-based approaches is that they capture complexes *in vivo* and the binding reactions can be studied under different cellular conditions and at different time points. However it also has important limitations. The method requires high-quality antibodies that are available only for a limited number of proteins. To circumvent this, epitope-tagged proteins could be used although it usually implies the introduction of modified genes into the endogenous locus in order to obtain expression at physiological levels. This method does not distinguish between proteins that bind directly to the genomic DNA and those that only interact with other proteins that do bind.

#### **7.6 SELEX**

222 Gel Electrophoresis – Advanced Techniques

The yeast one-hybrid is an approach used to identify proteins that bind a given nucleic acid sequence as opposed to the methods that are suited to identify the nucleic acid sequences preferably recognized by a known protein. The protocol is based on a hybrid prey protein fused to a transcription activation domain that allows the expression of a reporter gene when the prey protein interacts with the DNA bait (reviewed by Deplancke et al., 2004). This method allows for a proteome-scale analysis depending on the prey protein library but only detects monomers that bind the target nucleic acid. Although it is an *in vivo* approach it is performed in yeast (*Saccharomyces cerevisiae*), which may not be the endogenous context,

RNA-protein interactions can be studied with a yeast three-hybrid system that involves the expression in yeast cells of not one but three chimerical molecules, which assemble in order to activate two reporter genes (Kraemer et al., 2000). It represents a modification of the yeast two-hybrid system, widely used to identify protein-protein interactions, that was designed to allow high sensitivity *in vivo* detection of RNA-protein interactions. The yeast threehybrid system includes: a fusion protein consisting of a DNA binding protein and a RNAbinding protein; a hybrid protein consisting of a transcription activating domain and a peptide thought to interact with a particular RNA; a RNA intermediate that promotes the interaction of the two hybrid proteins, this RNA includes the RNA that interacts with the system's RNA-binding protein and the RNA molecule to be investigated. The successful interaction of these 3 components allows the reconstitution of a transcription factor and subsequent activation of reporter genes (Hook et al., 2005 and Wurster & Maher, 2010)

Chromatin immunoprecipitation (ChiP) is a commonly used method to study DNA-binding proteins *in vivo* and a standard method for the identification of transcription binding sites and histone modification locations (reviewed by Massie & Mills, 2008). In this method a cross-linking agent (e.g. formaldehyde) is added to cells to covalently bind proteins and chromatin that are in direct contact. Afterwards, the cells are lysed and chromosomal DNA is isolated and fragmented. Specific antibodies are used to immunoprecipitate the targeted proteins with the cross-linked DNA. The bound nucleic acid is released by reverting the cross-linking and then analyzed. Classically, the DNA was characterized by polymerase chain reaction (PCR) which required some previous knowledge of the candidate DNA regions. Nowadays, the DNA bound to protein is more commonly characterized through more powerful tools either coupled with microarrays that represent the genome (ChIP-chip) or state-of-the-art high-throughput sequencing (ChIP-seq). The improvements in DNA sequencing technology allow tens of millions of sequence reads, therefore ChIP-seq has a major advantage of increased sensitivity and resolution to add to the fact that it is not limited to predetermined probe sets as ChIP-chip. The major strength of the ChIP-based approaches is that they capture complexes *in vivo* and the binding reactions can be studied under different cellular conditions and at different time points. However it also has important limitations. The method requires high-quality antibodies that are available only for a limited number of proteins. To circumvent this, epitope-tagged proteins could be used although it usually implies the introduction of modified genes into the endogenous locus in order to obtain expression at physiological levels. This method does not distinguish between

**7.4 Yeast hybrid systems** 

**7.5 ChiP assays** 

and is limited to DNA-protein interactions.

The Systematic Evolution of Ligands by Exponential Enrichment (SELEX) is a well established method that enables the selection of enriched sequences from a random library that bind recombinant proteins. This procedure starts with the synthesis of the oligonucleotide library and then incubating the generated sequences with the putative interacting protein(s). The sequences that bind are eluted, amplified by PCR and subjected to more rounds of selection with increasing stringency conditions. This allows the identification of the tightest-binding sequences. It is a widely used approach to obtain transcription factors binding motifs as it requires low amounts of purified proteins (Matys et al., 2006). This approach becomes very complicated to use when large numbers of nucleic acid-binding proteins are analyzed as it then requires multiple rounds of selection. Another limitation is the fact that it is aimed at the identification of the best binding DNA targets *in vitro* and does not allow the characterization of the exact *in vivo* selectivity.

#### **7.7 Protein microarray**

A protein microarray is a method that allows high-throughput analysis in which labeled nucleic acids are queried against proteins immobilized on a chip (reviewed by Hu et al., 2011). In a functional protein microarray, thousands of purified recombinant proteins can be immobilized in a glass slide in discrete locations forming a high-density protein matrix, providing a flexible platform to characterize different protein activities. It is a very versatile method as it can perform a semi-quantitative analysis of protein binding to a wide range of molecules (nucleic acids, other proteins, antibodies, lipids, glycans…). In theory, it is feasible to print arrays of all the annotated proteins of a given organism originating a whole proteome microarray. However, it implies the expression and purification of each individual protein and several conditions need to be optimized to render the proteins apt for this method. Since the protein is immobilized it is crucial to guarantee that its structural integrity remains intact especially the binding domains that are to be studied.

#### **7.8 Nucleic acid microarrays**

Nucleic acid microarrays can also be used for a direct analysis of protein-nucleic acid interactions. In this case it is the nucleic acid that is immobilized and not the protein. Nucleic acid chips are a powerful and versatile tool in biological research. They consist of high-density arrays of oligonucleotides or complementary DNA that can cover a whole genome (reviewed by Stoughton, 2005). For protein-interaction studies, the protein(s) of interest is expressed usually with an epitope tag, and purified. The tag serves two purposes; it helps to isolate the protein through affinity purification, and allows detection by an epitope-specific reporter antibody. After incubation of the protein with the nucleic acid chip the signal intensities at the several array spots can be measured.

### **7.9 Ribonucleoprotein Immunoprecipitation – Microarray (RIP-chip)**

RNA immunoprecipitation and chip hybridization (RIP) is a protocol very similar to ChIPchip except that it targets RNA-protein interactions rather than DNA-protein (Keene et al.,

Electrophoretic Mobility Shift Assay: Analyzing Protein – Nucleic Acid Interactions 225

equipment. Another advantage is that the sequencing instrument can measure multiple fluorescent wavelengths allowing hetero and homodimeric forms to be measured in the

Since the first report, 30 years ago, EMSA became one of the most popular methods for detection and characterization of protein-nucleic acid interactions. Hundreds of protocols have been published accommodating modifications in virtually every parameter influencing the experimental outcome. Improvements were made in all EMSA steps including the methods for preparation of protein samples and purification, synthesis and labeling of nucleic acids, and detection. This allowed enlarging and diversifying the applications of

However, despite the large amount of available literature and protocols trial and error will ultimately be the way to optimize the EMSA conditions for the nucleic acid-protein complex to be analyzed. The guidelines discussed above help to provide an initial protocol adjusted to each study but slight changes may be needed to improve binding and detection of the

In recent years, the use of highthrouhput approaches to detect biologically relevant interactions, including those between proteins and nucleic acids, was reported. Development of these approaches was made possible, at least in part, by the availability of more sensitive and specific equipment and tools. Although EMSA cannot achieve a high

We are grateful to Dr. Cristina Branco for constructive comments. Work in the authors' laboratory is supported by Fundação para a Ciência e Tecnologia (PTDC/SAU-

Adachi, Y.; Chen, W.; Shang, W. & Kamata, T. (2005). Development of a direct and sensitive

Alves, C.; Cheng, H.; Roder, H. & Taylor, J. (2010). Intrinsic disorder and oligomerization of

Apella, E. & Anderson, C. (2001). Post-translational modifications and activation of p53 by genotoxic stresses. *European Journal of Biochesmistry,* Vol.268, No.10, pp. 2764-2772 Baneyx, F. (1999). Recombinant protein expression in Escherichia coli. *Current Opinion in* 

Boyle, A.; Song, L.; Lee, B.; London, D.; Keefe, D.; Birney, E.; Iyer, V.; Crawford, G. & Furey,

T. (2011). High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. *Genome Research,* Vol.21, No.3, pp. 456-464

the hepatitis delta virus antigen. *Virology,* Vol. 370, pp 12-21

detection method for DNA-binding proteins based on electrophoretic mobility shift assay and iodoacetamide derivative labelling. *Analytical Biochemistry,* Vol.342, pp.

throughput level it remains a valuable tool to confirm the detected interactions.

same run, using distinct tags on individual proteins.

EMSA and resulting in a number of variants of the method.

MII/098314/2008). CA is a recipient of a FCT PhD grant.

*Biotechnology*, Vol.10, pp. 411-421

**8. Conclusion** 

complexes.

**9. Acknowledgements** 

**10. References** 

348-351

2006). RIP-chip is an approach that consists on a microarray profiling of RNAs obtained from immunoprecipitated RNA-protein complexes. Genome-wide arrays are used to identify messenger RNAs (mRNAs) that are present in endogenous messenger ribonucleoprotein complexes making it a great tool to identify the physiological substrates of mRNAs. The endogenous complexes are immunoprecipitated from cell lysates which limits this study to kinetically stable interactions. Even though it can identify RNA-protein complexes with heteromultimers, at least one of the proteins has to be previously known to be the basis of immunoprecipitation and "fish out" the whole complex.

#### **7.10 Crosslinking and Immunoprecicipation (CLIP) and Photoactivable-Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation (PAR-CLIP)**

The RIP-chip method that has just been described is limited to studies of very stable RNAprotein complexes; to remediate this problem another method is available to study RNAbinding proteins. The crosslinking and immunoprecipitation (CLIP) approach uses *in vivo* UV crosslinking prior to the complexes immunoprecipitation to identify less stable interactions (Ule et al., 2003). After immunoprecipitation RNA molecules are separated and cDNA sequencing is carried on. However, this method is not perfect as the commonly used UV 254nm RNA-protein crosslinking has low efficiency and it is difficult to distinguish between crosslinked RNAs from background non-crosslinked fragments that can be detected in the sample due to the presence of abundant cellular RNAs.

A more recent approach tries to further improve the CLIP method using photoreactive ribonucleoside analogs such as 4-thiouridine or 6-thioguanosine (Hafner et al., 2010). In this photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) protocol the photoreactive nucleosides are incorporated into nascent transcripts within living cells. The irradiation is performed with UV light of 365nm, which induces an efficient crosslink of the labeled cellular RNA to its interacting proteins. The labeled RNAs are isolated after co-immunoprecipitation, and converted into cDNA for deep sequencing. The precise crosslinking position can be identified by mutations in the sequenced cDNA making it possible to distinguish the crosslinked fragments from background.

#### **7.11 High-Throughput Sequencing – Fluorescent Ligand Interaction Profiling (HisT-FLIP)**

Very recently a new method was developed to characterize DNA-protein interactions using second-generation sequencing instruments (Nutiu et al., 2011). This method allows high throughput and quantitative measurement of DNA-protein binding affinity. This High-Throughput Sequencing – Fluorescent Ligand Interaction Profiling (HiTS-FLIP) uses the optics of a high-throughput sequencer to visualize *in vitro* binding of a protein to the sequenced DNA in a flow cell. The new method was initially used on a *Saccharomyces cerevisiae* transcription factor. The fluorescently tagged protein was added at different concentrations to a flow cell containing around 88 million DNA clusters, the equivalent of over 160 yeast genomes. The traditional EMSA was used as an independent validation of the dissociation constants obtained and found a high correlation with values obtained with the new method and those from EMSA as reported in literature. This high-throughput method has an obvious advantage in the fact that it can provide hundreds of millions of measurements but is limited to DNA-protein interactions and requires expensive equipment. Another advantage is that the sequencing instrument can measure multiple fluorescent wavelengths allowing hetero and homodimeric forms to be measured in the same run, using distinct tags on individual proteins.
