**4.1 Preparation of the protein sample**

Regarding the protein sample, the EMSA can be divided into two categories based on whether the nucleic acid-interacting protein is known or not. Therefore, preparing the protein sample will depend on which category it falls, in order to obtain an optimal performance.

When faced with a putative nucleic acid-binding protein or complex of completely unknown subcellular origin, whole cell extracts must be used. If there is an educated guess

There are five focal steps in a conventional EMSA protocol that involve different variables susceptible to optimization: (1) preparation of protein sample; (2) synthesis and labeling of nucleic acid; (3) binding reaction; (4) non-denaturing gel electrophoresis and (5) detection of the outcome. In this segment we will discuss each step separately mentioning the key variables in each one and the options available for any given situation. Figure 2 represents schematically the regular steps in a gel retardation assay that will be discussed below.

Fig. 2. A schematic representation of a conventional EMSA protocol. The labeled nucleic acid, simplified as lines with a star representing the label, is mixed with the protein sample, represented by the oval shapes, in a binding reaction and then loaded into a non-denaturing gel. After electrophoresis the result is detected according to the label in the nucleic acid. On the schematic gel (A) represents a well on which only the labeled nucleic acid was loaded. The free nucleic acid is expected to have more mobility than the bound molecules. In well (B) is symbolized a labeled nucleic acid binding to one small peptide and in well (C) is binding to two larger proteins. The heavier complex (in C) is expected to display the lowest

mobility during electrophoresis and therefore is closer to the beginning of the gel.

Regarding the protein sample, the EMSA can be divided into two categories based on whether the nucleic acid-interacting protein is known or not. Therefore, preparing the protein sample will depend on which category it falls, in order to obtain an optimal

When faced with a putative nucleic acid-binding protein or complex of completely unknown subcellular origin, whole cell extracts must be used. If there is an educated guess

**4.1 Preparation of the protein sample** 

performance.

Whenever possible we will also refer to examples in the literature.

**4. The method** 

on the nature of the protein, it is advisable to isolate nuclear and cytoplasmic proteins from crude extracts improving the results. Particularly, if the binding protein is thought to be nuclear and in low abundance, the isolation of nuclear extracts will prevent the dilution that would occur if whole cell extracts were used, which could render the concentration too low for the protein to be even detected.

Cell extracts are easy and relatively fast to obtain and the methods are commonly derived from the protocol described by Dignam and collaborators almost three decades ago (Dignam et al., 1983). This method isolates both nuclear and/or cytoplasmic proteins suitable for later analysis using EMSA. One disadvantage in preparing cell extracts is its crudeness; they generally degrade faster than purer preparations due to the presence of cellular proteases. To limit protein degradation or alteration the protocol should be performed on ice or at 4ºC and protease inhibitors should be added. A control test can easily be performed to assess the viability of the extract by using ubiquitous DNA probes (Kerr, 1995). If these fail than the cell extract might be "dead". Despite its disadvantages cell extracts are needed when the interest lies in identifying new nucleic acid-binding proteins or when a complex of different proteins is needed to interact with the target nucleic acid as sometimes one recombinant protein cannot bind by itself. Tissue samples can also be a source of protein sample for these assays. The same care should be taken as in whole cell extracts to minimize the activity of proteases.

If the nucleic acid-binding protein is known then recombinant proteins can be expressed and purified. Recombinant or heterologous proteins are commonly expressed in bacteria or an eukaryotic cell line of interest. Fusion proteins of the target are generally constructed with a tag to facilitate purification. Common tags, such as glutathione-S-transferase (GST), tandem affinity purification tag (TAP tag), maltose binding protein (MBP) or 6xHistidine, are cloned in frame with the protein. Sometimes it is possible to include a protease cleavage site between the protein of interest and the tag so the latter can be easily removed after purification. Even though a tag can be very helpful, it should be taken into account that it can alter the recombinant protein conformation and even disrupt its binding ability. On the other hand they can be helpful in stabilizing the protein terminus they are close to. A careful study is needed when choosing the tag and usually small peptides are preferred to minimize its impact on the recombinant protein of interest.

There are several systems available for the production of heterologous proteins of which bacterial extracts of *Escherichia coli* are one of the most widely used. This Gram-negative bacterium remains an attractive host due to its ability to grow rapidly and with high density using inexpensive substrates. Its genetics has been well characterized for quite some time and there is a wide range of cloning vectors as well as mutant host strains that make it such a versatile system. Typically, the heterologous complementary DNA is cloned into a compatible plasmid which is then transfected into the bacteria to achieve a high gene dosage. This doesn't necessarily guarantee the accumulation of high levels of a full-length active form of the recombinant protein but other efforts can be made to improve that. To achieve high-level production in *E. coli* strong promoters should be used such as the bacteriophage T7 late promoter, and usually the T7 polymerase is also present under IPTG (isopropyl-β-D-1-thiogalactopyranoside)-induction. In the past years several strains have been engineered to improve the recombinant protein yields through efforts to increase mRNA stability as well as improve transcription

Electrophoretic Mobility Shift Assay: Analyzing Protein – Nucleic Acid Interactions 213

One of the key advantages of EMSA is its versatility as it can be performed using a wide range of nucleic acid structures and sizes. This method can characterize both double- and single-stranded DNA as well as RNA, triplex and quadruplex nucleic acids or even circular fragments. The probe design and synthesis depends on the application or purpose of the study and is a significant aspect, as it will influence the detection and therefore the sensitivity of the results. There are two main aspects to consider in this step: the length of

Unlabeled nucleic acids can be used in a gel retardation assay and be detected by postelectrophoretic staining with chromophores or fluorophores that bind nucleic acids or in the "classical way" using ethidium bromide. However the use of labeled nucleic acids is usually preferred as it can facilitate detection and add sensitivity to the method. The most common choice is radioisotope labeling as it offers the best sensitivity without interfering with the structure of the probe. A higher sensitivity makes it ideal for assays that have a limited amount of starting material. The radioisotope, usually 32P, can be incorporated in the nucleic acid during its synthesis, by the use of labeled nucleotides, or afterwards via end labeling using a kinase or a terminal transferase. With a radioactive label the EMSA results can be easily detected by autoradiography. Even if radioisotope labeling confers high sensitivity to the method it implies handling hazardous radioactive material requiring extra safety measures that may not be available. Other labels can be used as alternatives that, even though are less sensitive, are a lot safer to manipulate and more stable such as fluorophores, biotin or digoxigenin (Holden & Tacon, 2011). When these molecules are used detection is achieved by chemiluminescence or immunohistochemistry. Although, in general radioisotope labeling achieves higher sensitivity there are some reports that similar results

can be obtained with other labels such as Cyano dye Cy5 (Ruscher et al., 2000).

Although the most common approach is the labeling of the nucleic acid probe there are protocols available that employ protein labeling at the same time. For example, Adachi and co-workers suggest the use of an iodoacetamide derivative labeling of the thiol residue of cysteins (Adachi et al., 2005). Using radioisotope labeled DNA mixed with a nuclear protein extract they perform a conventional EMSA and after detection by autoradiography the complexes are eluted from excised gel bands and treated with 5-iodoacetamidofluorescein for protein labeling. The sample is then loaded onto a denaturing gel and after electrophoresis is transferred to a membrane and detected with anti-fluorescein antibody. This allows the characterization of the proteins in the complex giving information on how many proteins are present and their molecular weight. However it is not able to detect

Regarding the length of the nucleic acid probe, it depends on what is being studied. If one is looking for specific binding sites, small probes can be used to assess with each segment the protein will interact. The use of short nucleic acids has several advantages as they are easily synthesized and inexpensive to purchase; a small sequence has less non-specific binding sites (it should be particularly advantageous when a protein has low sequence-specificity); the electrophoretic resolution between complexes and free nucleic acid is higher so shorter electrophoresis times can be used. Nevertheless, in a short sequence the binding sites are closer to the molecular ends which can cause aberrant binding and it can be tricky to resolve the free nucleic acid from the complexes formed if these have a very high molecular weight.

**4.2 Synthesis and labeling of nucleic acids** 

the nucleic acid and its labeling.

proteins without cystein residues.

termination and translational efficiency (reviewed by Baneyx, 1999 and Makino et al., 2011). However, this extensively used system for protein overexpression has an important drawback when studying eukaryotic proteins. The bacterial systems are not able to perform post-translational modifications that would eventually happen *in vivo* in eukaryotic cells.

When working with recombinant nucleic acid-binding proteins it should be taken into account the importance of post-translational modifications on the protein's binding ability. A careful research of previous reports might hint if it is necessary to perform modifications prior to the binding reaction. In some cases post-translational modifications change the sequence-specificity of the binding. For example, genotoxic stress induces modifications on the C-terminus of the tumor suppressor protein p53 that modulate its DNA-binding specificity (Apella & Anderson, 2001). If the modifications are crucial, rather than using bacterial extracts a more biologically relevant host should be considered. Transient gene expression in mammalian cells has become a routine approach to express proteins in cell lines such as human embryonic kidney cells. The benefits are obvious for the production of eukaryotic proteins in mammalian cells as post-translational modifications will likely be native or near-native, solubility and correct folding are more likely to occur as well as expression of proteins in their proper intracellular compartments. These methods, however, tend to be more expensive as cells need a more complex growth media and there is a lower diversity in cloning vectors. To get out of the latter limitation an alternative approach uses baculovirus-infected insect cells. In this method a recombinant virus is produced either by site-specific transposition of an expression cassette into the shuttle vector or through homologous recombination (reviewed by Jarvis, 2009**).** 

When expressing recombinant proteins, sometimes, the heterologous genes interfere severely with the survival of the host cell. For toxic proteins produced in *E. coli* strains there are some techniques available to get around this problem. A highly toxic gene can be defined as a gene that, when introduced into a cell, causes cell death or severe growth and maintenance defects even prior to expression induction. The best solution for expressing a highly toxic gene is to enable the host to tolerate it during the growth phase, so that after induction an efficient expression ensures a rapid and quantitative production of the toxic protein before the cell dies (reviewed by Saida et al., 2006). This can be achieved by different strategies such as manipulation of the gene's transcriptional and translational control elements, for example, by suppressing basal expression of the toxic protein from leaky inducible promoters. Managing the coding sequence to produce reversible inactive forms or controlling the plasmid copy number is also an option as well as selecting less susceptible *E. coli* strains or adding stabilizing sequences.

Cell-free systems are also available to express recombinant proteins including *in vitro* transcription\translation systems such as rabbit reticulocyte systems, wheat germ based systems or *E. coli* cell-free protein expression systems (reviewed by Endo & Sawasaki, 2006). Here, proteins can be expressed directly from cDNA templates obtained through PCR, avoiding subcloning which makes it a faster method by skipping this step, and eventually cheaper. It can also be used to express proteins that seriously interfere with the cell physiology such as the toxic proteins mentioned above. On the other hand these methods usually achieve smaller yields than for instance bacterial extracts approaches.

#### **4.2 Synthesis and labeling of nucleic acids**

212 Gel Electrophoresis – Advanced Techniques

termination and translational efficiency (reviewed by Baneyx, 1999 and Makino et al., 2011). However, this extensively used system for protein overexpression has an important drawback when studying eukaryotic proteins. The bacterial systems are not able to perform post-translational modifications that would eventually happen *in vivo* in

When working with recombinant nucleic acid-binding proteins it should be taken into account the importance of post-translational modifications on the protein's binding ability. A careful research of previous reports might hint if it is necessary to perform modifications prior to the binding reaction. In some cases post-translational modifications change the sequence-specificity of the binding. For example, genotoxic stress induces modifications on the C-terminus of the tumor suppressor protein p53 that modulate its DNA-binding specificity (Apella & Anderson, 2001). If the modifications are crucial, rather than using bacterial extracts a more biologically relevant host should be considered. Transient gene expression in mammalian cells has become a routine approach to express proteins in cell lines such as human embryonic kidney cells. The benefits are obvious for the production of eukaryotic proteins in mammalian cells as post-translational modifications will likely be native or near-native, solubility and correct folding are more likely to occur as well as expression of proteins in their proper intracellular compartments. These methods, however, tend to be more expensive as cells need a more complex growth media and there is a lower diversity in cloning vectors. To get out of the latter limitation an alternative approach uses baculovirus-infected insect cells. In this method a recombinant virus is produced either by site-specific transposition of an expression cassette into the shuttle vector or through

When expressing recombinant proteins, sometimes, the heterologous genes interfere severely with the survival of the host cell. For toxic proteins produced in *E. coli* strains there are some techniques available to get around this problem. A highly toxic gene can be defined as a gene that, when introduced into a cell, causes cell death or severe growth and maintenance defects even prior to expression induction. The best solution for expressing a highly toxic gene is to enable the host to tolerate it during the growth phase, so that after induction an efficient expression ensures a rapid and quantitative production of the toxic protein before the cell dies (reviewed by Saida et al., 2006). This can be achieved by different strategies such as manipulation of the gene's transcriptional and translational control elements, for example, by suppressing basal expression of the toxic protein from leaky inducible promoters. Managing the coding sequence to produce reversible inactive forms or controlling the plasmid copy number is also an option as well as selecting less susceptible *E.* 

Cell-free systems are also available to express recombinant proteins including *in vitro* transcription\translation systems such as rabbit reticulocyte systems, wheat germ based systems or *E. coli* cell-free protein expression systems (reviewed by Endo & Sawasaki, 2006). Here, proteins can be expressed directly from cDNA templates obtained through PCR, avoiding subcloning which makes it a faster method by skipping this step, and eventually cheaper. It can also be used to express proteins that seriously interfere with the cell physiology such as the toxic proteins mentioned above. On the other hand these methods usually achieve smaller yields than for instance bacterial extracts

homologous recombination (reviewed by Jarvis, 2009**).** 

*coli* strains or adding stabilizing sequences.

approaches.

eukaryotic cells.

One of the key advantages of EMSA is its versatility as it can be performed using a wide range of nucleic acid structures and sizes. This method can characterize both double- and single-stranded DNA as well as RNA, triplex and quadruplex nucleic acids or even circular fragments. The probe design and synthesis depends on the application or purpose of the study and is a significant aspect, as it will influence the detection and therefore the sensitivity of the results. There are two main aspects to consider in this step: the length of the nucleic acid and its labeling.

Unlabeled nucleic acids can be used in a gel retardation assay and be detected by postelectrophoretic staining with chromophores or fluorophores that bind nucleic acids or in the "classical way" using ethidium bromide. However the use of labeled nucleic acids is usually preferred as it can facilitate detection and add sensitivity to the method. The most common choice is radioisotope labeling as it offers the best sensitivity without interfering with the structure of the probe. A higher sensitivity makes it ideal for assays that have a limited amount of starting material. The radioisotope, usually 32P, can be incorporated in the nucleic acid during its synthesis, by the use of labeled nucleotides, or afterwards via end labeling using a kinase or a terminal transferase. With a radioactive label the EMSA results can be easily detected by autoradiography. Even if radioisotope labeling confers high sensitivity to the method it implies handling hazardous radioactive material requiring extra safety measures that may not be available. Other labels can be used as alternatives that, even though are less sensitive, are a lot safer to manipulate and more stable such as fluorophores, biotin or digoxigenin (Holden & Tacon, 2011). When these molecules are used detection is achieved by chemiluminescence or immunohistochemistry. Although, in general radioisotope labeling achieves higher sensitivity there are some reports that similar results can be obtained with other labels such as Cyano dye Cy5 (Ruscher et al., 2000).

Although the most common approach is the labeling of the nucleic acid probe there are protocols available that employ protein labeling at the same time. For example, Adachi and co-workers suggest the use of an iodoacetamide derivative labeling of the thiol residue of cysteins (Adachi et al., 2005). Using radioisotope labeled DNA mixed with a nuclear protein extract they perform a conventional EMSA and after detection by autoradiography the complexes are eluted from excised gel bands and treated with 5-iodoacetamidofluorescein for protein labeling. The sample is then loaded onto a denaturing gel and after electrophoresis is transferred to a membrane and detected with anti-fluorescein antibody. This allows the characterization of the proteins in the complex giving information on how many proteins are present and their molecular weight. However it is not able to detect proteins without cystein residues.

Regarding the length of the nucleic acid probe, it depends on what is being studied. If one is looking for specific binding sites, small probes can be used to assess with each segment the protein will interact. The use of short nucleic acids has several advantages as they are easily synthesized and inexpensive to purchase; a small sequence has less non-specific binding sites (it should be particularly advantageous when a protein has low sequence-specificity); the electrophoretic resolution between complexes and free nucleic acid is higher so shorter electrophoresis times can be used. Nevertheless, in a short sequence the binding sites are closer to the molecular ends which can cause aberrant binding and it can be tricky to resolve the free nucleic acid from the complexes formed if these have a very high molecular weight.

Electrophoretic Mobility Shift Assay: Analyzing Protein – Nucleic Acid Interactions 215

gels depending mainly on the size of the nucleic acid and desired resolution. The average pore size is estimated to be around 5 to 20nm in diameter for 10 and 4% acrylamide gels respectively (Lane et al., 1992). Typically the higher concentration gels are used for oligonucleotides and small RNAs and the lowest concentration for DNA fragments of around 100bp. A polyacrylamide gradient gel is sometimes preferred over linear gels as the gradient in pore size increases the range of molecular weight fractioned in a single run, which is particularly important when the complex has a much higher weight than the free nucleic acid (Walker, 1994). When complexes of different composition are formed, the

Agarose gels, on the other hand, have a pore size of around 70 to 700nm (Lane et al., 1992) in diameter and are therefore mostly used in assays with larger nucleic acid fragments or when large protein complexes are expected. Overall, polyacrylamide gels offer a better resolution for nucleic acid-protein complexes with a molecular weight of up to 500,000Da (Fried, 1989

Regarding the electrophoresis buffers, it should be taken into account the fact that the interaction between nucleic acids and proteins involves an ionic component. Therefore, the buffer's ionic strength and pH are important features that play a role in the complex stability. Although this is a very important factor there hasn't been, to our knowledge, any thorough study on the subject. The choice of electrophoresis buffers is varied and generally low ionic strength buffers are preferred and sometimes coincide with the buffer used in the binding reaction. Buffers with a medium salt concentration help stabilize the complexes, generate less heat during electrophoresis and also increase the speed of migration. High salt concentrations not only disrupt the complexes but also interfere with its movement into the gel matrix and lead to significant heating during the electrophoresis. Too low salt concentrations can also disrupt the stability of the preformed complexes as well as separate a double stranded DNA template (Kerr, 1995). The most common buffers are TBE (90mM Tris-Borate, 2mM EDTA, pH 8) and TAE (40mM Tris-Acetate, 1mM EDTA, pH 8). However, there are some complexes that cannot be detected with the classical buffers. For example the complexes formed between phage Mu repressor and its operators have an electrophoresis buffer-dependent stability and require Tris-glycine buffer at pH 9.4. (Alazard et al., 1992 as

Particularly, in agarose gels it is important to monitor the temperature during electrophoresis to prevent the gel from heating up which could result in dissociation of the nucleic acid-protein complexes. Some cases may require that pre-cooling of the gel or even that the electrophoresis proceeds at lower than room temperatures, which can be achieved

The detection of an EMSA result will naturally depend on the labels used if any has been used. The results uncovered can involve the detection of the mobility shift between free nucleic acid and the complexed form or the detection of the mobility shift of free protein and

Looking at the nucleic acid component without any label added the shift in mobility can be detected by staining with molecules that bind nucleic acids. Different products can be used

gradient gels are also more likely to separate those with close molecular weight.

as cited in Hellman & Fried, 2007).

cited in Lane et al., 1992).

**4.5 Detection** 

the complexes.

with special refrigeration devices.

On the other hand, the longer nucleic acid targets avoid these problems but will have more non-specific binding sites and the mobility shift is generally smaller requiring longer electrophoresis times as they run more slowly. A compromise needs to be reached depending on what the EMSA study is trying to achieve.
