**Meet the editor**

Professor of Biochemistry and Molecular Medicine at Autonomous University of Nuevo Leon (UANL) Medical School, where he also holds the position of Secretary for Science of Technology. Self trained as a biochemist at UANL (1979), Ph.D. in Biomedical Sciences at the University of Texas Health Science Center Houston (UTHSC, 1982) and postdoc with Prof. Pierre Chambon

in Strasbourg, France (1984). Founder of several programs and centers of excellence dedicated to training and advancing research on molecular biology and genomics. Recipient of many awards, standing out the world record for the largest human genes manually sequenced, considered feasible evidence for the Human Genome Project, Distinguished Ex-Alumnus award from the Graduate School of Biomedical Sciences of the UTHSCH, the Scientific Merit Award of his home town, and his career profile published by the prestigious international scientific magazine Nature Medicine, the leading Mexican newspaper El Norte, and the Latin American culture magazine Contenido. Author of 120 peer-reviewed research articles with over 2000 citations, of a book on Molecular Biology, of two biotechnology patents, and of several industrial technology transfers. In 2005 he debuted as an entrepreneur having founded a consulting firm in biotechnology and genomics (Innbiogem, SC) and a Laboratory for Molecular Diagnostics and Bioprocess (Vitagénesis, SA).

### Contents

#### **Preface** XI

#### **Part 1 Technology 1**


#### **Part 2 Application 59**


X Contents

Chapter 7 **Genetic Engineering and Biotechnology of Growth Hormones 173**  Jorge Angel Ascacio-Martínez and Hugo Alberto Barrera-Saldaña

	- **Part 4 Responsibility 225**

### Preface

In the last three decades since the application of genetics to plants and animals, we have witnessed impressive advances best illustrated by the fact that almost one-tenth of all cultivated land on our planet is now planted with transgenic crops. Also, although no transgenic animals can be found in the prairies, some do live on specialized farms, replacing bioreactors from biotech facilities in the production of therapeutic proteins.

While the targets of the first efforts of genetic engineering were to increase plant resistance to pests and herbicides, some ingenious and provocative applications also started emerging, such as longer lasting fruit on the shelf and mice, even pet animals, expressing the sea medusa green fluorescent protein.Genetic engineering has proven that it is not a threat to mankind but rather a powerful tool for solving not only food shortages, especially by reducing losses due to pests and by contributing to the development of inexpensive and safer fertilizers, but also for decreasing the shortage of sophisticated biologicals from natural sources and for coping with the explosive demand of these in medicine. A good example are antigens and therapeutics, which are now produced even by cows in modern biotech farms.

At the same time, we are exposed to novel applications of genetic engineering in practically all fields. This book illustrates some of these applications, such as thermostabilization of luciferase; engineering of the phenylpropanoid pathway in a species of high demand for the paper industry; more efficient regeneration of transgenic soybean; viral resistant plants; and a novel approach for rapidly screening, in the test tube, properties of newly discovered animal growth hormones.

To make the technology more user-friendly and easy to understand, two chapters focus on the basics of making the expression of transgenes in plants and biotech hosts possible. They also illustrate the state-of-the-art tools (mainly expression vectors) that are capable of coping with the hosts´ requirements for expressing their own genes.

Finally, there are chapters concerned with safety issues in manipulating plants, viruses, and introducing genetically modified organisms into the environment, and with how to raise consciousness of the great responsibility we now carry to use genetic engineering wisely and planet-friendly.

#### XII Preface

The book contributes chapters on the basics of genetic engineering, on applications of the technology to attempt to solve problems of greater importance to both society and industry, and comes to a close by reminding us of the moral responsibility we have to always keep in mind, that nature is a very fragile equilibrium and that we have already put it at risk. We should always pay attention to the ethical, moral and environmental consequences of applications that have not been tested enough in the laboratory and in controlled field facilities to avoid unexpected and unintentional harm to our and other species and as well as the environment.

#### **Prof. Dr. Hugo A. Barrera-Saldaña**

Professor, Department of Biochemistry and Molecular Medicine UANL School of Medicine Monterrey, Director, Vitaxentrum Monterrey, México

**Part 1** 

**Technology** 

## **Expression of Non-Native Genes in a Surrogate Host Organism**

Dan Close, Tingting Xu, Abby Smartt, Sarah Price, Steven Ripp and Gary Sayler *Center for Environmental Biotechnology, The University of Tennessee, Knoxville USA* 

#### **1. Introduction**

Genetic engineering can be utilized to improve the function of various metabolic and functional processes within an organism of interest. However, it is often the case that one wishes to endow a specific host organism with additional functionality and/or new phenotypic characteristics. Under these circumstances, the principles of genetic engineering can be utilized to express non-native genes within the host organism, leading to the expression of previously unavailable protein products. While this process has been extremely valuable for the development of basic scientific research and biotechnology over the past 50 years, it has become clear during this time that there are a multitude of factors that must be considered to properly express exogenous genetic constructs.

The major factors to be considered are primarily due to the differences in how disparate organisms have evolved to replicate, repair, and express their native genetic constructs with a high level of efficiency. As a result, the proper expression of exogenous genes in a surrogate host must be considered in light of the ability of the replication and expression machinery to recognize and interact with the gene of interest. In this chapter, primary attention will be given to the differences in gene expression machinery and strategies between prokaryotic and eukaryotic organisms. Factors such as the presence or absence of exons, the functionality of polycistronic expression systems, and differences in ribosomal interaction with the gene sequence will be considered to explain how these discrepancies can be overcome when expressing a prokaryotic gene in a eukaryotic organism, or vice versa.

There are, of course, additional concerns that are applicable regardless of how closely related the surrogate host is to the native organism. To properly prepare investigators for the expression of genes in a wide variety of non-native organisms, concerns such as differences in the codon usage bias of the surrogate versus the native host, as well as how discrepancies in the overall GC content of each organism can affect the efficiency of gene expression and long term maintenance of the construct will be considered in light of the mechanisms employed by the host to recognize and remove foreign DNA. This will provide a basic understanding of the biochemical mechanisms responsible for genetic replication and expression, and how they can be utilized for expression of non-native constructs.

In addition, the presence, location, and function of the major regulatory signals controlling gene expression will be detailed, with an eye towards how they must be modified prior to exogenous expression. Specifically, this section will focus on the presence, location, and composition of common promoter elements, the function and location of the Kozak sequence, and the role of restriction and other regulatory sites as they relate to expression across broad host categories. Considerations relating to the potential phenotypic effects of exogenous gene expression will also be considered, especially in light of the potential for interaction with host metabolism or regulation of possible aggregation of the protein product within the surrogate host. This will provide readers with a basic understanding of how common sequences can be employed to either enhance or temper the production of a gene of interest within a surrogate host to provide efficient expression.

Finally, to highlight how these processes must be employed in concert to express non-native genes in a surrogate host organism, the expression of the full bacterial luciferase gene cassette in a human kidney cell host will be presented as a case study. This example represents a unique case whereby multiple, simultaneous considerations were applied to express a series of six genes originally believed to be functional only in prokaryotic organisms in a eukaryotic surrogate. The final expression of the full bacterial luciferase gene cassette has been the result of greater than 20 years of research by various groups, and nicely demonstrates how each of the major topic areas considered in this chapter were required to successfully produce autonomous bioluminescence from a widely disparate surrogate host. It will summarize the considerations that have been introduced, and present the reader with a clear overview of how these principles can be applied under laboratoryrelevant conditions to achieve a specific goal.

#### **2. Mechanisms of gene expression**

Before exogenously expressing a gene in a foreign host organism, it is important to understand the basics behind how genes are expressed and maintained. Through this understanding of innate genetic function, it is possible to better understand the modifications that serve to enhance expression of non-native genes. Fortuitously, from a basic standpoint, all genes are subject to the same basic processes whether they are prokaryotic or eukaryotic in origin: replication, transcription, and translation. The primary differences that separate eukaryotic and prokaryotic gene expression are due to the associated proteins that are involved in each of these processes. In the end however, the objective is the same, to transcribe DNA to messenger RNA (mRNA), translate that mRNA to protein, and to have that protein carry out a function. This succession of events has

Fig. 1. The central dogma of biology shown in schematic form. DNA is transcribed to RNA and the RNA is then translated into protein. This process is the fundamental platform of our understanding of life. Adapted from (Schreiber, 2005)

become known as the central dogma of biology (Fig. 1). By understanding the differences in the genetic machinery that are employed by eukaryotes and prokaryotes, one can achieve a better understanding of why certain modifications must be made when expressing a prokaryotic gene in a eukaryotic host, and vise versa.

#### **2.1 Replication**

4 Genetic Engineering – Basics, New Applications and Responsibilities

In addition, the presence, location, and function of the major regulatory signals controlling gene expression will be detailed, with an eye towards how they must be modified prior to exogenous expression. Specifically, this section will focus on the presence, location, and composition of common promoter elements, the function and location of the Kozak sequence, and the role of restriction and other regulatory sites as they relate to expression across broad host categories. Considerations relating to the potential phenotypic effects of exogenous gene expression will also be considered, especially in light of the potential for interaction with host metabolism or regulation of possible aggregation of the protein product within the surrogate host. This will provide readers with a basic understanding of how common sequences can be employed to either enhance or temper the production of a

Finally, to highlight how these processes must be employed in concert to express non-native genes in a surrogate host organism, the expression of the full bacterial luciferase gene cassette in a human kidney cell host will be presented as a case study. This example represents a unique case whereby multiple, simultaneous considerations were applied to express a series of six genes originally believed to be functional only in prokaryotic organisms in a eukaryotic surrogate. The final expression of the full bacterial luciferase gene cassette has been the result of greater than 20 years of research by various groups, and nicely demonstrates how each of the major topic areas considered in this chapter were required to successfully produce autonomous bioluminescence from a widely disparate surrogate host. It will summarize the considerations that have been introduced, and present the reader with a clear overview of how these principles can be applied under laboratory-

Before exogenously expressing a gene in a foreign host organism, it is important to understand the basics behind how genes are expressed and maintained. Through this understanding of innate genetic function, it is possible to better understand the modifications that serve to enhance expression of non-native genes. Fortuitously, from a basic standpoint, all genes are subject to the same basic processes whether they are prokaryotic or eukaryotic in origin: replication, transcription, and translation. The primary differences that separate eukaryotic and prokaryotic gene expression are due to the associated proteins that are involved in each of these processes. In the end however, the objective is the same, to transcribe DNA to messenger RNA (mRNA), translate that mRNA to protein, and to have that protein carry out a function. This succession of events has

Fig. 1. The central dogma of biology shown in schematic form. DNA is transcribed to RNA and the RNA is then translated into protein. This process is the fundamental platform of our

gene of interest within a surrogate host to provide efficient expression.

relevant conditions to achieve a specific goal.

understanding of life. Adapted from (Schreiber, 2005)

**2. Mechanisms of gene expression** 

The end goal of the replication process is the same for all organisms, whether eukaryotic or prokaryotic: reproducing genetic information to pass on to the next generation. Replication is an especially important stage for the gene expression process not only because it provides a means for passing on genetic information, but also because any errors that occur during this period alter the genetic code and subsequently pass that alteration to future generations. The major differences in replication between prokaryotes and eukaryotes are due to the location where replication occurs and the layout of the genome itself. In prokaryotic organisms, the DNA is typically stored as a circular chromosome, located in the uncompartmentalized cytoplasm of the cell. However, in eukaryotic organisms, the DNA is packaged into linear chromosomes and stored in the nucleus of the cell. The replication of DNA, however, occurs in a similar process for both prokaryotes and eukaryotes. An origin of replication is defined where the binding of DNA helicase allows the DNA to unwind, exposing both strands of DNA and allowing them to serve as templates for replication (Keck & Berger, 2000; So & Downey, 1992). Once unwound, an RNA primer is added to the 5' end of the DNA, and the DNA polymerase enzyme begins adding complementary nucleotides in the 5' to 3' direction. As DNA has an antiparallel conformation, a leading strand and lagging strand are both formed when it is unwound. The leading strand allows replication to occur continuously and therefore needs only one primer, however, the lagging strand is exposed in the 3' to 5' direction and forces replication to occur discontinuously. The lagging strand therefore requires multiple primers that allow the polymerase to make numerous short DNA fragments, called Okazaki fragments, which are later formed into a continuous strand (Falaschi, 2000; So & Downey, 1992). As described previously, prokaryotic DNA is housed on a circular chromosome, allowing for bidirectional replication and termination when the two replication forks meet at a termination sequence (Keck & Berger, 2000). However, because eukaryotes have linear chromosomes, termination is achieved by reaching the end of the chromosome where a telomerase enzyme then elongates the 3' end of the chromosome so that the template DNA can complete the replication process (Zvereva et al., 2010).

#### **2.2 Transcription**

#### **2.2.1 Transcription initiation**

Transcription is the process of creating an mRNA message from a DNA template, and proceeds in three basic steps for both eukaryotic and prokaryotic organisms: initiation, elongation, and termination. One important difference is that while prokaryotes have only a single coding region for genetic information, eukaryotes have both coding and non-coding regions called exons and introns, respectively. The exons carry the genetic information that must be transcribed and translated, whereas introns break up sequences of exons with noncoding genetic sequences (Watson et al., 2008). The initiation step begins with the binding of an RNA polymerase enzyme to a specific DNA sequence that encodes the gene or genes being expressed. This stage varies slightly between prokaryotic and eukaryotic organisms, with prokaryotes having only one RNA polymerase, whereas eukaryotes have three RNA polymerases. The prokaryotic RNA polymerase uses a specific feature called a sigma (σ) factor to recognize an upstream start site called a promoter. This region is composed of, at minimum, two DNA sequences located -35 and -10 base pairs (bp), upstream from where transcription will begin (Murakami & Darst, 2003). In addition, another DNA element called an UP-element is sometimes located further upstream within the promoter, allowing a stronger bond between the DNA template and the RNA polymerase upon binding. Immediately following the binding of the RNA polymerase, the DNA undergoes a conformational change whereby it unwinds to expose the single template strand required for the transcription process to proceed to the elongation step. This process of DNA separation generally occurs between the -11 and +3 bp positions relative to the transcription start site. Although the basic process of transcription initiation is similar in eukaryotes, different enzymes are utilized to carry out the steps described above. Unlike prokaryotes, eukaryotic organisms have three RNA polymerase enzymes called Pol I, Pol II and Pol III. Of these three enzymes, Pol II is the most predominant during routine transcription. And while prokaryotes have only the single initiation factor, the σ factor, Pol II works in conjunction with multiple general transcription factors (GTFs). Regardless of these differences, the polymerase binding process is the same, with initiation factors recognizing specific points on the promoter and allowing Pol II to bind (Ebright, 2000). In eukaryotes, the most common recognition sites are the TRIIB site, the TATA box, the initiator, or downstream promoter elements (Boeger et al., 2005). Once bound to the DNA, Pol II and the GTFs allow the DNA to unwind, preparing the way for the elongation step and the beginning of mRNA message assembly synthesis.

#### **2.2.2 Elongation during transcription**

As the elongation step begins, a conformational change allows the RNA polymerase to release from the promoter and it begins building an mRNA message as it scans along the template sequence. In prokaryotes, as the DNA template enters into the polymerasepromoter complex, it is paired with a complementary messenger sequence, producing a small transcript composed of linked mRNA nucleotides. As this process repeats, the newly formed mRNA nucleotide cannot be contained within the polymerase and must exit through a designated exit channel. This causes the σ factor to dissociate from the polymerase and likewise, the polymerase to dissociate from the template, allowing for continued elongation of the nascent mRNA message. As the mRNA is lengthened by the polymerase moving along the DNA, adding one mRNA nucleotide at a time, the DNA winds and unwinds to keep the transcription bubble that forms on the DNA template a constant size. This process is slightly different in eukaryotes, where escaping the promoter requires two steps to disconnect the GTFs from the polymerase and the polymerase from the promoter. The first step is an input of energy derived from the hydrolysis of ATP. Without the free energy released from ATP hydrolysis, an arrest period would occur that could terminate the elongation phase and thus, stop transcription altogether (Dvir et al., 1996, 2001). The second required step is the phosphorylation of Pol II. As phosphates are added to the polymerase tail, it sheds the associated GTFs and dissociates from the promoter region (Boeger et al., 2005). Once the polymerase is free of the GTFs, elongation factors are able to bind and stimulate the addition of nucleotides to the growing mRNA message.

#### **2.2.3 Termination of transcription**

6 Genetic Engineering – Basics, New Applications and Responsibilities

being expressed. This stage varies slightly between prokaryotic and eukaryotic organisms, with prokaryotes having only one RNA polymerase, whereas eukaryotes have three RNA polymerases. The prokaryotic RNA polymerase uses a specific feature called a sigma (σ) factor to recognize an upstream start site called a promoter. This region is composed of, at minimum, two DNA sequences located -35 and -10 base pairs (bp), upstream from where transcription will begin (Murakami & Darst, 2003). In addition, another DNA element called an UP-element is sometimes located further upstream within the promoter, allowing a stronger bond between the DNA template and the RNA polymerase upon binding. Immediately following the binding of the RNA polymerase, the DNA undergoes a conformational change whereby it unwinds to expose the single template strand required for the transcription process to proceed to the elongation step. This process of DNA separation generally occurs between the -11 and +3 bp positions relative to the transcription start site. Although the basic process of transcription initiation is similar in eukaryotes, different enzymes are utilized to carry out the steps described above. Unlike prokaryotes, eukaryotic organisms have three RNA polymerase enzymes called Pol I, Pol II and Pol III. Of these three enzymes, Pol II is the most predominant during routine transcription. And while prokaryotes have only the single initiation factor, the σ factor, Pol II works in conjunction with multiple general transcription factors (GTFs). Regardless of these differences, the polymerase binding process is the same, with initiation factors recognizing specific points on the promoter and allowing Pol II to bind (Ebright, 2000). In eukaryotes, the most common recognition sites are the TRIIB site, the TATA box, the initiator, or downstream promoter elements (Boeger et al., 2005). Once bound to the DNA, Pol II and the GTFs allow the DNA to unwind, preparing the way for the elongation step and the

As the elongation step begins, a conformational change allows the RNA polymerase to release from the promoter and it begins building an mRNA message as it scans along the template sequence. In prokaryotes, as the DNA template enters into the polymerasepromoter complex, it is paired with a complementary messenger sequence, producing a small transcript composed of linked mRNA nucleotides. As this process repeats, the newly formed mRNA nucleotide cannot be contained within the polymerase and must exit through a designated exit channel. This causes the σ factor to dissociate from the polymerase and likewise, the polymerase to dissociate from the template, allowing for continued elongation of the nascent mRNA message. As the mRNA is lengthened by the polymerase moving along the DNA, adding one mRNA nucleotide at a time, the DNA winds and unwinds to keep the transcription bubble that forms on the DNA template a constant size. This process is slightly different in eukaryotes, where escaping the promoter requires two steps to disconnect the GTFs from the polymerase and the polymerase from the promoter. The first step is an input of energy derived from the hydrolysis of ATP. Without the free energy released from ATP hydrolysis, an arrest period would occur that could terminate the elongation phase and thus, stop transcription altogether (Dvir et al., 1996, 2001). The second required step is the phosphorylation of Pol II. As phosphates are added to the polymerase tail, it sheds the associated GTFs and dissociates from the promoter region (Boeger et al., 2005). Once the polymerase is free of the GTFs, elongation factors are able to

bind and stimulate the addition of nucleotides to the growing mRNA message.

beginning of mRNA message assembly synthesis.

**2.2.2 Elongation during transcription** 

After the complete mRNA has been synthesized, transcription ends in the termination step. As suggested by the name, the purpose of the termination step is to stop the production of mRNA after the template gene has been transcribed. Prokaryotes have two different termination methods, Rho-dependent and Rho-independent. Rho binding sequences are DNA sequences that signal the end of elongation and allow the polymerase to dissociate from the DNA. The Rho protein is made up of six identical subunits that have a high affinity for C-rich RNA sequences. It becomes active in transcription termination once the ribosome has slowed translation to a point where it can bind to the RNA between the RNA polymerase and the ribosome (Richardson, 2003). The presence of a Rho binding region allows the corresponding Rho protein to bind to the RNA, after it has exited the polymerase. The intrinsic ATPase activity of the Rho protein then terminates elongation, stopping the production of RNA (Richardson, 2003). Rho-independent terminators do not require binding of the Rho protein to initiate termination of RNA production. Instead, the DNA template sequence encodes an inverted repeat and a series of AT base pairs that, when transcribed to RNA, form a hairpin that is followed by a series of AU base pairs. The formation of this secondary structure causes termination of RNA production and releases the nascent mRNA message from the polymerase (Abe & Aiba, 1996). In eukaryotes, this termination process is again different from that of prokaryotes because there are three RNA processing events that lead to termination: capping, splicing, and polyadenylation. As the mRNA message exits the polymerase, capping occurs through the addition of a methylated guanine to the 5' end of the nascent mRNA (Wahle, 1995). Next, splicing occurs where the non-coding regions of the mRNA are removed, and finally, the 3' end of the mRNA is polyadenylated, allowing it to dissociate from polymerase and end transcription. The major differences in the transcription process between prokaryotes and eukaryotes are summarized in Table 1.


Table 1. Comparison of the transcriptional process in prokaryotes and eukaryotes

#### **2.3 Translation**

After transcription has been successfully completed, the mRNA is ready to be translated; a process that takes the mRNA message and uses it to produce a string of amino acids, known as a protein. Just as with the transcriptional process, there are subtle, but important, differences in how this is performed in prokaryotes and eukaryotes. In eukaryotes, whereas the transcriptional process takes place in the nucleus, translation takes place in the cytoplasm. This means that the previously produced mRNA must move across the nuclear membrane to the cytoplasm before translation can occur. Since the transcriptional process in prokaryotes occurs in the uncompartmentalized cytoplasm, this is an unnecessary step and translation can occur as soon as the mRNA exits the polymerase during transcription. Regardless of if this process occurs in a prokaryote or eukaryote, there are four major components involved: mRNA, transfer RNA (tRNA), aminoacyl-tRNA synthetases, and ribosomes. The mRNA component is composed of codons, three nucleotide long elements, which are joined together end to end to form open reading frames (ORFs). While the genes of eukaryotes usually only have one ORF per mRNA sequence, it is not uncommon for prokaryotes to contain two or more ORFs per mRNA sequence (Watson et al., 2008). These multi-ORF mRNA sequences are referred to as polycistronic mRNAs and can encode multiple proteins from a single sequence of mRNA. In order for the amino acids to recognize and bind to the mRNA template, tRNA is used as a mediator. tRNAs are complementary to specific codons via their anti-codons and, upon recognition of their specified codon, incorporate the corresponding appropriate amino acid for that codon (Kolitz & Lorsch, 2010). Once the corresponding amino acid is bound to the tRNA, the complex is referred to as an aminoacyl-tRNA synthetase, which then binds to the complement mRNA to allow the appropriate amino acid to be added to the peptide chain. The final component of the translational process, the ribosome, is the enzyme responsible for catalyzing the pairing of mRNA and tRNA, leading to the formation of the polypeptide chain. Ribosomes are composed of two individual subunits, the small and large subunits, and contain three binding sites, the A site, the P site and the E site (Ramakrishnan, 2002). These three binding sites work together to allow protein synthesis. Similar to the transcriptional process, these components work together to perform the initiation, elongation, and termination phases of translation.

#### **2.3.1 Initiation of translation**

The translational initiation stage for prokaryotes and eukaryotes involves similar steps, but each performs these steps using different enzymes. For prokaryotes, the initiation step involves the recruitment of the ribosome to the mRNA through a ribosomal binding site that is located just upstream of the start codon on the previously synthesized mRNA. This process can occur as soon as the nascent mRNA has exited the polymerase, with three translation initiation factors (IF1, IF2, IF3) binding to the A, E and P sites of the ribosome and directing the placement of the initiator tRNA to the start codon of mRNA (Ramakrishnan, 2002). Following binding, the initiation factor bound to the E site releases, allowing the large ribosomal subunit to unite with the small subunit, creating a 70S initiation complex. This binding causes the hydrolysis of GTP and subsequent release of all additional initiation factors. Following disassociation of the initiation factors, the ribosome/mRNA complex is then ready to enter the elongation phase.

Due to the intrinsic compartmentalization in eukaryotic organisms, translation is a completely separate event from that of transcription because the nuclear membrane prevents the mRNA from interacting with the ribosome until it is released into the cytoplasm. However, once in the cytoplasm, the 5' methylated guanine cap attached to the eukaryotic mRNA binds to the ribosome and the process begins. The eukaryotic ribosome is similar to its prokaryotic counterpart in that it too has A, P and E binding sites and utilizes initiation factors to achieve correct attachment of associated tRNA (Figure 2). However,

cytoplasm. This means that the previously produced mRNA must move across the nuclear membrane to the cytoplasm before translation can occur. Since the transcriptional process in prokaryotes occurs in the uncompartmentalized cytoplasm, this is an unnecessary step and translation can occur as soon as the mRNA exits the polymerase during transcription. Regardless of if this process occurs in a prokaryote or eukaryote, there are four major components involved: mRNA, transfer RNA (tRNA), aminoacyl-tRNA synthetases, and ribosomes. The mRNA component is composed of codons, three nucleotide long elements, which are joined together end to end to form open reading frames (ORFs). While the genes of eukaryotes usually only have one ORF per mRNA sequence, it is not uncommon for prokaryotes to contain two or more ORFs per mRNA sequence (Watson et al., 2008). These multi-ORF mRNA sequences are referred to as polycistronic mRNAs and can encode multiple proteins from a single sequence of mRNA. In order for the amino acids to recognize and bind to the mRNA template, tRNA is used as a mediator. tRNAs are complementary to specific codons via their anti-codons and, upon recognition of their specified codon, incorporate the corresponding appropriate amino acid for that codon (Kolitz & Lorsch, 2010). Once the corresponding amino acid is bound to the tRNA, the complex is referred to as an aminoacyl-tRNA synthetase, which then binds to the complement mRNA to allow the appropriate amino acid to be added to the peptide chain. The final component of the translational process, the ribosome, is the enzyme responsible for catalyzing the pairing of mRNA and tRNA, leading to the formation of the polypeptide chain. Ribosomes are composed of two individual subunits, the small and large subunits, and contain three binding sites, the A site, the P site and the E site (Ramakrishnan, 2002). These three binding sites work together to allow protein synthesis. Similar to the transcriptional process, these components work together to perform the initiation,

The translational initiation stage for prokaryotes and eukaryotes involves similar steps, but each performs these steps using different enzymes. For prokaryotes, the initiation step involves the recruitment of the ribosome to the mRNA through a ribosomal binding site that is located just upstream of the start codon on the previously synthesized mRNA. This process can occur as soon as the nascent mRNA has exited the polymerase, with three translation initiation factors (IF1, IF2, IF3) binding to the A, E and P sites of the ribosome and directing the placement of the initiator tRNA to the start codon of mRNA (Ramakrishnan, 2002). Following binding, the initiation factor bound to the E site releases, allowing the large ribosomal subunit to unite with the small subunit, creating a 70S initiation complex. This binding causes the hydrolysis of GTP and subsequent release of all additional initiation factors. Following disassociation of the initiation factors, the

Due to the intrinsic compartmentalization in eukaryotic organisms, translation is a completely separate event from that of transcription because the nuclear membrane prevents the mRNA from interacting with the ribosome until it is released into the cytoplasm. However, once in the cytoplasm, the 5' methylated guanine cap attached to the eukaryotic mRNA binds to the ribosome and the process begins. The eukaryotic ribosome is similar to its prokaryotic counterpart in that it too has A, P and E binding sites and utilizes initiation factors to achieve correct attachment of associated tRNA (Figure 2). However,

ribosome/mRNA complex is then ready to enter the elongation phase.

elongation, and termination phases of translation.

**2.3.1 Initiation of translation** 

unlike the prokaryotic ribosome, the small subunit of the eukaryotic ribosome must bind to the initiator tRNA before coming into contact with mRNA (Watson et al., 2008). After the tRNA is bound, the ribosome then recognizes the mRNA template and begins scanning for an AUG start codon. Once identified, the initiator tRNA binds to the mRNA through hydrolysis of GTP, causing the release of the first set of initiation factors and introduction of a second set (Acker et al., 2009). This allows the large subunit to bind, initiating another GTP hydrolysis event that dissociates the remaining initiation factors and creates an 80S initiation complex. After the complete ribosome initiation complex is formed the ribosome/mRNA complex is ready to enter the elongation phase of translation.

Fig. 2. The ribosome is responsible for translating mRNA into protein. Used with permission from (Lafontaine & Tollervey, 2001)

#### **2.3.2 Elongation during translation**

Elongation is where the resultant protein encoded by a specific gene first begins to take form. During elongation, each tRNA codon associates with the appropriate amino acid through a 3´ ester bond. Once the amino acid is attached, the aminoacyl-tRNA containing that amino acid binds to the A site of the ribosome. The ribosome then forms a peptide bond between the amino acid of the incoming tRNA and the peptide chain attached to the peptidyl-tRNA in the P site. Binding of the amino acid to the peptide chain causes the aminoacyl-tRNA to become a peptidyl-tRNA and forces translocation of this tRNA from the A site to the P site. This transfer then forces the peptidyl-tRNA that was previously present at the P site to exit through the E site, forming a growing chain of polypeptides that will form the final protein originally encoded by the gene being expressed. This process is carried out with the help of elongation factors. In prokaryotes there are three elongation factors (EF-Tu, EF-G, and EF-T), whereas eukaryotes utilize only two elongation factors (eEF-1 and eEF-2) (Lavergne et al., 1992; Nilsson & Nissen, 2005; Oldfield & Proud, 1993). The prokaryotic elongation factor EF-Tu and eukaryotic elongation factor eEF-1 work in a similar fashion to bind to aminoacyl-tRNAs and escort them to the A site of the ribosome (Nilsson & Nissen, 2005; Oldfield & Proud, 1993). Once the aminoacyl-tRNA is in the A site, the peptide chain from the peptidyl-tRNA attaches to the amino acid on the aminoacyltRNA, and this complex is ready to be translocated. Translocation involves either the EF-G factor in prokaryotic systems or the eEF-2 factor in eukaryotic systems. Both of these factors are able to associate with the peptidyl-tRNA at the P site once the peptide chain has been transferred to the aminoacyl-tRNA at the A site, causing the hydrolysis of GTP that allows for the now peptidyl-tRNA of the A site to translocate to the P site and the peptidyl-tRNA that was in the P site to exit through the E site (Nilsson & Nissen, 2005; Riis et al., 1990; Watson et al., 2008). The final elongation factor, EF-T, found in prokaryotes and having no eukaryotic homologue, is responsible for the removal of EF-Tu and EF-G from the ribosome so that the A site is again able to bind to a new aminoacyl-tRNA and continue the elongation process (Nilsson & Nissen, 2005). This cycle of amino acid addition continues until all mRNA codons have been translated to protein.

#### **2.3.3 Termination of translation**

After successful completion of the protein synthesis process, the elongation phase must be terminated, effectively ending the growth of the polypeptide chain and marking the formation of a complete protein product. The elongation of the polypeptide product will continue until a stop codon is read from the mRNA template. In both prokaryotes and eukaryotes, there are three stop codons that can be employed to stop translation: UAG, UGA, or UAA. Once a stop codon has been recognized in the A site of the ribosome, a set of release factors (RFs) are called into action to allow the synthesized protein to be released. In prokaryotes there are two Class I release factors, RF1 and RF2, that recognize the UAG and UGA stop codons respectively and the UAA stop codon universally, and one Class II release factor, RF3, that allows the Class I release factors to dissociate from the ribosome after the protein has detached (Moreira et al., 2002). In contrast, eukaryotes have only one Class I release factor, eRF1, which recognizes all three stop codons and one Class II release factor eRF3 for dissociation (Moreira et al., 2002). Regardless of which release factor is used, when the stop codon is recognized, hydrolysis of the peptide chain begins and the newly synthesized protein and all termination elements are released from the ribosome. A summary of the host protein machinery active during translation is presented in Table 2.


Table 2. Host proteins active during translation

#### **3. Considerations for the expression of exogenous DNA**

Although nucleic acids serve as the universal genetic material and the central dogma applies to all organisms, exogenous expression of foreign genes is not as straightforward as delivering the target sequence into host cells and waiting for it to be expressed. This is because the gene expression machinery in certain species has evolved in such a way as to manipulate its own genetic material more efficiently than genomic material from other species, a fact that is especially true when the exogenous genetic material is from a very distantly related species. Any discrepancies, such as the genomic characteristics of GC content and codon usage patterns between the native and surrogate hosts will play an important role in the efficiency of exogenous gene expression. In addition, some organisms have also evolved to recognize and remove or silence foreign genetic sequences in order to protect themselves from the deleterious effects of foreign DNA expression. It is only through mimicking, circumventing, or deactivating these mechanisms that it becomes possible to efficiently express a foreign gene in a surrogate host. Therefore, by understanding how these mechanisms work, it increases the likelihood that a strategy can be developed for effective exogenous gene expression.

#### **3.1 GC content**

10 Genetic Engineering – Basics, New Applications and Responsibilities

transferred to the aminoacyl-tRNA at the A site, causing the hydrolysis of GTP that allows for the now peptidyl-tRNA of the A site to translocate to the P site and the peptidyl-tRNA that was in the P site to exit through the E site (Nilsson & Nissen, 2005; Riis et al., 1990; Watson et al., 2008). The final elongation factor, EF-T, found in prokaryotes and having no eukaryotic homologue, is responsible for the removal of EF-Tu and EF-G from the ribosome so that the A site is again able to bind to a new aminoacyl-tRNA and continue the elongation process (Nilsson & Nissen, 2005). This cycle of amino acid addition continues until all

After successful completion of the protein synthesis process, the elongation phase must be terminated, effectively ending the growth of the polypeptide chain and marking the formation of a complete protein product. The elongation of the polypeptide product will continue until a stop codon is read from the mRNA template. In both prokaryotes and eukaryotes, there are three stop codons that can be employed to stop translation: UAG, UGA, or UAA. Once a stop codon has been recognized in the A site of the ribosome, a set of release factors (RFs) are called into action to allow the synthesized protein to be released. In prokaryotes there are two Class I release factors, RF1 and RF2, that recognize the UAG and UGA stop codons respectively and the UAA stop codon universally, and one Class II release factor, RF3, that allows the Class I release factors to dissociate from the ribosome after the protein has detached (Moreira et al., 2002). In contrast, eukaryotes have only one Class I release factor, eRF1, which recognizes all three stop codons and one Class II release factor eRF3 for dissociation (Moreira et al., 2002). Regardless of which release factor is used, when the stop codon is recognized, hydrolysis of the peptide chain begins and the newly synthesized protein and all termination elements are released from the ribosome. A summary of the host protein machinery active during translation is presented in Table 2.

**Prokaryotes Eukaryotes Function** 

IF-1 eIF-1 Blocks the A site from

IF-2 eIF-2 Binds to initiator t-RNA IF-3 eIF-3 Blocks the E site N/A eIF-4 Ribosomal recognition of

N/A eIF-5 Blocks the E site

EF-G eEF-2 Translocation **Elongation** 

RF-2

Table 2. Host proteins active during translation

EF-Tu eEF-1 Binds aminoacyl-tRNA to the

EF-T N/A Releases elongation factors RF-1 Recognizes the UAA and

RF-3 eRF-2 Releases all translation factors

initiation t-RNA

mRNA

A site

UAG stop codons

UGA stop codons

eRF-1 Recognizes the UAA and

mRNA codons have been translated to protein.

**2.3.3 Termination of translation** 

**Initiation** 

**Termination** 

The term GC content refers to the percentage of G and C bases in a DNA sequence. It can be used to describe a gene, a chromosome, a genome, and even any region of a particular DNA sequence. Different organisms can vary significantly in their genomic GC content. For example, *Plasmodium falciparum* has an extremely GC-poor genome, with a GC content of approximately 20%, while *Streptomyces coelicolor* possess a GC content as high as 72%. The GC contents of commonly used laboratory organisms are listed in Table 3.


Table 3. GC content varies among common organisms

Due to the difference in thermodynamic stability between the GC bonding pairs and the AT bonding pairs, GC content can affect the formation and stability of both DNA and RNA secondary structures, which are important factors in the regulation of gene expression (Kubo & Imanaka, 1989; Kudla et al., 2009). In bacteria, the Shine-Dalgarno ribosome binding site that is located in the 5' untranslated region of the mRNA is relatively AU-rich. The presence of this high AT abundance and low secondary structure stability at the 5' end of a coding region has been found to contribute significantly to producing high translation efficiency in bacteria (Allert et al., 2010; Desmit & Vanduin, 1990). Furthermore, Kudla et al. have demonstrated that the addition of these types of AU-rich leader sequences to the 5' untranslated region of mRNAs can improve the expression levels of otherwise poorly expressed proteins (Kudla et al., 2009). In a recent systematic study of 340 genomes from various groups of organisms including bacteria, archaea, fungi, plants, insects, fishes, birds, and mammals, Gu and colleagues discovered a trend of reduced mRNA stability near the start codon in most organisms except birds and mammals and that this reduction results in changes in mRNA stability that are correlated with genomic GC content (Gu et al., 2010).

In birds and mammals, however, the genome-wide trend of reduced mRNA stability near the translation initiation site has not been observed, even though the GC content in these organisms is not significantly different from the species where such a trend was originally observed (Gu et al., 2010). The authors speculate that this difference is due to the isochoretype structure in the genomes of these organisms. An isochore is the result of a high variation in GC content over large-scale sequences within a genome (Bernardi, 1995). Within an isochore structure, however, the GC content is generally homogeneous regardless of the heterogeneous nature of the remainder of the genome (Figure 3) (Eyre-Walker & Hurst, 2001). It is important to note that, unlike in *E. coli*, high GC content within the coding region usually increases expression in mammalian cells (Kudla et al., 2006). Kudla and colleagues have found that GC-rich genes in mammalian cells were transcribed more efficiently than alternate, GC-poor versions of the same gene, leading to higher protein production. In fact, the 5' cap and Kozak consensus sequence located on the 5' untranslated region normally have a GC-rich composition in eukaryotic genes (Kozak, 1987).

Fig. 3. The classic isochore model of genomic GC content. Used with permission from (Eyre-Walker & Hurst, 2001)

It is widely accepted that genomic GC content has co-evolved with the gene expression machinery to ensure optimal expression for the fitness of the host (Andersson & Kurland, 1990; Kudla et al., 2009). Therefore, with regards to expression of exogenous genes, the difference in the GC contents between the target genes, especially at the 5' end, and the expression host can also impact the expression level of foreign genes. The difficulty in expressing *Plasmodium falciparum* genes in *E. coli* is hypothesized to be attributed to its extreme low GC content and the possibility of degradation of mRNA by ribonuclease E (McDowall et al., 1994; Plotkin & Kudla, 2011). Plotkin and Kudla have also predicted that more than 40% of human genes would be expressed poorly in *E. coli* without modification due to the relatively high GC content in the 5' end of mRNA and subsequent low 5' folding energy (Plotkin & Kudla, 2011).

#### **3.2 Codon usage bias**

12 Genetic Engineering – Basics, New Applications and Responsibilities

have demonstrated that the addition of these types of AU-rich leader sequences to the 5' untranslated region of mRNAs can improve the expression levels of otherwise poorly expressed proteins (Kudla et al., 2009). In a recent systematic study of 340 genomes from various groups of organisms including bacteria, archaea, fungi, plants, insects, fishes, birds, and mammals, Gu and colleagues discovered a trend of reduced mRNA stability near the start codon in most organisms except birds and mammals and that this reduction results in changes in mRNA stability that are correlated with genomic GC content (Gu et al., 2010).

In birds and mammals, however, the genome-wide trend of reduced mRNA stability near the translation initiation site has not been observed, even though the GC content in these organisms is not significantly different from the species where such a trend was originally observed (Gu et al., 2010). The authors speculate that this difference is due to the isochoretype structure in the genomes of these organisms. An isochore is the result of a high variation in GC content over large-scale sequences within a genome (Bernardi, 1995). Within an isochore structure, however, the GC content is generally homogeneous regardless of the heterogeneous nature of the remainder of the genome (Figure 3) (Eyre-Walker & Hurst, 2001). It is important to note that, unlike in *E. coli*, high GC content within the coding region usually increases expression in mammalian cells (Kudla et al., 2006). Kudla and colleagues have found that GC-rich genes in mammalian cells were transcribed more efficiently than alternate, GC-poor versions of the same gene, leading to higher protein production. In fact, the 5' cap and Kozak consensus sequence located on the 5' untranslated region normally

0 50 1.000 1.500 2.000 2.500 3.000 3.500 4.000

kb

Fig. 3. The classic isochore model of genomic GC content. Used with permission from (Eyre-

It is widely accepted that genomic GC content has co-evolved with the gene expression machinery to ensure optimal expression for the fitness of the host (Andersson & Kurland, 1990; Kudla et al., 2009). Therefore, with regards to expression of exogenous genes, the difference in the GC contents between the target genes, especially at the 5' end, and the expression host can also impact the expression level of foreign genes. The difficulty in expressing *Plasmodium falciparum* genes in *E. coli* is hypothesized to be attributed to its extreme low GC content and the possibility of degradation of mRNA by ribonuclease E (McDowall et al., 1994; Plotkin & Kudla, 2011). Plotkin and Kudla have also predicted that more than 40% of human genes would be expressed poorly in *E. coli* without modification due to the relatively high GC content in the 5' end of mRNA and subsequent low 5' folding

have a GC-rich composition in eukaryotic genes (Kozak, 1987).

0.60 0.55 0.50 0.45 0.40 0.35

Walker & Hurst, 2001)

energy (Plotkin & Kudla, 2011).

G+C content

In addition to determining mRNA stability and secondary structure organization, another feature of every genome that is impacted by GC content is its codon usage profile. The 20 amino acids commonly found in protein sequences are all encoded from a series of 61 different nucleotide triplets. The redundancy of this coding system necessarily allows the same amino acid to be encoded by several different codons. For example, the amino acids alanine and serine can be encoded using either four or six codons, respectively (Table 4). This innate degeneracy that is built into the genetic code has evolved to play a role in protecting DNA sequences from otherwise deleterious mutations by preserving their resultant protein sequences despite the inevitable incorporation of mutations at the genetic level, effectively silencing these mutations. However, the available synonymous codons are not used at equal frequencies across all species, nor across different regions within the same genome, and sometimes not even within the same gene (Andersson & Kurland, 1990; Kurland, 1991). Predictably, the discrepancy of codon usage profiles is greatest between remotely related species, while more closely related species are more likely to share similar codon preferences. Although the mechanistic processes underlying how an organism develops a specific codon bias has not been completely resolved (Chamary et al., 2006; Hershberg & Petrov, 2008), the GC content of the preferred codon chosen is thought to be the single most important factor determining codon usage biases across genomes (Plotkin & Kudla, 2011).


Table 4. Redundancy in the genetic code allows more than one codon to specify a particular amino acid

Although it was initially believed that synonymous codon substitutions were simply examples of fortuitous silent mutations, more recent research has revealed that codon usage patterns can directly affect important cellular processes such as the efficiency of transcription and translation, the accuracy of protein translation and even the process of protein folding (Angov, 2011; Zhang et al., 2009). It is therefore conceivable that the specific codon usage pattern of an organism has co-evolved along with other cellular machinery in order to provide for optimal gene expression and protein function of the host genes within their natural environment (Grantham et al., 1981). In prokaryotes, for example, the frequency of a codon being used correlates positively with the intracellular abundance of its corresponding tRNA (Bulmer, 1987; Dong et al., 1996). It therefore follows that the expression of non-native genes is hampered by the existence of variation in their respective codon usage pattern compared to the host organism. This hypothesis has been supported throughout the long history of exogenous gene expression, revealing that the same DNA sequence is often expressed at different efficiencies in different organisms (Gustafsson et al., 2004). This is due to the foreign DNA sequence containing codons that are rarely used in the host, a situation that leads to low levels of translational efficiency and protein expression (Kane, 1995; Kim & Lee, 2006; Rosano & Ceccarelli, 2009) due to a reduced translation elongation rate caused by the imbalance between the codons used in the target gene sequence and the available pool of charged tRNA in the host. These expression problems are then compounded by any incompatibility between the host translation machinery and the mRNA secondary structure due to changes in GC content from alternate codon usage patterns (Kim & Lee, 2006; Wu et al., 2004).

To overcome these problems, a common strategy aimed at enhancing the expression of nonnative genes in a surrogate host is that of codon optimization. This process encompasses the replacement of rare codons within the DNA sequence in order to closely match the host codon usage bias while retaining 100% identity to the original amino acid sequence. This process of codon optimization also allows for the simultaneous modification of predicted mRNA secondary structures that could result from changes in the GC content. This process is especially helpful in eliminating structures at the 5' end of coding regions, where they have an increased likelihood of interfering with downstream protein expression (Wu et al., 2004) *Cis*-acting negative regulatory elements within the coding sequence are also eliminated in order to reduce the chance of repression, therefore improving expression (Graf et al., 2000). The codon optimization process can be achieved experimentally either through multiple stages of site-directed mutagenesis on directly cloned DNA, or by resynthesis of the target gene *de novo*. The former method may be preferred if there are a limited number of codons that must be changed, however, the later method has become more and more practical due to improvements in the gene synthesis process that have both reduced the cost and time required to generate synthetic DNA sequences. In general, the codon optimization process has been shown to increase expression of a typical mammalian gene five- to fifteen-fold when expressed in an *E. coli* host (Burgess-Brown et al., 2008; Gustafsson et al., 2004). Similarly, expression of prokaryotic genes in eukaryotic cells can be improved significantly using this method as well (Patterson et al., 2005; Zolotukhin et al., 1996; Zur Megede et al., 2000).

#### **3.3 Mechanisms for removal and silencing of exogenous genes**

For an exogenous gene to be expressed in a non-native host, the foreign DNA must be physically delivered into the host cell and then properly integrated into the gene expression

Although it was initially believed that synonymous codon substitutions were simply examples of fortuitous silent mutations, more recent research has revealed that codon usage patterns can directly affect important cellular processes such as the efficiency of transcription and translation, the accuracy of protein translation and even the process of protein folding (Angov, 2011; Zhang et al., 2009). It is therefore conceivable that the specific codon usage pattern of an organism has co-evolved along with other cellular machinery in order to provide for optimal gene expression and protein function of the host genes within their natural environment (Grantham et al., 1981). In prokaryotes, for example, the frequency of a codon being used correlates positively with the intracellular abundance of its corresponding tRNA (Bulmer, 1987; Dong et al., 1996). It therefore follows that the expression of non-native genes is hampered by the existence of variation in their respective codon usage pattern compared to the host organism. This hypothesis has been supported throughout the long history of exogenous gene expression, revealing that the same DNA sequence is often expressed at different efficiencies in different organisms (Gustafsson et al., 2004). This is due to the foreign DNA sequence containing codons that are rarely used in the host, a situation that leads to low levels of translational efficiency and protein expression (Kane, 1995; Kim & Lee, 2006; Rosano & Ceccarelli, 2009) due to a reduced translation elongation rate caused by the imbalance between the codons used in the target gene sequence and the available pool of charged tRNA in the host. These expression problems are then compounded by any incompatibility between the host translation machinery and the mRNA secondary structure due to changes in GC content from alternate codon usage

To overcome these problems, a common strategy aimed at enhancing the expression of nonnative genes in a surrogate host is that of codon optimization. This process encompasses the replacement of rare codons within the DNA sequence in order to closely match the host codon usage bias while retaining 100% identity to the original amino acid sequence. This process of codon optimization also allows for the simultaneous modification of predicted mRNA secondary structures that could result from changes in the GC content. This process is especially helpful in eliminating structures at the 5' end of coding regions, where they have an increased likelihood of interfering with downstream protein expression (Wu et al., 2004) *Cis*-acting negative regulatory elements within the coding sequence are also eliminated in order to reduce the chance of repression, therefore improving expression (Graf et al., 2000). The codon optimization process can be achieved experimentally either through multiple stages of site-directed mutagenesis on directly cloned DNA, or by resynthesis of the target gene *de novo*. The former method may be preferred if there are a limited number of codons that must be changed, however, the later method has become more and more practical due to improvements in the gene synthesis process that have both reduced the cost and time required to generate synthetic DNA sequences. In general, the codon optimization process has been shown to increase expression of a typical mammalian gene five- to fifteen-fold when expressed in an *E. coli* host (Burgess-Brown et al., 2008; Gustafsson et al., 2004). Similarly, expression of prokaryotic genes in eukaryotic cells can be improved significantly using this method as well (Patterson et al., 2005; Zolotukhin et al., 1996; Zur Megede et al., 2000).

**3.3 Mechanisms for removal and silencing of exogenous genes** 

For an exogenous gene to be expressed in a non-native host, the foreign DNA must be physically delivered into the host cell and then properly integrated into the gene expression

patterns (Kim & Lee, 2006; Wu et al., 2004).

and regulation network within the host. Decades of research in the fields of molecular and cellular biotechnology have provided many effective techniques for the introduction of genetic material into both prokaryotic and eukaryotic hosts, however, after the gene has been transferred into the host cell, it needs to be recognized and processed by the host cells replication, transcription and translation machinery before it can be expressed as a functional protein. However, because expression of a foreign gene is often deleterious to host survival under wild-type conditions, many organisms have evolved defense mechanisms that remove or silence foreign DNA in order to protect themselves from this potentially detrimental process. In bacteria, for example, the invading foreign DNA can be cleaved by restriction endonucleases that recognize specific, non-self, nucleotide sequences, in a phenomenon referred to as restriction. In this process the native genetic material is often methylated at certain positions by methylase enzymes, therefore preventing recognition and degradation by the restriction endonucleases, and ensuring the maintenance and expression of native DNA sequences. This restriction modification system was first discovered in the 1960s and since that time has been demonstrated to be common in many bacterial species (Wilson & Murray, 1991). The restriction system, however, is not the only defense mechanism that has been developed to protect the host from expression of foreign genetic material. It has been demonstrated that Gram-negative bacteria are capable of selectively repressing horizontally acquired genes through their interaction with a histone-like nucleoid structuring (H-NS) protein. This phenomenon, termed xenogeneic silencing, was first discovered in 2006 by Navarre, Lucchini, Oshim and colleagues (Lucchini et al., 2006; Navarre et al., 2006; Oshima et al., 2006). The H-NS protein responsible for xenogeneic silencing belongs to a family of nucleoid-associated proteins that bind to AT-rich DNA sequences with low sequence specificity. In the case of xenogeneic silencing, H-NS protein targets the laterally acquired sequence because it exhibits a lower GC content than the host genome, allowing it to selectively repress the expression of exogenous DNA.

Unlike the prokaryotic approaches for silencing of exogenous DNA sequences, no mechanism for the direct removal of foreign genetic material has yet been proposed to function in eukaryotic organisms. Nonetheless, the expression of exogenous DNA in plants and mammalian cells often suffers from low efficiency due to epigenetic modification. These modifications lead to unstable expression and, in extreme cases, silencing of the transgene over time. Silencing can occur at either the transcriptional or post-transcriptional level through changes in the methylation status of the sequence, histone modification, or RNA interference (Pal-Bhadra et al., 2002; Pikaart et al., 1998; Riu et al., 2007). Regardless of the protective measures taken, these mechanisms are all employed by the host to regulate expression of exogenous genes and protect it from deleterious effects. One final concern that cannot yet be controlled for is that, due to the random integration following chromosomal introduction of an exogenous gene into the host chromosome, expression of the transgene can be highly dependent on the site of insertion. Depending on the location of integration, various position effects and epigenetic events often result in high variation of the expression level between individual expression attempts. While there is no way to reliably control for genomic insertion position of exogenous genes in the majority of cases, several elements have been proposed that can help to counteract the resultant position effects and achieve sustained transgene expression. These elements are discussed in section 4.4.

#### **4. Regulatory sequences that must be considered for optimal expression**

By developing a comprehensive understanding of the mechanisms underlying gene expression and appreciating how factors such as GC content and codon usage bias influence protein expression in non-native hosts, investigators can begin to develop theoretical guidelines for the rational design of DNA sequences optimally tuned for heterologous expression in their target organism. This approach is especially attractive, with the reduced time and cost of gene synthesis allowing for *de novo* production of complete genes and even entire expression cassettes making it possible to simply design a gene sequence and begin working. However, there are additional concerns that must be addressed prior to successful expression of an exogenous gene sequence. Besides the optimization of the coding region, regulatory sequences that are not transcribed or translated should also be taken into consideration in order to achieve optimal expression. Although not expressed in the final protein product, these elements are involved in the transcription, translation and long-term maintenance of target genes in the surrogate host, making their optimization just as important as optimization of the coding sequence itself.

#### **4.1 Regulatory elements involved in transcription**

The process leading from a gene to a functional protein starts with transcription by RNA polymerase. Therefore transcription initiation is often an important point of control for exogenous protein expression. The driving force behind recruiting and binding the polymerase that will transcribe the DNA to mRNA is the promoter sequence that is required to recruit the host's transcription machinery. Even though the promoter itself is not transcribed or translated, choosing a promoter that can be efficiently processed by the host's machinery therefore has a significant impact on the success of the design strategy. Commonly, strong, constitutive promoters that are normally used to drive the expression of endogenous housekeeping genes in the expression host are chosen for high level expression of exogenous genes. For example, the T7, alcohol dehydrogenase 1 (ADH1) and human elongation factor 1 α (EF1α) promoters are commonly employed for heterologous protein expression in *E. coli*, *S. cerevisiae* and mammalian cells, respectively. Viral promoters such as the cytomegalovirus immediate early (CMV IE) promoter and the Simian virus 40 (SV40) regulatory sequence are also used to drive transgene expression in mammalian cells as well. It is important to note, however, that while the strength of the promoter used can at least partially determine the level of transgene expression, different promoters can have variable rates of transcription across different cell lines. For this reason, the selection of an appropriate promoter should be determined on a case-by-case basis. Recent studies have systematically compared many of the commonly used promoters in a variety of cell types (Norrman et al., 2010; Qin et al., 2010) (Figure 4). These types of references are an excellent source of information when designing constructs with specific expression needs.

It is also important to remember that promoter sequences can be designed *de novo* similar to gene sequences, and that designing a specific primer upstream of a gene construct may be beneficial if no native alternative promoter sequences are available. Analysis of a large number of prokaryotic and eukaryotic promoters has revealed that many promoters contain a conserved core sequence that is essential for recognition and binding of RNA polymerase and its cofactors. Through incorporation of these conserved sequences, it may be possible to specifically design a promoter sequence, allowing one to tailor expression of their genetic

By developing a comprehensive understanding of the mechanisms underlying gene expression and appreciating how factors such as GC content and codon usage bias influence protein expression in non-native hosts, investigators can begin to develop theoretical guidelines for the rational design of DNA sequences optimally tuned for heterologous expression in their target organism. This approach is especially attractive, with the reduced time and cost of gene synthesis allowing for *de novo* production of complete genes and even entire expression cassettes making it possible to simply design a gene sequence and begin working. However, there are additional concerns that must be addressed prior to successful expression of an exogenous gene sequence. Besides the optimization of the coding region, regulatory sequences that are not transcribed or translated should also be taken into consideration in order to achieve optimal expression. Although not expressed in the final protein product, these elements are involved in the transcription, translation and long-term maintenance of target genes in the surrogate host, making their optimization just as

The process leading from a gene to a functional protein starts with transcription by RNA polymerase. Therefore transcription initiation is often an important point of control for exogenous protein expression. The driving force behind recruiting and binding the polymerase that will transcribe the DNA to mRNA is the promoter sequence that is required to recruit the host's transcription machinery. Even though the promoter itself is not transcribed or translated, choosing a promoter that can be efficiently processed by the host's machinery therefore has a significant impact on the success of the design strategy. Commonly, strong, constitutive promoters that are normally used to drive the expression of endogenous housekeeping genes in the expression host are chosen for high level expression of exogenous genes. For example, the T7, alcohol dehydrogenase 1 (ADH1) and human elongation factor 1 α (EF1α) promoters are commonly employed for heterologous protein expression in *E. coli*, *S. cerevisiae* and mammalian cells, respectively. Viral promoters such as the cytomegalovirus immediate early (CMV IE) promoter and the Simian virus 40 (SV40) regulatory sequence are also used to drive transgene expression in mammalian cells as well. It is important to note, however, that while the strength of the promoter used can at least partially determine the level of transgene expression, different promoters can have variable rates of transcription across different cell lines. For this reason, the selection of an appropriate promoter should be determined on a case-by-case basis. Recent studies have systematically compared many of the commonly used promoters in a variety of cell types (Norrman et al., 2010; Qin et al., 2010) (Figure 4). These types of references are an excellent

source of information when designing constructs with specific expression needs.

It is also important to remember that promoter sequences can be designed *de novo* similar to gene sequences, and that designing a specific primer upstream of a gene construct may be beneficial if no native alternative promoter sequences are available. Analysis of a large number of prokaryotic and eukaryotic promoters has revealed that many promoters contain a conserved core sequence that is essential for recognition and binding of RNA polymerase and its cofactors. Through incorporation of these conserved sequences, it may be possible to specifically design a promoter sequence, allowing one to tailor expression of their genetic

**4. Regulatory sequences that must be considered for optimal expression** 

important as optimization of the coding sequence itself.

**4.1 Regulatory elements involved in transcription** 

Fig. 4. Systematic comparison of different promoters in different mammalian cell types. Originally published in (Qin et al., 2010)

construct to their specific needs. In prokaryotes, this conserved sequence is known as the Pribnow box, and consists of a consensus sequence of six nucleotides, TATAAT (Pribnow, 1975). In addition, there is another conserved element often found 17 bp upstream of the Pribnow box. This upstream region has a consensus TTGACAT sequence that has been shown to be crucial for transcription initiation (Rosenberg & Court, 1979). In eukaryotic organisms, the counterpart to the Pribnow box is the TATA box with a consensus sequence of TATAAA. Besides recruiting the associated transcription machinery, these core promoter elements are also crucial in defining where RNA synthesis starts. In prokaryotes, RNA synthesis usually begins 10 bp downstream of the Pribnow box, whereas the first transcribed nucleotide is located approximately 25 bp downstream of the TATA box in eukaryotes. Therefore in addition to the use of an appropriate core promoter sequence, the location of that promoter sequence relative to the coding region should also be carefully considered to ensure complete transcription of the target genes.

It is important to note that although this minimal core promoter is essential for transcription, it alone is often not adequate to drive high level protein expression. In eukaryotes, DNA elements known as enhancers are often employed in tandem with the core promoter to enhance gene expression through the recruitment of additional transcription factors. These enhancers can be found at various locations, including upstream of the core promoter, within the introns of the gene driven by the core promoter, and downstream of the genes it regulates as well (Levine & Tjian, 2003). Although the mechanistic function of most enhancers is still not well understood, some well-studied viral enhancer elements are often included in common expression vectors as a means to increase the transcription efficiency of exogenous sequences. For example, the CMV IE enhancer has been shown to be capable of improving gene expression levels by 8- to 67-fold in lung epithelial cells when combined with several weak promoters (Yew et al., 1997) and Li and colleagues have further demonstrated that adding an SV40 enhancer to the CMV IE enhancer/promoter or 3' end of the polyadenylation site can increase exogenous gene expression in mouse muscle cells by up to 20-fold (Li et al., 2001).

#### **4.2 Regulatory elements involved in translation**

Just as with the requirement of a core promoter sequence for the initiation of transcription, the presence of certain, conserved sequences at the 5' untranslated region of mRNA sequences are essential for the initiation of translation. In prokaryotic organisms, the Shine-Dalgarno sequence on the transcribed mRNA serves this function by acting as the ribosome binding site (RBS). This consensus sequence is composed of six nucleotides, AGGAGG, which are complementary to the anti-Shine-Dalgarno sequence located at the 3' end of the 16S rRNA in the ribosome. During the initiation of translation the ribosome is recruited to the mRNA by this complementary base paring between the RBS and the 16S rRNA. For this reason, the classic RBS is included as a standard element in the Registry of Standard Biological Parts (http://partsregistry.org/). Also included in the registry is a collection of constitutive prokaryotic RBSs containing the Shine-Dalgarno sequence as well as flanking sequences that are known to affect translation. These sequences are invaluable when designing promoter and gene sequences, as their incorporation is required for efficient expression of the synthetic construct.

In eukaryotes, the 40S ribosomal subunit helps to serve this purpose by attaching to initiation factors that assist in the process of scanning the mRNA, with the Kozak sequence acting as the main initiator for translation (Kozak, 1986, 1987). This translational process most commonly begins at the AUG codon closest to the 5' end of the mRNA, however, this is not always the case. Kozak et al. have demonstrated that the distance from the 5' end, the sequence surrounding the first AUG codon, and its steric relationship with the 40S ribosomal subunit all contribute to determining the actual initiation site location. However, it has been routinely demonstrated that placing the promoter and Kozak sequence upstream of the initiating codon serves to induce increased expression of target gene sequences (Morita et al., 2000).

Besides the optimization of the codon usage pattern in the coding region, additional considerations must be taken into account when expressing prokaryotic genes in eukaryotic hosts or vice versa. Genes cloned directly from the genomic library of a eukaryotic organism usually cannot be expressed successfully in a prokaryotic host due to the presence of intervening, non-coding regions within the sequence. Unlike eukaryotes, prokaryotes lack the RNA splicing mechanisms required to remove these intron sequences and produce a mature mRNA. Therefore, any introns present within the expression construct must be eliminated prior to introduction into the prokaryotic host.

#### **4.3 Elements for simultaneous expression of multiple genes in eukaryotes**

Conversely, a significant obstacle towards the expression of genomically cloned bacterial genes in a eukaryotic host is the inability of the host to synthesize proteins polycistronically from a single mRNA. Unlike in prokaryotes, where translation of multiple adjacent genes from one promoter is common, translation in eukaryotic cells normally requires the presence of a methyl-7-G(5')pppN cap at the 5' end of the mRNA prior to recognition by the

demonstrated that adding an SV40 enhancer to the CMV IE enhancer/promoter or 3' end of the polyadenylation site can increase exogenous gene expression in mouse muscle cells by

Just as with the requirement of a core promoter sequence for the initiation of transcription, the presence of certain, conserved sequences at the 5' untranslated region of mRNA sequences are essential for the initiation of translation. In prokaryotic organisms, the Shine-Dalgarno sequence on the transcribed mRNA serves this function by acting as the ribosome binding site (RBS). This consensus sequence is composed of six nucleotides, AGGAGG, which are complementary to the anti-Shine-Dalgarno sequence located at the 3' end of the 16S rRNA in the ribosome. During the initiation of translation the ribosome is recruited to the mRNA by this complementary base paring between the RBS and the 16S rRNA. For this reason, the classic RBS is included as a standard element in the Registry of Standard Biological Parts (http://partsregistry.org/). Also included in the registry is a collection of constitutive prokaryotic RBSs containing the Shine-Dalgarno sequence as well as flanking sequences that are known to affect translation. These sequences are invaluable when designing promoter and gene sequences, as their incorporation is required for efficient

In eukaryotes, the 40S ribosomal subunit helps to serve this purpose by attaching to initiation factors that assist in the process of scanning the mRNA, with the Kozak sequence acting as the main initiator for translation (Kozak, 1986, 1987). This translational process most commonly begins at the AUG codon closest to the 5' end of the mRNA, however, this is not always the case. Kozak et al. have demonstrated that the distance from the 5' end, the sequence surrounding the first AUG codon, and its steric relationship with the 40S ribosomal subunit all contribute to determining the actual initiation site location. However, it has been routinely demonstrated that placing the promoter and Kozak sequence upstream of the initiating codon serves to induce increased expression of target gene sequences

Besides the optimization of the codon usage pattern in the coding region, additional considerations must be taken into account when expressing prokaryotic genes in eukaryotic hosts or vice versa. Genes cloned directly from the genomic library of a eukaryotic organism usually cannot be expressed successfully in a prokaryotic host due to the presence of intervening, non-coding regions within the sequence. Unlike eukaryotes, prokaryotes lack the RNA splicing mechanisms required to remove these intron sequences and produce a mature mRNA. Therefore, any introns present within the expression construct must be

Conversely, a significant obstacle towards the expression of genomically cloned bacterial genes in a eukaryotic host is the inability of the host to synthesize proteins polycistronically from a single mRNA. Unlike in prokaryotes, where translation of multiple adjacent genes from one promoter is common, translation in eukaryotic cells normally requires the presence of a methyl-7-G(5')pppN cap at the 5' end of the mRNA prior to recognition by the

**4.3 Elements for simultaneous expression of multiple genes in eukaryotes** 

eliminated prior to introduction into the prokaryotic host.

up to 20-fold (Li et al., 2001).

expression of the synthetic construct.

(Morita et al., 2000).

**4.2 Regulatory elements involved in translation** 

translation initiation complex at the start of peptide synthesis (Pestova et al., 2001). There are strategies, however, that allow for co-expression of two or more genes in eukaryotic cells. On the most basic level, it is possible to express each gene independently from its own promoter, either through the introduction of multiple vectors, or introduction of a single vector containing multiple promoters. An alternate approach is expression of the multiple genes using a polycistronic expression vector that takes advantage of either IRES (Internal Ribosomal Entry Site) or 2A elements. Derived from a viral linker sequence, the IRES element allows for 5'-cap-independent ribosomal binding and translation initiation directly at the start codon of the downstream gene, thus enabling translation of multiple ORFs from a single mRNA (Jackson, 1988; Jang et al., 1988). Although known IRES sequences vary in length and sequence, certain secondary structures have been shown to be conserved and important for the function of the elements (Baird et al., 2006). The most widely used IRES sequence for expression in mammalian cells is the one derived from encephalomyocarditis virus (EMCV) (de Felipe, 2002). Similar to the IRES elements, 2A elements are viral sequences that can also be used as a short linker region to provide translation of two or more genes driven off of a single promoter. Translation of the 2A element causes an interaction between the newly synthesized sequence and the exit tunnel of the ribosome. This interaction causes a "skipping" of the last peptide bond at the C terminus of the 2A sequence. Despite this missing bond, the ribosome is able to continue translation, creating a second, independent protein product. To ensure continuous translation, the stop codon of the ORF upstream of the 2A element must be mutated to avoid unnecessary termination. By using a combination of various IRES and 2A elements, investigators have demonstrated polycistronic expression of five genes simultaneously from a single promoter in mammalian cells (Szymczak & Vignali, 2005), illustrating how they can be used to simulate the polycistronic expression of some bacterial genes.

#### **4.4 Elements for sustained maintenance and expression**

Integration of exogenous DNA sequences into a host chromosome is usually required for sustained transgene expression in mammalian cells. Because the insertion event preceding expression is largely random, the expression level of the integrated gene can be greatly impacted by the surrounding sequences and chromatin structure. As a consequence, unstable expression and high variability between individual clones are the two major issues associated with transgene expression. In addition, if insertion of the exogenous genes occurs within or in close vicinity to a required host gene, the health or survivability of the host can be negatively impacted. To aid in controlling for this type of negative regulation, several DNA elements capable of preventing these types of position effects and stabilizing transgene expression have been discovered (Table 5). These DNA elements are naturally found in mammalian genomes and are crucial for regulating the proper expression of endogenous genes. The locus control regions (LCRs) can enhance transcription of linked genes and also enable copy number-dependent gene expression (Li et al., 2002), however, their large size and tissue-specific nature constrain their application in a variety of mammalian cell types (Kwaks & Otte, 2006). Insulators, also known as barriers or enhancerblocking elements, are DNA sequences that can protect genes from the transcriptionally inactive heterochromatin or the action of enhancers and repressors (Recillas-Targa et al., 2004). As an example, the best-characterized insulator, cHS4 (chicken β-globin hypersensitive site 4), has been shown to stabilize transgene expression over a long period of time (Pikaart et al., 1998) and facilitate efficient integration of expressed sequences (Recillas-Targa et al., 2004). Similar to insulators, STARs (stabilizing and antirepressor elements) are specifically used to block repression. Another type of DNA sequence, known as the ubiquitous chromatin opening element (UCOE) is derived from promoters of ubiquitously expressed genes. These elements have been shown to improve and stabilize transgene expression in a tissue-nonspecific manner, most likely through the maintenance of an active chromatin structure (Williams et al., 2005). Matrix attachment regions (MAR) are elements that mediate the attachment of the chromosome to the nuclear matrix and, as such, are also widely used in DNA for sustained transgene expression. These elements have also been shown to counteract position dependent insertion effects and prevent transgene silencing in a variety of cell types and transgenic animals (reviewed by Harraghy and colleagues (Harraghy et al., 2008)).


Table 5. Many different elements can be used to enhance and stabilize transgene expression in mammalian cells. Modified from (Kwaks & Otte, 2006) and (Harraghy et al., 2008)

#### **5. Mammalian expression of the bacterial luciferase gene cassette: A case study in exogenous expression**

Over the years there have been myriad examples of exogenously expressed genes. A recent example that highlights many of the considerations discussed here is the adaption of the bacterial luciferase gene cassette to function autonomously in a human cell line. The bacterial luciferase gene cassette, commonly referred to as the *lux* cassette, had been utilized in prokaryotic systems for almost 20 years prior to its first successful expression in a eukaryotic cell, and, even then, required almost another decade before it was successfully expressed in a human cell line. By following the development of the *lux* system from a strictly bacterial genetic system through its development into a eukaryotic reporter cassette, it is possible to review not only the genetic modifications that are required for exogenous gene expression, but also the thought process that leads researchers to implement these modifications.

#### **5.1 Bacterial luciferase background**

20 Genetic Engineering – Basics, New Applications and Responsibilities

of time (Pikaart et al., 1998) and facilitate efficient integration of expressed sequences (Recillas-Targa et al., 2004). Similar to insulators, STARs (stabilizing and antirepressor elements) are specifically used to block repression. Another type of DNA sequence, known as the ubiquitous chromatin opening element (UCOE) is derived from promoters of ubiquitously expressed genes. These elements have been shown to improve and stabilize transgene expression in a tissue-nonspecific manner, most likely through the maintenance of an active chromatin structure (Williams et al., 2005). Matrix attachment regions (MAR) are elements that mediate the attachment of the chromosome to the nuclear matrix and, as such, are also widely used in DNA for sustained transgene expression. These elements have also been shown to counteract position dependent insertion effects and prevent transgene silencing in a variety of cell types and transgenic animals (reviewed by Harraghy and

> **Stability of expression**

**Insulator** 1.2-2.4 kb Unknown Yes Unknown No Majority Yes

**MAR** ~3 kb Yes Yes No No Majority Yes

Table 5. Many different elements can be used to enhance and stabilize transgene expression in mammalian cells. Modified from (Kwaks & Otte, 2006) and (Harraghy et al., 2008)

**5. Mammalian expression of the bacterial luciferase gene cassette: A case** 

but also the thought process that leads researchers to implement these modifications.

Over the years there have been myriad examples of exogenously expressed genes. A recent example that highlights many of the considerations discussed here is the adaption of the bacterial luciferase gene cassette to function autonomously in a human cell line. The bacterial luciferase gene cassette, commonly referred to as the *lux* cassette, had been utilized in prokaryotic systems for almost 20 years prior to its first successful expression in a eukaryotic cell, and, even then, required almost another decade before it was successfully expressed in a human cell line. By following the development of the *lux* system from a strictly bacterial genetic system through its development into a eukaryotic reporter cassette, it is possible to review not only the genetic modifications that are required for exogenous gene expression,

**UCOE** 2.5-8 kb Yes Yes No Unknown Yes

**STAR** 0.5-2 kb Yes Yes No Yes Yes

**Cell typespecific**  **Copy numberdependent** 

**Positionindependent** 

Yes, if powerful enough

colleagues (Harraghy et al., 2008)).

**Element Size Increased** 

**study in exogenous expression** 

**expression** 

**LCR** 16 kb Unknown Yes Yes Yes

The bacterial luciferase (*lux*) gene cassette is a series of five genes whose protein products synergistically work together to produce a luminescent signal at 490 nm in the blue range of the visible spectrum (Close et al., 2009). Two of the five genes (*luxA* and *luxB*) form the heterodimeric luciferase protein, while the remaining three genes (*luxC*, *luxD*, and *luxE*) are responsible for the production of a long chain aliphatic aldehyde co-substrate upon which the luciferase protein acts (Meighen, 1991). The remaining co-substrates, FMNH2 and O2, are naturally present within the host and can be directly scavenged by the enzyme. Upon binding of the substrate complex to the luciferase dimer, the complex becomes oxidized and releases a photon at 490 nm (Figure 5). The turnover of this reaction is extremely slow, with the process taking as long as 20 sec at 20°C (Hastings & Nealson, 1977).

Fig. 5. The bioluminescent reaction catalyzed by the bacterial luciferase gene cassette. Reproduced with permission from (Close et al., 2009)

While these genes are widely distributed in prokaryotic organisms, the bioluminescent system they encode for is quite distinct from those commonly found in eukaryotes, such as the firefly or *Renilla* luciferase systems. Unlike these eukaryotic bioluminescence systems, the *lux* system is organized as a single operon, with all of the genes required for bioluminescent production driven from a single promoter. In addition, its prokaryotic origin means that it is optimized for function in a cellular background that is free from extensive compartmentalization. It is therefore not surprising that extensive genetic modifications were required prior to successful expression in the distantly related human cellular background. These modifications present an interesting case study of the considerations that must be made when exogenously expressing any gene in a non-native host organism.

#### **5.2 Initial attempts at exogenous expression**

The first attempts to express the *lux* system outside of bacteria started in the 1980's. After realizing the benefits offered by the fully autonomous expression of light as a bioluminescent reporter system in bacterial species, there was an increasing interest in evolving this system to function in a wider variety of organisms in order to take advantage of it usefulness across an increasingly broad range of circumstances. These initial attempts focused on expression of only the *luxA* and *luxB* genes rather than full cassette expression, seeking to first determine how to make the luciferase function and then apply the lessons learned to expression of the remaining *lux* genes.

Because eukaryotic organisms are not capable of polycistronic expression, the first modification made for the expression of the *luxA* and *luxB* genes was to place them each under the control of independent promoters (Koncz et al., 1987). This strategy allowed for the transcription of each mRNA sequence to occur independently. However, since each was placed on the same plasmid, their physical location of expression in the host should be proximal. This expression strategy circumvents the need for polycistronic expression, while simultaneously maximizing the chance that the *luxA* and *luxB* protein products will associate *in vivo* to produce a functional heterodimer. When this system was expressed in plants, cell extracts were capable of producing light in response to treatment with an aldehyde substrate. While this demonstrated the ability to exogenously express at least a portion of the *lux* cassette, it was still far from practical in terms of autonomous bioluminescent expression.

Moving forward from this dual promoter system in plants, several groups began experimenting with expressing the *luxA* and *luxB* genes as fusion products in yeast (Boylan et al., 1989; Kirchner et al., 1989), *Drosophila* (Almashanu et al., 1990), and even murine cell lines (Pazzagli et al., 1992). Regardless of the host origin, the results of these experiments were generally met with similar outcomes. When tested in yeast cells, the bioluminescent expression upon treatment with the aldehyde substrate was detectable above background, however, not as prevalent as bioluminescence from alternate prokaryotic systems tested under the same conditions (Boylan et al., 1989). When expression using this strategy was attempted using higher eukaryotic hosts such as *Drosophila* and murine cell lines, an interesting problem was encountered; bioluminescence was detectable but was determined to be highly temperature sensitive.

Because of the higher temperatures required for growth of the murine Ltk- cell line, the *lux* luciferase proteins were not able to maintain high levels of stability following gene expression. This resulted in extremely low levels of bioluminescent production from Ltkcells transfected with the *luxA* and *luxB* genes when grown at their optimal temperature of 37°C. When the growth temperature was decreased to a tolerable, but not ideal temperature of 30°C, bioluminescent detection increased 10-fold (Pazzagli et al., 1992). The temperaturedependent nature of this bioluminescent decrease was additionally confirmed through further testing in *E. coli*, where it was determined that hosts expressing LuxA-LuxB fusion proteins were capable of producing a greater than 50,000-fold increase in bioluminescent production when grown at 23°C compared to growth at 37°C (Escher et al., 1989). This highlights the need to not only evaluate the potential genetic hurdles to exogenous expression of a target gene, but also to consider the physiological limitations constraining expression of the protein encoded from that gene as well. This constraint proved to be a significant challenge in the development of routine eukaryotic expression of these genes, and it would be another decade before it was overcome, finally leading to expression of the full *lux* cassette in a yeast cell model.

#### **5.3 Autonomous bioluminescent expression from the** *lux* **cassette in yeast**

Using the lessons that were learned from expression of both dual-promoter and fusionbased expression of the *luxA* and *luxB* genes detailed above, work continued toward the

seeking to first determine how to make the luciferase function and then apply the lessons

Because eukaryotic organisms are not capable of polycistronic expression, the first modification made for the expression of the *luxA* and *luxB* genes was to place them each under the control of independent promoters (Koncz et al., 1987). This strategy allowed for the transcription of each mRNA sequence to occur independently. However, since each was placed on the same plasmid, their physical location of expression in the host should be proximal. This expression strategy circumvents the need for polycistronic expression, while simultaneously maximizing the chance that the *luxA* and *luxB* protein products will associate *in vivo* to produce a functional heterodimer. When this system was expressed in plants, cell extracts were capable of producing light in response to treatment with an aldehyde substrate. While this demonstrated the ability to exogenously express at least a portion of the *lux* cassette, it was still far from practical in terms of autonomous

Moving forward from this dual promoter system in plants, several groups began experimenting with expressing the *luxA* and *luxB* genes as fusion products in yeast (Boylan et al., 1989; Kirchner et al., 1989), *Drosophila* (Almashanu et al., 1990), and even murine cell lines (Pazzagli et al., 1992). Regardless of the host origin, the results of these experiments were generally met with similar outcomes. When tested in yeast cells, the bioluminescent expression upon treatment with the aldehyde substrate was detectable above background, however, not as prevalent as bioluminescence from alternate prokaryotic systems tested under the same conditions (Boylan et al., 1989). When expression using this strategy was attempted using higher eukaryotic hosts such as *Drosophila* and murine cell lines, an interesting problem was encountered; bioluminescence was detectable but was determined

Because of the higher temperatures required for growth of the murine Ltk- cell line, the *lux* luciferase proteins were not able to maintain high levels of stability following gene expression. This resulted in extremely low levels of bioluminescent production from Ltkcells transfected with the *luxA* and *luxB* genes when grown at their optimal temperature of 37°C. When the growth temperature was decreased to a tolerable, but not ideal temperature of 30°C, bioluminescent detection increased 10-fold (Pazzagli et al., 1992). The temperaturedependent nature of this bioluminescent decrease was additionally confirmed through further testing in *E. coli*, where it was determined that hosts expressing LuxA-LuxB fusion proteins were capable of producing a greater than 50,000-fold increase in bioluminescent production when grown at 23°C compared to growth at 37°C (Escher et al., 1989). This highlights the need to not only evaluate the potential genetic hurdles to exogenous expression of a target gene, but also to consider the physiological limitations constraining expression of the protein encoded from that gene as well. This constraint proved to be a significant challenge in the development of routine eukaryotic expression of these genes, and it would be another decade before it was overcome, finally leading to expression of the

**5.3 Autonomous bioluminescent expression from the** *lux* **cassette in yeast** 

Using the lessons that were learned from expression of both dual-promoter and fusionbased expression of the *luxA* and *luxB* genes detailed above, work continued toward the

learned to expression of the remaining *lux* genes.

bioluminescent expression.

to be highly temperature sensitive.

full *lux* cassette in a yeast cell model.

expression of the full *lux* cassette in a eukaryotic host. The first major breakthrough came from the decision to express *lux* genes from the bacterium *Photorhabdus luminescens* rather than the classical *lux* model organism, *Vibrio harveyi* (Gupta et al., 2003). Unlike the *V*. *harveyi* template organism used in the previous attempts, *P*. *luminescens* is a terrestrial rather than marine bacterium. As such, it therefore has a higher native growth temperature, which leads to the stability of its protein products at a higher temperature than those encoded by *V*. *harveyi,* despite performing the same function *in vivo*. This simple change in selection for the source of the exogenous genes demonstrates how important the selection process can be when expressing genes in a foreign host. Without the innate structural stability offered by the *P*. *luminescens* proteins, no combination of genetic modifications would have been capable of inducing high-level expression in a eukaryotic host at its preferred growth temperature.

Having overcome the intrinsic problems with gene expression at the natural yeast growth temperature, there were still additional genetic modifications that would have to be considered before the full *lux* cassette could be autonomously expressed. The first important consideration was that of how to promote constitutive, high level expression of the genes themselves. This was accomplished through the incorporation of yeast-specific promoter sequences that had previously been demonstrated to drive high-level expression under the majority of growth conditions. These promoters, the glyceraldehyde 3' phosphate dehydrogenase (GPD) and alcohol dehydrogenase 1 (ADH1) promoters, were used in place of the native upstream regions from the wild-type bacterial species that either have an inducer binding site or AT rich region (Meighen, 1991). The replacement of this AT rich promoter region with known, host-expressible promoters ensured that there would be high levels of transcription when the genes were expressed in the yeast surrogate.

Next, it was necessary for the researchers to develop a method for the expression of the five *lux* cassette genes simultaneously within the adopted host. Because *S. cerevisiae* is a eukaryote, it is not capable of carrying out the natural polycistronic expression of the cassette as would occur under wild-type conditions in a prokaryotic host. To overcome this hurdle, the polycistronic expression system was mimicked through the incorporation of IRES sites (Gupta et al., 2003). These IRES sites function as linker regions between the individual *lux* genes and allow for expression of multiple ORFs to be transcribed to a single piece of mRNA, but then translated individually through cap-independent ribosome recruitment during translation (Lupez-Lastra et al., 2005). While there are multiple organisms that are known to harbor these IRES elements, the researchers used an IRES sequence found natively from *S. cerevisiae* to ensure it would function efficiently in this system (Gupta et al., 2003).

Even with the addition of these IRES linker regions and multiple promoters, the shear number of genes that must be expressed for autonomous light production using the *lux* cassette still presented a significant obstacle for exogenous expression. To overcome this problem, it was determined that the most efficient expression strategy was to divide expression of the *lux* cassette between two independent expression vectors (Gupta et al., 2003). This created an expression system whereby the *luxA* and *luxB* genes were expressed independently from two promoters on a single vector, while the *luxC*, *luxD*, and *luxE* genes were expressed from a second vector and linked using IRES sequences (Figure 6). While the vectors used in this example are capable of episomal expression in yeast, it is important to note that normally eukaryotic expression occurs after chromosomal integration of the transfected gene sequences. Since this process cannot control the integration location of the gene sequences, a dual vector expression strategy could potentially lead to distal integration of the gene sequences and increase the probability that expression of the different gene groups would occur with different efficiencies despite their use of identical promoter sequences.

Fig. 6. Expression of the *lux* gene cassette in *S. cerevisiae* was made possible through A) independent expression of the *luxA* and *luxB* genes on one plasmid and B) expression of the remaining *lux* genes using a combination of multiple promoters and IRES linker regions from a second plasmid. Adapted from (Gupta et al., 2003)

Due to the extensive modifications performed to the *lux* cassette genes, they were capable of producing a well defined bioluminescent signal when expressed in *S. cerevisiae* (Gupta et al., 2003). This marked the first successful demonstration of *lux*-based autonomous bioluminescent production from a eukaryotic host organism. Despite this success, it was determined that the compartmentalization intrinsic to the eukaryotic nature of the yeast host was limiting access of the luciferase to its FMNH2 co-substrate. Unlike prokaryotes, eukaryotes do not have large quantities of cytosolically available FMNH2. This required an additional change to the *lux* expression strategy, whereby a flavin reductase gene (*frp*) was added to the *lux* cassette downstream of the *luxE* gene using the previously described IRES linker region and under control of the ADH1 promoter. This served to increase the amount of FMHN2 available locally to the luciferase enzyme. This final modification both stabilized bioluminescent production and increased light output greater than 5-fold (Gupta et al., 2003). While not often considered during exogenous expression, this addition provides an excellent example of how the expression environment must be considered in addition to general genetic modifications. In the case of *lux* expression, the addition of the *frp* gene was sufficient to alter the environment to a more favorable condition; however, this may not always be the case and should be approached on a case-by-case basis.

#### **5.4 Modification of the** *lux* **cassette for expression in mammalian cells**

Following the successful demonstration of autonomous bioluminescence from the *lux* genes in *S. cerevisiae*, research was begun into its expression in human cell lines. It was initially believed that the modifications that had been established during development for yeast expression would be sufficient for expression in the human cellular background. If

note that normally eukaryotic expression occurs after chromosomal integration of the transfected gene sequences. Since this process cannot control the integration location of the gene sequences, a dual vector expression strategy could potentially lead to distal integration of the gene sequences and increase the probability that expression of the different gene groups would occur with different efficiencies despite their use of identical promoter

*luxD luxC luxE GPD ADHI*

*GPD ADHI*

Sm Sp

*luxB*

K ScE

SmSp

K Sc E

N

Fig. 6. Expression of the *lux* gene cassette in *S. cerevisiae* was made possible through A) independent expression of the *luxA* and *luxB* genes on one plasmid and B) expression of the remaining *lux* genes using a combination of multiple promoters and IRES linker regions

Due to the extensive modifications performed to the *lux* cassette genes, they were capable of producing a well defined bioluminescent signal when expressed in *S. cerevisiae* (Gupta et al., 2003). This marked the first successful demonstration of *lux*-based autonomous bioluminescent production from a eukaryotic host organism. Despite this success, it was determined that the compartmentalization intrinsic to the eukaryotic nature of the yeast host was limiting access of the luciferase to its FMNH2 co-substrate. Unlike prokaryotes, eukaryotes do not have large quantities of cytosolically available FMNH2. This required an additional change to the *lux* expression strategy, whereby a flavin reductase gene (*frp*) was added to the *lux* cassette downstream of the *luxE* gene using the previously described IRES linker region and under control of the ADH1 promoter. This served to increase the amount of FMHN2 available locally to the luciferase enzyme. This final modification both stabilized bioluminescent production and increased light output greater than 5-fold (Gupta et al., 2003). While not often considered during exogenous expression, this addition provides an excellent example of how the expression environment must be considered in addition to general genetic modifications. In the case of *lux* expression, the addition of the *frp* gene was sufficient to alter the environment to a more favorable condition; however, this may not

<sup>P</sup> <sup>A</sup> <sup>B</sup> Sl

Sl <sup>N</sup> <sup>B</sup>

*luxA*

P

IRES

from a second plasmid. Adapted from (Gupta et al., 2003)

always be the case and should be approached on a case-by-case basis.

**5.4 Modification of the** *lux* **cassette for expression in mammalian cells** 

Following the successful demonstration of autonomous bioluminescence from the *lux* genes in *S. cerevisiae*, research was begun into its expression in human cell lines. It was initially believed that the modifications that had been established during development for yeast expression would be sufficient for expression in the human cellular background. If

sequences.

A

B

this had been determined to be the case, it would have been possible simply to transfect human cells with the previously developed vectors and monitor bioluminescent output. Unfortunately, this was determined not to be true, and expression of the genes, even with the addition of human specific, strong promoters could not be detected at levels significantly above background (Close et al., 2010; Patterson et al., 2005). It was therefore necessary to again modify the *lux* expression system in order to promote expression in a human host cell line.

Just as with previous modification approaches, this work began by focusing on expression of only a subset of the *lux* cassette genes, *luxA* and *luxB*. Using the lessons learned from *S. cerevisiae* expression, the *luxA* and *luxB* genes were placed under the control of a strong, constitutive human promoter and linked using a human specific IRES linker region. While this did lead to the ability to detect bioluminescence from cell extracts upon supplementation with substrates, it was not a significant improvement over expression in a yeast host. With little more that could be done to improve expression through genetic organization and enhanced promoter sequences, the researchers turned to the process of codon-optimization in hopes of increasing transcriptional and translational efficiency and therefore increasing light output. The codon usage patterns for the *P*. *luminescens lux* genes were compared to the codon usage patterns of each amino acid for all known expressed human genes and then altered to more closely match the human codon preference. At this time, the gene sequences were also scanned for the presence of restriction and other regulatory sequences such as potential hairpins or terminator sequences. These sequences were then eliminated through the replacement of the DNA sequence with a sequence that matched the original amino acid output with 100% identity, but was computationally favored due to its closer match with human codon preferences and absence of regulatory sequences (Table 6) (Patterson et al., 2005). This codon-optimization process, along with the previously described modifications, was capable of boosting bioluminescent output 54-fold over expression of non-codon-optimized gene sequences. This significant change highlights how important the codon optimization process can be when exogenously expressing genes in a distantly related organism.


Table 6. Comparison of the *luxA* and *luxB* genes in their wild-type (wt) and codonoptimized (co) forms. The probability of recognition as an exon was determined *in silico* using the genescan algorithm (http://genes.mit.edu). Adapted from (Patterson et al., 2005) Based on the success of the codon-optimization process for expression of the *luxA* and *luxB* genes in a human host cell, work then immediately began on implementing expression of the full *lux* cassette for autonomous bioluminescent production from a human host. For this process, the vector that was developed for expression of *luxA* and *luxB* was maintained, and the additional *lux* genes were placed into a second vector, mimicking the strategy employed for full *lux* cassette expression in *S. cerevisiae*. One important change that was incorporated, however, was the replacement of the yeast specific glyceraldehyde 3' phosphate dehydrogenase and alcohol dehydrogenase 1 promoters with CMV and EF1-α promoters (Close et al., 2010). These promoters allowed for strong constitutive expression of the remaining *lux* genes in a way that would not be possible if the original bacterial AT rich regions or yeast promoters were used. The benefits of the codon-optimization process were again highlighted during optimization of the remaining *lux* genes. The removal of regulatory sequences had a dramatic effect on the expression of the *luxE* gene, where their presence would have moved the predicted translational start point back to the 102nd nucleotide of the DNA sequence. In addition, the GC content of each of the genes was significantly altered to more closely match that of human coding regions, aiding in the recognition, expression and stability of each of the gene sequences following transfection into the human cellular genome (Table 7). As before, the *frp* flavin reductase gene was included in these constructs as well to compensate for the diminished cytosolic availability of FMNH2 in the highly compartmentalized eukaryotic host.


Table 7. Codon-optimization of the remainder of the *lux* genes was responsible for significant changes in both transcriptional start sites and the overall GC content. Each of these changes contributed significantly to the probability of the sequence being recognized as a coding sequence in the human host as determined *in silico* using the genescan algorithm (http://genes.mit.edu). Reproduced with permission from (Close et al., 2010)

Based on the success of the codon-optimization process for expression of the *luxA* and *luxB* genes in a human host cell, work then immediately began on implementing expression of the full *lux* cassette for autonomous bioluminescent production from a human host. For this process, the vector that was developed for expression of *luxA* and *luxB* was maintained, and the additional *lux* genes were placed into a second vector, mimicking the strategy employed for full *lux* cassette expression in *S. cerevisiae*. One important change that was incorporated, however, was the replacement of the yeast specific glyceraldehyde 3' phosphate dehydrogenase and alcohol dehydrogenase 1 promoters with CMV and EF1-α promoters (Close et al., 2010). These promoters allowed for strong constitutive expression of the remaining *lux* genes in a way that would not be possible if the original bacterial AT rich regions or yeast promoters were used. The benefits of the codon-optimization process were again highlighted during optimization of the remaining *lux* genes. The removal of regulatory sequences had a dramatic effect on the expression of the *luxE* gene, where their presence would have moved the predicted translational start point back to the 102nd nucleotide of the DNA sequence. In addition, the GC content of each of the genes was significantly altered to more closely match that of human coding regions, aiding in the recognition, expression and stability of each of the gene sequences following transfection into the human cellular genome (Table 7). As before, the *frp* flavin reductase gene was included in these constructs as well to compensate for the diminished cytosolic availability

of FMNH2 in the highly compartmentalized eukaryotic host.

**Length**

**(bp) % GC** 

wt*luxC* 1 1443 37% N/A 0.921

co*luxC* 1 1443 60% 449 0.999

wt*luxD* 1 924 38% N/A 0.875

co*luxD* 1 924 59% 294 0.999

wt*luxE* 102 1087 38% N/A 0.443

co*luxE* 1 1113 60% 331 0.999

wt*frp* 1 613 47% N/A 0.715

co*frp* 1 723 64% 249 0.999

Table 7. Codon-optimization of the remainder of the *lux* genes was responsible for significant changes in both transcriptional start sites and the overall GC content. Each of these changes contributed significantly to the probability of the sequence being recognized as a coding sequence in the human host as determined *in silico* using the genescan algorithm

(http://genes.mit.edu). Reproduced with permission from (Close et al., 2010)

**Number of Nucleotide Substitutions**

**Probability of Recognition as an Exon**

**Gene Predicted** 

**Start Position**

While the changes required to induce bioluminescent production from the *lux* cassette genes in the human cellular background were extensive, they were all necessary for proper function. The failure of even a single modification would lead to cells that may be capable of expressing the genes but not maintaining expression at a high enough level to be useful as a reporter system (Figure 7).

Fig. 7. Comparison of the bioluminescent expression from the *lux* genes expressed in a human host cell either following the modifications described above (modified *lux*) or without the aforementioned modifications (wild-type *lux*), and background light detection from host cells without *lux* genes (background). Adapted from (Close et al., 2010)

However, through the application of the techniques and considerations defined in this chapter, it was possible to develop not just one gene, but an entire cassette of six gene sequences from a reporter system once believed to function only in prokaryotic organisms, into a novel bioluminescent reporter system capable of being expressed in a human cell line with a signal bright enough to be seen through tissue similar to native eukaryotic genes such as firefly luciferase (Figure 8).

Fig. 8. Following modification of the full *lux* cassette, it was capable of being expressed in a human cell line host and producing bioluminescence at levels comparable to detection patterns of the native eukaryotic bioluminescent firefly luciferase (*luc*) gene. Adapted from (Close et al., 2011)

#### **6. Conclusions**

This chapter has detailed many of the concerns that must be considered when attempting to exogenously express a gene of interest in a foreign host. While a strong understanding of the transcriptional, translational and regulatory processes that dictate the maintenance and expression of all genes is a prerequisite for understanding the reasons why certain modifications must be performed in order to elicit high levels of exogenous expression, the examples provided here should be enough for the average researcher to begin developing an acceptable expression protocol. It is not a requirement that all of the modifications discussed in this chapter be applied to every gene, but a broad understanding of the possible changes can provide one with a wide variety of tools for expression of recalcitrant gene sequences. Just as with the *lux* cassette system, it is often necessary to perform more than one modification in order to induce acceptable levels of expression from foreign genes when expressed in a distal host organism. Often, proceeding in a step-wise fashion will yield clues as to which modifications will need to be performed, and which steps can be avoided, to save time and money when developing a new expression platform for a previously unexpressed gene sequence. It should also be noted that the methods detailed in this chapter are not all encompassing. In some cases, the host environment may simply not be suitable for expression of the target gene sequence and it may not be possible to alter that environment through the expression or deletion of additional genes. However, as the suite of exogenous expression techniques continues to grow via the discovery of new methods and our understanding of the cellular processes responsible for maintenance and expression of genes grows, the number of inexpressible genes will continue to fall.

#### **7. Acknowledgments**

Portions of this review reflecting work by the authors was supported by the National Science Foundation Division of Chemical, Bioengineering, Environmental, and Transport Systems (CBET) under award number CBET-0853780, the National Institutes of Health, National Cancer Institute, Cancer Imaging Program, award number CA127745-01, the University of Tennessee Research Foundation Technology Maturation Funding program, and the Army Defense University Research Instrumentation Program.

#### **8. References**


This chapter has detailed many of the concerns that must be considered when attempting to exogenously express a gene of interest in a foreign host. While a strong understanding of the transcriptional, translational and regulatory processes that dictate the maintenance and expression of all genes is a prerequisite for understanding the reasons why certain modifications must be performed in order to elicit high levels of exogenous expression, the examples provided here should be enough for the average researcher to begin developing an acceptable expression protocol. It is not a requirement that all of the modifications discussed in this chapter be applied to every gene, but a broad understanding of the possible changes can provide one with a wide variety of tools for expression of recalcitrant gene sequences. Just as with the *lux* cassette system, it is often necessary to perform more than one modification in order to induce acceptable levels of expression from foreign genes when expressed in a distal host organism. Often, proceeding in a step-wise fashion will yield clues as to which modifications will need to be performed, and which steps can be avoided, to save time and money when developing a new expression platform for a previously unexpressed gene sequence. It should also be noted that the methods detailed in this chapter are not all encompassing. In some cases, the host environment may simply not be suitable for expression of the target gene sequence and it may not be possible to alter that environment through the expression or deletion of additional genes. However, as the suite of exogenous expression techniques continues to grow via the discovery of new methods and our understanding of the cellular processes responsible for maintenance and expression

of genes grows, the number of inexpressible genes will continue to fall.

and the Army Defense University Research Instrumentation Program.

Portions of this review reflecting work by the authors was supported by the National Science Foundation Division of Chemical, Bioengineering, Environmental, and Transport Systems (CBET) under award number CBET-0853780, the National Institutes of Health, National Cancer Institute, Cancer Imaging Program, award number CA127745-01, the University of Tennessee Research Foundation Technology Maturation Funding program,

Abe, H., & Aiba, H. (1996). Differential contributions of two elements of rho-independent

Acker, M. G., Shin, B.-S., Nanda, J. S., Saini, A. K., Dever, T. E., & Lorsch, J. R. (2009). Kinetic

Allert, M., Cox, J. C., & Hellinga, H. W. (2010). Multifactorial determinants of protein

Almashanu, S., Musafia, B., Hadar, R., Suissa, M., & Kuhn, J. (1990). Fusion of *luxA* and *luxB*

*melanogaster*. *Journal of Bioluminescence and Chemiluminescence,* 5, 1, pp. 89-97.

terminator to transcription termination and mRNA stabilization. *Biochimie,* 78, 11-

analysis of late steps of eukaryotic translation initiation. *Journal of Molecular Biology,* 

expression in prokaryotic open reading frames. *Journal of Molecular Biology,* 402, 5,

and its expression in *Escherichia coli*, *Saccharomyces cerevisiae* and *Drosophila* 

**6. Conclusions** 

**7. Acknowledgments** 

**8. References** 

12, pp. 1035-1042.

385, 2, pp. 491-506.

pp. 905-918.


Ebright, R. H. (2000). RNA polymerase: Structural similarities between bacterial RNA

Escher, A., Okane, D. J., Lee, J., & Szalay, A. A. (1989). Bacterial luciferase alpha-beta fusion

Eyre-Walker, A., & Hurst, L. D. (2001). The evolution of isochores. *Nature Reviews Genetics,* 2,

Falaschi, A. (2000). Eukaryotic DNA replication: a model for a fixed double replisome.

Graf, M., Bojak, A., Deml, L., Bieler, K., Wolf, H., & Wagner, R. (2000). Concerted action of

Grantham, R., Gautier, C., Gouy, M., Jacobzone, M., & Mercier, R. (1981). Codon catalog

Gu, W. J., Zhou, T., & Wilke, C. O. (2010). A universal trend of reduced mRNA stability near

Gupta, R. K., Patterson, S. S., Ripp, S., & Sayler, G. S. (2003). Expression of the *Photorhabdus* 

Gustafsson, C., Govindarajan, S., & Minshull, J. (2004). Codon bias and heterologous protein

Harraghy, N., Gaussin, A., & Mermod, N. (2008). Sustained transgene expression using

Hastings, J., & Nealson, K. (1977). Bacterial bioluminescence. *Annual Reviews in Microbiology,* 

Hershberg, R., & Petrov, D. A. (2008). Selection on codon bias. *Annual Review of Genetics,* 42,

Jackson, R. J. (1988). RNA translation - Picornaviruses break the rules. *Nature,* 334, 6180, pp.

Jang, S. K., Krausslich, H. G., Nicklin, M. J. H., Duke, G. M., Palmenberg, A. C., & Wimmer,

Kane, J. F. (1995). Effects of rare codon clusters on high-level expression of heterologous proteins in *Escherichia coli*. *Current Opinion in Biotechnology,* 6, 5, pp. 494-500. Keck, J. L., & Berger, J. M. (2000). DNA replication at high resolution. *Chemistry & Biology,* 7,

Kim, S., & Lee, S. B. (2006). Rare codon clusters at 5'-end influence heterologous expression

of archaeal gene in *Escherichia coli*. *Protein Expression and Purification,* 50, 1, pp. 49-

E. (1988). A segment of the 5' nontranslated region of encephalomyocarditis virus RNA directs internal entry of ribosomes during in vitro translation. *Journal of* 

expression. *Trends in Biotechnology,* 22, 7, pp. 346-353.

MAR elements. *Current Gene Therapy,* 8, 5, pp. 353-366.

pp. 687-698.

7, pp. 549-555.

10822-10826.

9, 1, pp. R43-R74.

31, 1, pp. 549-595.

1, pp. 287-299.

3, pp. R63-71.

57.

292-293.

*Biology,* 6, 2, pp. e1000664.

*Research,* 4, 3, pp. 305-313.

*Virology,* 62, 8, pp. 2636-2643.

*America,* 86, 17, pp. 6528-6532.

*Trends in Genetics,* 16, 2, pp. 88-92.

polymerase and eukaryotic RNA polymerase II. *Journal of Molecular Biology,* 304, 5,

protein is fully active as a monomer and highly sensitive *in vivo* to elevated temperature. *Proceedings of the National Academy of Sciences of the United States of* 

multiple *cis*-acting sequences is required for Rev dependence of late human immunodeficiency virus type 1 gene expression. *Journal of Virology,* 74, 22, pp.

usage is a genome strategy modulated for gene expressivity. *Nucleic Acids Research,* 

the translation-initiation site in prokaryotes and eukaryotes. *PLoS Computational* 

*luminescens lux* genes (*luxA*, *B*, *C*, *D*, and *E*) in *Saccharomyces cerevisiae*. *FEMS Yeast* 


Meighen, E. A. (1991). Molecular biology of bacterial bioluminescence. *Microbiological* 

Moreira, D., Kervestin, S., Jean-Jean, O., & Philippe, H. (2002). Evolution of eukaryotic

genetic code deviations. *Molecular Biology and Evolution,* 19, 2, pp. 189-200. Morita, S., Kojima, T., & Kitamura, T. (2000). Plat-E: An efficient and stable system for transient packaging of retroviruses. *Gene Therapy,* 7, 12, pp. 1063-1066. Murakami, K. S., & Darst, S. A. (2003). Bacterial RNA polymerases: The whole story. *Current* 

Navarre, W. W., Porwollik, S., Wang, Y. P., McClelland, M., Rosen, H., Libby, S. J., et al.

Nilsson, J., & Nissen, P. (2005). Elongation factors on the ribosome. *Current Opinion in* 

Norrman, K., Fischer, Y., Bonnamy, B., Sand, F. W., Ravassard, P., & Semb, H. (2010).

Oldfield, S., & Proud, C. G. (1993). Phosphorylation of elongation factor-2 from the lepidopteran insect, spodoptera frugiperda. *FEBS Letters,* 327, 1, pp. 71-74. Oshima, T., Ishikawa, S., Kurokawa, K., Aiba, H., & Ogasawara, N. (2006). *Escherichia coli*

Patterson, S. S., Dionisi, H. M., Gupta, R. K., & Sayler, G. S. (2005). Codon optimization of

Pazzagli, M., Devine, J. H., Peterson, D. O., & Baldwin, T. O. (1992). Use of bacterial and

Pestova, T. V., Kolupaeva, V. G., Lomakin, I. B., Pilipenko, E. V., Shatsky, I. N., Agol, V. I., et

Pikaart, M. I., Recillas-Targa, F., & Felsenfeld, G. (1998). Loss of transcriptional activity of a

prevented by insulators. *Genes & Development,* 12, 18, pp. 2852-2862. Plotkin, J. B., & Kudla, G. (2011). Synonymous but not the same: the causes and consequences of codon bias. *Nature Reviews Genetics,* 12, 1, pp. 32-42. Pribnow, D. (1975). Nucleotide sequence of an RNA polymerase binding site at an early T7

association with RNA polymerase. *DNA Research,* 13, 4, pp. 141-153. Pal-Bhadra, M., Bhadra, U., & Birchler, J. A. (2002). RNAi related mechanisms affect both

translation elongation and termination factors: Variations of evolutionary rate and

(2006). Selective silencing of foreign DNA with low GC content by the H-NS

Quantitative comparison of constitutive promoters in human ES cells. *PLoS ONE,* 5,

histone-like protein H-NS preferentially binds to horizontally acquired DNA in

transcriptional and posttranscriptional transgene silencing in *Drosophila*. *Molecular* 

bacterial luciferase (*lux*) for expression in mammalian cells. *Journal of Industrial* 

firefly luciferases as reporter genes in DEAE-dextran mediated transfection of

al. (2001). Molecular mechanisms of translation initiation in eukaryotes. *Proceedings of the National Academy of Sciences of the United States of America,* 98, 13, pp. 7029-

transgene is accompanied by DNA methylation and histone deacetylation and is

promoter. *Proceedings of the National Academy of Sciences of the United States of* 

*Reviews,* 55, 1, pp. 123-142.

*Opinion in Structural Biology,* 13, 1, pp. 31-39.

*Structural Biology,* 15, 3, pp. 349-354.

8, pp. e12413.

*Cell,* 9, 2, pp. 315-327.

*America,* 72, 3, pp. 784-788.

7036.

protein in *Salmonella*. *Science,* 313, 5784, pp. 236-238.

*Microbiology & Biotechnology,* 32, 3, pp. 115-123.

mammalian cells. *Analytical Biochemistry,* 204, 2, pp. 315-323.


### **Gateway Vectors for Plant Genetic Engineering: Overview of Plant Vectors, Application for Bimolecular Fluorescence Complementation (BiFC) and Multigene Construction**

Yuji Tanaka1, Tetsuya Kimura2, Kazumi Hikino3, Shino Goto3,4, Mikio Nishimura3,4, Shoji Mano3,4 and Tsuyoshi Nakagawa1 *1Department of Molecular and Functional Genomics, Center for Integrated Research in Science, Shimane University, 2Department of Sustainable Resource Science, Graduate School of Bioresources, Mie University, 3Department of Cell Biology, National Institute for Basic Biology, 4Department of Basic Biology, School of Life Science, The Graduate University for Advanced Studies, Japan* 

#### **1. Introduction**

34 Genetic Engineering – Basics, New Applications and Responsibilities

Zhang, G., Hubalewska, M., & Ignatova, Z. (2009). Transient ribosomal attenuation

Zolotukhin, S., Potter, M., Hauswirth, W., Guy, J., & Muzyczka, N. (1996). A "humanized"

Zur Megede, J., Chen, M. C., Doe, B., Schaefer, M., Greer, C. E., Selby, M., et al. (2000).

immunodeficiency virus type 1 *gag* gene. *Journal of Virology,* 74, 6, pp. 2628. Zvereva, M., Shcherbakova, D., & Dontsova, O. (2010). Telomerase: Structure, functions, and

*Molecular Biology,* 16, 3, pp. 274-280.

cells. *Journal of Virology,* 70, 7, pp. 4646-4654.

activity regulation. *Biochemistry,* 73, 13, pp. 1563-1583.

coordinates protein synthesis and co-translational folding. *Nature Structural &* 

green fluorescent protein cDNA adapted for high-level expression in mammalian

Increased expression and immunogenicity of sequence-modified human

Transgenic technologies for the genetic engineering of plants are very important for basic plant research and biotechnology. For example, promoter analysis with a reporter such as green fluorescent protein (GFP) is typically used to determine the expression pattern of genes of interest in basic plant research. Moreover, downregulation or controlled expression studies of target genes are used to determine the function of these genes. In plant biotechnology, overexpression of heterologous genes by transgenic methods is widely used to improve industrially important crop plants. Recently, genome projects focusing on various higher plants have provided abundant sequence information, and genome-wide studies of gene function and gene regulation are being carried out. In these areas of research, transgenic analyses using genetically modified plants will become more essential. For example, high-throughput promoter analysis to examine the temporal and spatial regulation of gene expression, the subcellular localization of the gene products based on reporter genes, and ectopic expression of cDNA clones and RNAi will reveal the functions of a variety of genes. For gene manipulation in plants, the binary system of *Agrobacterium*mediated transformation is most widely used. This system consists of two plasmids derived from Ti plasmids, namely disarmed Ti plasmids and binary vectors (Bevan, 1984). The former contains most genes for T-DNA transfer from *Agrobacterium tumefaciens* to plants, whereas the latter is composed of a functional T-DNA and minimal elements for replication both in *Escherichia coli* and in *A. tumefaciens*. Most of the widely used binary vectors established in the 1990s were constructed by a traditional restriction endonuclease based method. Therefore, it was time consuming and laborious to construct modified genes on binary vectors using the limited number of available restriction sites because of their large size and the existence of many restriction sites outside their cloning sites. To overcome this disadvantage and perform high-throughput analysis of plant genes, a new cloning system to realize rapid and efficient construction of modified genes on binary vectors was desired. The Gateway cloning system provided by Invitrogen (Carlsbad, CA, USA) is one of these solutions. We have constructed a variety of Gateway compatible Ti binary vectors for plant transgenic research.

#### **2. Basic Ti-binary vector for** *Agrobacterium***-mediated transformation and Gateway cloning**

Transformation mediated by the soil bacterium *A. tumefaciens* is widely used for gene manipulation of plants. This bacterium has huge Ti-plasmids (larger than 200 kb) and the ability to transfer the T-DNA region of the Ti-plasmid to infect plant chromosomes. The natural Ti-mediated transformation system can be applied to transfer novel genes into a plant genome. To be useful for gene manipulation, binary vectors possessing the T-DNA region were developed. The vectors must possess a plant selection marker gene, a bacterial antibiotic resistance gene, a site for cloning foreign genes, T-DNA border sequences for gene transfer to the plant genome, an origin of replication (*ori*) for a broad host range of the plasmid and an *ori* for *E. coli*. Although binary vectors are much smaller than native Ti– plasmids, they are still large and cause difficulties in gene cloning by traditional methods. Gateway Technology (available from Invitrogen) is based on the site-specific recombination system between phage lambda and *E. coli* DNA. This system was modified to improve its specificity and efficiency to utilize it as a universal cloning system. The advantages of Gateway cloning are as follows: it is free from the need for restriction endonucleases and DNA ligase, has a simple and uniform protocol, and offers highly efficient and reliable cloning and easy manipulation of fusion constructs. Therefore, the development of a variety of Gateway cloning compatible vectors for many purposes will expand the usefulness of this system in plant research.

#### **2.1 Ti-binary vector for** *Agrobacterium***-mediated plant transformation**

*A. tumefaciens* harboring a Ti-plasmid can transfer a specific segment of the plasmid, the T-DNA region, which is bounded by a right border (RB) and a left border (LB) sequence, to the genome of an infected plant (Figure 1). Expression of the T-DNA genes causes the overproduction of phytohormones in the infected cells, which causes crown gall tumors. Although T-DNA genes are required for crown gall tumor formation, other genes called the *vir* genes outside of the T-DNA region are essential for transfer of T-DNA into the host plant genome. These *vir* genes work even when they reside on another plasmid in *A. tumefaciens*. Based on these findings, a Ti-binary vector system was developed to overcome the difficulty of manipulating the original Ti plasmids *in vitro* by recombinant DNA methods due to their huge size (Bevan, 1984). A wide range of shuttle vectors for *E. coli* and *A. tumefaciens* was constructed that contain T-DNA border sequences flanking multiple restriction sites for foreign DNA cloning and marker genes for selection in plant cells. Using this vector system, DNA manipulation and vector construction can be done in *E. coli*; the vector is then transferred to *A. tumefaciens* harboring an artificial Ti-plasmid in which the T-DNA has been deleted. The vector is maintained stably in *A. tumefaciens,* and the cloned foreign DNA and marker gene between RB and LB can be transferred to the host plant genome by the transformation system encoded by *vir* genes on the T-DNA deletion Ti-plasmid. In early studies, several dicot plants were transformed by an *Agrobacterium* method. However, various dicot and monocot plants can now be transformed by co-cultivation of leaf slices or cultured calli with chemicals inducing expression of *vir* genes. Transformed cells are selected by marker gene phenotype such as antibiotic resistance and regenerated to transgenic plants. The most important model plant, *Arabidopsis thaliana*, can be easily transformed by *A. tumefaciens* using a floral dip procedure.

Fig. 1. Ti-binary vector system for *Agrobacterium*-mediated plant transformation. A binary vector, in which a target gene and plant selection marker gene are cloned between the two border sequences (RB and LB), is transformed into *A. tumefaciens* harboring a disarmed Ti-plasmid without the T-DNA region. Plant cells are infected by the transformed *A. tumefaciens* and then the target gene and marker gene are transferred into a plant chromosome by the *vir* genes on Ti-plasmid

#### **2.2 Outline of Gateway cloning**

36 Genetic Engineering – Basics, New Applications and Responsibilities

binary vectors using the limited number of available restriction sites because of their large size and the existence of many restriction sites outside their cloning sites. To overcome this disadvantage and perform high-throughput analysis of plant genes, a new cloning system to realize rapid and efficient construction of modified genes on binary vectors was desired. The Gateway cloning system provided by Invitrogen (Carlsbad, CA, USA) is one of these solutions. We have constructed a variety of Gateway compatible Ti binary vectors for plant

**2. Basic Ti-binary vector for** *Agrobacterium***-mediated transformation and** 

**2.1 Ti-binary vector for** *Agrobacterium***-mediated plant transformation** 

*A. tumefaciens* harboring a Ti-plasmid can transfer a specific segment of the plasmid, the T-DNA region, which is bounded by a right border (RB) and a left border (LB) sequence, to the genome of an infected plant (Figure 1). Expression of the T-DNA genes causes the overproduction of phytohormones in the infected cells, which causes crown gall tumors. Although T-DNA genes are required for crown gall tumor formation, other genes called the *vir* genes outside of the T-DNA region are essential for transfer of T-DNA into the host plant genome. These *vir* genes work even when they reside on another plasmid in *A. tumefaciens*. Based on these findings, a Ti-binary vector system was developed to overcome the difficulty of manipulating the original Ti plasmids *in vitro* by recombinant DNA methods due to their huge size (Bevan, 1984). A wide range of shuttle vectors for *E. coli* and *A. tumefaciens* was constructed that contain T-DNA border sequences flanking multiple restriction sites for foreign DNA cloning and marker genes for selection in plant cells. Using this vector system, DNA manipulation and vector construction can be done in *E. coli*; the vector is then transferred to *A. tumefaciens* harboring an artificial Ti-plasmid in which the T-DNA has been deleted. The vector is maintained stably in *A. tumefaciens,* and the cloned foreign DNA and

Transformation mediated by the soil bacterium *A. tumefaciens* is widely used for gene manipulation of plants. This bacterium has huge Ti-plasmids (larger than 200 kb) and the ability to transfer the T-DNA region of the Ti-plasmid to infect plant chromosomes. The natural Ti-mediated transformation system can be applied to transfer novel genes into a plant genome. To be useful for gene manipulation, binary vectors possessing the T-DNA region were developed. The vectors must possess a plant selection marker gene, a bacterial antibiotic resistance gene, a site for cloning foreign genes, T-DNA border sequences for gene transfer to the plant genome, an origin of replication (*ori*) for a broad host range of the plasmid and an *ori* for *E. coli*. Although binary vectors are much smaller than native Ti– plasmids, they are still large and cause difficulties in gene cloning by traditional methods. Gateway Technology (available from Invitrogen) is based on the site-specific recombination system between phage lambda and *E. coli* DNA. This system was modified to improve its specificity and efficiency to utilize it as a universal cloning system. The advantages of Gateway cloning are as follows: it is free from the need for restriction endonucleases and DNA ligase, has a simple and uniform protocol, and offers highly efficient and reliable cloning and easy manipulation of fusion constructs. Therefore, the development of a variety of Gateway cloning compatible vectors for many purposes will expand the usefulness of this

transgenic research.

**Gateway cloning** 

system in plant research.

Gateway cloning technology is based on the lambda phage infection system, in which sitespecific reversible recombination reactions occur during phage integration into and excision from *E. coli* genome (Figure 2). In this process, the *att*P site (242 bp) of lambda phage and the *att*B site (25 bp) of *E. coli* recombine (in a BP reaction) and the lambda phage genome is integrated into the *E. coli* genome. After the recombination reaction, the lambda phage genome is flanked by the *att*L (100 bp) and *att*R (168 bp) sites. In the reverse reaction, the phage DNA is excised from the *E. coli* genome by recombination between the *att*L and *att*R sites (in an LR reaction). The BP reaction needs two proteins, the phage integrase (Int) and the *E. coli* integration host factor (IHF). The mixture of these two proteins is called BP clonase in the Gateway system. In the LR reaction, Int, IHF and one more phage protein, excisionase (Xis), are required, and this mixture is called LR clonase. The Gateway cloning method uses these *att* sites and clonases for construction of recombinant DNA *in vitro*.

Fig. 2. BP and LR reactions in lambda phage infection of *E. coli*. The site-specific reversible BP and LR recombination reactions occur during lambda phage integration into and excision from the *E. coli* genome

Basic strategies for application of Gateway technology to plasmid construction are shown in Figure 3. For the basic Gateway system, four pairs of modified *att* sites were generated for directional cloning. They are *att*B1 and *att*B2, *att*P1 and *att*P2, *att*L1 and *att*L2, and *att*R1 and *att*R2; a recombination reaction can occur only in the combinations of *att*B1 and *att*P1, *att*B2 and *att*P2, *att*L1 and *att*R1, or *att*L2 and *att*R2, since recombination strictly depends on *att*  sequences (Hartley et al., 2000; Walhout et al., 2000). In addition to these *att* sites, the negative selection marker *ccd*B, the protein product of which inhibits DNA gyrase, and a chloramphenicol-resistance (Cmr) marker are used for selection and maintenance of Gateway vectors. Usually, *att*1 is located at the 5' end of the open reading frame (ORF) and *att*2 is located at the 3' end. This orientation is maintained in all cloning steps. First, the gene of interest should be cloned in an entry vector by TOPO cloning (pENTR/D-TOPO), a BP reaction (pDONR221), or restriction endonuclease and ligase (pENTR1A). Each vector is available from Invitrogen. To make an entry clone by a BP reaction, the *att*B1 and *att*B2 sequences are added to the 5' and 3' ends, respectively, of the ORF by adapter PCR. The product (*att*B1-ORF-*att*B2) is subjected to a BP reaction with a donor vector, pDONR221, which possesses an *att*P1-*ccd*B-Cmr-*att*P2 cassette. Because of the negative selection marker *ccd*B between *att*P1 and *att*P2, only transformants harboring the recombined vectors carrying *att*L1-ORF-*att*L2 (the entry clone) can grow on the selection plate. Once the entry clone is in hand, the ORF is transferred to a destination vector that possesses an *att*R1-Cmr-*ccd*B-*att*R2 cassette. Since destination vectors also contain *ccd*B between *att*R1 and *att*R2, and have a selection marker gene that is different from the entry clone, only the recombined destination vectors carrying *att*B1-ORF-*att*B2 will be selected. Gateway cloning is designed so that the smallest *att* sequence, *att*B (25 bp), appears in the final product to minimize the length of cloning junctions after the clonase reaction. In N- or C-terminal fusion constructs, the ORF is linked to a tag with eight or more amino acids encoded by the *att*B1 or *att*B2 sites. Because

phage DNA is excised from the *E. coli* genome by recombination between the *att*L and *att*R sites (in an LR reaction). The BP reaction needs two proteins, the phage integrase (Int) and the *E. coli* integration host factor (IHF). The mixture of these two proteins is called BP clonase in the Gateway system. In the LR reaction, Int, IHF and one more phage protein, excisionase (Xis), are required, and this mixture is called LR clonase. The Gateway cloning method uses these *att* sites and clonases for construction of recombinant DNA *in vitro*.

Fig. 2. BP and LR reactions in lambda phage infection of *E. coli*. The site-specific reversible BP and LR recombination reactions occur during lambda phage integration into and

Basic strategies for application of Gateway technology to plasmid construction are shown in Figure 3. For the basic Gateway system, four pairs of modified *att* sites were generated for directional cloning. They are *att*B1 and *att*B2, *att*P1 and *att*P2, *att*L1 and *att*L2, and *att*R1 and *att*R2; a recombination reaction can occur only in the combinations of *att*B1 and *att*P1, *att*B2 and *att*P2, *att*L1 and *att*R1, or *att*L2 and *att*R2, since recombination strictly depends on *att*  sequences (Hartley et al., 2000; Walhout et al., 2000). In addition to these *att* sites, the negative selection marker *ccd*B, the protein product of which inhibits DNA gyrase, and a chloramphenicol-resistance (Cmr) marker are used for selection and maintenance of Gateway vectors. Usually, *att*1 is located at the 5' end of the open reading frame (ORF) and *att*2 is located at the 3' end. This orientation is maintained in all cloning steps. First, the gene of interest should be cloned in an entry vector by TOPO cloning (pENTR/D-TOPO), a BP reaction (pDONR221), or restriction endonuclease and ligase (pENTR1A). Each vector is available from Invitrogen. To make an entry clone by a BP reaction, the *att*B1 and *att*B2 sequences are added to the 5' and 3' ends, respectively, of the ORF by adapter PCR. The product (*att*B1-ORF-*att*B2) is subjected to a BP reaction with a donor vector, pDONR221, which possesses an *att*P1-*ccd*B-Cmr-*att*P2 cassette. Because of the negative selection marker *ccd*B between *att*P1 and *att*P2, only transformants harboring the recombined vectors carrying *att*L1-ORF-*att*L2 (the entry clone) can grow on the selection plate. Once the entry clone is in hand, the ORF is transferred to a destination vector that possesses an *att*R1-Cmr-*ccd*B-*att*R2 cassette. Since destination vectors also contain *ccd*B between *att*R1 and *att*R2, and have a selection marker gene that is different from the entry clone, only the recombined destination vectors carrying *att*B1-ORF-*att*B2 will be selected. Gateway cloning is designed so that the smallest *att* sequence, *att*B (25 bp), appears in the final product to minimize the length of cloning junctions after the clonase reaction. In N- or C-terminal fusion constructs, the ORF is linked to a tag with eight or more amino acids encoded by the *att*B1 or *att*B2 sites. Because

excision from the *E. coli* genome

the reading frame of *att*B1 and *att*B2 is unified in the Gateway system, any entry clone incorporated into a destination vector is correctly fused to the tag sequence.

Fig. 3. Schematic illustration of Gateway cloning. An entry clone is constructed by TOPO directional cloning, a BP reaction or restriction digestion and ligation. For construction using the BP reaction, the ORF region is amplified by adapter PCR and the resulting *att*B1-ORF-*att*B2 fragment is cloned into pDONR221 by a BP reaction to generate an entry clone containing *att*L1-ORF-*att*L2. Subsequently, the ORF is cloned into destination vectors by an LR reaction to generate expression clones including tagged fusion constructs. For D-TOPO cloning, CACC is added to the ORF by adapter PCR, and the resulting CACC-ORF fragment is cloned into pENTR/D-TOPO. *B1*, *att*B1; *B2*, *att*B2; *P1*, *att*P1; *P2*, *att*P2; *L1*, *att*L1; *L2*, *att*L2; *R1*, *att*R1; *R2*, *att*R2; *Pro*, promoter; *Ter*, terminator; *Cmr*, chloramphenicol resistance marker; *ccd*B, negative selection marker in *E. coli*.; *Kmr*, kanamycin-resistance marker

#### **3. Binary vectors compatible with Gateway cloning**

A large number of binary vectors compatible with Gateway cloning, known as destination vectors, have been developed and are summarized in a review (Karimi et al., 2007b). Gateway compatible binary vectors for promoter analysis have the general structure *att*R1-

Cmr-*ccd*B-*att*R2-tag-terminator, and after an LR reaction with an *att*L1-promoter-*att*L2 entry clone, they yield an *att*B1-promoter-*att*B2-tag-terminator binary construct. Gateway compatible binary vectors for expression of tagged fusion proteins have the general structure promoter-tag-*att*R1-Cmr-*ccd*B-*att*R2-terminator (for N-terminal fusions) or promoter-*att*R1-Cmr-*ccd*B-*att*R2-tag-terminator (for C-terminal fusions). After an LR reaction with an *att*L1-ORF-*att*L2 entry clone, they respectively yield promoter-tag-*att*B1-ORF-*att*B2 terminator or promoter-*att*B1-ORF-*att*B2-tag-terminator. The tag added to the N-terminus of the ORF is linked by the peptide encoded by the *att*B1 sequence (XSLYKKAGX), and the tag added to the C-terminus is linked by the peptide encoded by the *att*B2 sequence (XPAFLYKVX). Gateway compatible binary vectors for RNAi analysis (Helliwell & Waterhouse, 2003; Hilson et al., 2004; Karimi et al., 2002; Miki & Shimamoto, 2004) generally have the inverted structure of cassettes: promoter-*att*R1-*ccd*B-*att*R2-linker-*att*R2-*ccd*B-*att*R1 terminator. By an LR reaction with an *att*L1-trigger-*att*L2 entry clone, the trigger sequence is incorporated into both sites in opposite orientations, yielding a promoter-*att*B1-trigger*att*B2-linker-*att*B2-(complementary trigger)-*att*B1-terminator construct. When the construct is introduced into plants, hairpin RNA is expressed and processed into small interfering RNA that functions in gene silencing.

Among many Gateway compatible binary vector series, the pW (Karimi et al., 2002), pMDC (Brand et al., 2006; Curtis & Grossniklaus, 2003) and pEarleyGate (Earley et al., 2006) series contain vectors available for many kinds of experiments in plants. The pW series consists of vectors for overexpression or antisense repression by the cauliflower mosaic virus 35S promoter (P35S), for promoter analysis using luciferase (LUC), β-glucuronidase (GUS), or GFP-GUS as reporters, and for construction of gene fusions with GFP, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP) or red fluorescent protein (RFP). The pMDC series consists of vectors for cloning, for overexpression by P35S, for inducible expression by heat shock or estrogen treatment, for promoter analysis using GFP-6xHis or GUS as reporter, and for gene fusions with GFP, GFP-6xHis, or GUS. The pEarleyGate is a BASTA® resistance binary vector series consisting of vectors for overexpression by P35S, for promoter analysis using HA, FLAG, Myc, or AcV5, and for gene fusions with YFP, HA, FLAG, Myc, AcV5, tandem affinity purification (TAP) tags, YFP-HA, or GFP-HA.

The vectors described above are useful tools; however, sometimes it is necessary to use a different series if an existing one does not have a vector of the required type. In order to carry out most experiments within the same series (having a unified backbone and a unified junction sequence), we constructed a comprehensive Gateway compatible binary vector system carrying many reporters and tags based on the same backbone, as mentioned in next section.

#### **4. Development of Gateway binary vector (pGWB) series**

To make Gateway compatible binary vectors efficiently, we first tried to establish a systematic method for construction of a vector series. For this purpose, we designed a construction method for introducing a tag sequence by blunt end ligation to save time and labor caused by restriction sites in the tag sequence. Based on this notion, platform vectors pUGW0 and pUGW2 (Nakagawa et al., 2007a) were made using pUC119 as the backbone. As described below, many Gateway binary vector (pGWB) series were constructed from intermediate plasmid pUGWs, which were made with pUGW0 or pUGW2. The characteristics and accession nos. of each pGWB are summarized in Information of Gaeway Binary Vectors (pGWBs) (http://shimane-u.org/nakagawa/gbv.htm).

#### **4.1 Platform vectors pUGW0 and pUGW2 for construction of pGWB series**

40 Genetic Engineering – Basics, New Applications and Responsibilities

Cmr-*ccd*B-*att*R2-tag-terminator, and after an LR reaction with an *att*L1-promoter-*att*L2 entry clone, they yield an *att*B1-promoter-*att*B2-tag-terminator binary construct. Gateway compatible binary vectors for expression of tagged fusion proteins have the general structure promoter-tag-*att*R1-Cmr-*ccd*B-*att*R2-terminator (for N-terminal fusions) or promoter-*att*R1-Cmr-*ccd*B-*att*R2-tag-terminator (for C-terminal fusions). After an LR reaction with an *att*L1-ORF-*att*L2 entry clone, they respectively yield promoter-tag-*att*B1-ORF-*att*B2 terminator or promoter-*att*B1-ORF-*att*B2-tag-terminator. The tag added to the N-terminus of the ORF is linked by the peptide encoded by the *att*B1 sequence (XSLYKKAGX), and the tag added to the C-terminus is linked by the peptide encoded by the *att*B2 sequence (XPAFLYKVX). Gateway compatible binary vectors for RNAi analysis (Helliwell & Waterhouse, 2003; Hilson et al., 2004; Karimi et al., 2002; Miki & Shimamoto, 2004) generally have the inverted structure of cassettes: promoter-*att*R1-*ccd*B-*att*R2-linker-*att*R2-*ccd*B-*att*R1 terminator. By an LR reaction with an *att*L1-trigger-*att*L2 entry clone, the trigger sequence is incorporated into both sites in opposite orientations, yielding a promoter-*att*B1-trigger*att*B2-linker-*att*B2-(complementary trigger)-*att*B1-terminator construct. When the construct is introduced into plants, hairpin RNA is expressed and processed into small interfering RNA

Among many Gateway compatible binary vector series, the pW (Karimi et al., 2002), pMDC (Brand et al., 2006; Curtis & Grossniklaus, 2003) and pEarleyGate (Earley et al., 2006) series contain vectors available for many kinds of experiments in plants. The pW series consists of vectors for overexpression or antisense repression by the cauliflower mosaic virus 35S promoter (P35S), for promoter analysis using luciferase (LUC), β-glucuronidase (GUS), or GFP-GUS as reporters, and for construction of gene fusions with GFP, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP) or red fluorescent protein (RFP). The pMDC series consists of vectors for cloning, for overexpression by P35S, for inducible expression by heat shock or estrogen treatment, for promoter analysis using GFP-6xHis or GUS as reporter, and for gene fusions with GFP, GFP-6xHis, or GUS. The pEarleyGate is a BASTA® resistance binary vector series consisting of vectors for overexpression by P35S, for promoter analysis using HA, FLAG, Myc, or AcV5, and for gene fusions with YFP, HA, FLAG, Myc,

The vectors described above are useful tools; however, sometimes it is necessary to use a different series if an existing one does not have a vector of the required type. In order to carry out most experiments within the same series (having a unified backbone and a unified junction sequence), we constructed a comprehensive Gateway compatible binary vector system carrying many reporters and tags based on the same backbone, as mentioned in next

To make Gateway compatible binary vectors efficiently, we first tried to establish a systematic method for construction of a vector series. For this purpose, we designed a construction method for introducing a tag sequence by blunt end ligation to save time and labor caused by restriction sites in the tag sequence. Based on this notion, platform vectors pUGW0 and pUGW2 (Nakagawa et al., 2007a) were made using pUC119 as the backbone. As described below, many Gateway binary vector (pGWB) series were constructed from intermediate plasmid pUGWs, which were made with pUGW0 or pUGW2. The

AcV5, tandem affinity purification (TAP) tags, YFP-HA, or GFP-HA.

**4. Development of Gateway binary vector (pGWB) series** 

that functions in gene silencing.

section.

The platform vectors pUGW0 and pUGW2 include P35S and the nopaline synthase terminator (Tnos), as shown in Figure 4. A pUGW0 was the starting vector for N-terminal fusions, with the structure *Hin*dIII-P35S-*Xba*I-ATG-*Aor*51HI-*att*R1-Cmr-*ccd*B-*att*R2-*Sac*I-Tnos. A tag (reporter or epitope tag) sequence amplified by blunt-end PCR was introduced into the *Aor*51HI site (blunt end) to yield *Hin*dIII-P35S-*Xba*I-ATG-*tag*-*att*R1-Cmr-*ccd*B-*att*R2-*Sac*I-Tnos. In the case of a small epitope tag, an oligonucleotide could be introduced directly into the *Aor*51HI site. Translation is initiated at the ATG just upstream of the *Aor*51HI site. pUGW2 was the starting vector for C-terminal fusions, with the structure *Hin*dIII-*Xba*I-*Hin*dIII-P35S-*Xba*I-*att*R1-Cmr-*ccd*B-*att*R2-*Aor*51HI-*Sac*I-Tnos. Tag sequences were introduced by the same method used for pUGW0. The P35S region could be easily removed by digestion with *Xba*I followed by self-ligation for construction of promoter-less pUGWs. Because there is no need to digest the tag fragment with restriction enzymes to introduce it into the *Aor*51HI site of pUGW0 and pUGW2, any tag fragment can be cloned by the same method. With these simple procedures, a pUGW series containing a variety of tags was efficiently generated. They were sources of Gateway cassettes including tag sequences, and were used for construction of a Gateway binary vector (pGWB). Moreover, the pUGWs are Gateway compatible plant vectors useful for transient expression analysis after particle bombardment or protoplast transformation. Because of their small size and high copy number in *E. coli*, preparation and handling of pUGW plasmids are very easy.

Fig. 4. Procedure for construction of pUGWs. pUGW0 and pUGW2 are the starting vectors for construction of new pUGW derivatives. The tag sequence amplified by blunt-end PCR is introduced into the *Aor*51HI site of pUGW0 or pUGW2, which yields pUGWs for N-fusion or C-fusion. The region between P35S and Tnos is indicated. The nucleotide sequence corresponding to the region from *att*R1 to *att*R2 is underlined. *Cmr*, chloramphenicol resistance marker; *ccd*B, negative selection marker in *E. coli*.; *P35S*, 35S promoter

#### **4.2 The pGWB series (pGWBxx and pGWB2xx) based on the pBI plasmid**

Initially, pGWB was constructed on the backbone of modified pBI carrying a nopaline synthase promoter (Pnos) driven neomycin phosphotransferase II (NPTII) and P35S-driven hygromycin phosphotransferase (HPT), which confer kanamycin-resistance (Kmr) and hygromycin-resistance (Hygr), respectively, to plants (Mita et al., 1995). The initial pGWB series (pGWBxx) consists of 36 vectors designed for simple cloning of genes (pGWB1), for overexpression of ORF clones (pGWB2), and for fusion with a variety of tags (pGWB3 through pGWB45) as shown in the Complete List of pGWB (http://shimaneu.org/nakagawa/gbv.htm). GUS, TAP and LUC are available for C-fusion, and 10 other tags, sGFP, 6xHis, FLAG, 3xHA, 4xMyc, 10xMyc, GST, T7, enhanced yellow fluorescent protein (EYFP), and enhanced cyan fluorescent protein (ECFP), are available for both N- and C-fusion. The promoter-less C-fusion vectors can be used for promoter analysis. By an LR reaction with a promoter entry clone, a binary construct of promoter:tag is created. The remaining N- and C-fusion vectors contain P35S for constitutive expression. By an LR reaction with an ORF entry clone, binary constructs expressing tag-ORF or ORF-tag are easily obtained (Figure 5). With the pGWBs, promoter activity, detection of tagged proteins, and subcellular localization of proteins can be analyzed effectively (Nakagawa et al., 2007a).

Fig. 5. Cloning into pGWB by LR reaction. The Gateway region in pGWB (top of the figure) represents a variety of acceptor sites (R1-R2) described in the box. The pGWB series includes plasmids with no promoter and no tag, or with no promoter and a C-tag. These are used for expression controlled by a gene's own promoter. The pGWB plasmids also include the following types: a 35S promoter and no tag, a 35S promoter and a C-tag, and a 35S promoter and an N-tag. These are used for constitutive expression using the 35S promoter. After an LR reaction with the entry clone, the expression clones indicated in the right panel are obtained. The tag is fused via the *att*B sequence. *B1*, *att*B1; *B2*, *att*B2; *L1*, *att*L1; *L2*, *att*L2; *R1*, *att*R1; *R2*, *att*R2; *Tnos*, nopaline synthase terminator; *M*, selection marker for plant; *Cmr*, chloramphenicol-resistance marker; *ccd*B, negative selection marker in *E. coli*.; *P35S*, 35S promoter

We also constructed pGWBs carrying the *Pnos:HPT:Tnos* marker instead of *P35S:HPT:Tnos* (pGWB1-45) to avoid a possible effect of the P35S sequence on the expression pattern and

hygromycin phosphotransferase (HPT), which confer kanamycin-resistance (Kmr) and hygromycin-resistance (Hygr), respectively, to plants (Mita et al., 1995). The initial pGWB series (pGWBxx) consists of 36 vectors designed for simple cloning of genes (pGWB1), for overexpression of ORF clones (pGWB2), and for fusion with a variety of tags (pGWB3 through pGWB45) as shown in the Complete List of pGWB (http://shimaneu.org/nakagawa/gbv.htm). GUS, TAP and LUC are available for C-fusion, and 10 other tags, sGFP, 6xHis, FLAG, 3xHA, 4xMyc, 10xMyc, GST, T7, enhanced yellow fluorescent protein (EYFP), and enhanced cyan fluorescent protein (ECFP), are available for both N- and C-fusion. The promoter-less C-fusion vectors can be used for promoter analysis. By an LR reaction with a promoter entry clone, a binary construct of promoter:tag is created. The remaining N- and C-fusion vectors contain P35S for constitutive expression. By an LR reaction with an ORF entry clone, binary constructs expressing tag-ORF or ORF-tag are easily obtained (Figure 5). With the pGWBs, promoter activity, detection of tagged proteins, and subcellular localization of proteins can be analyzed effectively (Nakagawa et al., 2007a).

Fig. 5. Cloning into pGWB by LR reaction. The Gateway region in pGWB (top of the figure) represents a variety of acceptor sites (R1-R2) described in the box. The pGWB series includes plasmids with no promoter and no tag, or with no promoter and a C-tag. These are used for expression controlled by a gene's own promoter. The pGWB plasmids also include the following types: a 35S promoter and no tag, a 35S promoter and a C-tag, and a 35S promoter and an N-tag. These are used for constitutive expression using the 35S promoter. After an LR reaction with the entry clone, the expression clones indicated in the right panel are obtained. The tag is fused via the *att*B sequence. *B1*, *att*B1; *B2*, *att*B2; *L1*, *att*L1; *L2*, *att*L2; *R1*, *att*R1; *R2*, *att*R2; *Tnos*, nopaline synthase terminator; *M*, selection marker for plant; *Cmr*, chloramphenicol-resistance marker; *ccd*B, negative selection marker in *E. coli*.; *P35S*, 35S

We also constructed pGWBs carrying the *Pnos:HPT:Tnos* marker instead of *P35S:HPT:Tnos* (pGWB1-45) to avoid a possible effect of the P35S sequence on the expression pattern and

promoter

strength of the cloned gene (Zheng et al., 2007). These vectors are named pGWB203, 204, 228 and 235, and their characters are shown at the bottom of the Complete List of pGWB (http://shimane-u.org/nakagawa/gbv.htm). In early experiments, when the phosphate transporter PHT1 promoter was used for promoter analysis in *A. thaliana*, GUS activity in plant extracts was 5-fold higher with pGWB3 than with pGWB203 (Nakagawa et al., 2007a).

#### **4.3 Improved Gateway binary vector (ImpGWB) series (pGWB4xx, pGWB5xx, pGWB6xx and pGWB7xx) based on the pPZP plasmid**

We next constructed improved Gateway binary vectors (ImpGWBs) using pPZP as a backbone (Hajdukiewicz et al., 1994). In the ImpGWB system, handling of plasmid is largely improved, transformation efficiency in *E. coli* is drastically increased and much larger amount of plasmid DNA was recovered. The structures and characters of pGWBs (pBI backbone) and ImpGWBs (pPZP backbone) are summarized in Figure 6.

Fig. 6. Characters of pGWBs and ImpGWBs. The Gateway region in vectors represents a variety of acceptor sites as described in the Figure 5. Pnos, nopaline synthase promoter; Tnos, nopaline synthase terminator; P35S, 35S promoter; NPTII, neomycin phosphotransferase II; HPT, hygromycin phophotransferase; *bar*, bialaphos resistance gene; GPT, UDP-*N*acetylglucosamine: dolichol phosphate *N*-acetylglucosamine-1-P transferase (Koizumi & Iwata, 2008; Koizumi et al., 1999) gene. Kmr, kanamycin-resistance; Hygr, hygromycinresistance; Spcr, spectinomycin-resistance; BASTA®r, BASTA®-resistance; Tunicamycinr, tunicamycin-resistance

At present, four kinds of ImpGWB, the Kmr subseries (pGWB4xx) (Nakagawa et al., 2007b), Hygr subseries (pGWB5xx) (Nakagawa et al., 2007b), BASTA®-resistance subseries (pGWB6xx) (Nakamura et al., 2010) and tunicamycin-resistance subseries (pGWB7xx) (Tanaka et al., 2011), are available, and they are useful for introducing multiple transgenes into plants by repetitive transformation. Each subseries is composed of 46 vectors as summarized in the Complete List of ImpGWB (http://shimane-u.org/nakagawa/gbv.htm). A set of 16 tags, sGFP, GUS, LUC, EYFP, ECFP, G3 green fluorescent protein (G3GFP), monomeric red fluorescent protein (mRFP), TagRFP, 6xHis, FLAG, 3xHA, 4xMyc, 10xMyc, GST, T7, and TAP, is available in ImpGWB. Because ImpGWB is highly efficient in transformation of *E. coli*, this series was used for development of a new cloning system using multiple LR reactions as described below.

#### **4.4 R4 Gateway binary vector (R4pGWB) series (R4pGWB4xx, R4pGWB5xx, R4pGWB6xx and R4pGWB7xx) for promoter swapping**

To assemble multiple DNA fragments in the desired order, an additional four *att* sites (*att*3, *att*4, *att*5 and *att*6) have been developed and applied to MultiSite Gateway cloning (Karimi et al., 2007a; Sasaki et al., 2004). Utilization of these *att* sites (*att*1-6) expanded the availability of cloning technology for more complex gene construction. The cloning system equipped with these *att* sites is useful for swapping of promoters, ORFs and tags, and is also applicable for cloning of multiple transgenes in one vector (Chen et al., 2006). In a typical MultiSite Gateway system, three entry clones containing specialized *att* sites, *att*L4 promoter-*att*R1, *att*L1-ORF-*att*L2, and *att*R2-tag-*att*L3 are simultaneously connected and incorporated into a destination vector carrying *att*R4-Cmr-*ccd*B-*att*R3 acceptor sites to make an *att*B4-promoter-*att*B1-ORF-*att*B2-tag-*att*B3 construct (Figure 7).

Fig. 7. MultiSite Gateway system. In the MultiSite Gateway system, *att*1, *att*2, *att*3 and *att*4 sequences are used for cloning of multiple DNA fragments into one vector. A promoter entry clone (L4-Pro-R1), ORF entry clone (L1-ORF-L2), tag entry clone (R2-tag-L3) and destination vector R4-R3 are subjected to an LR reaction. The promoter, ORF and tag sequences are linked and incorporated into the destination vector to form a *promoter:ORFtag* clone. *B1*, *att*B1; *B2*, *att*B2; *B3*, *att*B3; *B4*, *att*B4; *L1*, *att*L1; *L2*, *att*L2; *L3*, *att*L3; *L4*, *att*L4; *R1*, *att*R1; *R2*, *att*R2; *R3*, *att*R3; *R4*, *att*R4; *P1*, *att*P1; *P2*, *att*P2; *P3*, *att*P3; *P4*, *att*P4; *P1R*, *att*P1R; *P2R*; *att*P2R; *Cmr*, chloramphenicol-resistance marker; *ccd*B, negative selection marker in *E. coli*.; *Pro*, promoter; *Kmr*, kanamycin-resistance marker

summarized in the Complete List of ImpGWB (http://shimane-u.org/nakagawa/gbv.htm). A set of 16 tags, sGFP, GUS, LUC, EYFP, ECFP, G3 green fluorescent protein (G3GFP), monomeric red fluorescent protein (mRFP), TagRFP, 6xHis, FLAG, 3xHA, 4xMyc, 10xMyc, GST, T7, and TAP, is available in ImpGWB. Because ImpGWB is highly efficient in transformation of *E. coli*, this series was used for development of a new cloning system

To assemble multiple DNA fragments in the desired order, an additional four *att* sites (*att*3, *att*4, *att*5 and *att*6) have been developed and applied to MultiSite Gateway cloning (Karimi et al., 2007a; Sasaki et al., 2004). Utilization of these *att* sites (*att*1-6) expanded the availability of cloning technology for more complex gene construction. The cloning system equipped with these *att* sites is useful for swapping of promoters, ORFs and tags, and is also applicable for cloning of multiple transgenes in one vector (Chen et al., 2006). In a typical MultiSite Gateway system, three entry clones containing specialized *att* sites, *att*L4 promoter-*att*R1, *att*L1-ORF-*att*L2, and *att*R2-tag-*att*L3 are simultaneously connected and incorporated into a destination vector carrying *att*R4-Cmr-*ccd*B-*att*R3 acceptor sites to make

Fig. 7. MultiSite Gateway system. In the MultiSite Gateway system, *att*1, *att*2, *att*3 and *att*4 sequences are used for cloning of multiple DNA fragments into one vector. A promoter entry clone (L4-Pro-R1), ORF entry clone (L1-ORF-L2), tag entry clone (R2-tag-L3) and destination vector R4-R3 are subjected to an LR reaction. The promoter, ORF and tag sequences are linked and incorporated into the destination vector to form a *promoter:ORFtag* clone. *B1*, *att*B1; *B2*, *att*B2; *B3*, *att*B3; *B4*, *att*B4; *L1*, *att*L1; *L2*, *att*L2; *L3*, *att*L3; *L4*, *att*L4; *R1*, *att*R1; *R2*, *att*R2; *R3*, *att*R3; *R4*, *att*R4; *P1*, *att*P1; *P2*, *att*P2; *P3*, *att*P3; *P4*, *att*P4; *P1R*, *att*P1R; *P2R*; *att*P2R; *Cmr*, chloramphenicol-resistance marker; *ccd*B, negative selection marker in

**4.4 R4 Gateway binary vector (R4pGWB) series (R4pGWB4xx, R4pGWB5xx,** 

using multiple LR reactions as described below.

**R4pGWB6xx and R4pGWB7xx) for promoter swapping** 

an *att*B4-promoter-*att*B1-ORF-*att*B2-tag-*att*B3 construct (Figure 7).

*E. coli*.; *Pro*, promoter; *Kmr*, kanamycin-resistance marker

Although MultiSite Gateway cloning is an excellent method for building a complicated multigene construct, it is relatively difficult to obtain the desired clone because four recombinations at each *att* site are required for successful cloning. To facilitate multifragment cloning, especially for promoter swapping, we developed the R4 Gateway binary vector (R4pGWB) by reducing the number of recombinations needed from four to three (*att*4, *att*1 and *att*2) (Figure 8, left) (Nakagawa et al., 2008). The R4pGWB series was made by replacing the *att*R1 site of ImpGWBs (promoter-less and C-fusion type with four resistance markers) with the *att*R4 site; all tags used in ImpGWB are also available in the R4pGWB system as shown in the Complete List of R4pGWB (http://shimaneu.org/nakagawa/gbv.htm). By an LR reaction with a promoter entry clone (*att*L4-promoter*att*R1), an ORF entry clone (*att*L1-ORF-*att*L2) and R4pGWB equipped with the appropriate tag, construction of chimeric genes among promoters, ORFs, and tags (*att*B4-promoter-*att*B1- ORF-*att*B2-tag) is achieved very easily. The R4pGWB system is a powerful tool to express an ORF by any desired promoter, *e.g.*, a promoter for strong expression, for tissue or cell specific expression, for developmental stage specific expression, or for induction by biotic or abiotic stimuli.

Fig. 8. R4pGWB and R4L1pGWB systems. A promoter entry clone (L4-Pro-R1) is constructed by a BP reaction using pDONR P4-P1R and a B4-Pro-B1 fragment prepared by adapter PCR. Left; in the R4pGWB system, a promoter entry clone (L4-Pro-R1), ORF entry clone (L1-ORF-L2) and R4pGWB are subjected to an LR reaction. The promoter and ORF are linked and incorporated into R4pGWB to form a promoter:ORF-tag clone. Right; in the R4L1pGWB system, only a promoter entry clone (L4-Pro-R1) is used for an LR reaction with an R4L1pGWB. The promoter sequence is incorporated into R4L1pGWB and fused with the tag on the vector. With the R4L1pGWB system using a single LR reaction, a promoter:tag construct is obtained at high efficiency. Nucleotides in red indicate B4 and B1 sequences. *Pro*, promoter; *B1*, *att*B1; *B2*, *att*B2; *B4*, *att*B4; *L1*, *att*L1; *L2*, *att*L2; *L4*, *att*L4; *R1*, *att*R1; *R2*, *att*R2; *R4*, *att*R4; *P4*, *att*P4; *P1R*, *att*P1R; *M*, selection marker for plant; *Cmr*, chloramphenicolresistance marker; *ccd*B, negative selection marker in *E. coli*.; *Pro*, promoter; *Kmr*, kanamycinresistance marker

#### **4.5 R4L1 Gateway binary vector (R4L1pGWB) series (R4L1pGWB4xx and R4L1pGWB5xx) for promoter analysis**

Due to establishment of the R4pGWB system, many kinds of *att*L4-promoter-*att*R1 entry clones were constructed and have been used as a resource for expression of ORFs in plants. We plan to also utilize these resources of *att*L4-promoter-*att*R1 entry clones for efficient promoter:tag experiments, and developed an R4L1 Gateway binary vector (R4L1pGWB) (Nakamura et al., 2009) containing *att*R4-Cmr-*ccd*B-*att*L1-tag-Tnos. By the simple bipartite LR reaction with *att*L4-promoter-*att*R1 and R4L1pGWB, an *att*B4-promoter-*att*B1-tag-Tnos construct used for promoter assays can be easily obtained in this system (Figure 8, right). The tags in R4L1pGWBs are G3GFP-GUS, GUS, LUC, EYFP, ECFP, G3GFP and TagRFP as shown in the Complete List of R4L1pGWB (http://shimane-u.org/nakagawa/gbv.htm).

#### **5. Application of the pGWB system**

Because Gateway cloning is efficient, precise, flexible and simple to use, its application will continue to grow in plant research. In this section, we briefly describe two recent advances in our pGWB system, a split reporter for interaction analysis and recycling cloning for multigene constructs.

#### **5.1 Gateway vectors for bimolecular fluorescence complementation (BiFC) assay**

BiFC is based on the reconstitution of a fluorescent signal when two interacting proteins or peptides, which are fused to either an N- or C-fragment of a split fluorescent protein, interact. Due to its relative technical simplicity and the ability to use fluorescence microscopes for observation, a growing number of publications describe the use of BiFC to analyze protein-protein interactions. In addition to monitoring protein-protein interactions, this method has expanded to wider application, such as multicolor BiFC to investigate protein complexes (Hu & Kerppola, 2003; Kodama & Wada, 2009; Lee et al., 2008; Waadt et al., 2008), detection *in vivo* (Bracha-Drori et al., 2004; Walter et al., 2004) and combined with bioluminescence resonance energy transfer (BRET; Chen et al., 2008; Gandia et al., 2008; Xu et al., 2007). To date, several BiFC vectors dedicated to plant research have been constructed. Among our efforts in development of Gateway technology, we have generated various destination vectors for BiFC assays. In this section, we introduce our Gateway technologybased BiFC vectors, and describe their application.

#### **5.1.1 Detection of protein-protein interactions in plant cells by BiFC assay**

The investigation of protein-protein interactions provides valuable information in cell biology. In addition to BiFC, several other techniques detect protein-protein interactions, such as co-immunoprecipitation assays (Co-IP), *in vitro* binding assays, the yeast two-hybrid system (Y2H; James et al., 1996), the mating-based split-ubiquitin system (mbSUS; Ludewig et al., 2003; Obrdlik et al., 2004), BRET(Chen et al., 2008; Xu et al., 2007), fluorescence resonance energy transfer (FRET; Day et al., 2001), fluorescence lifetime imaging microscopy (FLIM; Bastiaens & Squire, 1999) and fluorescence correlation spectroscopy (FCS; Hink et al., 2002). The imaging-based approaches such as BiFC and FRET have been utilized in plant research because they enable detection in plant cells, in contrast to Y2H and mbSUS, which are functional only in yeast cells, and because they do not require specific antibodies or purification of proteins, unlike Co-IP and *in vitro* binding assays.

The BiFC assay is one of the most convenient techniques among the image-based approaches. Although FRET and FLIM are useful and powerful techniques for detection of protein-protein interactions, FRET requires complicated analysis such as of acceptor bleaching and an exclusive device is necessary for FLIM. Although several considerations are required even for BiFC assays, special devices are not required for detection, and complicated analysis is not necessary after obtaining image data. In addition, the BiFC assay provides information on subcellular location of the interacting proteins.

We used our Gateway vector construction system (Hino et al., 2011; Nakagawa et al., 2008; Nakagawa et al., 2007b) to make destination vectors for BiFC assays. Using these vectors, it is easy to make constructs for detection of protein-protein interactions. These Gateway vectors have worked well in plant cells (Goto et al., 2011; Hino et al., 2011; Singh et al., 2009).

#### **5.1.2 Principles of the BiFC assay**

46 Genetic Engineering – Basics, New Applications and Responsibilities

Due to establishment of the R4pGWB system, many kinds of *att*L4-promoter-*att*R1 entry clones were constructed and have been used as a resource for expression of ORFs in plants. We plan to also utilize these resources of *att*L4-promoter-*att*R1 entry clones for efficient promoter:tag experiments, and developed an R4L1 Gateway binary vector (R4L1pGWB) (Nakamura et al., 2009) containing *att*R4-Cmr-*ccd*B-*att*L1-tag-Tnos. By the simple bipartite LR reaction with *att*L4-promoter-*att*R1 and R4L1pGWB, an *att*B4-promoter-*att*B1-tag-Tnos construct used for promoter assays can be easily obtained in this system (Figure 8, right). The tags in R4L1pGWBs are G3GFP-GUS, GUS, LUC, EYFP, ECFP, G3GFP and TagRFP as shown in the Complete List of R4L1pGWB (http://shimane-u.org/nakagawa/gbv.htm).

Because Gateway cloning is efficient, precise, flexible and simple to use, its application will continue to grow in plant research. In this section, we briefly describe two recent advances in our pGWB system, a split reporter for interaction analysis and recycling cloning for

**5.1 Gateway vectors for bimolecular fluorescence complementation (BiFC) assay** 

**5.1.1 Detection of protein-protein interactions in plant cells by BiFC assay** 

The investigation of protein-protein interactions provides valuable information in cell biology. In addition to BiFC, several other techniques detect protein-protein interactions, such as co-immunoprecipitation assays (Co-IP), *in vitro* binding assays, the yeast two-hybrid system (Y2H; James et al., 1996), the mating-based split-ubiquitin system (mbSUS; Ludewig et al., 2003; Obrdlik et al., 2004), BRET(Chen et al., 2008; Xu et al., 2007), fluorescence resonance energy transfer (FRET; Day et al., 2001), fluorescence lifetime imaging microscopy (FLIM; Bastiaens & Squire, 1999) and fluorescence correlation spectroscopy (FCS; Hink et al., 2002). The imaging-based approaches such as BiFC and FRET have been utilized in plant research because they enable detection in plant cells, in contrast to Y2H and mbSUS, which

BiFC is based on the reconstitution of a fluorescent signal when two interacting proteins or peptides, which are fused to either an N- or C-fragment of a split fluorescent protein, interact. Due to its relative technical simplicity and the ability to use fluorescence microscopes for observation, a growing number of publications describe the use of BiFC to analyze protein-protein interactions. In addition to monitoring protein-protein interactions, this method has expanded to wider application, such as multicolor BiFC to investigate protein complexes (Hu & Kerppola, 2003; Kodama & Wada, 2009; Lee et al., 2008; Waadt et al., 2008), detection *in vivo* (Bracha-Drori et al., 2004; Walter et al., 2004) and combined with bioluminescence resonance energy transfer (BRET; Chen et al., 2008; Gandia et al., 2008; Xu et al., 2007). To date, several BiFC vectors dedicated to plant research have been constructed. Among our efforts in development of Gateway technology, we have generated various destination vectors for BiFC assays. In this section, we introduce our Gateway technology-

**4.5 R4L1 Gateway binary vector (R4L1pGWB) series (R4L1pGWB4xx and** 

**R4L1pGWB5xx) for promoter analysis** 

**5. Application of the pGWB system** 

based BiFC vectors, and describe their application.

multigene constructs.

In BiFC assays, a fluorescent reporter, such as CFP, GFP, YFP and RFP, is split into two nonfluorescent fragments, N- and C- fragments (Figure 9A,B). Two proteins or peptides, which are to be tested for interaction, are fused at the N- or C-terminus of each fragment. After expression of both fusion genes simultaneously, if an interaction occurs between the two proteins, the non-fluorescent fragments are reconstituted and behave as an unsplit fluorescent protein. Therefore, the detection of fluorescence means the target proteins interact (Figure 9A).

Once the interaction occurs, the reconstituted molecule does not dissociate into nonfluorescent fragments, leading to enhancement of fluorescence due to accumulation of reconstituted fluorescent proteins.

There are eight potential combinations to be tested for protein-protein interactions in a BiFC assay, taking into account which protein of the two partners tested is fused to the N- or Cterminal end of which N- or C- fragment (Figure 9C). However, improper fusion of a split fragment sometimes abolishes protein function and masks information on subcellular targeting. For example, the peroxisome targeting signal 2 (PTS2) must be fused to the Nterminus of the split fluorescent protein (Singh et al., 2009; Figure 10B). In contrast, PTS1 must be fused to the C-terminus of a split fluorescent protein, because its location at the Cterminus is necessary for its function. In these cases, the number of combinations tested is fewer. However, if there is no information on protein function, all combinations should be tested. Viewed in this light, our destination vectors are useful for construction of several fusion genes at the same time.

#### **5.1.3 Destination vectors for the multicolor and in vivo BiFC assays**

Various BiFC vectors have been developed and used in plant research (Bracha-Drori et al., 2004; Diaz et al., 2005; Ding et al., 2006; Goto et al., 2011; Hino et al., 2011; Loyter et al., 2005; Maple et al., 2005; Marrocco et al., 2006; Ohad et al., 2007; Singh et al., 2009; Waadt et al., 2008; Walter et al., 2004; Zamyatnin et al., 2006). All the vectors, including ours, use P35S to

Fig. 9. Principles of the BiFC assay. (A) Nonfluorescent fragments (YN and YC) of a fluorescent protein are brought together through interaction of the tested proteins or peptides (a, b and c) to which they are fused. The interaction of the two proteins causes reconstitution of a fluorescent signal. (B) Diagram of amino acid substitutions among CFP, GFP, YFP and mRFP1, and the positions where they were fragmented. Although there are alternative positions to split a fluorescent protein into two fragments (Hu & Kerppola, 2003; Waadt et al., 2008), the CFP, GFP and YFP in our system were split between residues 174 and 175, and mRFP1, which contains an amino acid substitution of the 66th glutamine to threonine, was split between residues 154 and 155. Amino acids in CFP and YFP that were converted from GFP are depicted in white. In the case of RFP, amino acids that are different from GFP are not represented, since there are many substitutions. (C) Potential combination of two fragments. There are eight possible configurations in the BiFC assay. Each target protein (gray and black) can be fused at its N- or C- terminus to the N- or C-terminal fragment of the fluorescent protein (light green)

express a fusion gene. There are two ways to insert a target gene into the 5' or 3' end of a split fragment of fluorescent protein gene: (1) cloning into a multicloning site using digestion and ligation, and (2) Gateway technology (Hino et al., 2011; Walter et al., 2004). Our BiFC vectors were developed to be compatible with Gateway technology. We generated four kinds of destination vectors for BiFC assays (Figure 10A), enabling the transfer of a gene of interest from the entry clone to the 5' or 3' end of each split fragment. Therefore, researchers are able to easily fuse a gene of interest to the 5' or 3' end of the split fragment, leading to various convenient constructs.

The BiFC vectors were initially generated using YFP (Hu et al., 2002). However, other fluorescent proteins, BFP (Hu & Kerppola, 2003), CFP (Kodama & Wada, 2009; Lee et al., 2008), GFP (Hu et al., 2002; Kodama & Wada, 2009), Venus, (Lee et al., 2008), Cerulean (Lee et al., 2008), DsRed-monomer (Kodama & Wada, 2009), mRFP1 (Jach et al., 2006), mCherry (Fan et al., 2008), and a far-red fluorescent protein, mLumin (Chu et al., 2009), have reportedly been useful for BiFC assay. We adopted CFP, GFP, YFP and mRFP1 to generate vectors (Figure 9B), and verified their usefulness for detection of protein-protein interactions

Fig. 9. Principles of the BiFC assay. (A) Nonfluorescent fragments (YN and YC) of a fluorescent protein are brought together through interaction of the tested proteins or peptides (a, b and c) to which they are fused. The interaction of the two proteins causes reconstitution of a fluorescent signal. (B) Diagram of amino acid substitutions among CFP, GFP, YFP and mRFP1, and the positions where they were fragmented. Although there are alternative positions to split a fluorescent protein into two fragments (Hu & Kerppola, 2003; Waadt et al., 2008), the CFP, GFP and YFP in our system were split between residues 174 and 175, and mRFP1, which contains an amino acid substitution of the 66th glutamine to threonine, was split between residues 154 and 155. Amino acids in CFP and YFP that were converted from GFP are depicted in white. In the case of RFP, amino acids that are different from GFP are not represented, since there are many substitutions. (C) Potential combination of two fragments. There are eight possible configurations in the BiFC assay. Each target protein (gray and black) can be fused at its N- or C- terminus to the N- or C-terminal

express a fusion gene. There are two ways to insert a target gene into the 5' or 3' end of a split fragment of fluorescent protein gene: (1) cloning into a multicloning site using digestion and ligation, and (2) Gateway technology (Hino et al., 2011; Walter et al., 2004). Our BiFC vectors were developed to be compatible with Gateway technology. We generated four kinds of destination vectors for BiFC assays (Figure 10A), enabling the transfer of a gene of interest from the entry clone to the 5' or 3' end of each split fragment. Therefore, researchers are able to easily fuse a gene of interest to the 5' or 3' end of the split fragment,

The BiFC vectors were initially generated using YFP (Hu et al., 2002). However, other fluorescent proteins, BFP (Hu & Kerppola, 2003), CFP (Kodama & Wada, 2009; Lee et al., 2008), GFP (Hu et al., 2002; Kodama & Wada, 2009), Venus, (Lee et al., 2008), Cerulean (Lee et al., 2008), DsRed-monomer (Kodama & Wada, 2009), mRFP1 (Jach et al., 2006), mCherry (Fan et al., 2008), and a far-red fluorescent protein, mLumin (Chu et al., 2009), have reportedly been useful for BiFC assay. We adopted CFP, GFP, YFP and mRFP1 to generate vectors (Figure 9B), and verified their usefulness for detection of protein-protein interactions

fragment of the fluorescent protein (light green)

leading to various convenient constructs.

(Figure 10B-E). PTS2-containing proteins are directed to peroxisomes after binding to a receptor, PEX7, in the cytosol (Hayashi & Nishimura, 2006; Mano & Nishimura, 2005). We were able to observe reconstituted CFP fluorescence as punctate structures, when allowing interaction of *nCFP-PEX7* and *PTS2-cCFP* (Figure 10B), which agrees with a previous report (Singh et al., 2009). Lesion simulating disease 1 (LSD1), a negative regulator of programmed cell death, is a zinc finger protein that forms homodimers. We also tried to detect LSD1 homooligomerization using the combination of LSD1-nYFP and LSD1-cYFP. Reconstituted YFP signals were observed in the cytosol and nucleus (Figure 10C), a result that coincided with previous data (Walter et al., 2004). The localization and interaction of one of the plasma membrane intrinsic proteins, PIP2, which belongs to the aquaporin family, with other PIP members were demonstrated by FRET and FLIM assays in maize cells (Zelazny et al., 2007). An Arabidopsis PIP2 gene, *PIP2;1*, was also fused to split fragments of mRFP1 and used for investigation of homooligomerization (Figure 10D). We were able to detect reconstituted RFP signals at the plasma membrane.

Fig. 10. Schematic representation of the multicolor BiFC vectors and examples of transient expression. (A) Four kinds of destination vectors for transient expression were generated to be compatible with Gateway technology. nXFP and cXFP, the N- or C-fragment, respectively, of CFP, GFP, YFP or RFP; *ColE1 ori*, ColE1 replication origin; *Ampr*, ampicillinresistance marker used for selection in bacteria; *Cmr*, chloramphenicol-resistance marker; *ccd*B, negative selection marker used in bacteria; P35S, 35S promoter; *Tnos*, nopaline synthase terminator; *R1*, *att*R1; *R2*, *att*R2. (B-E) Fluorescence images of onion epidermal cells expressing the fusion genes indicated above each panel were acquired 18-24 hr after particle bombardment. Bars = 50 µm

Multicolor BiFC assays have been developed to examine protein-protein interactions among various factors, since some combinations of N- and C-fragments of different fluorescent proteins allow reconstitution of signals (Hu & Kerppola, 2003; Kodama & Wada, 2009; Lee et al., 2008; Waadt et al., 2008). We also investigated which combinations among different fragments in our BiFC vectors are practical for reconstitution of signals using nXFP-PEX7 and PTS2-cXFP (XFP means CFP, GFP, YFP or RFP). Combinations among split fragments of CFP, GFP and YFP enabled the reconstitution of fluorescence (Table 1, Figure 10E), although some combinations did not give reproducible results. In contrast, a reconstituted RFP signal was observed only between split fragments from RFP (Table 1).


Table 1. Summary of the detection of reconstituted signals using various combinations of split fragments from different fluorescent proteins. cC, cG, cY and cR represent the C-fragment of CFP, GFP, YFP and RFP, respectively. nC, nG, nY and nR indicate the N-fragment of CFP, GFP, YFP and RFP, respectively. '+' and '-' denote detection of interaction and inability for interaction, respectively. ''±'' indicates that reproducible results could not be obtained

We adapted our BiFC vectors for transient expression (Figure 10A) to binary vectors for *in vivo* BiFC assays (Figure 11A). Using these binary vectors, researchers are able to easily generate transgenic plants expressing a fusion gene of the N- or C-fragment with a gene of interest. We prepared two kinds of binary vectors, containing either Kmr or Hygr markers. Therefore, after crossing transgenic plants expressing either the N- or C-fragment, it will be easier to obtain transgenic plants expressing both N- or C-fragments from screening on medium with both kanamycin and hygromycin.

Agroinfiltration is a powerful technique to express an alien gene *in planta*, and it has been reported that this technique is functional in BiFC assays (Bracha-Drori et al., 2004; Waadt et al., 2008; Walter et al., 2004). We examined whether our binary vectors could also work well in agroinfiltration using *nYFP-Peroxin 6 (PEX6)* and *cYFP-ABERRANT PEROXISOME MORPHOLOGY 9 (APEM9)* (Figure 11B). We already reported the interaction of PEX6 and APEM9 using transient expression of these fusion genes in onion epidermal cells (Goto et al., 2011). We mixed three cultures of *A. tumefaciens* (strain C58C1RifR) haboring *nYFP-PEX6*, *cYFP-APEM9* or *CFP-PTS1* as peroxisomal markers, and then co-infiltarted into the leaf cells of *Nicotiana tobaccum*. Reconstituted YFP signal was observed as punctate structures (Figure 11C), and these signals surrounded the CFP-labeled peroxisome matrix (Figure 11C-E), showing that BiFC occurs at the peroxisomal membrane, as reported previously (Goto et al., 2011). These results demonstrated that our binary vectors for BiFC assays work well.

#### **5.1.4 Special considerations for BiFC assays using our vectors**

In BiFC assays, fluorescence is derived from reconstituted fluorescence or artificial noise. The same is true for our BiFC vectors. Fluorescence is sometimes observed even in combination with an untagged vector as a negative control. Therefore, it is necessary to test expression using a negative control vector. Conversely, when fluorescence is not observed after expression of two fusion genes, there are two views about the result. One is that the interaction does not occur, although the two fusion genes are expressed properly. The other is that gene expression is inefficient or that the genes were inefficiently introduced into the

Fig. 11. Schematic representation of the binary vectors for the BiFC assay and examples of an *in vivo* BiFC experiment using an *Agrobacterium*-infiltration technique. (A) Four kinds of destination vectors for an *in vivo* BiFC assay were generated to be compatible with Gateway technology. nXFP and cXFP, the N- or C-fragment, respectively, of CFP, GFP, YFP or RFP; cXFP; *sta*, region conferring stability in *Agrobacterium*; *rep*, broad host-range replication origin; *bom*, *cis*-acting element for conjugational transfer; *ori,* ColE1 replication origin; *Cmr*, chloramphenicol-resistance marker; *ccd*B, negative selection marker used in bacteria; *P35S*, 35S promoter; *Tnos*, nopaline synthase terminator; *R1*, *att*R1; *R2*, *att*R2; Black arrowheads indicate right border and left border. (B-E) An example of an *in vivo* BiFC experiment. (B) Three fusion genes, *nYFP-PEX6*, c*YFP-APEM9* and *CFP-PTS1* as peroxisome markers were expressed in *Nicotiana tobaccum*. (C-E) Fluorescence images of leaf epidermal cells were acquired 3 days after infiltration. (C) Reconstituted YFP signals. (D) Peroxisomes visualized with CFP. (E) A merged image of (C) with (D). Insets represent magnified images of a peroxisome. Bars = 20 µm and 1 µm for each inset

cells. We always express an additional gene, such as *CFP-PTS1* in Figure 11, to investigate the efficiency of gene expression in transient assays. At the same time, this helps visualize cells and organelles so that it is easier to observe introduced cells that are bombarded or agro-infiltrated. The alternative method is the detection of fusion protein by immunoblotting. Some vectors are developed to add the epitope tag to split fragments so that the detection of accumulation of fusion proteins is carried out by immunoblotting (Bracha-Drori et al., 2004; Waadt et al., 2008; Walter et al., 2004). Of course, if specific antibodies against target protein are possible to obtain, they are useful for verification of protein accumulation.

#### **5.1.5 Perspectives**

50 Genetic Engineering – Basics, New Applications and Responsibilities

and PTS2-cXFP (XFP means CFP, GFP, YFP or RFP). Combinations among split fragments of CFP, GFP and YFP enabled the reconstitution of fluorescence (Table 1, Figure 10E), although some combinations did not give reproducible results. In contrast, a reconstituted RFP signal

nC-PEX7 nG-PEX7 nY-PEX7 nR-PEX7

PTS2-cC + + + - PTS2-cG + + + - PTS2-cY ± ± + - PTS2-cR - - - + Table 1. Summary of the detection of reconstituted signals using various combinations of split fragments from different fluorescent proteins. cC, cG, cY and cR represent the C-fragment of CFP, GFP, YFP and RFP, respectively. nC, nG, nY and nR indicate the N-fragment of CFP, GFP, YFP and RFP, respectively. '+' and '-' denote detection of

interaction and inability for interaction, respectively. ''±'' indicates that reproducible results

We adapted our BiFC vectors for transient expression (Figure 10A) to binary vectors for *in vivo* BiFC assays (Figure 11A). Using these binary vectors, researchers are able to easily generate transgenic plants expressing a fusion gene of the N- or C-fragment with a gene of interest. We prepared two kinds of binary vectors, containing either Kmr or Hygr markers. Therefore, after crossing transgenic plants expressing either the N- or C-fragment, it will be easier to obtain transgenic plants expressing both N- or C-fragments from screening on

Agroinfiltration is a powerful technique to express an alien gene *in planta*, and it has been reported that this technique is functional in BiFC assays (Bracha-Drori et al., 2004; Waadt et al., 2008; Walter et al., 2004). We examined whether our binary vectors could also work well in agroinfiltration using *nYFP-Peroxin 6 (PEX6)* and *cYFP-ABERRANT PEROXISOME MORPHOLOGY 9 (APEM9)* (Figure 11B). We already reported the interaction of PEX6 and APEM9 using transient expression of these fusion genes in onion epidermal cells (Goto et al., 2011). We mixed three cultures of *A. tumefaciens* (strain C58C1RifR) haboring *nYFP-PEX6*, *cYFP-APEM9* or *CFP-PTS1* as peroxisomal markers, and then co-infiltarted into the leaf cells of *Nicotiana tobaccum*. Reconstituted YFP signal was observed as punctate structures (Figure 11C), and these signals surrounded the CFP-labeled peroxisome matrix (Figure 11C-E), showing that BiFC occurs at the peroxisomal membrane, as reported previously (Goto et al.,

2011). These results demonstrated that our binary vectors for BiFC assays work well.

In BiFC assays, fluorescence is derived from reconstituted fluorescence or artificial noise. The same is true for our BiFC vectors. Fluorescence is sometimes observed even in combination with an untagged vector as a negative control. Therefore, it is necessary to test expression using a negative control vector. Conversely, when fluorescence is not observed after expression of two fusion genes, there are two views about the result. One is that the interaction does not occur, although the two fusion genes are expressed properly. The other is that gene expression is inefficient or that the genes were inefficiently introduced into the

**5.1.4 Special considerations for BiFC assays using our vectors** 

was observed only between split fragments from RFP (Table 1).

could not be obtained

medium with both kanamycin and hygromycin.

Our BiFC vectors have wide application to analysis of protein-protein interactions. Future introduction of the R4pGWB system (Nakagawa et al., 2008) to these BiFC vectors will allow regulation of each fusion gene under a specific promoter, leading to examination of the interaction with tissue or developmental stage specificity. Additionally, inducible promoters will be used for transient expression in transgenic plants harboring R4pGWB-based BiFC fragments. Since a great variety of fluorescent proteins with different properties, such as large Stokes' shift, are available, more various combinations for the multicolour BiFC assay will be generated by adopting our Gateway technology system to new fluorescent proteins, revealing the relationship among several factors in a complex.

#### **5.2 Recycling cloning system for multigene constructs**

Multigene transformation of plants is a powerful technology for molecular breeding because it can simultaneously improve multiple enzymes and factors constituting biological pathways (Ha et al., 2010; Nakayama et al., 2000; Naqvi et al., 2009; Ye et al., 2000). For multigene transformation, methods such as re-transformation, co-transformation, and crossfertilization are available (Dafny-Yelin & Tzfira, 2007), but the most practical method is the utilization of a multigene construct, a vector carrying multiple expression units (Chen et al., 2006). In this section, we introduce a recycling cloning system for cloning multiple expression units by simple repetitive LR reactions.

Fig. 12. Schematic illustration of recycling cloning. The pRED vector has the structure L1- MCS-R4-RCS-R3-L2. The gene of interest is cloned into the MCS of pRED419 and subsequently subjected to an LR reaction with a destination vector. In this step, the DNA fragment of gene-R4-RCS-R3 is incorporated into the destination vector and a binary clone carrying B1-gene-R4-RCS-R3-B2 is obtained. Next, the resulting binary clone is subjected to an LR reaction with pCON to introduce the R1-RCS-R2 sequence into the binary clone. The binary clone carrying B1-gene-B4-R1-RCS-R2-B3-B2 is recycled for introduction of another gene by LR reaction with another gene/pRED in a second cycle. The marker gene (M) is transcribed in the opposite orientation to the cloned gene. *B1*, *att*B1; *B2*, *att*B2; *B3*, *att*B3; *B4*, *att*B4; *L1*, *att*L1; *L2*, *att*L2; *L3*, *att*L3; *L4*, *att*L4; *R1*, *att*R1; *R2*, *att*R2; *R3*, *att*R3; *R4*, *att*R4; *M*, selection marker for plant; *Cmr*, chloramphenicol-resistance marker; *ccd*B, negative selection marker in *E. coli*.; MCS, multiple cloning site; RCS, rare cutter site

As shown in the right panel of Figure 12, two vectors are used for each cloning cycle in this system. The recycle donor vector pRED has four *att* sites, a multiple cloning site (MCS) and a rare cutter site (RCS) in the following order: *att*L1-MCS-*att*R4-RCS-*att*R3-*att*L2. The RCS

has *Asi*I, *Swa*I, *Fse*I, *Pac*I, *Asc*I and *Pme*I sites. The gene of interest is cloned into the MCS of pRED (gene/pRED) and subjected to an LR reaction with the destination vector containing R1-R2 acceptor sites. In this step, a binary construct carrying gene-*att*R4-RCS-*att*R3 is obtained. Next, conversion vector pCON, containing *att*L4-*att*R1-RCS-*att*R2-*att*L3, is subjected to an LR reaction to introduce the *att*R1-RCS-*att*R2 acceptor site into the resulting binary construct, and the binary construct obtained, which carries *att*R1-RCS-*att*R2, is recycled for the next round of the cloning cycle, together with another gene/pRED clone (Figure 12). Before the LR reactions, binary constructs are digested by a rare cutter to suppress colonies derived from non-recombinants. With these simple repetitive reactions, genes are introduced sequentially into one vector. Using this recycling cloning system, we made a multigene construct containing four expression units of reporter genes and confirmed expression of all four reporters in transformed tobacco BY-2 cells (Kimura, unpublished results).

### **6. Conclusions**

52 Genetic Engineering – Basics, New Applications and Responsibilities

will be generated by adopting our Gateway technology system to new fluorescent proteins,

Multigene transformation of plants is a powerful technology for molecular breeding because it can simultaneously improve multiple enzymes and factors constituting biological pathways (Ha et al., 2010; Nakayama et al., 2000; Naqvi et al., 2009; Ye et al., 2000). For multigene transformation, methods such as re-transformation, co-transformation, and crossfertilization are available (Dafny-Yelin & Tzfira, 2007), but the most practical method is the utilization of a multigene construct, a vector carrying multiple expression units (Chen et al., 2006). In this section, we introduce a recycling cloning system for cloning multiple

Fig. 12. Schematic illustration of recycling cloning. The pRED vector has the structure L1-

subsequently subjected to an LR reaction with a destination vector. In this step, the DNA fragment of gene-R4-RCS-R3 is incorporated into the destination vector and a binary clone carrying B1-gene-R4-RCS-R3-B2 is obtained. Next, the resulting binary clone is subjected to an LR reaction with pCON to introduce the R1-RCS-R2 sequence into the binary clone. The binary clone carrying B1-gene-B4-R1-RCS-R2-B3-B2 is recycled for introduction of another gene by LR reaction with another gene/pRED in a second cycle. The marker gene (M) is transcribed in the opposite orientation to the cloned gene. *B1*, *att*B1; *B2*, *att*B2; *B3*, *att*B3; *B4*, *att*B4; *L1*, *att*L1; *L2*, *att*L2; *L3*, *att*L3; *L4*, *att*L4; *R1*, *att*R1; *R2*, *att*R2; *R3*, *att*R3; *R4*, *att*R4; *M*, selection marker for plant; *Cmr*, chloramphenicol-resistance marker; *ccd*B, negative selection

As shown in the right panel of Figure 12, two vectors are used for each cloning cycle in this system. The recycle donor vector pRED has four *att* sites, a multiple cloning site (MCS) and a rare cutter site (RCS) in the following order: *att*L1-MCS-*att*R4-RCS-*att*R3-*att*L2. The RCS

MCS-R4-RCS-R3-L2. The gene of interest is cloned into the MCS of pRED419 and

marker in *E. coli*.; MCS, multiple cloning site; RCS, rare cutter site

revealing the relationship among several factors in a complex.

**5.2 Recycling cloning system for multigene constructs** 

expression units by simple repetitive LR reactions.

Gateway cloning is an efficient, reliable, easy and flexible technology, so many types of vectors have been developed and used worldwide. Our pGWBs series consists of many vectors with a variety of tags and four resistance markers. They are constructed on the same vector backbone and provide unified experimental conditions in transgenic research. Because the introduction of a tag sequence into pUGW is very easy (Figure 4), the number of vectors for fusion with new tags is growing in our Gateway vector system. Among them, vectors for fusion with split fluorescent proteins are very important tools for BiFC assays. Our Gateway technology-based BiFC vectors are useful when several fusion genes must be generated for detection of protein-protein interactions among several factors in transient or *in vivo* assays. Introduction of the R4pGWB system (Nakagawa et al., 2008) to these BiFC vectors will lead to wider applications. Recycling cloning has the potential to introduce many expression units in high efficiency and will open a new way for genetic engineering of plants.

#### **6.1 Distribution and information updates**

All vectors described in this chapter are available for non-commercial research purposes, although the permission of original developers is required for some tags. The e-mail addresses for requesting the vectors are mano@nibb.ac.jp (for distribution of BiFC vectors) and tnakagaw@life.shimane-u.ac.jp (for distribution of other pGWBs).

The list of pGWBs is updated on our website (http://shimane-u.org/nakagawa/gbv.htm).

#### **7. References**


Bracha-Drori K., Shichrur K., Katz A., Oliva M., Angelovici R., Yalovsky S., & Ohad N.

Brand L., Horler M., Nuesch E., Vassalli S., Barrell P., Yang W., Jefferson R.A., Grossniklaus

Chen H., Zou Y., Shang Y., Lin H., Wang Y., Cai R., Tang X., & Zhou J.M. (2008). Firefly

Chen Q.J., Zhou H.M., Chen J., & Wang X.C. (2006). A Gateway-based platform for

Chu J., Zhang Z., Zheng Y., Yang J., Qin L., Lu J., Huang Z.L., Zeng S., & Luo Q. (2009). A

Curtis M.D. & Grossniklaus U. (2003). A gateway cloning vector set for high-throughput

Dafny-Yelin M. & Tzfira T. (2007). Delivery of multiple transgenes to plant cells. *Plant* 

Day R.N., Periasamy A., & Schaufele F. (2001). Fluorescence resonance energy transfer

427, ISSN 0960-7412 (Print), 0960-7412 (Linking)

pp. 1194-1204

pp. 927-936

pp. 462-469

0889 (Linking)

(Print), 0032-0889 (Linking)

(Electronic), 0956-5663 (Linking)

(2004). Detection of protein-protein interactions in plants using bimolecular fluorescence complementation. *The Plant Journal*, Vol.40, No.3, (Nov 2004), pp. 419-

U., & Curtis M.D. (2006). A versatile and reliable two-component system for tissuespecific gene induction in Arabidopsis. *Plant Physiology*, Vol.141, No.4, (Aug 2006),

luciferase complementation imaging assay for protein-protein interactions in plants. *Plant Physiology*, Vol.146, No.2, (Feb 2008), pp. 368-376, ISSN 0032-0889

multigene plant transformation. *Plant Molecular Biology*, Vol.62, No.6, (Dec 2006),

novel far-red bimolecular fluorescence complementation system that allows for efficient visualization of protein interactions under physiological conditions. *Biosensors and Bioelectronics*, Vol.25, No.1, (Sep 15 2009), pp. 234-239, ISSN 1873-4235

functional analysis of genes in planta. *Plant Physiology*, Vol.133, No.2, (Oct 2003),

*Physiology*, Vol.145, No.4, (Dec 2007), pp. 1118-1128, ISSN 0032-0889 (Print), 0032-

microscopy of localized protein interactions in the living cell nucleus. *Methods*, Vol.25, No.1, (Sep 2001), pp. 4-18, ISSN 1046-2023 (Print), 1046-2023 (Linking) Diaz I., Martinez M., Isabel-LaMoneda I., Rubio-Somoza I., & Carbonero P. (2005). The DOF

protein, SAD, interacts with GAMYB in plant nuclei and activates transcription of endosperm-specific genes during barley seed development. *The Plant Journal*, Vol.42, No.5, (Jun 2005), pp. 652-662, ISSN 0960-7412 (Print), 0960-7412 (Linking) Ding Y.H., Liu N.Y., Tang Z.S., Liu J., & Yang W.C. (2006). Arabidopsis GLUTAMINE-RICH

PROTEIN23 is essential for early embryogenesis and encodes a novel nuclear PPR motif protein that interacts with RNA polymerase II subunit III. *The Plant Cell*, Vol.18, No.4, (Apr 2006), pp. 815-830, ISSN 1040-4651 (Print), 1040-4651 (Linking) Earley K.W., Haag J.R., Pontes O., Opper K., Juehne T., Song K., & Pikaard C.S. (2006).

Gateway-compatible vectors for plant functional genomics and proteomics. *The* 

mCherry as a new red bimolecular fluorescence complementation system for visualizing protein-protein interactions in living cells. *Biochemical and Biophysical Research Communications*, Vol.367, No.1, (Feb 29 2008), pp. 47-53, ISSN 1090-2104

Detection of higher-order G protein-coupled receptor oligomers by a combined

Fan J.Y., Cui Z.Q., Wei H.P., Zhang Z.P., Zhou Y.F., Wang Y.P., & Zhang X.E. (2008). Split

Gandia J., Galino J., Amaral O.B., Soriano A., Lluis C., Franco R., & Ciruela F. (2008).

*Plant Journal*, Vol.45, No.4, (Feb 2006), pp. 616-629

(Electronic), 0006-291X (Linking)

BRET-BiFC technique. *FEBS Letters*, Vol.582, No.20, (Sep 3 2008), pp. 2979-2984, ISSN 0014-5793 (Print), 0014-5793 (Linking)


Jach G., Pesch M., Richter K., Frings S., & Uhrig J.F. (2006). An improved mRFP1 adds red to

James P., Halladay J., & Craig E.A. (1996). Genomic libraries and a host strain designed for

Karimi M., Inze D., & Depicker A. (2002). GATEWAY vectors for *Agrobacterium*-mediated plant transformation. *Trends in Plant Science*, Vol.7, No.5, (May 2002), pp. 193-195 Karimi M., Bleys A., Vanderhaeghen R., & Hilson P. (2007a). Building blocks for plant gene assembly. *Plant Physiology*, Vol.145, No.4, (Dec 2007a), pp. 1183-1191 Karimi M., Depicker A., & Hilson P. (2007b). Recombinational cloning with plant gateway

Kodama Y. & Wada M. (2009). Simultaneous visualization of two protein complexes in a

Koizumi N., Ujino T., Sano H., & Chrispeels M.J. (1999). Overexpression of a gene that

Koizumi N. & Iwata Y. (2008). Construction of a binary vector for transformation of

Lee L.Y., Fang M.J., Kuang L.Y., & Gelvin S.B. (2008). Vectors for multi-color bimolecular

Loyter A., Rosenbluh J., Zakai N., Li J., Kozlovsky S.V., Tzfira T., & Citovsky V. (2005). The

Ludewig U., Wilken S., Wu B., Jost W., Obrdlik P., El Bakkoury M., Marini A.M., Andre B.,

Mano S. & Nishimura M. (2005). Plant peroxisomes. *Vitamins & Hormones*, Vol.72, 2005), pp.

Maple J., Aldridge C., & Moller S.G. (2005). Plastid division is mediated by combinatorial

Marrocco K., Zhou Y., Bury E., Dieterle M., Funk M., Genschik P., Krenz M., Stolpe T., &

No.3, (Feb 2006), pp. 423-438, ISSN 0960-7412 (Print), 0960-7412 (Linking)

single plant cell using multicolor fluorescence complementation analysis. *Plant Molecular Biology*, Vol.70, No.1-2, (May 2009), pp. 211-217, ISSN 1573-5028

encodes the first enzyme in the biosynthesis of asparagine-linked glycans makes plants resistant to tunicamycin and obviates the tunicamycin-induced unfolded protein response. *Plant Physiology*, Vol.121, No.2, (Oct 1999), pp. 353-361, ISSN

*Arabidopsis thaliana* with a new selection marker. *Bioscience, Biotechnology, and* 

fluorescence complementation to investigate protein-protein interactions in living plant cells. *Plant Methods*, Vol.4, 2008), pp. 24, ISSN 1746-4811 (Electronic), 1746-

plant VirE2 interacting protein 1. a molecular link between the Agrobacterium Tcomplex and the host cell chromatin? *Plant Physiology*, Vol.138, No.3, (Jul 2005), pp.

Hamacher T., Boles E., von Wiren N., & Frommer W.B. (2003). Homo- and heterooligomerization of ammonium transporter-1 NH4 uniporters. *The Journal of Biological Chemistry*, Vol.278, No.46, (Nov 14 2003), pp. 45603-45610, ISSN 0021-9258

assembly of plastid division proteins. *The Plant Journal*, Vol.43, No.6, (Sep 2005), pp.

Kretsch T. (2006). Functional analysis of EID1, an F-box protein involved in phytochrome A-dependent light signal transduction. *The Plant Journal*, Vol.45,

2006), pp. 597-600, ISSN 1548-7091 (Print), 1548-7091 (Linking)

vectors. *Plant Physiology*, Vol.145, No.4, (Dec 2007b), pp. 1144-1154

pp. 1425-1436, ISSN 0016-6731 (Print), 0016-6731 (Linking)

(Electronic), 0167-4412 (Linking)

0032-0889 (Print), 0032-0889 (Linking)

4811 (Linking)

(Print), 0021-9258 (Linking)

*Biochemistry*, Vol.72, No.11, (Nov 2008), pp. 3041-3043

1318-1321, ISSN 0032-0889 (Print), 0032-0889 (Linking)

111-154, ISSN 0083-6729 (Print), 0083-6729 (Linking)

811-823, ISSN 0960-7412 (Print), 0960-7412 (Linking)

bimolecular fluorescence complementation. *Nature Methods*, Vol.3, No.8, (Aug

highly efficient two-hybrid selection in yeast. *Genetics*, Vol.144, No.4, (Dec 1996),


signals in mixed DNA cloning by the Multisite Gateway system. *Journal of Biotechnology*, Vol.107, No.3, (Feb 5 2004), pp. 233-243

