**3.1 Homologous recombination**

Homologous recombination occurs throughout genomes of diploid organisms immediately before cell division in the late S or G2 phases of the cell cycle and is responsible for recombining large pieces of DNA that have a very similar sequences [12]. The mechanism of homologous recombination is complex and may involve many enzymes but is very accurate and tightly controlled. It can repair double-stranded breaks with either single or double ends, even those with covalently attached proteins [13]. Holliday junctions are formed, and their resolution determines the outcome.

#### **3.2 Nonhomologous recombination**

Nonhomologous (illegitimate) recombination occurs in regions where no largescale sequence similarity is apparent and is responsible for translocations between nonhomologous chromosomes or deletions of several genes from a chromosome [13]. It is the main mechanism for DNA repair that takes place throughout the cell cycle, repairing DNA damage due to chemicals and UV light. It efficiently restores chromosomal integrity at the risk of introducing local sequence errors.

The mechanisms of non-homologous recombination are nonhomologous end-joining (NHEJ) and alternative NHEJ (altNHEJ, also known as microhomology-mediated end-joining, MMEJ). They involve the ligation of two doublestranded breaks with little or no sequence homology, without the need for a repair template [13, 14].

#### **3.3 Replicative recombination**

This is a specialized type of recombination in which a segment of DNA is translocated from one location on a chromosome to another on the same or another chromosome in a process that involves the generation of a new copy of a segment of DNA [11]. Many transposable genetic elements use this process to generate a new copy of the transposable genetic element at a new location.

#### **3.4 Site-specific recombination**

Site-specific recombination (SSR) is widespread in prokaryotes, involves much shorter DNA segments and requires specific nucleotide sequences that are recognized by specific proteins known as recombinases. The lambda integrase system for integration into *E. coli* genome was the first to be discovered, but many more systems have since been discovered and characterized. Site-specific recombination brings together two short DNA sequences on separate locations on the same or separate DNA molecules, with the cutting and re-joining of the DNA molecules in a recombination reaction catalyzed by specific SSR enzyme systems [10, 15]. The process is conservative since it does not involve DNA synthesis or degradation, or any high-energy cofactors such as ATP, and is thus distinct from homologous, nonhomologous and replicative recombination. The outcomes of SSR are integration/ excision, inversion or linear recombination depending on the initial orientation of the two target sites.

The conservative site-specific recombinases can be classified into two families: serine family recombinases (formerly known as invertase/resolvase) and tyrosine family recombinases (formerly known as integrase) based on the amino acid that acts as the active site nucleophile during DNA breakage [15]. An example of serine family recombinase is bacteriophage PhiC31 integrase. Examples of tyrosine family recombinases are Lambda integrase, Cre recombinase and Flp recombinase.

The serine family recombinases carry out DNA inversion or DNA resolution (excision) reactions. The mechanism involves staggered double-stranded breaks in two parallel dsDNA molecules participating in the exchange, followed by a 180° rotation of the recombination complex (in a plane perpendicular to that of the DNA molecules), and then ligated. The tyrosine family recombinases carry out DNA integration reactions. The mechanism involves the formation of a Holliday junction because of initial cuts in only one (inner) strand of each of two dsDNA molecules positioned antiparallel to each other, and they are rejoining across the molecules. The Holliday junction is resolved when the outer DNA molecules are also cut and rejoined to result in recombinant DNA strands [15]. The reader is referred to Jayaram et al. [15] for more details of recombination geometries.

In-plant biotechnology, the cre-loxP system is a historically prominent SSR system and will be considered in more detail below. Recently, the CRISPR-Cas 9 system and related nuclease variants have gained great prominence and will also be considered in detail.

### *3.4.1 Cre-loxP recombination system*

The Cre-*lox*P site-specific recombination system is based on a naturally occurring Bacteriophage P1 system. The name 'cre' is derived from '*c*auses/*c*yclization *r*ecombination' while '*lox*P' is derived from '*lo*cus of crossing over, *x*, in P1'. The *lox*P site is composed of 34 bp consensus sequence consisting of an 8 bp nonsymmetrical central region flanked by two 13 bp palindromic sequences. Cre recombinase is a 38 kDa protein that catalyses the recombination of two *lox*P recognition sites on the same or different DNA strands using tyrosine 324 for the nucleophilic attack [9]. The recombination takes place via a Holliday junction intermediate formed by two antiparallel DNA molecules/segments to which a dimer of Cre recombinase subunits is bound to each *lox*P site. Two opposite active Cre recombinase subunits catalyse strand cleavage, exchange and ligation at the 8 bp nonsymmetric central region, thus resolving the Holliday junction intermediate. Excision of DNA flanking two *lox*P sequences occurs if the two have the same orientation; if their orientation is opposite, then inversion of the intervening sequence occurs. Strand exchange or translocation will occur if two *lox*P sites located on different DNA molecules recombine.

#### *3.4.2 CRISPR/Cas9 and other variants*

The CRISPR system was first reported in 1987 in *E. coli* where it functions as a form of adaptive immunity against invading nucleic acid [16] and has since been shown to be of ubiquitous occurrence [17]. Many variations have since been discovered in nature, and modifications have also been introduced by genetic engineering for ease of use.

The CRISPR/Cas9 system currently used is composed of an RNA-dependent DNA endonuclease called Cas 9 protein, complexed with a guide RNA (gRNA). The gRNA is only 20 nucleotides long and is complementary to the target DNA to which it recruits the Cas9 protein [18]. The Cas9/gRNA then binds to a short but specific protospacer adjacent motif (PAM) sequence at the 3′ end of the target sequence. For *Streptococcus pyogenes*, the PAM sequence is 5'-NGG-3′. Cas9 protein then introduces a double-stranded break (DSB) on the target sequence. The DSB will be repaired by HR or NHEJ, resulting in insertion, deletion, or fragment replacement within the target site. Thus, recombination will be effected.

## **4. Cisgenics**

Cisgenesis is defined as the genetic modification of a recipient plant with a natural gene (in the sense orientation, with its natural promoter, terminator and introns) from a sexually compatible plant [5]. A closely related concept is that of intragenesis, where an additional hybrid copy of a gene from the same or crossable species is introduced in sense or anti-sense orientation, combining promoter, coding region and terminator from different genes [19–22]. In intragenesis therefore, some changes or reshuffling of coding or control regions of the natural gene(s) will have occurred, unlike in cisgenesis. In addition, Rommens et al. [4] stipulates that for *Agrobacterium*-mediated transformation, border sequences derived from plants (P-DNA) should be used in place of T-DNA. Cisgenesis and intragenesis are contrasted with transgenesis, which is the genetic modification of a recipient plant with one or more genes from any non-plant organism, or from a donor plant that is sexually incompatible with the recipient plant. Holme et al. [21] discusses the

varying stringency with which the term 'cisgenic' has been used over the years. The strictest definitions of the terms are advocated for since technological advances now enable more precise genetic modification followed by more detailed sequence analysis of the resulting genetically modified plants. This would also facilitate the implementation of different regulatory regimes for cisgenic and transgenic plants.

Early definitions of cisgenesis emphasized the source of the gene of interest used in transformation and may not have insisted on the complete absence of other accompanying sequences. At that time, almost all transgenic plants were developed using *Agrobacterium*-mediated transformation or biolistics, with the gene of interest being introduced as part of a binary plasmid with the reporter and selectable marker genes. The least stringent definition did not fully consider the possible presence of these extra genetic sequences, and the sites of insertion. Later reports of cisgenesis included procedures to remove extra sequences via traditional crossing or by site-specific recombination procedures.

The strictest definition of cisgenesis should apply only when the procedures through which the plant was modified do not involve any DNA sequences, however short or procedurally essential, from any non-plant organism or sexually incompatible plant. This strict definition has recently become possible because of recently developed tools for site-specific recombination and genome editing. Some examples are considered below.

Many different strategies have been used to meet the marker-free status that is required for cisgenic plants. Where transformation efficiencies are high, plant transformation can be carried out using constructs that do not have selectable markers; transformed lines are identified by screening for the specific gene sequences that have been introduced. Biolistic transformation using appropriate minimal cassettes has also been suggested [7]. This requires analysis of many lines, which makes it time-consuming and expensive.

In an alternative strategy, constructs in which selectable markers are flanked by site-specific recombination sites have been used. The selectable markers are later deleted from transformed plants following induction of the site-specific recombination system. Examples, where this approach was used, are in intragenic strawberries [23] and in cisgenic and intragenic apples [24, 25]. In maize, a series of transgenic maize lines that express five different recombinases have been generated and can be used for selectable marker removal and transgene integration into specific loci [26].

Marker-free transformants may also be obtained through a co-transformation strategy, where the selectable marker and the transgene of interest are introduced on different vector constructs so that they integrate into different locations on the plant genome. The two genes may then segregate into different progeny in subsequent generations. Cisgenic durum wheat [27] and cisgenic barley [28] were generated using this strategy.

In all these strategies however the site of integration of the transgene is random, and there is always a chance that vector backbone sequences may also be integrated into the plant. Recent work with CRISPR-based strategies has attempted to address these shortcomings.

#### **5. Genome-editing technologies and cisgenics**

Genome editing is the addition, removal or alteration of genetic material (at particular locations) in the genome of an organism. Concurrent developments in site-specific recombination and genome sequencing technologies have made (precision) genome editing a reality. It is now possible to sequence the whole genome of an organism in a very short period and at a cost that is affordable to research

laboratories. Many site-specific recombination systems have been developed into technologies that can target specific sites in the genome at which specific, predetermined changes will be introduced. Re-sequencing of the genome will verify the specificity of the modifications.

To initiate genome-editing, double-stranded breaks are made in the target genome at the site to be edited. Many tools have been developed for precision targeting of these double-stranded breaks. These include meganucleases, zinc finger nucleases (ZFN), transcription activator-like effector nucleases (TALENS) and CRISPR/Cas. The requirements for recognition of specific DNA nucleotide sequence sites and mechanisms of cleavage for these nucleases are reviewed in detail elsewhere [10]. Once the double-stranded cuts have been made on a DNA molecule, endogenous cellular factors recognise and bind to these sites of discontinuity and initiate repair by either HR or NHEJ mechanisms resulting in addition, removal or some other kind of alteration of the DNA nucleotide sequence following the design of the editing system used. In this paper, we will use the CRISPR/Cas system to further illustrate this, and show how cisgenic plants *sensus stricto* can be obtained.

Truly cisgenic plants should be a reporter- and selection marker-free, should not contain sequences from non-crossable species, and the editing must be done by a precise mechanism at a pre-determined genomic site. Most of the reports on genome editing do not result in cisgenic plants because they do not satisfy at least one of these requirements. Most reports use selection marker genes, DNA plasmids with sequences of bacterial or other origins, or the coding sequence or flanking sequences introduced have been modified from their native state in the crossable species where they are derived from.

Recent developments in the use of the CRISPR/Cas system in plant genome editing are reviewed by Wada et al., Metje-Sprink et al., Nadakuduti et al. [18, 29, 30]. A strategy that would inspire confidence in both consumers and regulators is one where the transformation method does not involve the use of DNA sequences at all. Thus, at least two DNA-free genome editing strategies have been reported. The first involves the use of viral RNA vectors. The second uses pre-assembled CRISPR/Cas, with only a short gRNA and no other nucleic acids.

An example of the first approach is presented by Ma et al. [31] who described an example of an RNA virus-vectored system. They engineered the negative-strand RNA virus, Sonchus yellow net rhabdovirus (SYNV) by inserting the CRISPR sequences for the guide RNA and Cas9 protein between the N and P genes of the virus. No selection marker was used. Infection was carried out by mechanical inoculation or by agroinfiltration of transformed *Agrobacterium* cells. Over 90% of plants regenerated from virus-infected tissue had the successful deletion of the target GFP gene used in the experiment [31]. The system must now be evaluated using an agriculturally important gene.

In the second approach, pre-assembled CRISPR/Cas9 ribonucleoproteins can be transfected into protoplasts or in vitro fertilized zygotes [32, 33]. This has been successfully done in rice zygotes [33]. However, the difficulty in the regeneration of whole plants from the protoplasts makes this method not applicable to many important species. The ribonucleoprotein or RNA may also be biolistically delivered into immature embryo cells or calli. This has been done with wheat [34, 35], maize [36] and rice [37]. However, the efficiency of editing is very low.

While there are thousands of CRISPR systems, most of the work has been done using the CRISPR/Cas9 system. However, the recently discovered system from *Prevotella* and *Francisella* (CRISPR/Cpf1, renamed CRISPR/Cas12) appears to be easier to adapt to DNA-free applications. This is mainly because the Cas 12a protein is smaller and will thus be easier to transfect into cells [29].
