**4.1 Effects of non-B DNA structures on DNA replication / transcription**

Some regions of DNA forming non-B DNA structures in replication or transcription, which may turn to affect the DNA transactions (Van Holde & Zlatanova, 1994; Samadashwily et al., 1997; Krasilnikova et al., 2004; Lin et al., 2006; Mirkin & Mirkin, 2007)

The Gratuitous Repair on Undamaged DNA Misfold 411

Indeed, it has been found that DNA double-stranded breaks (DSBs) can sometimes be accumulated at or around the repeating sequences, and error-prone repair pathways were also proposed to be involved in forming gross DNA rearrangements (Kurahashi et al., 2006). Moreover, DNA breaks may also happen in the single-stranded area, or the structured region when they serve as targets of nuclease activity, leading to enhanced mutagenesis or recombination. The breakpoints of the disease-causing translocation cluster within a 150-bp genomic region of the *bcl*-2 gene were seen potentially form a triplex DNA structure (Adachi

It has long been found that, the efficacies of DNA replication in the leading and lagging strand templates were differently performed in *E. coli* chromosome. Replication errors and SOS mutator effects occurred preferentially in the lagging strand, while intermolecular strand switch events during DNA replication occurred preferentially in the leading strand (Iwaki et al., 1995; Trinh & Sinden, 1995; Iwaki et al., 1996; Fijalkowska et al., 1998; Sinden et al., 1999; Maliszewska-Tkaczyk, 2002; Gawel et al., 2002; Hashem & Sinden, 2005). Similarly, unequal fidelities have also been found with deletions between direct repeats in the leading strand template (Hashem & Sinden, 2005). This may attribute to potential of non-B DNA structure formation in the leading and lagging strand template in DNA replication. Similarly, the replication fidelities of various inverted repeats, direct repeats, including trinucleotide repeats can also be compromised if they adopt non-B DNA conformations, such as hairpin, cruciform, triplex, tetra-duplex DNA, leading potentially to mutations or

In eukaryotes, chromosomal DNA wrapping around histones in nucleosomes interferes with the protein binding to promoters and origins of replication. Nucleosome formations, on one hand, and formation of cruciform, Z-DNA and triplex DNA, on the other hand, are mutually exclusive. Thus, the alternative structure-forming DNA sequences may expose nucleosome-free DNA, making them accessible to transcription, replication, recombination proteins as well as nucleases, producing fragile sites in chromosome (chwartz et al., 2006;

Fragile sites are specific loci that appear as constrictions, gaps, or breaks on chromosomes from cells exposed to partial inhibition of DNA replication (Schwartz et al., 2006; Lukusa & Fryns, 2008). In chromosomal level, fragile sites always lack nucleosomes, and sometimes can be associated with trinucleotide repeats (TNRs) of CGG· CCG, CAG· CTG, GAA· TTC and GCN· NGC, with specific G‑rich tetra- to dodecanucleotide repeats or with long AT‑rich repeats, such as the 33 or 42 minisatellites in the FRA16B and FRA10B common fragile sites (Wang & Griffith, 1996). In the same time, fragile sites can be classified as rare or common, depending on their frequency within the population and their specific mode of induction. So far, there are more than 89 common fragile sites listed in GDB (Gene Databases), which are considered to be an intrinsic part of the chromosomal structure presented in all individuals. Six common fragile sites have been cloned and characterized, including FRA3B (Huebner & Croce, 2001; Lettessier et al., 2011), FRA7G, FRA7H, FRA16D (Shah et al., 2010), FRAXB , and FRA6F. Common fragile site instability was attributed to the fact that they contain sequences prone to form secondary structures that may impair replication fork movement, possibly leading to fork collapse and resulting in DNA breaks. Most rare fragile sites are induced by folate shortage, and others are induced by DNA minor groove binders. So far, seven folate sensitive (FRA10A, FRA11B, FRA12A, FRA16A, FRAXA,

& Tsujimoto, 1990; Raghavan et al., 2004, 2005).

rearrangements (Pan & Leach, 2000; Sinden et al., 2002).

**4.4 Nucleosome exclusion** 

Lukusa & Fryns, 2008).

One of the well-studied effects of the non-B structures on replication is a block to polymerases because of template folding, which was shown for cruciforms/ hairpins and H-DNA (Samadashwily et al., 1997; Krasilnikova et al., 2004; Voineaqu et al., 2009).

It has been found that triplex DNA can adversely affect DNA replication and potentially lead to replication fork collapse (Samadashwily et al., 1997; Krasilnikova et al., 2004; Voineaqu et al., 2009). The polypurine·strand of a triplex forming duplex may not be a potential template, therefore giving increased chance of being single stranded, and forming intermolecular or intramolecular triplex (Hile & Eckert, 2004; Urban et al.,2010). Besides,a non-B DNA structure itself may also directly slow the progression of replication fork (Samadashwily et al., 1997; Mirkin & Mirkin, 2007; Trinh & Sinden, 1991). Such non-B DNA structures may be an obstacle to fork progression or a target for nucleolytic attack, thus allowing DNA breakage leading to deletion or recombination (Mirkin, 2006; Kim et al., 2006).

In contrast, the single-stranded parts in a cruciform or H-DNA may serve as the recognition elements for replication initiation proteins. For example, cruciform binding proteins (CBP), such as 14-3-3sigma in HeLa cells recruits replication proteins to a cruciform to start replication (Alvarez et al., 2002; Novac et al., 2002). Therefore, it is possible for a hairpin/cruciform DNA sequence behaves like a replication "origin", inducing an origin independent DNA replication. The similar way of DNA replication has been found in *E.coli* and named as stable DNA replication. More interestingly, the origin independent DNA replication has also been proposed as a mechanism for the production of expanded DNA repeats (Pan 2006).

In addition, certain non-B DNA structures can also interfere with RNA transcription and recombination (Van Holde & Zlatanova, 1994; Broxson et al., 2011). Similarly RNA transcription can also promote forming non-B DNA structures, including hairpin, triplexs and G4DNA (Van Holde & Zlatanova, 1994; Broxson et al., 2011).

#### **4.2 Modulation of supercoiling and promoting transcription**

The extent of supercoiling in a DNA segment is known to affect transcription, recombination, and replication such that an ideal DNA topology may be critical for them. It has been found that formation of cruciforms, Z-DNA and H-DNA caused partial relaxation of excessive superhelicity in a topological domain. Specific cases of DNA replication and gene expression have also been described as superhelicity dependent events induced by formation of cruciforms, Z-DNA and H-DNA.

#### **4.3 Accumulation of DNA Damages causing increased mutability within non-B DNA structure forming sequences or their flanking sequences**

DNA sequences that are prone to adopting non-B DNA secondary structures are associated with hot spots of genomic instability, where repeat expansions, chromosomal fragility, or gross chromosomal rearrangements can be often seen. For example, long repeating tracts of CTG· CAG, CCTG· CAGG, and GAA· TTC are associated with the etiology of myotonic dystrophy type 1 (DM1), type 2(DM2), and Friedreich's ataxia (FRDA) (Wells, 2007). The repeating sequences involved have potentials to adopt a variety of non-B DNA secondary structures (McMurray, 1999; Pan, 2004, 2006, 2009). Studies in various model systems, including *Escherichia coli* and mammalian cell lines, such as COS-7, CV-1, and HEK-293, have revealed that conditions promoting formation of non-B DNA structures enhanced the repeats instabilities. Such instabilities can occur both within the repeat sequences and in the flanking sequences of up to ~4 kbp (Wojciechowska et al., 2006).

One of the well-studied effects of the non-B structures on replication is a block to polymerases because of template folding, which was shown for cruciforms/ hairpins and H-

It has been found that triplex DNA can adversely affect DNA replication and potentially lead to replication fork collapse (Samadashwily et al., 1997; Krasilnikova et al., 2004; Voineaqu et al., 2009). The polypurine·strand of a triplex forming duplex may not be a potential template, therefore giving increased chance of being single stranded, and forming intermolecular or intramolecular triplex (Hile & Eckert, 2004; Urban et al.,2010). Besides,a non-B DNA structure itself may also directly slow the progression of replication fork (Samadashwily et al., 1997; Mirkin & Mirkin, 2007; Trinh & Sinden, 1991). Such non-B DNA structures may be an obstacle to fork progression or a target for nucleolytic attack, thus allowing DNA breakage leading to

In contrast, the single-stranded parts in a cruciform or H-DNA may serve as the recognition elements for replication initiation proteins. For example, cruciform binding proteins (CBP), such as 14-3-3sigma in HeLa cells recruits replication proteins to a cruciform to start replication (Alvarez et al., 2002; Novac et al., 2002). Therefore, it is possible for a hairpin/cruciform DNA sequence behaves like a replication "origin", inducing an origin independent DNA replication. The similar way of DNA replication has been found in *E.coli* and named as stable DNA replication. More interestingly, the origin independent DNA replication has also been proposed as a mechanism for the production of expanded DNA

In addition, certain non-B DNA structures can also interfere with RNA transcription and recombination (Van Holde & Zlatanova, 1994; Broxson et al., 2011). Similarly RNA transcription can also promote forming non-B DNA structures, including hairpin, triplexs

The extent of supercoiling in a DNA segment is known to affect transcription, recombination, and replication such that an ideal DNA topology may be critical for them. It has been found that formation of cruciforms, Z-DNA and H-DNA caused partial relaxation of excessive superhelicity in a topological domain. Specific cases of DNA replication and gene expression have also been described as superhelicity dependent events induced by

**4.3 Accumulation of DNA Damages causing increased mutability within non-B DNA** 

DNA sequences that are prone to adopting non-B DNA secondary structures are associated with hot spots of genomic instability, where repeat expansions, chromosomal fragility, or gross chromosomal rearrangements can be often seen. For example, long repeating tracts of CTG· CAG, CCTG· CAGG, and GAA· TTC are associated with the etiology of myotonic dystrophy type 1 (DM1), type 2(DM2), and Friedreich's ataxia (FRDA) (Wells, 2007). The repeating sequences involved have potentials to adopt a variety of non-B DNA secondary structures (McMurray, 1999; Pan, 2004, 2006, 2009). Studies in various model systems, including *Escherichia coli* and mammalian cell lines, such as COS-7, CV-1, and HEK-293, have revealed that conditions promoting formation of non-B DNA structures enhanced the repeats instabilities. Such instabilities can occur both within the repeat sequences and in the

DNA (Samadashwily et al., 1997; Krasilnikova et al., 2004; Voineaqu et al., 2009).

deletion or recombination (Mirkin, 2006; Kim et al., 2006).

and G4DNA (Van Holde & Zlatanova, 1994; Broxson et al., 2011).

**4.2 Modulation of supercoiling and promoting transcription** 

**structure forming sequences or their flanking sequences** 

flanking sequences of up to ~4 kbp (Wojciechowska et al., 2006).

formation of cruciforms, Z-DNA and H-DNA.

repeats (Pan 2006).

Indeed, it has been found that DNA double-stranded breaks (DSBs) can sometimes be accumulated at or around the repeating sequences, and error-prone repair pathways were also proposed to be involved in forming gross DNA rearrangements (Kurahashi et al., 2006). Moreover, DNA breaks may also happen in the single-stranded area, or the structured region when they serve as targets of nuclease activity, leading to enhanced mutagenesis or recombination. The breakpoints of the disease-causing translocation cluster within a 150-bp genomic region of the *bcl*-2 gene were seen potentially form a triplex DNA structure (Adachi & Tsujimoto, 1990; Raghavan et al., 2004, 2005).

It has long been found that, the efficacies of DNA replication in the leading and lagging strand templates were differently performed in *E. coli* chromosome. Replication errors and SOS mutator effects occurred preferentially in the lagging strand, while intermolecular strand switch events during DNA replication occurred preferentially in the leading strand (Iwaki et al., 1995; Trinh & Sinden, 1995; Iwaki et al., 1996; Fijalkowska et al., 1998; Sinden et al., 1999; Maliszewska-Tkaczyk, 2002; Gawel et al., 2002; Hashem & Sinden, 2005). Similarly, unequal fidelities have also been found with deletions between direct repeats in the leading strand template (Hashem & Sinden, 2005). This may attribute to potential of non-B DNA structure formation in the leading and lagging strand template in DNA replication. Similarly, the replication fidelities of various inverted repeats, direct repeats, including trinucleotide repeats can also be compromised if they adopt non-B DNA conformations, such as hairpin, cruciform, triplex, tetra-duplex DNA, leading potentially to mutations or rearrangements (Pan & Leach, 2000; Sinden et al., 2002).

#### **4.4 Nucleosome exclusion**

In eukaryotes, chromosomal DNA wrapping around histones in nucleosomes interferes with the protein binding to promoters and origins of replication. Nucleosome formations, on one hand, and formation of cruciform, Z-DNA and triplex DNA, on the other hand, are mutually exclusive. Thus, the alternative structure-forming DNA sequences may expose nucleosome-free DNA, making them accessible to transcription, replication, recombination proteins as well as nucleases, producing fragile sites in chromosome (chwartz et al., 2006; Lukusa & Fryns, 2008).

Fragile sites are specific loci that appear as constrictions, gaps, or breaks on chromosomes from cells exposed to partial inhibition of DNA replication (Schwartz et al., 2006; Lukusa & Fryns, 2008). In chromosomal level, fragile sites always lack nucleosomes, and sometimes can be associated with trinucleotide repeats (TNRs) of CGG· CCG, CAG· CTG, GAA· TTC and GCN· NGC, with specific G‑rich tetra- to dodecanucleotide repeats or with long AT‑rich repeats, such as the 33 or 42 minisatellites in the FRA16B and FRA10B common fragile sites (Wang & Griffith, 1996). In the same time, fragile sites can be classified as rare or common, depending on their frequency within the population and their specific mode of induction. So far, there are more than 89 common fragile sites listed in GDB (Gene Databases), which are considered to be an intrinsic part of the chromosomal structure presented in all individuals. Six common fragile sites have been cloned and characterized, including FRA3B (Huebner & Croce, 2001; Lettessier et al., 2011), FRA7G, FRA7H, FRA16D (Shah et al., 2010), FRAXB , and FRA6F. Common fragile site instability was attributed to the fact that they contain sequences prone to form secondary structures that may impair replication fork movement, possibly leading to fork collapse and resulting in DNA breaks. Most rare fragile sites are induced by folate shortage, and others are induced by DNA minor

groove binders. So far, seven folate sensitive (FRA10A, FRA11B, FRA12A, FRA16A, FRAXA,

(Fry, 2007).

**5.2 NER and HR proteins** 

stranded DNA region in an H-DNA.

third strand ((Jain et al., 2010).

**5.3.1 RecQ helicases BLM, WRN, RECQL4 and Sgs1** 

(Bachrati & Hickson, 2003; Cejka & Kowalczykowski, 2010; Masai, 2011).

**5.3 Helicases and junction resolvases** 

The Gratuitous Repair on Undamaged DNA Misfold 413

binds perfect hairpin formed by inverted repeats (lacking mismatched regions), affinity is low, suggesting that mismatches are important for the MMR protein binding (Kantelinen et al., 2008). In addition, MutS has also been reported to bind parallel G4 DNA in humans

NER proteins, such as the UvrB and UvrC in *E.coli*, and the XPA, XPG, XPC in eukaryotes and homologous recombination proteins, such as RecA, HsRad51, were found to be involved in H-DNA mediated repair and recombination (Bacolla et al., 2001). UvrB and UvrC may preferentially recognize the helical distortions, while RecA recognizing single

Proteins that preferentially catalyze the unwinding of DNA non-B DNA secondary structures are DNA helicases in ATP-hydrolysis dependent manner. Helicases are DNA unwinding enzymes that preferentially melt some of the non-B DNA structures. The selectivity of helicases on non-B DNA secondary structures has been identified in simian virus 40 (SV40), yeast and human cells. The most studied helicases are members of RecQ family, whose roles are found in a broad range of organisms from *E. coli* RecQ to humans WRN, BLM and RecQL4 (Mohaghegh et al., 2001; Bachrati & Hickson, 2003; Cobb & Bjergbaek, 2006; Masai, 2011). All the non-B DNA secondary structure unwinding helicases act catalytically and all require for their hydrolysis of nucleotide triphosphate, normally ATP, and the presence of Mg2+ ions. For example, G-quadruplex DNA substrates are unwound by RecQ helicase with a 3'→5' polarity and need the tetraplex to hold a short 3' single-stranded tail that serves as a "loading dock" for these enzymes (Jain et al., 2010). It should be emphasized, however, that none of the described helicases unwinds tetraplex DNA only and all the enzymes are also able to unfold, although at a lower efficiency, other DNA structures such as duplex DNA, Holliday junctions or triplex. Recently, DHX9 helicase from human cells was found to co-immunoprecipitate with triplex DNA, suggesting a role in maintaining genome stability (Jain et al., 2010). DHX9 displaced the third strand from a specific triplex DNA and catalyzed the unwinding with a 3' to 5' polarity for the displaced

RecQ helicases are a group of DNA helicases that are conserved from bacteria to man (Bachrati & Hickson, 2003). *Rec*Q helicase is named after the *rec*Q gene of *Escherichia coli* and has the activity of unwinding DNA in the 3'–5' direction in relation to the DNA strand in which the enzyme is bound (Mohaghegh et al., 2001). There are at least five homologues in humans, three of which are associated with genetic diseases. The yeast homologue of RecQ is Sgs1, whose function was found to be similar to most of the members in the RecQ family

It has been reported that, without a functional RecQ helicase, DNA replication does not advance normally. In humans, lacking of WRN or BLM protein accumulates aberrant replication intermediates (Harrigan et al., 2003; Cheok et al., 2005), this may allow for certain non-B DNA structure forming (Mohaghegh et al., 2001; Bacolla et al., 2011). Therefore, it is not surprising to see that more and more reports are going to be published

FRAXE and FRAXF) and two nonfolate sensitive (FRA10B and FRA16B) fragile sites have been molecularly characterized. Interestingly, almost all these fragile sites are found to have expanded DNA repeats resulting from mutation involving the normally occurring polymorphic CCG/CGG trinucleotide repeats and AT-rich minisatellite repeats (Balakumaran et al., 2000; Voineagu et al., 2009).

The expanded repeats were also demonstrated to have the potentials, at least under certain circumstances, to form stable secondary non-B DNA structures, including intrastrand hairpins, slipped strand DNA or tetrahelical structures, or to present flexible repeat sequences. Both of which are expected to affect the replication. In addition, these DNA sequences are also found to decrease the efficiency of nucleosome assembly, resulting in decondensation defects seen as fragile sites (Wang & Griffith, 1996; Freudenreich, 2007).

#### **5. Genes and gene products that are involved in abnormal folding**

A numerous proteins that interact with non-B DNA secondary structures have been characterized recently. These proteins may also be called as DNA structure-specific proteins, such as Rad1, Rad2, Rad10, Msh2, Msh3, BLM, WRN and Sgs1 (Bhattacharyya & Lahue, 2004; Nag & Cavallo, 2007; Kantelinen et al., 2010; Pichierri et al., 2011). These DNA structure-specific proteins can be further classified by function into several distinct groups, depending on their possible effects on the formation/ stability of non-B DNA structure. Some of the binding proteins may increase the stability of the bound non-B DNA secondary structures; and some may promote forming non-B DNA secondary structures; or destabilize non-B DNA secondary structures. Indeed, the available data implicate various proteins participating in mismatch repair, nucleotide excision repair, base excision repair, homologous recombination, recognize non-B DNA secondary structures in trying to avoid "so called" structure-directed mutagenesis.

As discussed previously, DNA structures can often induce DNA mutations. This DNA structure mediated mutagenesis may be because of the following reasons: the abnormal positioning of the bases and sugar in non-B DNA conformations, which impact the function of some DNA repair proteins on damaged DNA. For example, alkylating damage such as *N7*-methylguanine or *O6*-methylguanine is not repaired as efficiently in Z-DNA as it is in B-DNA. Alternatively, forming DNA secondary structures near DNA damage sites might influence the damage repair processing, depending on the types of damages, the environments, and the nature of the secondary structures (Pfohl-Leszkowicz et al., 1983; Boiteux et al., 1985).

#### **5.1 MMR proteins**

It has long been studied that MMR deficiency is associated with microsatellite sequence instability and human disease. For example, the instability of TNRs and AT-rich minisatellites is associated with their capacity of adopting unusual secondary structures, such as hairpins or DNA triplexes. This feature is common to different types of repeated DNA. Therefore, repeat instability is dependent on MMR in mice and yeast, consistent with the observation that sequences at repetitive DNA sites form short hairpins or small loops that are targets of the Msh2–Msh6 MMR (Modrich, 2006).

MMR proteins bind to non-B DNA secondary structures mainly through its capacity of recognizing mismatched base pairs. It has been found that MMR binds mismatches in a CNG triplet repeats hairpin stem. Although the MSH2–MSH3 complex of MMR also binds perfect hairpin formed by inverted repeats (lacking mismatched regions), affinity is low, suggesting that mismatches are important for the MMR protein binding (Kantelinen et al., 2008). In addition, MutS has also been reported to bind parallel G4 DNA in humans (Fry, 2007).

#### **5.2 NER and HR proteins**

412 DNA Repair

FRAXE and FRAXF) and two nonfolate sensitive (FRA10B and FRA16B) fragile sites have been molecularly characterized. Interestingly, almost all these fragile sites are found to have expanded DNA repeats resulting from mutation involving the normally occurring polymorphic CCG/CGG trinucleotide repeats and AT-rich minisatellite repeats

The expanded repeats were also demonstrated to have the potentials, at least under certain circumstances, to form stable secondary non-B DNA structures, including intrastrand hairpins, slipped strand DNA or tetrahelical structures, or to present flexible repeat sequences. Both of which are expected to affect the replication. In addition, these DNA sequences are also found to decrease the efficiency of nucleosome assembly, resulting in decondensation defects seen as fragile sites (Wang & Griffith, 1996; Freudenreich, 2007).

A numerous proteins that interact with non-B DNA secondary structures have been characterized recently. These proteins may also be called as DNA structure-specific proteins, such as Rad1, Rad2, Rad10, Msh2, Msh3, BLM, WRN and Sgs1 (Bhattacharyya & Lahue, 2004; Nag & Cavallo, 2007; Kantelinen et al., 2010; Pichierri et al., 2011). These DNA structure-specific proteins can be further classified by function into several distinct groups, depending on their possible effects on the formation/ stability of non-B DNA structure. Some of the binding proteins may increase the stability of the bound non-B DNA secondary structures; and some may promote forming non-B DNA secondary structures; or destabilize non-B DNA secondary structures. Indeed, the available data implicate various proteins participating in mismatch repair, nucleotide excision repair, base excision repair, homologous recombination, recognize non-B DNA secondary structures in trying to avoid

As discussed previously, DNA structures can often induce DNA mutations. This DNA structure mediated mutagenesis may be because of the following reasons: the abnormal positioning of the bases and sugar in non-B DNA conformations, which impact the function of some DNA repair proteins on damaged DNA. For example, alkylating damage such as *N7*-methylguanine or *O6*-methylguanine is not repaired as efficiently in Z-DNA as it is in B-DNA. Alternatively, forming DNA secondary structures near DNA damage sites might influence the damage repair processing, depending on the types of damages, the environments, and the nature of the secondary structures (Pfohl-Leszkowicz et al., 1983;

It has long been studied that MMR deficiency is associated with microsatellite sequence instability and human disease. For example, the instability of TNRs and AT-rich minisatellites is associated with their capacity of adopting unusual secondary structures, such as hairpins or DNA triplexes. This feature is common to different types of repeated DNA. Therefore, repeat instability is dependent on MMR in mice and yeast, consistent with the observation that sequences at repetitive DNA sites form short hairpins or small loops

MMR proteins bind to non-B DNA secondary structures mainly through its capacity of recognizing mismatched base pairs. It has been found that MMR binds mismatches in a CNG triplet repeats hairpin stem. Although the MSH2–MSH3 complex of MMR also

**5. Genes and gene products that are involved in abnormal folding** 

(Balakumaran et al., 2000; Voineagu et al., 2009).

"so called" structure-directed mutagenesis.

that are targets of the Msh2–Msh6 MMR (Modrich, 2006).

Boiteux et al., 1985).

**5.1 MMR proteins** 

NER proteins, such as the UvrB and UvrC in *E.coli*, and the XPA, XPG, XPC in eukaryotes and homologous recombination proteins, such as RecA, HsRad51, were found to be involved in H-DNA mediated repair and recombination (Bacolla et al., 2001). UvrB and UvrC may preferentially recognize the helical distortions, while RecA recognizing single stranded DNA region in an H-DNA.

#### **5.3 Helicases and junction resolvases**

Proteins that preferentially catalyze the unwinding of DNA non-B DNA secondary structures are DNA helicases in ATP-hydrolysis dependent manner. Helicases are DNA unwinding enzymes that preferentially melt some of the non-B DNA structures. The selectivity of helicases on non-B DNA secondary structures has been identified in simian virus 40 (SV40), yeast and human cells. The most studied helicases are members of RecQ family, whose roles are found in a broad range of organisms from *E. coli* RecQ to humans WRN, BLM and RecQL4 (Mohaghegh et al., 2001; Bachrati & Hickson, 2003; Cobb & Bjergbaek, 2006; Masai, 2011). All the non-B DNA secondary structure unwinding helicases act catalytically and all require for their hydrolysis of nucleotide triphosphate, normally ATP, and the presence of Mg2+ ions. For example, G-quadruplex DNA substrates are unwound by RecQ helicase with a 3'→5' polarity and need the tetraplex to hold a short 3' single-stranded tail that serves as a "loading dock" for these enzymes (Jain et al., 2010). It should be emphasized, however, that none of the described helicases unwinds tetraplex DNA only and all the enzymes are also able to unfold, although at a lower efficiency, other DNA structures such as duplex DNA, Holliday junctions or triplex. Recently, DHX9 helicase from human cells was found to co-immunoprecipitate with triplex DNA, suggesting a role in maintaining genome stability (Jain et al., 2010). DHX9 displaced the third strand from a specific triplex DNA and catalyzed the unwinding with a 3' to 5' polarity for the displaced third strand ((Jain et al., 2010).

#### **5.3.1 RecQ helicases BLM, WRN, RECQL4 and Sgs1**

RecQ helicases are a group of DNA helicases that are conserved from bacteria to man (Bachrati & Hickson, 2003). *Rec*Q helicase is named after the *rec*Q gene of *Escherichia coli* and has the activity of unwinding DNA in the 3'–5' direction in relation to the DNA strand in which the enzyme is bound (Mohaghegh et al., 2001). There are at least five homologues in humans, three of which are associated with genetic diseases. The yeast homologue of RecQ is Sgs1, whose function was found to be similar to most of the members in the RecQ family (Bachrati & Hickson, 2003; Cejka & Kowalczykowski, 2010; Masai, 2011).

It has been reported that, without a functional RecQ helicase, DNA replication does not advance normally. In humans, lacking of WRN or BLM protein accumulates aberrant replication intermediates (Harrigan et al., 2003; Cheok et al., 2005), this may allow for certain non-B DNA structure forming (Mohaghegh et al., 2001; Bacolla et al., 2011). Therefore, it is not surprising to see that more and more reports are going to be published

The Gratuitous Repair on Undamaged DNA Misfold 415

1996, 2004; Connelly et al., 1998, 1999) and its eukaryotic homologue of Mre11-Rad50 (Paull

It is now known that influences of repetitive DNA sequences on genomic instabilities were often attributable to forming non-B DNA secondary structures *in vivo*. Once a non-B DNA structure is stable, which will interfere with DNA replication, repair and/ or transcription *in vivo*, resulting in unstable genome. These deleterious non-B DNA secondary structures have already been found to form in *E.coli*, such as the large hairpin formed by the long palindrome DNA sequences (Leach, 1994). The stable hairpin can be cleaved by SbcCD, leading to forming DNA double strand breaks, and then be repaired by using homologous

Long palindrome sequences are significantly more stable in nuclease-deficient (SbcCD) strains of *E. coli* than in wild-type strains. The SbcCD protein complex is a member of the structural maintenance of chromosomes (SMCs) family found in bacteriophage, bacteria, yeast, *Drosophila*, mouse, and human. SbcCD has both 3'–5' exonuclease activity on doublestranded DNA and endonuclease activity on single-stranded DNA (Connelly et al., 1999). *In vitro*, it can recognize and bind hairpin structures and cleave at the loop, 5' immediately next

Further degradation of the hairpin cleavage products can occur by the ATP-dependent double-stranded DNA exonuclease activity of the SbcCD protein complex. This structurespecific endonuclease activity does not need a 3' or 5' terminus (Connelly & Leach, 1992,

Rad50 and Mre11 are the eukaryotic homologues of SbcCD that have not been shown to bind hairpin/cruciform directly. Mre11 and Rad50, forming complex with Nbs1 (in human cells) or Xrs2 (in yeast), show a hairpin structure cleaving activity i*n vitro*. And which participate in processing double strand breaks *in vivo* by homologous recombination or nonhomologous end-joining (Paull & Gellert, 1998, 2000; Sonoda et al., 2006; Delmas et al., 2009). In hairpin cleavage, MRN/ MRX interacts with BRCA1 which preferentially binds four-way branched DNA, similar to cruciforms. Mre11 shows an incision activity at hairpin/ cruciform, and acts as a selective endonuclease in yeast to bind to G4 DNA or to

Besides the DNA structure specific nucleases such as SbcCD and its eukaryotic homologue Mre11-Rad50-Nbs1 (Xrs1), many other DNA structure-specific DNA nucleases have also been determined. These nucleases recognize and cleave the non-B DNA structures or even the DNA sequences that have non-B DNA secondary structures adopted, playing important roles in various DNA transactions including DNA replication, repair and recombination. For example, Rad1-Rad10 (XPF or ERCC1) has shown to cleave branched intermediates/ Flapped DNA in repair (Li et al., 2008; Muñoz et al., 2009). And Rad2 family of nucleases, such as human XPG (Class I), FEN1 (Class II), and HEX1/ hEXO1 (Class III), have shown both substrate specific 5' to 3' exonuclease activity and endonuclease activity in repair, recombination, and/ or replication. Among them, Rad2 domain of human exonuclease 1

& Gellert, 1998, 2000; Sonoda et al., 2006; Carter et al., 2007; Delmas et al., 2009).

recombination (Connelly & Leach, 1996,; Connelly et al., 1992, 1998, 1999).

**5.6.2 Mre11-Rad50-Nbs1 (MRN) / Mre11-Rad50-Xrs2 (MRX)** 

G'2 quadruplex DNA and cleaves the G4 DNA.

**5.6.1 SbcCD** 

to the loop/ stem junction.

**5.6.3 other nucleases** 

1996; Connelly et al., 1998, 1999).

which specify the important roles of RecQ in resolving the non-B DNA structures, including those G4-DNA (Kamath-Loeb et al., 2001; Fry & loeb, 1999). Similarly the large T antigen and Dna2 helicase/ exonuclease have also been found to unwind the Gtetraduplex (Masuda-Sasa et al., 2008).

#### **5.3.2 Junction resolvases**

A cruciform is similar in appearance to a recombination intermediate, a four-way Holliday junction. Therefore, Holliday junction resolvases, RuvABC in prokaryotes, or Mus81, Sgs1 and Sgs2 in yeast might also have activity on cruciforms formed at inverted repeats (Cejka & Kowalczykowski, 2010; Lilley, 2010; Ashton et al., 2011; Mankouri et al., 2011),.

#### **5.4 Topoisomerase**

Non-B DNA structures can be substrates for DNA topoisomerase I and II (Howard et al., 1993; Froelich-Ammon et al., 1994). It has shown that DNA topoisomerase II binds and cleaves hairpins (e.g., hairpin formed at a negatively supercoiled 52-bp palindromic sequence in the human β-globin gene), but not cruciforms. DNA topoisomerase II cleavage sites near human immunodeficiency virus integration sites in the human genome consist of Z-DNA forming sequences and other repetitive sequence (Howard et al., 1993); in contrast, DNA topoisomerase I promotes forming parallel G4 DNA in humans. Similarly RAP1, Hop1 in yeast, and Thrombin in humans are also found to promote form of G4 DNA.

#### **5.5 Single strand binding protein (SSB/RPA)**

RPA–ssDNA serves as intermediate in many DNA repair processes. For example, ssDNA-RPA can be made through nuclease and helicase actions in repair of UV-induced thymine dimers by nucleotide excision repair, and in a replication fork where DNA polymerase is paused but without pausing DNA helicase accompanied. RPA may prevent or destabilize a non-B DNA structure formation. For example, RPA in humans has been found to destabilize a G'4 DNA (Fig. 1). As for a triplex, the polypyrimidine strands are preferred to bind with RPA, which will then form complex with XPA, XPC-hHR23B (Vasquez et al., 2002; Thomas et al., 2005). In mammalian cells, RPA binds 50-fold more strongly to pyrimidines than to purines, therefore, makes the polypyrimidine strand single-stranded in an intramolecular triplex structure at neutral pH. Moreover, persistent RPA binding may lead to RPA hyper-phosphorylation that triggers repair reactions (Thomas et al., 2005). In addition, RPA-ssDNA and an ssDNA–dsDNA junction can also act as initial signals for cells response to DNA damages, which activates the ATR pathway (Ball et al., 2004; Choi et al., 2010) .

#### **5.6 DNA structure-specific nucleases**

Proteins consist of nucleases that specifically cleave DNA next to or within a non-B DNA secondary structures have been well studied. The earliest protein having such functions was identified in *Saccharomyces cerevisiae*, the gene *KEM*1 (also called SEP1, DST2, XRN1 and RAR5) (Liu et al., 1994, 1995). *KEM*1 was initially characterized as a telomere binding protein, and later, it was found to cleave DNA that includes a four-stranded G4 domain but show low or no nucleolytic activity toward single- or double-stranded DNA substrates. Other well-known DNA structure specific nucleases are SbcCD (Connelly & Leach, 1992, 1996, 2004; Connelly et al., 1998, 1999) and its eukaryotic homologue of Mre11-Rad50 (Paull & Gellert, 1998, 2000; Sonoda et al., 2006; Carter et al., 2007; Delmas et al., 2009).

#### **5.6.1 SbcCD**

414 DNA Repair

which specify the important roles of RecQ in resolving the non-B DNA structures, including those G4-DNA (Kamath-Loeb et al., 2001; Fry & loeb, 1999). Similarly the large T antigen and Dna2 helicase/ exonuclease have also been found to unwind the G-

A cruciform is similar in appearance to a recombination intermediate, a four-way Holliday junction. Therefore, Holliday junction resolvases, RuvABC in prokaryotes, or Mus81, Sgs1 and Sgs2 in yeast might also have activity on cruciforms formed at inverted repeats (Cejka

Non-B DNA structures can be substrates for DNA topoisomerase I and II (Howard et al., 1993; Froelich-Ammon et al., 1994). It has shown that DNA topoisomerase II binds and cleaves hairpins (e.g., hairpin formed at a negatively supercoiled 52-bp palindromic sequence in the human β-globin gene), but not cruciforms. DNA topoisomerase II cleavage sites near human immunodeficiency virus integration sites in the human genome consist of Z-DNA forming sequences and other repetitive sequence (Howard et al., 1993); in contrast, DNA topoisomerase I promotes forming parallel G4 DNA in humans. Similarly RAP1, Hop1 in yeast, and Thrombin in humans are also found

RPA–ssDNA serves as intermediate in many DNA repair processes. For example, ssDNA-RPA can be made through nuclease and helicase actions in repair of UV-induced thymine dimers by nucleotide excision repair, and in a replication fork where DNA polymerase is paused but without pausing DNA helicase accompanied. RPA may prevent or destabilize a non-B DNA structure formation. For example, RPA in humans has been found to destabilize a G'4 DNA (Fig. 1). As for a triplex, the polypyrimidine strands are preferred to bind with RPA, which will then form complex with XPA, XPC-hHR23B (Vasquez et al., 2002; Thomas et al., 2005). In mammalian cells, RPA binds 50-fold more strongly to pyrimidines than to purines, therefore, makes the polypyrimidine strand single-stranded in an intramolecular triplex structure at neutral pH. Moreover, persistent RPA binding may lead to RPA hyper-phosphorylation that triggers repair reactions (Thomas et al., 2005). In addition, RPA-ssDNA and an ssDNA–dsDNA junction can also act as initial signals for cells response to DNA damages, which activates the ATR pathway (Ball et al.,

Proteins consist of nucleases that specifically cleave DNA next to or within a non-B DNA secondary structures have been well studied. The earliest protein having such functions was identified in *Saccharomyces cerevisiae*, the gene *KEM*1 (also called SEP1, DST2, XRN1 and RAR5) (Liu et al., 1994, 1995). *KEM*1 was initially characterized as a telomere binding protein, and later, it was found to cleave DNA that includes a four-stranded G4 domain but show low or no nucleolytic activity toward single- or double-stranded DNA substrates. Other well-known DNA structure specific nucleases are SbcCD (Connelly & Leach, 1992,

& Kowalczykowski, 2010; Lilley, 2010; Ashton et al., 2011; Mankouri et al., 2011),.

tetraduplex (Masuda-Sasa et al., 2008).

**5.3.2 Junction resolvases** 

**5.4 Topoisomerase** 

to promote form of G4 DNA.

2004; Choi et al., 2010) .

**5.6 DNA structure-specific nucleases** 

**5.5 Single strand binding protein (SSB/RPA)** 

It is now known that influences of repetitive DNA sequences on genomic instabilities were often attributable to forming non-B DNA secondary structures *in vivo*. Once a non-B DNA structure is stable, which will interfere with DNA replication, repair and/ or transcription *in vivo*, resulting in unstable genome. These deleterious non-B DNA secondary structures have already been found to form in *E.coli*, such as the large hairpin formed by the long palindrome DNA sequences (Leach, 1994). The stable hairpin can be cleaved by SbcCD, leading to forming DNA double strand breaks, and then be repaired by using homologous recombination (Connelly & Leach, 1996,; Connelly et al., 1992, 1998, 1999).

Long palindrome sequences are significantly more stable in nuclease-deficient (SbcCD) strains of *E. coli* than in wild-type strains. The SbcCD protein complex is a member of the structural maintenance of chromosomes (SMCs) family found in bacteriophage, bacteria, yeast, *Drosophila*, mouse, and human. SbcCD has both 3'–5' exonuclease activity on doublestranded DNA and endonuclease activity on single-stranded DNA (Connelly et al., 1999). *In vitro*, it can recognize and bind hairpin structures and cleave at the loop, 5' immediately next to the loop/ stem junction.

Further degradation of the hairpin cleavage products can occur by the ATP-dependent double-stranded DNA exonuclease activity of the SbcCD protein complex. This structurespecific endonuclease activity does not need a 3' or 5' terminus (Connelly & Leach, 1992, 1996; Connelly et al., 1998, 1999).

#### **5.6.2 Mre11-Rad50-Nbs1 (MRN) / Mre11-Rad50-Xrs2 (MRX)**

Rad50 and Mre11 are the eukaryotic homologues of SbcCD that have not been shown to bind hairpin/cruciform directly. Mre11 and Rad50, forming complex with Nbs1 (in human cells) or Xrs2 (in yeast), show a hairpin structure cleaving activity i*n vitro*. And which participate in processing double strand breaks *in vivo* by homologous recombination or nonhomologous end-joining (Paull & Gellert, 1998, 2000; Sonoda et al., 2006; Delmas et al., 2009). In hairpin cleavage, MRN/ MRX interacts with BRCA1 which preferentially binds four-way branched DNA, similar to cruciforms. Mre11 shows an incision activity at hairpin/ cruciform, and acts as a selective endonuclease in yeast to bind to G4 DNA or to G'2 quadruplex DNA and cleaves the G4 DNA.

#### **5.6.3 other nucleases**

Besides the DNA structure specific nucleases such as SbcCD and its eukaryotic homologue Mre11-Rad50-Nbs1 (Xrs1), many other DNA structure-specific DNA nucleases have also been determined. These nucleases recognize and cleave the non-B DNA structures or even the DNA sequences that have non-B DNA secondary structures adopted, playing important roles in various DNA transactions including DNA replication, repair and recombination. For example, Rad1-Rad10 (XPF or ERCC1) has shown to cleave branched intermediates/ Flapped DNA in repair (Li et al., 2008; Muñoz et al., 2009). And Rad2 family of nucleases, such as human XPG (Class I), FEN1 (Class II), and HEX1/ hEXO1 (Class III), have shown both substrate specific 5' to 3' exonuclease activity and endonuclease activity in repair, recombination, and/ or replication. Among them, Rad2 domain of human exonuclease 1

The Gratuitous Repair on Undamaged DNA Misfold 417

structures at CAG repeats, can be recognized by mismatch repair machinery (Yang, 2006). Msh2/ Msh3 complex in eukaryotic cells specifically binds CAG-hairpins, and the ATP-ase activity of the Msh2 / Msh3 complex can be altered by the binding. However, the repair is dependent on the number of loops/ bulges. A few of them may be repaired by MMR, but too many may not because of interfering MMR by multiple MutS binding, suggesting that repair on a particular non-B DNA conformation will be conditional, depending on locations and environments. Further, nucleotide excision repair (NER) proteins can bind intermolecular triplex, which are involved in the triplex mediated mutagenesis and recombination (Wang & Vasquez, 2006). In bacterial cells, NER proteins UvrB and UvrC were responsible for triplex-induced cell growth retardation. Given the likenesses of the intermolecular and intramolecular triplex, it is possible for NER contributing to the H-DNA-

Apart from initiating an individual pathway of DNA repair, some non-B DNA structures can also be recognized by more than one repair proteins working in different repair

Competition of repair proteins on a non-B DNA structure may be needed for a cooperative repair, setting up a cooperative new DNA repair to repair; in contrast, the competition may sometimes be internecine, failing in repair of either pathway. Under this circumstance, the repair on a non-B DNA structure by the compositing actions of the DNA structural recognition proteins would be compromised. For example, a stable hairpin may be needed for starting DNA replication, but such a stable hairpin would also be repaired by SbcCD or Mre11-Rad50, making a DNA break for homologous recombination to repair (Leach, 1994). Similarly, unwound DNA or small DNA loops may also be needed for DNA replication or for transcription. While they may also be recognized and bound by repair proteins, such as DNA mismatch and nucleotide-excision repair proteins, recombination proteins, instead of

A good demonstration for the internecine competition between multiple repair proteins was the foldings of TGG and AGG repeats in the lagging strand template in a replication fork (Pan & Leach, 2000; Pan et al., to be published results). TGG, AGG and CGG repeats are a group of NGG repeats which own significant potential of folding into non-B DNA secondary structures (Usdin, 1998; Pan & Leach, 2000). AGG repeats formed triplex (Suda et al., 1996; Mishima et al., 1996, 1997), homoduplex (Suda et al.,1995), tetra-duplex (Yang & Hurley, 2006), and a special G-quadruplex, known as tetrad:heptad:heptad:tetrad ((G:H:H:G) or (T:H:H:T)) (Matsugami et al., 2001a, 2001b, 2002, 2003), while CGG and TGG repeats formed pseudo-hairpin and tetra-duplex, respectively (Darlow & Leach, 1998;

It was shown by Pan and Leach, that replication of TGG repeats in the lagging strand template experiences repeats misfolding, during which both MutS and SbcCD were found to affect the later processing by homologous recombination. Binding MutS to the non-B DNA structure formed by TGG repeats may stabilize the structure, while hindering SbcCD cleaving the structure. Interestingly, the roles of MutS and SbcCD in this case seemed complex, since TGG repeats can replicate either without MutS or SbcCD, suggesting that they also play same role in stabilizing the TGG repeat structure. In contrast, similar sized AGG repeats was found also to fold into non-B DNA structures in a similar lagging strand template of a replication fork.

pathways, resulting in competitions between proteins on same DNA structures.

induced mutagenesis and recombination**.**

SSB/ RPA (Kirkpatrick & Petes, 1997).

Usdin, 1998; Pan & Leach, 2000; Zemánek et al., 2005).

**6.2 Competitions among multiple repair proteins** 

(HEX1-N2) has high activity on single- and double-stranded DNA substrates as well as a flap structure-specific endonuclease activity but does not have specific endonuclease activity at 10-base pair bubble-like structures, G:T mismatches, or uracil residues (Lee & Wilson, 1999). FEN-1, a structure-specific endonuclease is essential for DNA replication and repair, removes RNA and DNA 5' flaps (Tsutakawa et al., 2011). FEN-1 was thought to be involved in hairpin structure processing, and was found to be involved in CNG triplet repeat stability in the lagging strand template (Spiro et al., 1999; Singh et al., 2007). Similarly, Deletions in PCNA, RPA, and the Bloom protein (BLM), a 3'-5' helicase can also increase CNG repeat expansion or deletion, which reportedly interacts with FEN-1 in cleaving flaps. Recently NucS from *Pyrococcus abyssi* was found to be the equivalent of FEN-1 that cleaves the flapped DNA in Okazaki frangment processing in the lagging strand DNA replication (Ren et al., 2009; Creze et al., 2011).

SLX1 and SLX4 are other structure-specific endonucleases acting as heteromer that cleave branched DNA substrates, particularly simple-Y, 5'-flap, or replication fork structures. It also cleaves the strand bearing the 5' nonhomologous arm at the branch junction and generates ligatable nicked products from 5'-flap or replication fork substrates (Fricke & Brill, 2003).

RAGs is a complex consisting of RAG1, RAG2, and HMGB1 that cleaves 3' overhangs in multiple locations at the duplex/ single-stranded transitions (Fugmann, 2001). RAGs complex is able to cleave different non-B DNA structures such as symmetric bubbles, heterologous loops and proposed triplex DNA. For example, RAGs complex cleaves the *bcl*-2 Mbr at 3' overhang and non-B DNA structures under physiological buffer conditions (Adachi & Tsujimoto, 1990; Fugmann, 2001; Raghavan et al., 2004, 2005).

In addition, many single-strand specific nucleases, like S1, P1, and mung bean nucleases, are also efficient at cleaving single stranded DNA in the non-B DNA structures, though at low pH. Since some non-B DNA structures, e.g. H-DNA and G4 DNA disclose an unstructured single-stranded DNA region, which therefore serve as substrates for those single-strand specific nucleases. Recently, a more specific nuclease that cuts single-stranded DNA 5' to a G4 domain was isolated from human cells. This enzyme, initially named G quartet nuclease 1 (GQN1) is thought to be involved in immunoglobulin heavy chain class switch recombination in B cells, does not digest single- or double-stranded DNA, Holliday junctions or tetraplex RNA. It specifically cuts single-stranded DNA located few nucleotides 5' to either G'2 or G4 domains (Sun et al., 2001). However, GQN1 cannot incise tetraplex RNA, showing a significant difference from a mouse cytoplasmic exoribonuclease (mXRN1p) which cleaves G4 RNA (Bashkirov et al., 1997).
