**1. Introduction**

In natural genomes, tens of DNA structure analogous to B-DNA conformation have been found to be formed through compiling weak interacting forces, including hydrophobic, Van der Waals and hydrogen-bond accepters and donors and inductions of certain agents (Rao et al., 2010). Of which, hairpins, cruciform junctions, Z-DNA, G-tetrads/quadruplexes, helices, loops and bulges are most studied so far.

Since the late 1950s, the roles of the non-B DNA structures in biological functions have begun to be enlightened (Watson & Crick, 1953; Wilkins et al., 1953a, 1953b; Svozil et al., 2008). Piling up results suggest that non-B conformations, such as cruciforms, triplexes, tetraplexes, can interact with proteins involving DNA metabolism, including replication, gene expression and recombination, or influence nucleosomes and other supramolecular structures formation (Wang & Griffith, 1996; Shimizu et al. 2000). However, non-B DNA secondary structures may also be treated as DNA mis-folds by DNA repair systems. Because of which the non-B DNA secondary structures can serve as end points for several types of genome rearrangements seen in some diseases (Wang & Vasquez, 2006; Wells, 2007; Bacolla & Wells, 2009; Chen et al., 2010).

#### **2. DNA sequences which are susceptible to abnormal folding**

The non-B DNA structure forming sequences are found to be rich in genomes from divergent organisms (Table 1) (Cox & Mirkin, 1997; Svozil et al., 2008; Cerz et al., 2011). For example, nearly half of the human genome consists of repetitive sequences, which can be arranged as inverted, direct tandem, and homopurine–homopyrimidine mirror repeats.

The Gratuitous Repair on Undamaged DNA Misfold 403

Where "a" is the number of G residues in each short G-tract, which are usually directly involved in G-tetrad. Xb, Xc and Xd can be any combination of residues, including G,

**G3-5NLoop1G3-5NLoop2G3-5NLoop3G3-5** 

In 1979, DNA sequence of d (CpGpCpGpCpG) was crystallized and found to adopt a lefthanded conformation (the Z-DNA conformation) with altered helical parameters relative to right-handed B-form (Rich et al., 1983; Mirkin, 2008). Later, it was realized that DNA sequences with alternating pyrimidines and purines, such as (CA:TG)n and (CG:CG)n, may wind a double helix into a left-handed zigzag form (Z-DNA). Z-DNA is thinner (18 Å) than B-DNA (20 Å), due to its bases shifting to the outskirts of a double helix. It has only one

In general, five or more tandem repeats, each comprising an alternating pyrimidine–purine dinucleotide motif, in which the pattern YG is preserved on at least one of the DNA strands

A subset of mirror repeat sequences comprise only purines (A and G, R) or pyrimidines (C and T, Y) on the same strand of a double stranded DNA, separated by few (0~8) nucleotides. These DNA motifs can adopt various intramolecular three-stranded analogous (triplex, H-DNA) stabilized by Hoogsteen hydrogen bonds (Casey & Glazer, 2001; Mukherjee &

For a sequence requirement in forming triplex DNA is thought to be that only R· Ycontaining mirror repeats can yield A: A\*T and G: G \* C triads. When the hydrogen bonds in the A· T and G· C base pairs are formed in canonical B-form DNA, several hydrogen bond forming groups in the bases can still be free unpaired. Each purine base has two hydrogen bond forming groups on the edges that are posed in the major groove. These unpaired bases can be used to form base triads that are unit blocks of triple-stranded DNA (see the

In theory, a homopurine-homopyrimidine duplex can form triplexes of either purine (Pu) motif (purine, antiparallel motif) or pyrimidine (Py) motif (pyrimidine, parallel motif). However, under physiological conditions, cytosine protonation is not favored, and CG\*G becomes therefore the most stable triad in a Pu motif. To form an intermolecular or intramolecular triplex, adjoining homopurine-homopyrimidine tracts of at least 10 base pairs are normally required for a duplex acceptor, since shorter than that the triplexes

A triplex may be mutagenic *in vivo*, as double-strand breaks may occur in or near the triplex site, which if with DNA replication, recombinational repair may produce triplex mediated

Triplex can also be formed in RNA transcription, although it is a kinetically unfavored compared to duplex annealing. However triplex RNA and DNA are stable, showing halflives on the order of days, which may involve the molecular mechanism of Friedreich's

formed can be unstable under physiological conditions (Fox & Brown, 2011).

Where NLoop1-3 are loops of unknown length, within the limits 1<NLoop1-3 <7 nt.

The potential quadruplex sequences were therefore restricted to:

deep, narrow groove equivalent to the minor groove in B-DNA.

forming the loops.

**2.3 Z-DNA motif** 

can adopt Z-DNA.

**2.4 Triplex motif** 

Vasquez 2011).

following explanation for detail).

ataxia (FRDA) (Pan et al., 2009).

mutagenesis (Chan et al., 1999; Faruqi et al., 2000).

These repeat sequences are major contributors to forming non-B DNA structures, although the unusual structures can also be formed by various other sequences that are not repeating tracts (Svozil et al., 2008; Cerz et al., 2011). Repeat DNA sequences may adopt either orthodox right-handed B-DNA or non-B DNA conformations at specific sequence motifs as a function of negative supercoil density, created by transcription, protein binding, and other reasons. For example, inverted repeats can form B conformation in cells, while also forming hairpin structures, slipped structures with looped-out bases, four-stranded G-quartet structures, left-handed Z-DNA and intramolecular triplex DNA structures (H-DNA) depending on the base compositions and the arrangements.


Table 1. Non-B DNA motifs in different mammalian genomes (Cer et al., 2011)

#### **2.1 Cruciform motif**

DNA sequence that reads the same from 5' to 3' in either strand of a duplex is called as inverted repeat or palindrome DNA sequence. This subset of inverted repeat sequences may fold-back and form intramolecular, antiparallel, double helices stabilized by Watson–Crick hydrogen bonds (van Holde & Zlatanova, 1994; Courey, 1999; Smith, 2008).

As a whole, the interstrand hydrogen bonds in the inverted repeats must be broken, and intrastrand hydrogen bonds form between the complementary bases in each single strand, forming two hairpin-like arms with small (3-4 unpaired bases) loop at their tips. The structure looks similar to a four-way junction, of which the nucleobases in and around the junction are fully involved in base pairing.

#### **2.2 Potential quadruplex sequences**

Potential quadruplex sequences are usually G-rich, such as the DNA sequences in eukaryotic telomeres, and in non-telomeric genomic DNA, like the nuclease-hypersensitive promoter regions (Burge et al., 2006; Rawal et al., 2006; Qin & Hurley, 2008; Sannohe & Sugiyama, 2010). To form a quadruplex, the DNA sequences have to form overlapping four G-blocks. Each contains the same number (n) of G bases (n vary from 3 to 7), on each strand, and/ or separated by 1–7 nt (Burge et al., 2006). The potential unimolecular G-quadruplex forming sequences (i.e. intramolecular) can be expressed as follows (Burge et al., 2006):

#### **GaXbGaXcGaXdGa**

Where "a" is the number of G residues in each short G-tract, which are usually directly involved in G-tetrad. Xb, Xc and Xd can be any combination of residues, including G, forming the loops.

The potential quadruplex sequences were therefore restricted to:

### **G3-5NLoop1G3-5NLoop2G3-5NLoop3G3-5**

Where NLoop1-3 are loops of unknown length, within the limits 1<NLoop1-3 <7 nt.

### **2.3 Z-DNA motif**

402 DNA Repair

These repeat sequences are major contributors to forming non-B DNA structures, although the unusual structures can also be formed by various other sequences that are not repeating tracts (Svozil et al., 2008; Cerz et al., 2011). Repeat DNA sequences may adopt either orthodox right-handed B-DNA or non-B DNA conformations at specific sequence motifs as a function of negative supercoil density, created by transcription, protein binding, and other reasons. For example, inverted repeats can form B conformation in cells, while also forming hairpin structures, slipped structures with looped-out bases, four-stranded G-quartet structures, left-handed Z-DNA and intramolecular triplex DNA structures (H-DNA)

Feature human Chimpanzee Macaque Dog Mouse Cruciform 197910 190736 128334 172032 188532 Slipped Motif 347969 314516 305285 404750 695150 Triplex Motif 179623 105640 140580 303385 565479 Z-DNA Motif 294320 278928 280982 261012 690276 G-Tetraduplex 374545 314171 298142 492535 559280 Direct repeats 871045 787335 765798 968955 1593107

repeats 1044533 998249 843889 814080 801242 Mirror Repeats 1651723 1485135 1455025 1849897 1651723

DNA sequence that reads the same from 5' to 3' in either strand of a duplex is called as inverted repeat or palindrome DNA sequence. This subset of inverted repeat sequences may fold-back and form intramolecular, antiparallel, double helices stabilized by Watson–Crick

As a whole, the interstrand hydrogen bonds in the inverted repeats must be broken, and intrastrand hydrogen bonds form between the complementary bases in each single strand, forming two hairpin-like arms with small (3-4 unpaired bases) loop at their tips. The structure looks similar to a four-way junction, of which the nucleobases in and around the

Potential quadruplex sequences are usually G-rich, such as the DNA sequences in eukaryotic telomeres, and in non-telomeric genomic DNA, like the nuclease-hypersensitive promoter regions (Burge et al., 2006; Rawal et al., 2006; Qin & Hurley, 2008; Sannohe & Sugiyama, 2010). To form a quadruplex, the DNA sequences have to form overlapping four G-blocks. Each contains the same number (n) of G bases (n vary from 3 to 7), on each strand, and/ or separated by 1–7 nt (Burge et al., 2006). The potential unimolecular G-quadruplex forming sequences (i.e. intramolecular) can be expressed as follows (Burge et al., 2006):

**GaXbGaXcGaXdGa** 

Table 1. Non-B DNA motifs in different mammalian genomes (Cer et al., 2011)

hydrogen bonds (van Holde & Zlatanova, 1994; Courey, 1999; Smith, 2008).

depending on the base compositions and the arrangements.

Structural

Inverted

**2.1 Cruciform motif** 

junction are fully involved in base pairing.

**2.2 Potential quadruplex sequences** 

In 1979, DNA sequence of d (CpGpCpGpCpG) was crystallized and found to adopt a lefthanded conformation (the Z-DNA conformation) with altered helical parameters relative to right-handed B-form (Rich et al., 1983; Mirkin, 2008). Later, it was realized that DNA sequences with alternating pyrimidines and purines, such as (CA:TG)n and (CG:CG)n, may wind a double helix into a left-handed zigzag form (Z-DNA). Z-DNA is thinner (18 Å) than B-DNA (20 Å), due to its bases shifting to the outskirts of a double helix. It has only one deep, narrow groove equivalent to the minor groove in B-DNA.

In general, five or more tandem repeats, each comprising an alternating pyrimidine–purine dinucleotide motif, in which the pattern YG is preserved on at least one of the DNA strands can adopt Z-DNA.

#### **2.4 Triplex motif**

A subset of mirror repeat sequences comprise only purines (A and G, R) or pyrimidines (C and T, Y) on the same strand of a double stranded DNA, separated by few (0~8) nucleotides. These DNA motifs can adopt various intramolecular three-stranded analogous (triplex, H-DNA) stabilized by Hoogsteen hydrogen bonds (Casey & Glazer, 2001; Mukherjee & Vasquez 2011).

For a sequence requirement in forming triplex DNA is thought to be that only R· Ycontaining mirror repeats can yield A: A\*T and G: G \* C triads. When the hydrogen bonds in the A· T and G· C base pairs are formed in canonical B-form DNA, several hydrogen bond forming groups in the bases can still be free unpaired. Each purine base has two hydrogen bond forming groups on the edges that are posed in the major groove. These unpaired bases can be used to form base triads that are unit blocks of triple-stranded DNA (see the following explanation for detail).

In theory, a homopurine-homopyrimidine duplex can form triplexes of either purine (Pu) motif (purine, antiparallel motif) or pyrimidine (Py) motif (pyrimidine, parallel motif). However, under physiological conditions, cytosine protonation is not favored, and CG\*G becomes therefore the most stable triad in a Pu motif. To form an intermolecular or intramolecular triplex, adjoining homopurine-homopyrimidine tracts of at least 10 base pairs are normally required for a duplex acceptor, since shorter than that the triplexes formed can be unstable under physiological conditions (Fox & Brown, 2011).

A triplex may be mutagenic *in vivo*, as double-strand breaks may occur in or near the triplex site, which if with DNA replication, recombinational repair may produce triplex mediated mutagenesis (Chan et al., 1999; Faruqi et al., 2000).

Triplex can also be formed in RNA transcription, although it is a kinetically unfavored compared to duplex annealing. However triplex RNA and DNA are stable, showing halflives on the order of days, which may involve the molecular mechanism of Friedreich's ataxia (FRDA) (Pan et al., 2009).

The Gratuitous Repair on Undamaged DNA Misfold 405

AAT, ATT etc. can also adopt hairpin structures with mismatched base pairs in the stem

To form a hairpin/cruciform, DNA duplex needs to be unwound in replication, transcription, and/or DNA repair processing; affording single-stranded repeat sequences the opportunity to base pair with itself in an intramolecular fashion. The term of "cruciform" originates from forming two duplex arms, which adopts either an "open" form, allowing strand migration or a"stacked" (locked) form, where the helices stack on each other (Courey, 1999; Khuu et al., 2006; Lilley, 2010). In both cases, the overall conformation and the intraduplex angles behave like the Holliday junction recombination

Both inverted repeats and tandem arranged trinucleotide repeats were found to be mutagenic, causing genomic instability. Inverted repeats were initially found to cause deletions in *E. coli* (Sinden et al., 1991), and then were seen in humans as (8; 22) (q24.13; q11.21), and many types of t (11; 22) translocations. The breakpoints of these translocation mutations were localized at the center of AT-rich palindromic sequences on 11q23 and 22q11, respectively. So far, t (11; 22) is the only known recurrent, non-Robertsonian translocation in humans, in some cases leads to male infertility and recurrent abortion (Kurahashi et al., 2000, 2006, 2010; Kurahashi & Emanuel, 2001). Furthermore, deletions stimulated by a poly (R.Y) sequence from intron 21 of the polycystic kidney disease 1 gene (PKD1) have also been characterized ( Bacolla et al., 2001;Patel et al., 2004). And a long (CCTG-CAGG)n repeat in *E.coli* was also found to form cruciform (Pluciennik et al., 2002; Dere & Wells, 2006). Interestingly, cruciform-forming inverted repeats have mediated many of the microinversions in evolution that distinguish the human and chimpanzee genomes

In cells, DNA double strand breaks can be derived from cruciform, because hairpin/ cruciform are substrates for several structure-specific nucleases and/ or repair enzymes, such as SbcCD in *E.coli* and Mre11-Rad50 in eukaryotes. The actions of such enzymes make strand breaks, which may result in rearrangements or translocation of chromosomes (Smith, 2008). In addition, proteins working in nucleotide excisonal repair (NER) can also recognize the helical distortions in hairpin, therefore NER may recognize DNA hairpin to resolve the

intermediates (Fig.2A) (Courey, 1999; Khuu et al., 2006;; Lilley, 2010).

(McMurray, 1999; Trotta, et al., 2000).

Fig. 2. Hairpin/cruciform of DNA

(Kolb et al., 2009).

hairpin in the DNA.
