**3. Results**

#### **3.1 Selective viral DNA integration into the target sequence**

Figures 9 and 10 show the percentage of viral DNA integration into the target sequence or into the same length random sequences. Four types of the same length random sequences were used as controls. The horizontal blue line shows the percentage integration when uniform integration into the substrate DNA, including into the target DNA plus the circular plasmid, was thought to occur. These data indicate that the percentage of integration into the target sequence was significantly higher than that into random sequences. Also, when a target sequence was used in which the middle 5'-CA and 5'-GT nucleotides were deleted, the integration efficiency was significantly decreased. Thus, local nucleotide motifs within the target sequence affect integration efficiency.

The detailed reaction protocol that we used is as follows. First, 75 ng of the U5'-LTR cDNA sequence of HIV-1.; (+) 5'-TGT GTG CCC GTC TGT TGT GTG ACT CTG GTA ACT AGA GAT CCT CAG ACC TTT TTG GTA GTG TGG AAA ATC TCT AGC A-3' and (-) 5'-ACT GCT AGA GAT TTT CCA CAC TAC CAA AAA GGG TCT GAG GGA TCT CTA GTT ACC AGA GTC ACA CAA CAG ACG GGC ACA CA-3', was incubated with 50 ng recombinant HIV-1 integrase in 10 l of binding buffer for 1 h at 30°C. The binding buffer consisted of 1- 0.1 mM MnCl2, 80 mM glutamate potassium glutamate, 10 mM mercaptoethanol, 10%

Similarly, the 3'-LTR cDNA sequence (75 ng) of HIV-1.; (+) 5'-ACT GGA AGG GTT AAT TTA CTC CAA GCA AAG GCA AGA TAT CC TTG ATT TGT GGG TCT ATA ACA CAC AAG GCT ACT TCC CA-3' and (-) 5'-ACTG GGA AGT AGC CTT GTG TGT TAT AGA CCC ACA AAT CAA GGA TAT CTT GCC TTT GCT TGG AGT AAA TTA ACC CTT CCAGT-3', was incubated with the recombinant retroviral integrase. After incubation, the double-stranded (ds) 5'-LTR DNA was combined with the ds 3'-LTR DNA for 1 h at 30 °C, and the LTR DNA was then further incubated with the target DNA for 1 h at 30°C. As controls, ds 5'-LTR DNA and ds 3'-LTR DNA were also individually incubated with the target DNA. For control target DNAs we synthesized four random 144-bp sequences, which were designed by a random number generator, and we ligated these sequences into circular

In order to prevent non-specific reactions at the target DNA sequence, we ligated the target sequence DNA into circular plasmid DNA (Invitrogen pCR2.1 TOPO vector) and used this entire DNA as the target DNA for the assay (Fig. 6). The proportion of LTRs and target DNAs was optimized to prevent both non-specific reactions and integration due to an excess of LTRs. The DNA reacted in the buffer was purified using a QIA quick column (QIAGEN, GmbH, Germany). PCR amplification was then performed using retroviral primers: the HIV-1 U5'-LTR primer, 5'-GTG TGC CCG TCT GTT GTG TGA CTCTGG-3', or the HIV-1 U3'-LTR primer, 5'-CTG GGA AGT AGC CTT GTG TGT TAT AG-3', and a TOPO vector primer 5'-TCA CTC ATG GTT ATG GCA GC -3' whose first nucleotide corresponds to nucleotide position 2222 in the TOPO-pCR2.1 plasmid (Invitrogen, Carlsberg, CA). Amplicon copy number was quantified following identification of the

Figures 9 and 10 show the percentage of viral DNA integration into the target sequence or into the same length random sequences. Four types of the same length random sequences were used as controls. The horizontal blue line shows the percentage integration when uniform integration into the substrate DNA, including into the target DNA plus the circular plasmid, was thought to occur. These data indicate that the percentage of integration into the target sequence was significantly higher than that into random sequences. Also, when a target sequence was used in which the middle 5'-CA and 5'-GT nucleotides were deleted, the integration efficiency was significantly decreased. Thus, local nucleotide motifs within

DNA in the same manner as described below for the target DNA

**3.1 Selective viral DNA integration into the target sequence** 

**2.3 Reaction protocol** 

DMSO, and 35 mM MOPS (pH 7.2).

HIV-1-substrate DNA junction [14].

the target sequence affect integration efficiency.

**3. Results** 

Fig. 9. Integration into target sequence DNA vs. random sequence DNAs.

The percentage of PCR product copies derived from viral DNA that had integrated into the target sequence or into random sequences is plotted vs. the total number of PCR product copies, including the PCR products that were integrated into the remainder of the DNA sequence of the plasmid. The horizontal line shows the ratio of these PCR products when integration was thought to occur in a uniform manner in the 4-kb substrate DNA.

Fig. 10. A graph of the percentage integration into the target sequence or into random sequences.

The left arrows represents the percentage (~2.3%) of integration into target DNA to integration into control when integration was thought to occur in a uniform manner in the substrate DNA, the target sequence DNA plus the circular plasmid with which it was ligated. The integration efficiency was significantly decreased when CA and GT were removed from the middle region of the repeat. Thus, local nucleotide motifs affect integration efficiency (\*\*\*\**P* < 0.001).

#### **3.2** *In vitro* **integration site in the target repeat sequence DNA**

Figure 11 shows the *in vitro* integration site in the target repeat sequence DNA. The entire target sequence is shown in this figure. The vertical axis indicates the percentage of PCR amplicons derived from the integration of individual LTR units. Integration efficiency was significantly higher when both the 5'- and the 3'-LTR DNA were used than when either LTR DNA was used alone. The use of both 5'- and 3'-LTR DNA is one of the unique points of our protocol, since previous protocols used a single 3'-LTR DNA (Figs. 7 and 11).

Fig. 11. *In vitro* integration site in the target repeat sequence DNA.

The vertical axis indicates the percentage of the PCR amplicons derived from proviral DNA integrating into individual units. The entire target sequence is shown. The integration efficiency was significantly higher when both 5'- and 3'-LTR DNA were used rather than when a single LTR DNA was used. The use of both 5'- and 3'-LTR DNA is one of the unique points of our protocol. x, GTGGAGGGCAGT; y, ACTGCCCCCAC. (\*\*\* *P* < 0.001)

Interestingly, we found that the middle segment of the target sequence was more favorable for integration, even though the same sequence units were repeated in the target sequence. To explain this observation, we considered the possibility that a structural factor may contribute to selective integration into the middle segment. Thus, if a single strand of the target DNA focally appeared by rewinding of the target double strand DNA, a long hairpin or cruciform structure may form in the target sequence site. It is probable that, if the target sequence DNA is open, or rewound, then the top of such a secondary structure would be favorable for integration. DNA folding thermodynamic analysis was performed to determine secondary structure in the target DNA and a hairpin structure was indeed predicted by this analysis (Fig. 12).

ligated. The integration efficiency was significantly decreased when CA and GT were removed from the middle region of the repeat. Thus, local nucleotide motifs affect

Figure 11 shows the *in vitro* integration site in the target repeat sequence DNA. The entire target sequence is shown in this figure. The vertical axis indicates the percentage of PCR amplicons derived from the integration of individual LTR units. Integration efficiency was significantly higher when both the 5'- and the 3'-LTR DNA were used than when either LTR DNA was used alone. The use of both 5'- and 3'-LTR DNA is one of the unique points of our

The vertical axis indicates the percentage of the PCR amplicons derived from proviral DNA integrating into individual units. The entire target sequence is shown. The integration efficiency was significantly higher when both 5'- and 3'-LTR DNA were used rather than when a single LTR DNA was used. The use of both 5'- and 3'-LTR DNA is one of the unique

Interestingly, we found that the middle segment of the target sequence was more favorable for integration, even though the same sequence units were repeated in the target sequence. To explain this observation, we considered the possibility that a structural factor may contribute to selective integration into the middle segment. Thus, if a single strand of the target DNA focally appeared by rewinding of the target double strand DNA, a long hairpin or cruciform structure may form in the target sequence site. It is probable that, if the target sequence DNA is open, or rewound, then the top of such a secondary structure would be favorable for integration. DNA folding thermodynamic analysis was performed to determine secondary structure in the target DNA and a hairpin structure was indeed

points of our protocol. x, GTGGAGGGCAGT; y, ACTGCCCCCAC. (\*\*\* *P* < 0.001)

5'AC**TG**GAAGGC----- -----TCT**AGCA 3**' 3'**AC**CTTCCG------- -----AGA**TCGT**CA **5**'

**HIV-1 LTR DNA 5'-LTR 3'-LTR**

 -----TCT**AGCA 3**' -----AGA**TCGT**CA **5**'

**Target DNA**

5'AC**TG**GAAGGC----- 3'**AC**CTTCCG-------

**3.2** *In vitro* **integration site in the target repeat sequence DNA** 

Fig. 11. *In vitro* integration site in the target repeat sequence DNA.

predicted by this analysis (Fig. 12).

protocol, since previous protocols used a single 3'-LTR DNA (Figs. 7 and 11).

integration efficiency (\*\*\*\**P* < 0.001).

Fig. 12. A presumed secondary structure in the target sequence DNA.

The blue lines in the target DNA sequence shown at the top indicate the most frequent integration site. The presumed hairpin like structure shown at the bottom was constructed based on calculation of the target DNA sequence using the *m*-fold program (http://mfold.rna.albany.edu/?q=mfold/DNA-Folding-Form). dG, dH, and dS represent Gibbs' free energy, enthalpy, and entropy in ssDNA, respectively. A box represents the most frequent integration site.

#### **3.2 Decoy effect of modified target sequences**

We prepared two modified DNA target sequences in which the 5'-CA and 5'-GT were removed from the repeat unit at the middle site, termed modified sequence I and II, respectively (Fig. 13). PCR analysis of in vitro integration into modified sequence I or II revealed significant reductions in the number of copies of the PCR products compared to integration into the unmodified sequence. In addition, integration selectivity was not evident when using the modified DNA sequences (P < 0.05). We next mixed substrate DNA

Fig. 13. Modified target DNA sequences.

containing the target sequence with substrate DNA containing modified sequence I or II in equal amounts, and examined the number of PCR product copies that originated from integration into the non-modified target sequence. Integration into the original, nonmodified target sequence of the substrate DNA was significantly reduced when this DNA was mixed with the modified sequences (Fig. 14).

Two modified DNA sequences were prepared in which CA and GT were removed from the repeat unit at the middle site, termed modified sequence I and modified sequence II respectively. Red letters represent the TG/CA motifs. Yellow letters represent the GT/AC motifs that are observed in the HIV-1 proviral genome.

Fig. 14. *In vitro* integration using modified sequence I or II

The result showed significant reductions in the number of copies of PCR products derived from integrated DNA. In addition, integration selectivity was evidently suppressed when using the modified DNA targets (left graph, \**P* < 0.05). Substrate DNA containing the target sequence was then mixed with substrate DNA containing modified sequence I or II in equal amounts, and the percentage of PCR product copies originating from integration into the original target sequence were determined. Integration into the original target sequence was significantly reduced when this target was mixed with the modified sequences (right graph, \**P* < 0.05).

#### **3.3 Biochemistry of the integrase: DNA structure fluctuation enhances selective integration**

We digested circular DNA with HIV-1 integrase in a buffer containing various concentrations of manganese dichloride and measured the band intensity of linearized DNA

containing the target sequence with substrate DNA containing modified sequence I or II in equal amounts, and examined the number of PCR product copies that originated from integration into the non-modified target sequence. Integration into the original, nonmodified target sequence of the substrate DNA was significantly reduced when this DNA

Two modified DNA sequences were prepared in which CA and GT were removed from the repeat unit at the middle site, termed modified sequence I and modified sequence II respectively. Red letters represent the TG/CA motifs. Yellow letters represent the GT/AC

The result showed significant reductions in the number of copies of PCR products derived from integrated DNA. In addition, integration selectivity was evidently suppressed when using the modified DNA targets (left graph, \**P* < 0.05). Substrate DNA containing the target sequence was then mixed with substrate DNA containing modified sequence I or II in equal amounts, and the percentage of PCR product copies originating from integration into the original target sequence were determined. Integration into the original target sequence was significantly reduced when this target was mixed with the modified

**3.3 Biochemistry of the integrase: DNA structure fluctuation enhances selective** 

We digested circular DNA with HIV-1 integrase in a buffer containing various concentrations of manganese dichloride and measured the band intensity of linearized DNA

was mixed with the modified sequences (Fig. 14).

motifs that are observed in the HIV-1 proviral genome.

Fig. 14. *In vitro* integration using modified sequence I or II

sequences (right graph, \**P* < 0.05).

**integration** 

following electrophoresis. The relative band intensity increased when the concentration of MnCl2 in the reaction buffer exceeded 40 mM (Fig. 15). This result raised the question of how such fluctuation in DNA structure influences the selectivity of *in vitro* HIV-1 integration. Furthermore, the percentage of integration into the target sequence DNA was found to increase significantly when the concentration of MnCl2 exceeded 40 mM. The ratio of the PCR product number derived from integration into the target sequence DNA was also found to increase significantly when the MnCl2 concentration exceeded 40 mM (Fig. 16). In conclusion, such fluctuation may generate a favorable conformation of target DNA for integration of the HIV-1 LTR [16, 17].

Fig. 15. Effect of MnCl2 concentration and structural-fluctuation of target DNA on integration

(A) Electrophoretogram of plasmid plus target DNA (left) and plasmid DNA (right). P and Q indicate structural isomers of the circular DNA. R, a single fragment, indicates the digested DNA fragment after 90 min incubation. The vertical axis indicates relative signal intensity and the horizontal axis indicates electrophoretic mobility distance. (B) Electrophoresis of the DNA following incubation of recombinant integrase in buffer containing 0, 20, or 40 mM MnCl2 following incubation for 0, 50, and 90 min. P, Q, and R are as described in (A). Upper and lower photographs display electrophoresis of plasmid plus target DNA and plasmid DNA, respectively. Significant fluctuation in R was observed in the electrophoresed DNA following 90 min incubation. (C) Relative signal area of the digested 4.0-kb substrate DNA corresponding to R in (B). Error bars represent standard deviation (S.D.). Fragment R area was calculated by integration of the individual curves shown in the electrophoretogram (A) with respect to electrophoretic mobility distance. Signal area significantly increased when 40 mM MnCl2 was included in the buffer.

Fig. 16. DNA structural fluctuation and integration

A MnCl2 concentration greater than 40 mM significantly increased the percentage of integration into the target sequence DNA. The percentage of copy number of PCR products derived from integration into the target sequence DNA was found to increase significantly when the concentration of MnCl2 exceeded 40 mM. It is concluded that such fluctuation may generate a favorable conformation of target DNA (in red) for integration of HIV-1 LTR DNA.

#### **4. Conclusion**

In conclusion, selective HIV-1 integration was proved at an *in vitro* level in this study. The factors that determine this selectivity are (i) a sequence motif, including CAGT, and (ii) a structural factor that can be induced by fluctuation of a high concentration of MnCl2 [16, 17]. The findings shown in Figs. 9 and 10 indicate that the percentage of integration into the target sequence was significantly greater than the integration rate into the random and deleted sequences. Moreover, the entire repeat sequence or secondary structure may be a target of integration.

In particular, our findings that sequences similar to the target DNA sequence interfere with integration (Fig. 14). Thus, a modified DNA can act as a decoy for the target DNA. In the present study, integration efficiency and selectivity were highly sensitive to MnCl2 concentration in the reaction buffer. In particular, the integration efficiency and selectivity increased significantly when the MnCl2 concentration was increased from 30 mM to 40 mM.

intensity and the horizontal axis indicates electrophoretic mobility distance. (B) Electrophoresis of the DNA following incubation of recombinant integrase in buffer containing 0, 20, or 40 mM MnCl2 following incubation for 0, 50, and 90 min. P, Q, and R are as described in (A). Upper and lower photographs display electrophoresis of plasmid plus target DNA and plasmid DNA, respectively. Significant fluctuation in R was observed in the electrophoresed DNA following 90 min incubation. (C) Relative signal area of the digested 4.0-kb substrate DNA corresponding to R in (B). Error bars represent standard deviation (S.D.). Fragment R area was calculated by integration of the individual curves shown in the electrophoretogram (A) with respect to electrophoretic mobility distance. Signal area

A MnCl2 concentration greater than 40 mM significantly increased the percentage of integration into the target sequence DNA. The percentage of copy number of PCR products derived from integration into the target sequence DNA was found to increase significantly when the concentration of MnCl2 exceeded 40 mM. It is concluded that such fluctuation may generate a favorable conformation of target DNA (in red) for integration of HIV-1 LTR DNA.

In conclusion, selective HIV-1 integration was proved at an *in vitro* level in this study. The factors that determine this selectivity are (i) a sequence motif, including CAGT, and (ii) a structural factor that can be induced by fluctuation of a high concentration of MnCl2 [16, 17]. The findings shown in Figs. 9 and 10 indicate that the percentage of integration into the target sequence was significantly greater than the integration rate into the random and deleted sequences. Moreover, the entire repeat sequence or secondary structure may be a

In particular, our findings that sequences similar to the target DNA sequence interfere with integration (Fig. 14). Thus, a modified DNA can act as a decoy for the target DNA. In the present study, integration efficiency and selectivity were highly sensitive to MnCl2 concentration in the reaction buffer. In particular, the integration efficiency and selectivity increased significantly when the MnCl2 concentration was increased from 30 mM to 40 mM.

significantly increased when 40 mM MnCl2 was included in the buffer.

Fig. 16. DNA structural fluctuation and integration

**4. Conclusion** 

target of integration.

Fluctuations in the electrophoretic mobility of the substrate DNA also increased. These results suggest that there is a threshold concentration of MnCl2 for *in vitro* integration, probably because MnCl2 induces instability of secondary structure and therefore phase transition of the host DNA strand may occur. Target DNA can probably not generate the specified stable conformation under 40mM of MnCl2. Based on these data as well as the data shown in Fig. 15, we propose that there are close correlations between structural changes in substrate DNA and integration selectivity and efficiency (Fig. 16, 17). We have used MnCl2 for studies of *in vitro* integration because this salt is more appropriate than other salts for the generation of *in vivo* integration. However, during *in vivo* integration into the host genome, numerous DNA binding proteins and metal ions regulate the reaction in a complex manner. Therefore, the present data cannot be immediately applied to *in vivo* systems and further investigation using cell culture systems are necessary. Nevertheless, this *in vitro* integration assay is expected to facilitate understanding of the pathogenicity of HIV-1.

Fig. 17. A model of integration

The top of the secondary structure may be favorable for integration when the target DNA sequence is open or rewound by protein binding to the upstream of the target DNA sequence.
