**2.** *In vitro* **integration assay used in previous studies**

In many previous *in vitro* integration protocols, the double stranded DNA of the HIV-1 3'-LTR proviral end alone and the substrate DNA are mixed in an appropriate buffer containing MnCl2 [14] (Fig. 2, 3). Subsequently, a PCR reaction using a primer set targeted to the proviral

Fig. 2. Scheme of the previously described incubation with recombinant integrase

DNA and the target DNA amplifies a DNA segment that includes the viral-host DNA junction. In some protocols, even the viral DNA itself is used as the target DNA. Thus, previous studies paid little attention to the target sequence. It is commonly known that integrase binds to the proviral DNA in a regular tetrameric fashion. Indeed, some sequence motifs should be favored by an integrase oligomer, because a dimeric transcriptional factor protein has the ability to bind to palindromic sequence motifs such as E-box and GAS elements.

The 3' end of the HIV-1 LTR sequence is shown. Red letters indicate the conserved dinucleotide motif. Incubation of the 3' end of the HIV-1 LTR sequence DNA with integrase results in processing of the end of the 5'-GT dinucleotide. The resulting exposed hydroxyl group then attacks the target DNA.

Fig. 3. Protocol of the previously described *in vitro* integration assay

A nick is introduced into the host DNA that is attached to a CA dinucleotide in the HIV-1 DNA end. Red letters indicate the conserved dinucleotide motif. Arrows indicate the PCR primer set and the direction of DNA polymerization.

### **2.1 Preparation of target sequences for** *in vitro* **integration**

We recently reported the target sequence of MLV integration. We developed an inbred strain of mice suffering from spontaneous B cell lymphoma by MLV integration. MLV integration into *Stat5a* was identified in 25% of the lymphoma genome [15, 16] (Fig. 4). As depicted in Fig. 4, the hot spot of integration included a 5'-CA-rich sequence as well as a palindromic motif. Downward facing arrows in the figure indicated the MLV integration sites. The abundance of 5'-CA dinucleotides in the integration hot spots provided us with a hint for the preparation of target DNA for HIV integration, because these motifs are shared by the genome ends of MLV and HIV-1 (Fig. 5). We hypothesized that HIV-1 DNA also favors such a 5'-CA-rich sequence motif.

DNA and the target DNA amplifies a DNA segment that includes the viral-host DNA junction. In some protocols, even the viral DNA itself is used as the target DNA. Thus, previous studies paid little attention to the target sequence. It is commonly known that integrase binds to the proviral DNA in a regular tetrameric fashion. Indeed, some sequence motifs should be favored by an integrase oligomer, because a dimeric transcriptional factor protein has the ability to bind to palindromic sequence motifs such as E-box and GAS elements. The 3' end of the HIV-1 LTR sequence is shown. Red letters indicate the conserved dinucleotide motif. Incubation of the 3' end of the HIV-1 LTR sequence DNA with integrase results in processing of the end of the 5'-GT dinucleotide. The resulting exposed hydroxyl

> **5' AAAATCTAGCAGT -3' 3' TTTTAGATCGTCA -5'**

**In vitro integration assay**

 **5' AAAATCTAGCA -3' 3' TTTTAGATCGTCA-5'**

**+**

**Joining reaction**

A nick is introduced into the host DNA that is attached to a CA dinucleotide in the HIV-1 DNA end. Red letters indicate the conserved dinucleotide motif. Arrows indicate the PCR

We recently reported the target sequence of MLV integration. We developed an inbred strain of mice suffering from spontaneous B cell lymphoma by MLV integration. MLV integration into *Stat5a* was identified in 25% of the lymphoma genome [15, 16] (Fig. 4). As depicted in Fig. 4, the hot spot of integration included a 5'-CA-rich sequence as well as a palindromic motif. Downward facing arrows in the figure indicated the MLV integration sites. The abundance of 5'-CA dinucleotides in the integration hot spots provided us with a hint for the preparation of target DNA for HIV integration, because these motifs are shared by the genome ends of MLV and HIV-1 (Fig. 5). We hypothesized that HIV-1 DNA also

Fig. 3. Protocol of the previously described *in vitro* integration assay

**2.1 Preparation of target sequences for** *in vitro* **integration** 

primer set and the direction of DNA polymerization.

favors such a 5'-CA-rich sequence motif.

**Incubation with integrase**

**Host DNA**

**3**

**5**

group then attacks the target DNA.

**5 3**

Fig. 4. Target sequence of MLV integration within the *Stat5a* gene.

Downward-facing arrows indicate the MLV integration sites (15, 16). Red letters highlight the CA-rich sequence. Blue boxes represent exons of the *Stat5a* gene. Underlines indicates the palindromic motif.

Fig. 5. Conserved 5'-CA dinucleotide in the proviral ends of HIV-1 and MLV.

A box indicate the integration signal sequence reported by Yoshinaga et al (9).

#### **2.2 Modified** *In vitro* **integration assay**

We then prepared a target sequence for HIV-1 integration. A repeat sequence was prepared in order to enhance integration efficiency. We used the repeat sequence, 5'-

(GTCCCTTCC*CAGT*)6(*ACTG*GGAAGGGAC)6-3', or a modification of this sequence, which was ligated into a circular plasmid. The sequence within parenthesis is the unit of the repeat. This target sequence includes the 5'-CA dinucleotide motif, and includes 5'-AC at the HIV-1 DNA termini (Fig. 5). 5'-CAGT and 5'-ACTG (shown in *italics* in the above sequence) in the repeat units are also present in the HIV-1 proviral genome ends. This target DNA was reacted with recombinant integrase and formed a pre-integration (PI) complex. Figures 6 and 7 show our scheme of *in vitro* integration as well as the sequences of the HIV-1 proviral 5'- and 3'-ends. Following incubation of the proviral LTR sequence DNAs with recombinant integrase, the resultant pre-integration complexes were reacted with the target DNA. PCR amplification was performed and the integration sites were analyzed by direct sequencing. Unlike previously reported protocols, we used both 5'- and 3'-LTR sequences in our protocol. Such a target sequence unit was expected to directly interact with complementary HIV-1 DNA end sequences present in the target DNA. Complementarity between HIV-1 DNA and host DNA is shown in Fig. 8.

Fig. 6. Scheme of *in vitro* viral integration into the target sequence.

The red segment in the target sequence DNA that includes a circular plasmid represents the 144-bp target DNA, and the black line represents the remainder of the circular plasmid DNA used for ligation. Following incubation of the proviral LTR sequence DNAs with integrase, the resultant pre-integration complexes were reacted with the substrate DNA. Red letters in the HIV-1 cDNA represent the LTR termini. PCR amplification was performed using primers corresponding to regions in the proviral ends and a plasmid region. The integration sites were analyzed by direct sequencing. "Q" in the PCR product indicates the junction between the provirus and the target DNA and R&S represents the 3'-ends of the primer within the HIV-1 DNA and the plasmid DNA, respectively [14].

(GTCCCTTCC*CAGT*)6(*ACTG*GGAAGGGAC)6-3', or a modification of this sequence, which was ligated into a circular plasmid. The sequence within parenthesis is the unit of the repeat. This target sequence includes the 5'-CA dinucleotide motif, and includes 5'-AC at the HIV-1 DNA termini (Fig. 5). 5'-CAGT and 5'-ACTG (shown in *italics* in the above sequence) in the repeat units are also present in the HIV-1 proviral genome ends. This target DNA was reacted with recombinant integrase and formed a pre-integration (PI) complex. Figures 6 and 7 show our scheme of *in vitro* integration as well as the sequences of the HIV-1 proviral 5'- and 3'-ends. Following incubation of the proviral LTR sequence DNAs with recombinant integrase, the resultant pre-integration complexes were reacted with the target DNA. PCR amplification was performed and the integration sites were analyzed by direct sequencing. Unlike previously reported protocols, we used both 5'- and 3'-LTR sequences in our protocol. Such a target sequence unit was expected to directly interact with complementary HIV-1 DNA end sequences present in the target DNA. Complementarity between HIV-1

Target DNA

Plasmid DNA

HIV-1 DNA

S

PCR product

R

R Q S

The red segment in the target sequence DNA that includes a circular plasmid represents the 144-bp target DNA, and the black line represents the remainder of the circular plasmid DNA used for ligation. Following incubation of the proviral LTR sequence DNAs with integrase, the resultant pre-integration complexes were reacted with the substrate DNA. Red letters in the HIV-1 cDNA represent the LTR termini. PCR amplification was performed using primers corresponding to regions in the proviral ends and a plasmid region. The integration sites were analyzed by direct sequencing. "Q" in the PCR product indicates the junction between the provirus and the target DNA and R&S represents the 3'-ends of the

Q

DNA and host DNA is shown in Fig. 8.

5'-LTR

3'-LTR

5'-TGTGTGCCCGTC--------------TGGAAAATCTCTAGCA -3' 3'-ACACACGGGCAG--------------ACCTTTTAGAGATCGTCA-5'

5'-ACTGGAAGGGTT--------------CAAGGCTACTTCCCA -3' 3'-TGACCTTCCCAA--------------GTTCCGATGAAGGGTCA-5'

Fig. 6. Scheme of *in vitro* viral integration into the target sequence.

primer within the HIV-1 DNA and the plasmid DNA, respectively [14].

HIV-1 DNA

Fig. 7. Target sequence for integration.

The top sequences show the termini of the HIV-1 provirus. The bottom sequence indicates the target sequence and highlights the dinucleotide motif CA/TG (red), and the AC bases (yellow) that are also present at the HIV-1 DNA termini. We prepared a repeat sequence in order to enhance integration efficiency. The sequence shown in parenthesis is the unit of the repeat. Our protocol differs from previous protocols in that we used both 5'- and 3'-LTR sequences rather than a single 3'-LTR DNA.

Fig. 8. Integration scheme.

The repeat sequence unit in the target sequence was expected to directly interact with HIV-1 DNA. The dotted lines indicate complementarity between target and viral DNA. The dsDNA sequence at the bottom indicates the 5'- and 3'-ends of the proviral HIV-1 DNA.

### **2.3 Reaction protocol**

The detailed reaction protocol that we used is as follows. First, 75 ng of the U5'-LTR cDNA sequence of HIV-1.; (+) 5'-TGT GTG CCC GTC TGT TGT GTG ACT CTG GTA ACT AGA GAT CCT CAG ACC TTT TTG GTA GTG TGG AAA ATC TCT AGC A-3' and (-) 5'-ACT GCT AGA GAT TTT CCA CAC TAC CAA AAA GGG TCT GAG GGA TCT CTA GTT ACC AGA GTC ACA CAA CAG ACG GGC ACA CA-3', was incubated with 50 ng recombinant HIV-1 integrase in 10 l of binding buffer for 1 h at 30°C. The binding buffer consisted of 1- 0.1 mM MnCl2, 80 mM glutamate potassium glutamate, 10 mM mercaptoethanol, 10% DMSO, and 35 mM MOPS (pH 7.2).

Similarly, the 3'-LTR cDNA sequence (75 ng) of HIV-1.; (+) 5'-ACT GGA AGG GTT AAT TTA CTC CAA GCA AAG GCA AGA TAT CC TTG ATT TGT GGG TCT ATA ACA CAC AAG GCT ACT TCC CA-3' and (-) 5'-ACTG GGA AGT AGC CTT GTG TGT TAT AGA CCC ACA AAT CAA GGA TAT CTT GCC TTT GCT TGG AGT AAA TTA ACC CTT CCAGT-3', was incubated with the recombinant retroviral integrase. After incubation, the double-stranded (ds) 5'-LTR DNA was combined with the ds 3'-LTR DNA for 1 h at 30 °C, and the LTR DNA was then further incubated with the target DNA for 1 h at 30°C. As controls, ds 5'-LTR DNA and ds 3'-LTR DNA were also individually incubated with the target DNA. For control target DNAs we synthesized four random 144-bp sequences, which were designed by a random number generator, and we ligated these sequences into circular DNA in the same manner as described below for the target DNA

In order to prevent non-specific reactions at the target DNA sequence, we ligated the target sequence DNA into circular plasmid DNA (Invitrogen pCR2.1 TOPO vector) and used this entire DNA as the target DNA for the assay (Fig. 6). The proportion of LTRs and target DNAs was optimized to prevent both non-specific reactions and integration due to an excess of LTRs. The DNA reacted in the buffer was purified using a QIA quick column (QIAGEN, GmbH, Germany). PCR amplification was then performed using retroviral primers: the HIV-1 U5'-LTR primer, 5'-GTG TGC CCG TCT GTT GTG TGA CTCTGG-3', or the HIV-1 U3'-LTR primer, 5'-CTG GGA AGT AGC CTT GTG TGT TAT AG-3', and a TOPO vector primer 5'-TCA CTC ATG GTT ATG GCA GC -3' whose first nucleotide corresponds to nucleotide position 2222 in the TOPO-pCR2.1 plasmid (Invitrogen, Carlsberg, CA). Amplicon copy number was quantified following identification of the HIV-1-substrate DNA junction [14].
