**2. An overview to human genome**

The genome coding regions are defined, in part, by an alternative series of motifs responsible for a variety of functions that take place on the DNA and RNA sequences, such as, gene regulation, RNA transcription, RNA splicing, and DNA methylation. For example,

sequencing of the human genome revealed a controversial number of interrupted genes (25,000-32,000) with their regulatory sequences [1, 2] representing about 2% of the genome. These genes are immersed in a giant sea of different types of non-coding sequences which make up around 98% of the genome. The non-coding regions are characterized by many kinds of repetitive DNA sequences, where almost 10.6% of the human genome consists of Alu sequences, a type of SINE (short interspersed elements) sequence [3]. [Alu] elements are not randomly distributed throughout the genome but rather are biased toward gene-rich regions [5]. They can act as insertional mutagens and the vast majority appears to be genetically inert (6). LINES, MIR, MER, LTRs, DNA transposons, and introns are other kinds of noncoding sequences, which together conform about 86% of the genome. In addition, some of these sequences are overlapped one to another, for example, the CpG islands (CGI), which complicates analysis of the genomic landscape. In turn, each chromosome is characterized by some particular properties of structure and function.

Systemic Approach to the Genome Integration Process of Human Lentivirus 57

An additional related study performed by Felice et al, 2009 [32] compared and contrasted the chromosomal integration patterns between gamma retrovirus (Moloney Leukemia virus, MLV) and Lentivirus (human immunodeficiency virus type 1, HIV-1), finding that gammaretroviral, but not lentiviral vectors, integrate in genomic regions enriched in cell-type specific subsets of transcription factors binding sites (TFBSs), independently from their relative position with respect to genes and transcription start sites. Therefore, is proposed that TFBSs could be differential genomic determinants of retroviral target site selection in

Several *in vitro* and *in vivo* studies have shown that HIV-1 integrate predominantly in active transcription units and in genome zones with high gene density, high frequency of Alu elements, low content of CpG islands and open chromatin regions [33]. Notwithstanding this evidence, the identification of particular characteristics of local chromatin that facilitate

The objective of the this chapter is to show the main results that our group of investigation have obtained of statistically testing those genomic variables that define a preferred genomic environment for human lentiviral integration and localize them in specific chromosome loci; moreover in the construction of gene/protein interaction networks among those cellular genes located around several Lentivirus integration sites in naturally infected humans as a systemic approach to better understand the lentiviral integration process.

To test our hypothesis we conducted *in silico* studies of the integration profile in the genomic DNA of peripheral blood mononuclear cells (PBMCs) and macrophages for both human Lentiviruses (HIV-1 and HIV-2) in a window size analysis of 100K. The statistical analyses included several genomic variables such as the chromosomal loci, the numbers of CpG Island, protein coding genes, transcripts and also the distribution of SINEs, LINEs, LTRs and others; moreover the exploration genomic regions in which epigenetics mechanisms would be associated with the integration process. Together, the results allow us to propose common genomic environments that favor the target chromatin zones for both

A total of 352 human genome sequences flanking the 5'LTR of human Lentiviruses (176 sequences of HIV-1 [27] and 176 of HIV-2 [33] were obtained from GenBank (NCBI) under accession numbers: CL529260 to CL529766 (HIV-1) and DQ632388 to DQ632563 (HIV-2). Using the BLAST algorithm (NCBI; *http://blast.ncbi.nlm.nih.gov/Blast.cgi*), the sequences were aligned to the draft human genome (hg18) and those that met the following criteria were considered authentic integration sites: (i) contained the terminal 3' end of the HIV-1 or HIV-2 LTR; (ii) had matching genomic DNA within five bp of the end of the viral LTR; (iii) had at least 95% homology to human genomic sequence across the entire sequenced region; (iv) matched a single human genetic locus with at least 95% homology across the entire

integration in a wider genomic manner still remains to be elucidated.

the human genome.

human Lentiviruses.

**4. Data mining and statistical analyses** 

sequenced region (v) had minimum size of 50 bp*.* 
