**5. Patterns of provirus distribution**

No significant differences were observed in the integration lymphocytic profiles between HIV-1 and HIV-2 (p>0.05, Mann-Whitney test). The integrational events for both human Lentiviruses were recorded in all chromosomes except the Y (figure 1). However, significant differences between the number of HIV-1 and HIV-2 provirus were observed for chromosomes 4, 8, 9, 11 and 16 (p<0.05, X2 test). Most of the total integrations (39/352) occurred in chromosome 17 (figure 1). A tendency to a differential distribution of provirus towards telomeric and subtelomeric regions of the most of human chromosomes was observed. In this sense, other authors showed that centromeric alphoid repeat regions are disfavored as integration sites [36]. Although proviruses were observed in all chromosomes, we identified some chromatin regions with only HIV-1 integrations in chromosomes 4, 6 and 9 and only HIV-2 in chromosome 21.

#### **5.1. Functional characterization of genes flanking integration sites**

The ontology of genes hosting HIV integrations events were analyzed using G.O (Gene Ontology from NCBI). 83% (146/176) of HIV-1 and 77% (135/176) of HIV-2 integrations occurred close to chromatin regions containing protein coding genes (p>0.05, t-student test). In a 100Kb extension of chromatin that harbored both HIV-1 and HIV-2 proviruses no differences were observed for the gene functional categories (p>0.05, Bonferoni´s correction). According to molecular function, 46% of HIV-1 integrations and 57% of HIV-2 were associated with molecular binding, while 19% and 18% respectively occurred in regions that code for genes associated with enzymatic function (figure 2a). Otherwise an exploring about the biological process revealed a preferential integration in a collection of genes involved in metabolism and gene expression for HIV-1 (36%) and HIV-2 (37%) (p>0.05, Bonferoni´s correction) (figure 2b).

58 Bioinformatics

Ensemble databases (update 2010).

**5. Patterns of provirus distribution** 

and 9 and only HIV-2 in chromosome 21.

BLAST of NCBI and the BLAT algorithm of the Genome Browser (University of California, Santa Cruz, Human Genome Project) (*http://www.genome.ucsc.edu/)* were used to obtain information about coding protein genes (RefSeq), transcripts, CpG islands and repetitive elements. Additional genomic information included molecular process and molecular function, was obtained from Gene Ontology (GO) (*http://www.geneontology.org/index.shtml*), GenCard (*http://www.genecards.org/cgi-bin/carddisp.pl*) and Gene Entrez (*http://www.ncbi.nlm.nih.gov/ncbi/geneentrez*). The chromosomal localization of the HIV-1 and HIV-2 proviruses was identified using the G pattern banding of each chromosome, as proposed by the Paris Conference (1971) [35], with updating of 850 times resolution. As the highest number of HIV-1 and HIV-2 proviruses was recorded on chromosome 17, an extensive characterization of its chromatin structure was performed including the genomic information available in several platforms of the Genome Browser: shows the CpG islands and distribution of its methylation; of histone H3 in the Lysine 4 and 27 methylation data obtained from ENCODE Histone modification by University of Washington CHIP-seq; Nucleosoma occupancy probabilities from A375 by Washington University and DNase1 hypersensitivity (ENCODE University of Washington) in GM12878 cells. All statistical analyses were performed using STATISTICA 7 [35]. The Mann-Whitney test (Wilcoxon rank) was used to establish differences between HIV-1 and HIV-2 chromosomal integration. Differences in function, molecular process and cell localization were analyzed using the ttest for independent samples. The Kolgomorov-Smirnov test was used for determining normality of data. In order to avoid an erroneous significance level for multiple comparisons a Bonferroni correction test was applied. To calculate the significant association among CpG numbers, genes numbers and integrations multiple regression analyses were performed**.**  CpG numbers and genes per Mpb per chromosomes were determined from the NCBI and

No significant differences were observed in the integration lymphocytic profiles between HIV-1 and HIV-2 (p>0.05, Mann-Whitney test). The integrational events for both human Lentiviruses were recorded in all chromosomes except the Y (figure 1). However, significant differences between the number of HIV-1 and HIV-2 provirus were observed for chromosomes 4, 8, 9, 11 and 16 (p<0.05, X2 test). Most of the total integrations (39/352) occurred in chromosome 17 (figure 1). A tendency to a differential distribution of provirus towards telomeric and subtelomeric regions of the most of human chromosomes was observed. In this sense, other authors showed that centromeric alphoid repeat regions are disfavored as integration sites [36]. Although proviruses were observed in all chromosomes, we identified some chromatin regions with only HIV-1 integrations in chromosomes 4, 6

The ontology of genes hosting HIV integrations events were analyzed using G.O (Gene Ontology from NCBI). 83% (146/176) of HIV-1 and 77% (135/176) of HIV-2 integrations

**5.1. Functional characterization of genes flanking integration sites** 

**Figure 1.** Chromosomal loci where 352 HIV-1 and HIV-2 cDNA have integrated into the human genome. Localization of chromosomal sequences matching both lentivirus are indicated in the graphics. Upper for each chromosome. Blue lines identify HIV-2 integrations and red lines identify HIV-1 integrations.

#### **5.2. Distribution of the repetitive elements flanking integration sites**

A low number of repetitive elements including SINEs, LINEs and LTRs were identified associated with provirus in an extension of 100Kb of flanking host chromatin. In general, there were no differences in the distribution of repetitive elements categories (SINEs, LINEs and LTRs) between HIV-1 and HIV-2 integrations (p>0.05, X2 test). Our results showed that both lentiviruses had a preferential integration close to Alu elements which correspond to SINEs. Within LINEs, differences among L1, L2 and L3 were recorded. The other class of repetitive elements like LTR, simple repeats and low complexity represented a minor proportion of the integration associated chromatin (figure 3).

Systemic Approach to the Genome Integration Process of Human Lentivirus 61

The results of multiple-regression analysis conducted on the HIV-1 and HIV-2 data sets showed that there were differential distributions of CpG island, genes, and Alu elements that together conditioned a specific genomic environment per chromosome (R2=0.91, p<0.05). Gene density was the independent variable contributed most in the prediction of the dependent variable (integrations) due to the highest regression coefficients (B= 0.83; p<0.05). The highest relative likelihood of hosting a lentiviral integration event in the human genome was registered in chromosome 17 (figure 4a). To test that integration events are favored by gene-rich regions in all chromosomes, a comparison between those variables was done indicating that a high gene density in chromatin regions determine a favorable environment for integration, even when the chromosome 17 is excluded (Figure 4b). Because chromosome 17 registered the highest percentage of Lentiviral integration events, a detailed analysis of chromatin structure correlating several variables that give data about the cellular chromatin status was performed. In general the distal chromatin regions of p and q arms showed similarities in the distribution of methylation in CpG islands, methylation in several lysine residues of histone H3 (K4, K27 and K36) and variable levels of

**Figure 3.** Frequencies of several repetitive elements associated with regions of 100 kb around the HIV-1 and HIV-2 proviruses. SINEs, short interspersed nuclear element; LINEs, long interspersed nuclear

Experimental studies have demonstrated that regulatory regions in general and promoters in particular, tend to be DNase sensitive and are target for integration of the majority of retroviruses [37, 38]. In 2006, the complete nucleotide sequence of chromosome 17 was published [39]. This chromosome is rich in protein coding genes, having the second highest gene density in the genome, (16.2 genes per Mb), with a relative excess of short interspersed elements (SINEs, 22.3%) and a deficit of long interspersed elements (LINEs, 14.4%). Likewise, this chromosome has high average CpG content (45.5%) and high euchromatin

open chromatin and nucleosome occupancy (figure 5a and b).

element.

**Figure 2.** Functional characterization of the coding protein genes located in genome regions around 100 kb of human lentivirus. (a) Molecular function by GO of genes associated with HIV-1 and HIV-2 integration. (b) Biological process by GO of genes associated with HIV-1 and HIV-2 integration. Blue blocks correspond to HIV-1. Red blocks to HIV-2

#### **5.3. Definition of the common genomic environment of integrations**

As the integration do not follow a random model [23-25], some characteristics of the chromatin associated with regions with high level of provirus integration, support the hypothesis that a preferential integration is conditioned by structural and functional states of local chromatin; these states are defined by several genomic variables which were studied in this work, and together would define genomic environments.

The results of multiple-regression analysis conducted on the HIV-1 and HIV-2 data sets showed that there were differential distributions of CpG island, genes, and Alu elements that together conditioned a specific genomic environment per chromosome (R2=0.91, p<0.05). Gene density was the independent variable contributed most in the prediction of the dependent variable (integrations) due to the highest regression coefficients (B= 0.83; p<0.05). The highest relative likelihood of hosting a lentiviral integration event in the human genome was registered in chromosome 17 (figure 4a). To test that integration events are favored by gene-rich regions in all chromosomes, a comparison between those variables was done indicating that a high gene density in chromatin regions determine a favorable environment for integration, even when the chromosome 17 is excluded (Figure 4b). Because chromosome 17 registered the highest percentage of Lentiviral integration events, a detailed analysis of chromatin structure correlating several variables that give data about the cellular chromatin status was performed. In general the distal chromatin regions of p and q arms showed similarities in the distribution of methylation in CpG islands, methylation in several lysine residues of histone H3 (K4, K27 and K36) and variable levels of open chromatin and nucleosome occupancy (figure 5a and b).

60 Bioinformatics

both lentiviruses had a preferential integration close to Alu elements which correspond to SINEs. Within LINEs, differences among L1, L2 and L3 were recorded. The other class of repetitive elements like LTR, simple repeats and low complexity represented a minor

**Figure 2.** Functional characterization of the coding protein genes located in genome regions around 100 kb of human lentivirus. (a) Molecular function by GO of genes associated with HIV-1 and HIV-2 integration. (b) Biological process by GO of genes associated with HIV-1 and HIV-2 integration. Blue

As the integration do not follow a random model [23-25], some characteristics of the chromatin associated with regions with high level of provirus integration, support the hypothesis that a preferential integration is conditioned by structural and functional states of local chromatin; these states are defined by several genomic variables which were studied

**5.3. Definition of the common genomic environment of integrations** 

in this work, and together would define genomic environments.

blocks correspond to HIV-1. Red blocks to HIV-2

proportion of the integration associated chromatin (figure 3).

**Figure 3.** Frequencies of several repetitive elements associated with regions of 100 kb around the HIV-1 and HIV-2 proviruses. SINEs, short interspersed nuclear element; LINEs, long interspersed nuclear element.

Experimental studies have demonstrated that regulatory regions in general and promoters in particular, tend to be DNase sensitive and are target for integration of the majority of retroviruses [37, 38]. In 2006, the complete nucleotide sequence of chromosome 17 was published [39]. This chromosome is rich in protein coding genes, having the second highest gene density in the genome, (16.2 genes per Mb), with a relative excess of short interspersed elements (SINEs, 22.3%) and a deficit of long interspersed elements (LINEs, 14.4%). Likewise, this chromosome has high average CpG content (45.5%) and high euchromatin density [39] (figure 6). Our statistical analysis determined that chromosome 17 had the highest number of integrations, mainly concentrate towards the telomeres of both arms.

Systemic Approach to the Genome Integration Process of Human Lentivirus 63

**Figure 5.** Flash image of the Genome Browser showing the distribution of several characteristics of the chromatin along 9.5 Mb of the p and q arm representing 25% of chromosome 17. (a) p arm, (b) q arm. The figure shows the GC percentage for each: 5pb (black), refseq Genes (black), CpG islands (several tones of gray), levels of open chromatin (ENCODE, Duke) in GM12878 cells with DNasel and FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements) (black), DNase1 hypersensitivity (ENCODE, University of Washington) in GM12878 cells (gray), pk (sites identified as signal peaks within FDR 0.5%

Previous studies have identified most of human cell pathways been disturbed by at least one interaction with an HIV-1 protein during the virus life cycle [42-44]. Those interactions are of two types: either direct, via host cell protein-viral proteins or indirect, such as regulatory interactions that alter expression of human genes [45, 46]; the signaling network cc-cytokine is both disrupted and exploited by HIV at various stages of infection. 22 candidates human class E proteins were connected into coherent network by 43 different protein-protein interactions, in which AIP1 play a key role in linking complexes that act

hypersensitive zones), Hspots (zones identified using the HotSpot algorithm), and predicted

nucleosome occupancy in A375 cells (black peaks).

**Figure 4.** Multiple-regression analysis among gene density, CpG island number, and frequency of HIV-1 and HIV-2 proviruses including every human chromosome. A high statistical correlation is observed mainly for chromosome 17. (a) Analysis including all human chromosomes. (b) The same analysis but excluding chromosome 17.

The most relevant relationship was related to the conformational state of chromatin including the nucleosomes occupancy, methylation of CpG Islands, DNase hypersensitive regions and transcriptionally active genes that are found in open-decondensed chromatin regions. These regions provide the environment for DNA regulatory processes such as DNA replication, repairs and transcription. Albanese et al. (2008) [40], found histone and IN acetylation may favor integration by tethering the virus to acetylated/decondensed regions of the chromatin. We concluded that the structural characteristics and the epigenetic modifications observed in those regions with high frequency of cDNA viral integrations would synergistically configure a local "genomic environment" that facilitates the target site selection during the retroviral integration.

## **5.4. Construction of HIV-1 gene/protein networks**

Host-virus interactions is a complex level of systems information that permits a thorough understanding of how the virus exploits the host cell and uses the cellular machinery to integrate into host genome. Recently, the HIV-1 Human Protein Interaction Database (HHPID) registered 3959 interactions among 1452 human proteins and nineteen HIV proteins (fifteen of them structural and four intermediate proteins) [41].

Systemic Approach to the Genome Integration Process of Human Lentivirus 63

62 Bioinformatics

excluding chromosome 17.

selection during the retroviral integration.

**5.4. Construction of HIV-1 gene/protein networks** 

proteins (fifteen of them structural and four intermediate proteins) [41].

density [39] (figure 6). Our statistical analysis determined that chromosome 17 had the highest number of integrations, mainly concentrate towards the telomeres of both arms.

**Figure 4.** Multiple-regression analysis among gene density, CpG island number, and frequency of HIV-1 and HIV-2 proviruses including every human chromosome. A high statistical correlation is observed mainly for chromosome 17. (a) Analysis including all human chromosomes. (b) The same analysis but

The most relevant relationship was related to the conformational state of chromatin including the nucleosomes occupancy, methylation of CpG Islands, DNase hypersensitive regions and transcriptionally active genes that are found in open-decondensed chromatin regions. These regions provide the environment for DNA regulatory processes such as DNA replication, repairs and transcription. Albanese et al. (2008) [40], found histone and IN acetylation may favor integration by tethering the virus to acetylated/decondensed regions of the chromatin. We concluded that the structural characteristics and the epigenetic modifications observed in those regions with high frequency of cDNA viral integrations would synergistically configure a local "genomic environment" that facilitates the target site

Host-virus interactions is a complex level of systems information that permits a thorough understanding of how the virus exploits the host cell and uses the cellular machinery to integrate into host genome. Recently, the HIV-1 Human Protein Interaction Database (HHPID) registered 3959 interactions among 1452 human proteins and nineteen HIV

**Figure 5.** Flash image of the Genome Browser showing the distribution of several characteristics of the chromatin along 9.5 Mb of the p and q arm representing 25% of chromosome 17. (a) p arm, (b) q arm. The figure shows the GC percentage for each: 5pb (black), refseq Genes (black), CpG islands (several tones of gray), levels of open chromatin (ENCODE, Duke) in GM12878 cells with DNasel and FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements) (black), DNase1 hypersensitivity (ENCODE, University of Washington) in GM12878 cells (gray), pk (sites identified as signal peaks within FDR 0.5% hypersensitive zones), Hspots (zones identified using the HotSpot algorithm), and predicted nucleosome occupancy in A375 cells (black peaks).

Previous studies have identified most of human cell pathways been disturbed by at least one interaction with an HIV-1 protein during the virus life cycle [42-44]. Those interactions are of two types: either direct, via host cell protein-viral proteins or indirect, such as regulatory interactions that alter expression of human genes [45, 46]; the signaling network cc-cytokine is both disrupted and exploited by HIV at various stages of infection. 22 candidates human class E proteins were connected into coherent network by 43 different protein-protein interactions, in which AIP1 play a key role in linking complexes that act early (TSG101/ESCRT-I) and late (CHMP4/ESCRT-III) in the HIV infection pathways [47, 49]. Monocyte/macrophage infection is characterized by a viral dynamic substantially different from that of T lymphocytes. In fact, *in vivo* HIV infection of activated CD4-T lymphocytes accounts for the majority of the daily production of virus particles. However, a large number of lymphocytes are in a resting state, thus unable to sustain a complete and productive virus life cycle, and contribute only minimally to the daily virus production [50- 52]. Because of the limited HIV-induced cytopathic effect and of their ability to accumulate high levels of HIV particles in intracellular compartments, HIV-infected macrophages serve as a potentially important reservoir, and as "Trojan horses" exploited by the virus to favor its dissemination in different tissues. [53, 54].

Systemic Approach to the Genome Integration Process of Human Lentivirus 65

according to the number of nodes, and an associated Z-score. An active module with Zscores greater than 3.0 indicated significant response upon the conditions of the experiment. We kept the standard default values, as being the most effective for initial analyses (58).

**Figure 6.** (A). Interaction values of 28 genes of human macrophages interrupted by HIV-1 cDNA integration. (B). Gene expression network in non HIV-1 infected macrophage. Visualization of gene network composed by 28 genes located close to regions with high frequency of HIV-1 provirus in human macrophages. These genes interact with 1202 genes through 2770 interactions. The network was constructed using Cytoscape. Each node corresponds to a gene and edges represent interactions among

Eleven thousand and seven hundred and thirteen (11,713) significant genes of 41,000 probes were clustered in two significant different groups of cells; one of them included only

genes. The color gradient represents the expression values

Cytoscape v.2.63 [55] was also used to construct a gene expression network from two kinds of files: The first one from gene expression profiles as a text file (.pvals) that were imported of expression data microarray experiments (GEO profiles, NCBI). The second, as data annotation in text files (.sif) that corresponds to each one gene-gene interactions (online databases). In the first one, gene expression values were collected from the microarray data series GSE19236 composed by two Agilent platforms (**GPL6480** and **GPL6848**) with 48 samples of monocytes to macrophages, macrophages and dendritic cells. These are available from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) repository (accession number GEO: GSE19236) and for our analysis, we selected all macrophages expression samples (GSM476720, GSM476721, GSM476722, GSM476723, GSM476724, GSM476725). To identify which genes were significant among samples in microarrays; considering a p-value< 0.001 as significant, an ANOVA test was calculated. Additionally, a Hierarchical clustering analysis of the samples using Euclidean Distance Method and mean linking were performed. MultiExperiment Viewer v4.1 [56] was applied to make the corresponding statistical analyses. Using data from BOND (Biomolecular Network Data Bank, http://bond.unleashedinformatics.com/Action), BioGird (Biological General Repository for Interaction Datasets, http://thebiogrid.org/), KEEG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/), available online, a new file with the interaction data of 28 genes located close to integration sites was constructed.

Cytoscape v2.6 was used for visualizing and analyzing the genetic interaction networks among 28 human macrophages genes and their interactions. BiNGO v2.6 plugin (Biological Networks Gene Ontology tool) was used to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes. A hypergeometric test was applied to determine which categories were significantly represented (p-value< 0.01); significant value was adjusted for multiple hypotheses testing using the Bonferroni Family-wise error rate correction [57]. The network topology parameters were calculated using Network Analyzer plug-in, which includes network diameter, the number of connected pairs of nodes and average number of neighbors; it also analyses node degrees, shortest paths, clustering coefficients, and topological coefficients (Max Planck Institute Informatik).

To identify active sub-networks as highly connected regions of the main network we used j ActiveModules plug-in that grouped genes according with significant p-values of gene expression over particular subsets of samples. The result shows active modules, listed according to the number of nodes, and an associated Z-score. An active module with Zscores greater than 3.0 indicated significant response upon the conditions of the experiment. We kept the standard default values, as being the most effective for initial analyses (58).

64 Bioinformatics

its dissemination in different tissues. [53, 54].

early (TSG101/ESCRT-I) and late (CHMP4/ESCRT-III) in the HIV infection pathways [47, 49]. Monocyte/macrophage infection is characterized by a viral dynamic substantially different from that of T lymphocytes. In fact, *in vivo* HIV infection of activated CD4-T lymphocytes accounts for the majority of the daily production of virus particles. However, a large number of lymphocytes are in a resting state, thus unable to sustain a complete and productive virus life cycle, and contribute only minimally to the daily virus production [50- 52]. Because of the limited HIV-induced cytopathic effect and of their ability to accumulate high levels of HIV particles in intracellular compartments, HIV-infected macrophages serve as a potentially important reservoir, and as "Trojan horses" exploited by the virus to favor

Cytoscape v.2.63 [55] was also used to construct a gene expression network from two kinds of files: The first one from gene expression profiles as a text file (.pvals) that were imported of expression data microarray experiments (GEO profiles, NCBI). The second, as data annotation in text files (.sif) that corresponds to each one gene-gene interactions (online databases). In the first one, gene expression values were collected from the microarray data series GSE19236 composed by two Agilent platforms (**GPL6480** and **GPL6848**) with 48 samples of monocytes to macrophages, macrophages and dendritic cells. These are available from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) repository (accession number GEO: GSE19236) and for our analysis, we selected all macrophages expression samples (GSM476720, GSM476721, GSM476722, GSM476723, GSM476724, GSM476725). To identify which genes were significant among samples in microarrays; considering a p-value< 0.001 as significant, an ANOVA test was calculated. Additionally, a Hierarchical clustering analysis of the samples using Euclidean Distance Method and mean linking were performed. MultiExperiment Viewer v4.1 [56] was applied to make the corresponding statistical analyses. Using data from BOND (Biomolecular Network Data Bank, http://bond.unleashedinformatics.com/Action), BioGird (Biological General Repository for Interaction Datasets, http://thebiogrid.org/), KEEG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/), available online, a new file with the interaction data of 28 genes located close to integration sites was constructed.

Cytoscape v2.6 was used for visualizing and analyzing the genetic interaction networks among 28 human macrophages genes and their interactions. BiNGO v2.6 plugin (Biological Networks Gene Ontology tool) was used to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes. A hypergeometric test was applied to determine which categories were significantly represented (p-value< 0.01); significant value was adjusted for multiple hypotheses testing using the Bonferroni Family-wise error rate correction [57]. The network topology parameters were calculated using Network Analyzer plug-in, which includes network diameter, the number of connected pairs of nodes and average number of neighbors; it also analyses node degrees, shortest paths, clustering

To identify active sub-networks as highly connected regions of the main network we used j ActiveModules plug-in that grouped genes according with significant p-values of gene expression over particular subsets of samples. The result shows active modules, listed

coefficients, and topological coefficients (Max Planck Institute Informatik).

**Figure 6.** (A). Interaction values of 28 genes of human macrophages interrupted by HIV-1 cDNA integration. (B). Gene expression network in non HIV-1 infected macrophage. Visualization of gene network composed by 28 genes located close to regions with high frequency of HIV-1 provirus in human macrophages. These genes interact with 1202 genes through 2770 interactions. The network was constructed using Cytoscape. Each node corresponds to a gene and edges represent interactions among genes. The color gradient represents the expression values

Eleven thousand and seven hundred and thirteen (11,713) significant genes of 41,000 probes were clustered in two significant different groups of cells; one of them included only

dendritic cells, meanwhile the second grouped monocyte to macrophages and macrophages which are sharing similar gene expression patterns. A total of 2,770 interactions among 28 genes which were located closed to HIV-1 proviruses in human macrophages were recorded. AKT3 was gene with highest number of interactions (456), followed by FLT1 (381), STAT5A (356) and AXIN1 (328) (figure 6a). In contrast ZNF36, DYRK1A and RBMS3 genes had the lowest number of gene interactions. The normal macrophage gene network showed tree components: the main cluster composed by 26 macrophages genes and its interactions and two minor clusters in which ZNF36 gene was the central node with five interactions; and STX1A as central node with twelve interactions (Figure 6a).

Systemic Approach to the Genome Integration Process of Human Lentivirus 67

**Figure 7.** Effects in the topology gene expression network in macrophage by HIV-1 integration. (A) Normal macrophage genes expression network. (B) HIV-1 integration network when five macrophage

Clustering coefficient 0.30 0.04 Connected components 3 3 Shortest paths 94% 90% Network heterogeneity 5.63 3.75 Centralization 0.34 0.21 Avg. number of neighbors 4.2 2.70 Characteristic path length 3.30 4.13

**Table 1.** Comparison of network parameter values in normal and HIV-1 infected macrophages.

Using Random network plugin by Cytoscape we found the Clustering Coefficient of the non-infected Network and simulated infected Network in comparison with those generated at random showed not statistical differences (Kruskal-Wallis test, P= 0.317). The data confirmed that the topology of both reported networks have a strong support that the

We test our hypothesis that integration HIV-1 generate disturbs in the gene expression having a global effect in cellular networks and essential biological pathways. The enriched GO terms were categorized for normal and infected macrophages networks to identify the functional cellular change by HIV-1 integration. From all the GO categories covered by the 28 macrophages genes and its interactions, we have listed the ten most significant categories

The Gene Ontology (GO) enrichment analysis that normal network was composed by 423 significant functional categories of a total of 1190. These individual significant categories

**Parameter Normal macrophage HIV-1 infected macrophage** 

genes were turned off.

simulation of our gene network is valid.

of the enriched GO terms in table 2.

To further identify active sub-networks inside the main gene network, we performed an expression clustering analysis using p-values calculated by comparison of gene level expression among five macrophage samples. We found 5 subnetworks, in which the most significant active module was integrated by 222 genes with a score of 3.15 (p<0.01). Within them 12 genes related with provirus integration sites were found: AXIN1, NFAT5, STAT5A, FLT1, AKT3, HTT, RIPK2, DGCR8, WWOX, NRG1, DYRK1A and SLC2A14 (figure 6b).

The GO functional significant categories in this active module showed enrichment for positive regulation of biological process and cell proliferation. Most of the genes identified in this sub-network were associated with cellular pathways that play significant role by modulating cell signaling networks including Wnt signaling, MAPK signaling and ErbB signaling.

## **5.5. Effects on normal macrophage gene networks by HIV-1 integration**

In order to better understand the alteration of macrophages homeostasis by the HIV-1 integration, our analyses were focused to simulate what are the effects of viral cDNA integration in the alteration of several gene expression networks in human macrophage. In general the topology of non-infected macrophage network gene was dramatically changed by the HIV-1 integration events that lead to turned off the expression of five genes by the integration of proviral cDNA (Figure 7).

The evaluation of the several topological parameters such as clustering coefficient, shortest paths, network heterogeneity, the centralization, average number of neighbors and characteristic path length, showed a changed in the values of HIV-1 macrophage infected gene network, compared with normal macrophage network. The non- altered network was more condensed, had more number of interactions, was wide open rich in shortest paths and also was composed by one major component and two minor clusters being more heterogeneous and multi-functional (table 1).

Statistical differences between the topology states of two networks were registered for topological coefficients, closeness centrality and neighborhood connectivity distribution (Kolgomorov-Smirnov test p<0.05), but not in average clustering coefficient distribution. These results indicate that normal network was significantly more central and densely connected in comparison with that of HIV-1 macrophage infected network.

signaling.

integration of proviral cDNA (Figure 7).

heterogeneous and multi-functional (table 1).

dendritic cells, meanwhile the second grouped monocyte to macrophages and macrophages which are sharing similar gene expression patterns. A total of 2,770 interactions among 28 genes which were located closed to HIV-1 proviruses in human macrophages were recorded. AKT3 was gene with highest number of interactions (456), followed by FLT1 (381), STAT5A (356) and AXIN1 (328) (figure 6a). In contrast ZNF36, DYRK1A and RBMS3 genes had the lowest number of gene interactions. The normal macrophage gene network showed tree components: the main cluster composed by 26 macrophages genes and its interactions and two minor clusters in which ZNF36 gene was the central node with five interactions;

To further identify active sub-networks inside the main gene network, we performed an expression clustering analysis using p-values calculated by comparison of gene level expression among five macrophage samples. We found 5 subnetworks, in which the most significant active module was integrated by 222 genes with a score of 3.15 (p<0.01). Within them 12 genes related with provirus integration sites were found: AXIN1, NFAT5, STAT5A, FLT1, AKT3, HTT, RIPK2, DGCR8, WWOX, NRG1, DYRK1A and SLC2A14 (figure 6b).

The GO functional significant categories in this active module showed enrichment for positive regulation of biological process and cell proliferation. Most of the genes identified in this sub-network were associated with cellular pathways that play significant role by modulating cell signaling networks including Wnt signaling, MAPK signaling and ErbB

In order to better understand the alteration of macrophages homeostasis by the HIV-1 integration, our analyses were focused to simulate what are the effects of viral cDNA integration in the alteration of several gene expression networks in human macrophage. In general the topology of non-infected macrophage network gene was dramatically changed by the HIV-1 integration events that lead to turned off the expression of five genes by the

The evaluation of the several topological parameters such as clustering coefficient, shortest paths, network heterogeneity, the centralization, average number of neighbors and characteristic path length, showed a changed in the values of HIV-1 macrophage infected gene network, compared with normal macrophage network. The non- altered network was more condensed, had more number of interactions, was wide open rich in shortest paths and also was composed by one major component and two minor clusters being more

Statistical differences between the topology states of two networks were registered for topological coefficients, closeness centrality and neighborhood connectivity distribution (Kolgomorov-Smirnov test p<0.05), but not in average clustering coefficient distribution. These results indicate that normal network was significantly more central and densely

connected in comparison with that of HIV-1 macrophage infected network.

**5.5. Effects on normal macrophage gene networks by HIV-1 integration** 

and STX1A as central node with twelve interactions (Figure 6a).

**Figure 7.** Effects in the topology gene expression network in macrophage by HIV-1 integration. (A) Normal macrophage genes expression network. (B) HIV-1 integration network when five macrophage genes were turned off.


**Table 1.** Comparison of network parameter values in normal and HIV-1 infected macrophages.

Using Random network plugin by Cytoscape we found the Clustering Coefficient of the non-infected Network and simulated infected Network in comparison with those generated at random showed not statistical differences (Kruskal-Wallis test, P= 0.317). The data confirmed that the topology of both reported networks have a strong support that the simulation of our gene network is valid.

We test our hypothesis that integration HIV-1 generate disturbs in the gene expression having a global effect in cellular networks and essential biological pathways. The enriched GO terms were categorized for normal and infected macrophages networks to identify the functional cellular change by HIV-1 integration. From all the GO categories covered by the 28 macrophages genes and its interactions, we have listed the ten most significant categories of the enriched GO terms in table 2.

The Gene Ontology (GO) enrichment analysis that normal network was composed by 423 significant functional categories of a total of 1190. These individual significant categories

could be further classified into two major groups; cell function regulation and signaling of biological process. In contrast HIV-1 infected macrophage gene network was enriched with 10 significant functional categories of a total of 40. The significantly overrepresented categories indicated that this emergent new gene network was composed by genes involved in metabolic process and DNA repair process.

Systemic Approach to the Genome Integration Process of Human Lentivirus 69

AKT3 via JNK interacts with NFTA and Jun that are targets for the HIV-1 macrophage integration network and are included in the mitogen-activated protein kinase (MAPK) cascade which perform essential functions such as proliferation, survival and inflammation, apoptosis in all cell types. This pathway is associated with others that include the phosphatidylinositol signaling system, Wnt signaling pathway, ERK5 pathway, P53 signaling pathway. (61-63). According with these previous data, we propose that, when AKT3 is turned off by HIV-1 integration, the cross talk with others is disrupted leading to a signaling dysfunction of metabolic associated pathways. When AKT3 was inactive the direct interaction with MKK7 produce a disruption of JNK and after with JUN that would result in a non activation by phosphorilation of apoptotic and cell cycle process. On the other hand inactivation of the MAPK pathway in both macrophages and dendritic cells leads to inhibition of proinflammatory cytokine secretion, downregulation of co-stimulatory molecules such as CD80 and CD86, and ineffective T cell priming. The net result is an

Recently it have been reported that HIV-1 infection triggers the activation of the PI3K/Akt cell survival pathway in primary human macrophages as reflected by decreased PTEN protein expression and increased Akt kinase activity and renders these cells resistant to cytotoxic insults (54, 61, 64, 65). As result of HIV-1 integration close to AKT3, PTEN, AKT1 and 2, FOXO 1 and MDM2 that are included into the macrophage gene network, would

We can conclude that a general effect of HIV-1 integrations in macrophages DNA is to disrupt several signaling pathways that control the normal cell homeostasis. Comparison between normal and infected macrophages of top 10 GO function categories showed the dramatic change of one non-infected macrophage whose main cellular functions are devoted to maintain a cell signaling crucial functions, to one infected in which the most important function are macromolecular biosynthetic process, maintenance of fidelity during DNAdependent DNA replication, mismatch repair, age-dependent response to reactive oxygen species during chronological cell aging and oxidation reduction. As HIV infected macrophage is an abnormal reservoir in which the metabolic cascades are altered, it is possible to propose that the metabolism of macrophage adapt to perform survival functions where the apoptotic process is interrupted and a SOS metabolism make that the macrophage

*In silico* studies are based upon statistical calculations which permit the drawing of generalizations about a biological process; however since some variables could affecting the *in toto* process, in order to get a real history of Lentivirus integration it would be important to consider that there is another factors, including physiological process and cellular compartments that would be influencing the *in vivo* integration site selection. Some of these are cell-cycle phase, the transcriptional state of the cell, the topology of chromosomal DNA, cell type infected, and presence of co-helper molecules during the PIC complex conformation

impaired innate and adaptive immune response (64, 65).

expected a disruption of the apoptotic process.

**6. Conclusions** 

change of its life style

In this study we simulated at systemic level, the alterations of cellular pathways when HIV provirus integrates into genes by turning them off and produce dysregulation of several local signaling pathways. One of the target gene associated with HIV-1 integration was AKT3, also called PKB, which is a serine/threonine protein kinase family member. It is involved in a wide range of biological processes including cell proliferation, differentiation, apoptosis, stimulating cell growth, and regulating other biological responses (59, 60). Also, it have been identified playing important roles of regulation in the G2/M transition of the cell cycle.


**a**. The description of the gene ontology biological processes and the corresponding gene ontology identifiers are given. **b**. p-Value calculated as an exponential function.

**Table 2.** The top 10 of significant biological process of normal and HIV-1 infected macrophages networks a.

AKT3 via JNK interacts with NFTA and Jun that are targets for the HIV-1 macrophage integration network and are included in the mitogen-activated protein kinase (MAPK) cascade which perform essential functions such as proliferation, survival and inflammation, apoptosis in all cell types. This pathway is associated with others that include the phosphatidylinositol signaling system, Wnt signaling pathway, ERK5 pathway, P53 signaling pathway. (61-63). According with these previous data, we propose that, when AKT3 is turned off by HIV-1 integration, the cross talk with others is disrupted leading to a signaling dysfunction of metabolic associated pathways. When AKT3 was inactive the direct interaction with MKK7 produce a disruption of JNK and after with JUN that would result in a non activation by phosphorilation of apoptotic and cell cycle process. On the other hand inactivation of the MAPK pathway in both macrophages and dendritic cells leads to inhibition of proinflammatory cytokine secretion, downregulation of co-stimulatory molecules such as CD80 and CD86, and ineffective T cell priming. The net result is an impaired innate and adaptive immune response (64, 65).

Recently it have been reported that HIV-1 infection triggers the activation of the PI3K/Akt cell survival pathway in primary human macrophages as reflected by decreased PTEN protein expression and increased Akt kinase activity and renders these cells resistant to cytotoxic insults (54, 61, 64, 65). As result of HIV-1 integration close to AKT3, PTEN, AKT1 and 2, FOXO 1 and MDM2 that are included into the macrophage gene network, would expected a disruption of the apoptotic process.
