**5. Visualization and interpretation of data**

Shared words among the checkpoint factor genes suggested the presence of regulatory networks. We assessed the relationships by generating network depictions in the form of interaction networks (Figure 2) and a circos diagram (Figure 3) constructed from the summary data in Table 9. To derive Figure 2, a metric MDS was conducted on the affiliation network defined in Table 9. The resulting graph was then spring-embedded, with node repulsion, to facilitate visualization (Borgatti, 2002). The interaction network depicts the distribution of the DNA words among the genes (note that each gene appears once, representing all alternative promoters as a single node). Genes are denoted by blue squares and words are represented with red circles. Bold lines indicate multiple occurrences of a word. Reverse complement words are shown independently.

The circos diagram represents the information in a closed circular space, wherein connections between words on one side of the diagram extend to genes on the other side. The putative nodes of the regulatory networks are defined by multiple edges, representing a characterized transcription factor or a novel DNA binding site, or a checkpoint factor gene.

Some of the discovered words correspond to known binding sites for transcription factors, reported in the JASPAR and TRANSFAC databases of transcription factors (see Table 10). The relationships between the top fifteen words and the transcription factors are depicted in the circos diagram in Figure 4. Note that multiple binding site motifs were discovered for many of the transcription factors, and that several of the sites match the binding patterns of more than one transcription factor.

Shared Regulatory Motifs in Promoters of Human DNA Repair Genes 81

row or column. Ring 3: Represents

Fig. 3. Circos 2diagram of the top 15 words, based on statistical significance, and their

**TFBS** *S·***ln(***S***/***Es***) TF** ACCCCCAC 3.76 *PF0091, Pax-4* ACTCCCTA 4.67 *Helios A, p300* ATGGCTGT 5.42 *Cap* ATTAAAGA 3.72 *Pax-2* CGGAGCCC 3.95 *LF-A1* CTGAAATT 3.80 *STAT1, STAT6* CTTTTGAA 3.83 *TCF-4* GAAAAATT 3.76 *CIZ*

GCACCTGC 3.68 *PF0035, AP-4, cap, Lmo2 complex*

Table 10. Known transcription factor binding sites (with significance scores and corresponding

Additional insight into the regulatory network for the checkpoint factors can be seen in Figure 5, which replaces the DNA binding site motifs with the names of implicated transcription factors for each DNA repair gene. The diagram indicates the discovery of specific transcription factors involved in the control of each gene and shared among

TACTTTTT 3.82 *FOXC, CIZ, RUSH-1alpha1* TATATTTA 3.82 *FOXL1, PF0028, PF0054*

GTGGCTGC 3.64 *cap*

TCCTTTCT 3.70 *Pax-2* TTTTTATA 3.64 *FOXL1*

transcription factor) discovered in the promoters of the checkpoint factors genes.

multiple genes. Up to seven transcription factors were discovered for each gene.

relative contributions of cells to row and column totals. Each color represents one cell; percentage is the proportion of a cell's value to the row or column sum.

Edge: Represents a cell value by using an edge to connect row and column entries. The width is proportional to the cell value.

Inner ring: Each edge is colored to correspond with a row item; each edge end is colored to correspond with a column item.

Ring 2: Represents a row or a column; width is proportional to sum of cell values in the corresponding

occurrences in gene promoter regions.

Table 9. The top ranked words (rows of the table), based on statistical significance (*S·*ln(*S*/ *Es*)), and the number of occurrences of each word in the promoter regions of genes (columns).

Fig. 2. Model of the checkpoint regulatory network using multidimensional scaling.

Table 9. The top ranked words (rows of the table), based on statistical significance (*S·*ln(*S*/

*Es*)), and the number of occurrences of each word in the promoter regions of genes

Fig. 2. Model of the checkpoint regulatory network using multidimensional scaling.

(columns).

Fig. 3. Circos 2diagram of the top 15 words, based on statistical significance, and their occurrences in gene promoter regions.


Table 10. Known transcription factor binding sites (with significance scores and corresponding transcription factor) discovered in the promoters of the checkpoint factors genes.

Additional insight into the regulatory network for the checkpoint factors can be seen in Figure 5, which replaces the DNA binding site motifs with the names of implicated transcription factors for each DNA repair gene. The diagram indicates the discovery of specific transcription factors involved in the control of each gene and shared among multiple genes. Up to seven transcription factors were discovered for each gene.

Shared Regulatory Motifs in Promoters of Human DNA Repair Genes 83

This work was supported by the Intramural Research Program of NHGRI (LE and LK) and by the Ohio Plant Biotechnology Consortium, the Choose Ohio First Program of the University System of Ohio, and the Ohio University Graduate Research and Education Board (LW).

Bellizzi, A.M., & Frankel, W.L. Colorectal cancer due to deficiency in DNA mismatch repair

Berwick, M. & Vineis, P. Markers of DNA repair and susceptibility to cancer in humans: an

Borgatti, S.P., Everet, M.G., and Freeman, L.C. 2002. Ucinet 6 for Windows: Software for

Bryne, J.C., & Valen, E., Tang, M.H., Marstrand, T., & Winther, O. JASPAR, the open access

Elnitski, L., Lichtenberg, J., & Welch, L.R.Regulatory network nodes of checkpoint factors in

Helleday, T., Petermann, E., Lundin, C., Hodgson B., & Sharma, R.A. DNA repair pathways as targets for cancer therapy, *Nature Reviews Cancer* (2008) 8 (3): 193-204.

database of transcription factor-binding profiles: new content and tools in the 2008

DNA repair pathways. *BCB '10: Proceedings of the First ACM International Conference* 

function: a review. *Adv Anat Pathol*. 2009 Nov;16(6):405-17.

Social Network Analysis. Harvard: Analytic Technologies.

*on Bioinformatics and Computational Biology,* 529-536. 2010.

update. *Nucleic Acids Res* 2008, 36:D102-106.

epidemiologic review. *J Natl Cancer Inst*. 2000 Jun 7;92(11):874-97.

Fig. 5. Relationships between genes and transcription factors.

**7. Acknowledgments** 

**8. References** 

Fig. 4. Circos diagram showing the top 15 DNA motifs found in promoters of checkpoint factor genes and their related transcription factors (number of occurrences are multiplied by 100).
