**2.2 Transcription factor binding site analysis**

A systematic search of transcription factor binding sites in the list of bidirectional promoters was used to assess regulatory connections at the DNA level, and revealed several in common (using a motif finding algorithm we searched for the motifs reported in (Xie et al. 2005)). Notably, identical *ELK1* binding sites were located at the same distance from *ERBB2*, *FANCD2*, and *BRCA2* transcription start sites (Yang et al. 2007). *ETS* factor binding sites were present as a trio with SP1 and *PAX4/RXR* binding sites in the majority of the promoters. The transcription factors for which binding motifs were found in all of the promoters along with their descriptions from GeneCards (Safran et al. 2010) are reported in Table 2.

Shared Regulatory Motifs in Promoters of Human DNA Repair Genes 71

The research reported in (Yang et al. 2007) provides strong evidence that a unique set of regulatory proteins control genes that contain bidirectional promoters by comparing coexpression clusters of genes enriched for bidirectional promoters versus those depleted for bidirectional promoters. This section reports on a study that identified transcription factor binding sites that are specific to genes in DNA repair pathways (Lichtenberg et al. 2009). The promoters of genes from the DNA repair pathways were partitioned into two groups, those that are bidirectional (32 promoters) and those that are unidirectional (42

Each group of promoters was analyzed to discover putative transcription factor binding sites. The analysis was performed with WordSeeker motif discovery software (Lichtenberg et al. 2010), which employs high performance supercomputer-based algorithms to perform motif enumeration and to construct Markov models. Our analysis revealed that the average nucleotide G+C content of the bidirectional promoters was slightly higher than the unidirectional promoters, 59.87% versus 50.84%, respectively. These differences were rigorously controlled by the use of the Markov model, which examines background frequencies of each nucleotide in the collection of sequences. Unique sets of binding sites were identified for each group, some of which represent

A statistical analysis of the promoters of the DNA repair genes revealed a number of significant DNA binding site motifs. Some of the discovered motifs correspond to recognition sequences of known proteins. These are listed in Table 3, along with their *p*values and the corresponding transcription factors known to bind to the motifs (as determined by the TRANSFAC database (Wingender et al. 2000) and the JASPAR database (Bryne et al. 2008)). In addition, novel motifs, representing uncharacterized transcription factor binding sites, were discovered in the bidirectional and unidirectional promoters from

> **Motif (unidirectional promoters)**

*<sup>p</sup>***-Value Transcription** 

**Factor** 

DNA repair pathway genes (see Table 4 for the motifs and their *p*-values).

**Factor** 

Table 3. Enriched motifs matching characterized transcription factor binding sites discovered in the bidirectional promoters (columns 1 and 2) and in the unidirectional

AGGGCCGT 0.04142 *MYB* ACCCGCCT 0.00656 *SP1*  CAGGGGCC 0.02841 *V\$WT1\_Q6* AGGAAACA 0.03295 *NFAT*  CGTGGGGG 0.04701 *E2F* ATTAAAAT 0.05372 *OCT1*  GGCCCGCC 0.06682 *SP1* CGGAAACC 0.04210 *AREB6*  TCCCGGCT 0.05408 *ELK1* GCAGGGCG 0.07134 *PF0096*  TCCCGGGA 0.06861 *STAT5A* GGGGAGTA 0.03321 *FOXC1*  TCGCGCCA 0.01539 *PF0112* GGGGCTGC 0.06212 *LRF*  TCTGAGGA 0.01350 *TFIIA* TGGGCGGA 0.06334 *GC* 

*<sup>p</sup>***-Value Transcription**

**3. Unbiased assessment of transcription factor binding sites in two** 

**subgroups of genes from DNA repair pathways** 

promoters).

novel binding sites.

**Motif (bidirectional promoters)** 

promoters (columns 3 and 4).

**3.1 Assessment of individual sites** 

Fig. 1. Co-expression clustering analysis of 10 DNA repair genes finds intersecting nodes.


Table 2. Transcription factor binding sites in the promoters of the B/O cancer genes.
