**2.4 Transcriptional regulation of the NAGS, CPS1, and OTC genes in the genomics era**

Advances in sequencing technology-enabled sequencing of dozens of mammalian genomes and comparisons of their sequences revealed conserved regions in non-coding regions that could function as regulatory elements [72, 73]. This strategy was used to identify NAGS promoter and enhancer [50]. Next-generation sequencing also enabled examination of the function of non-coding regions in the human and mouse genomes including their chromatin structure, and binding of transcription and chromatin remodeling proteins to generate an Encyclopedia of Non-coding DNA Elements (ENCODE). These studies were first carried out in the limited number of cultured cell lines, but are now expanding to include tissues and cultured primary cells and their results have been stored in the ENCODE database [74, 75]. In addition to these large-scale projects, many individual labs have been performing ChIP-Seq experiments and the publically available results of their experiments are being gathered in the Cistrome database [76]. The advantage of the Cistrome database is ability to compare chromatin states and track changes in binding of transcription factors in response to signaling molecules, treatments, and environmental stimuli. Data mining of the ENCODE and Cistrome databases present an opportunity to identify novel regulatory elements in the *NAGS*, *CPS1*,

**87**

*Data Mining Approaches for Understanding of Regulation of Expression of the Urea Cycle Genes*

and *OTC* genes and transcription factors that bind to the regulatory elements. Both databases were queried for chromatin modifications and binding of transcription factors to the *NAGS*, *CPS1*, and *OTC* genes and their flanking regions in the liver tissue using following coordinates of the GRCh38/hg38 human genome assembly: chr17:43,994,682-44,012,832 for the *NAGS* gene, chr2:210,499,833-210,691,279 for the *CPS1* gene, and chrX:38,334,777-38,459,529 for the *OTC* gene. The following filters were applied to experimental matrix of the ENCODE database (www. encodeproject.org): organism—*Homo sapiens*, biosample type—tissue, organ liver, project—ENCODE, genome assembly—GRCh38, assay category—ChIP-Seq, assay category—DNA binding, and target of assay—transcription factor, histone, broad and narrow histone mark, and chromatin remodeler. Results of the query were visualized using UCSC Genome Browser. Results of the ChIP-Seq experiments for the genomic region of interest can be downloaded as either wiggle or bed files using Tools and Table Browser menus of the UCSC Genome Browser. The ChIP-Seq data for each DNA binding protein and histone modification of interest can be acquired by selecting ENCODE Hub from the group menu, ENCODE ChIP-Seq from the track menu and experiment ID from the table menu of the Table Browser page. Cistrome Data Browser was used to query Cistrome database (www.cistrome. org); *Homo sapiens* was selected as species and hepatocyte as biological source. Experimental results that passed quality controls were visualized in the UCSC Genome Browser and results of experiments were obtained is the same way as for

The 5′-ends of each region were chosen based on the presence of RAD21, a component of cohesion, and CTCF binding that can indicate boundaries of chromatin domains, whereas the 3′-ends of the *NAGS* and *OTC* genomic region have been chosen to be either within or close to their downstream neighboring genes; the 3′-end of the *CPS1* genomic region was chosen to include a conserved region downstream of the last exon of the *CPS1* gene (**Figures 3A**, **4A**, and **5A**). RAD21 and CTCF bind to additional sites within *NAGS*, *CPS1*, and *OTC* genes and the role of cohesion and CTCF in expression of the three genes is yet to be determined

The H3K4me3 histone 3 modifications that mark active promoters and H3K27ac modifications that mark active enhancers are present at upstream regions of all three genes (**Figures 3B**, **4B**, and **5B**). The ENCODE database also has the DNaseI sensitivity data from the human fetal liver tissue that show open chromatin state for the *NAGS* gene and closed chromatin for the *CPS1* and *OTC* genes (**Figures 3B**, **4B**, and **5B**). This difference could be due to presence of the ubiquitously expressed

Query of the ChIP-Seq data in the ENCODE database confirmed binding of the Sp1, CREB/ATF3, HNF4A, HNF3/FOXA1, HNF3/FOXA2, and COUP-TF to the promoters and enhancers of the *NAGS*, *CPS1*, and *OTC* genes and identified binding of these and several other transcription factors to previously identified as well as novel regulatory elements (**Figures 3C**, **4C**, and **5C**). Transcription factors RXRA, COUP-TF and HNF4A, YY1 and JUND/AP1 also appear to bind *NAGS* promoter while RXRA, HNF4A, YY1, and REST bind the −3 kb *NAGS* enhancer (**Figure 3C**). The ChIP-Seq data also show binding of transcription factor to regions of the NAGS gene that could be novel regulatory elements. For example, Sp1, RXRA, and HNF4A bind to a region in the first intron of the *NAGS* gene. ChIP-Seq data also show that transcription factors bind to a region located between *NAGS* promoter and −3 kb enhancer as well as to the two regions upstream of the −3 kb enhancer; these sites could be novel regulatory elements of the *NAGS* gene (**Figure 3C**). The map of the *NAGS* genomic region shows that transcription of the *PYY* gene initiates at the *NAGS* promoter but in the opposite direction (**Figure 3D**). This is because a

*TMEM101* gene downstream of the *NAGS* gene (**Figure 3**).

*DOI: http://dx.doi.org/10.5772/intechopen.81253*

the ENCODE database.

(**Figures 3A**, **4A**, and **5A**).

*Data Mining Approaches for Understanding of Regulation of Expression of the Urea Cycle Genes DOI: http://dx.doi.org/10.5772/intechopen.81253*

and *OTC* genes and transcription factors that bind to the regulatory elements. Both databases were queried for chromatin modifications and binding of transcription factors to the *NAGS*, *CPS1*, and *OTC* genes and their flanking regions in the liver tissue using following coordinates of the GRCh38/hg38 human genome assembly: chr17:43,994,682-44,012,832 for the *NAGS* gene, chr2:210,499,833-210,691,279 for the *CPS1* gene, and chrX:38,334,777-38,459,529 for the *OTC* gene. The following filters were applied to experimental matrix of the ENCODE database (www. encodeproject.org): organism—*Homo sapiens*, biosample type—tissue, organ liver, project—ENCODE, genome assembly—GRCh38, assay category—ChIP-Seq, assay category—DNA binding, and target of assay—transcription factor, histone, broad and narrow histone mark, and chromatin remodeler. Results of the query were visualized using UCSC Genome Browser. Results of the ChIP-Seq experiments for the genomic region of interest can be downloaded as either wiggle or bed files using Tools and Table Browser menus of the UCSC Genome Browser. The ChIP-Seq data for each DNA binding protein and histone modification of interest can be acquired by selecting ENCODE Hub from the group menu, ENCODE ChIP-Seq from the track menu and experiment ID from the table menu of the Table Browser page. Cistrome Data Browser was used to query Cistrome database (www.cistrome. org); *Homo sapiens* was selected as species and hepatocyte as biological source. Experimental results that passed quality controls were visualized in the UCSC Genome Browser and results of experiments were obtained is the same way as for the ENCODE database.

The 5′-ends of each region were chosen based on the presence of RAD21, a component of cohesion, and CTCF binding that can indicate boundaries of chromatin domains, whereas the 3′-ends of the *NAGS* and *OTC* genomic region have been chosen to be either within or close to their downstream neighboring genes; the 3′-end of the *CPS1* genomic region was chosen to include a conserved region downstream of the last exon of the *CPS1* gene (**Figures 3A**, **4A**, and **5A**). RAD21 and CTCF bind to additional sites within *NAGS*, *CPS1*, and *OTC* genes and the role of cohesion and CTCF in expression of the three genes is yet to be determined (**Figures 3A**, **4A**, and **5A**).

The H3K4me3 histone 3 modifications that mark active promoters and H3K27ac modifications that mark active enhancers are present at upstream regions of all three genes (**Figures 3B**, **4B**, and **5B**). The ENCODE database also has the DNaseI sensitivity data from the human fetal liver tissue that show open chromatin state for the *NAGS* gene and closed chromatin for the *CPS1* and *OTC* genes (**Figures 3B**, **4B**, and **5B**). This difference could be due to presence of the ubiquitously expressed *TMEM101* gene downstream of the *NAGS* gene (**Figure 3**).

Query of the ChIP-Seq data in the ENCODE database confirmed binding of the Sp1, CREB/ATF3, HNF4A, HNF3/FOXA1, HNF3/FOXA2, and COUP-TF to the promoters and enhancers of the *NAGS*, *CPS1*, and *OTC* genes and identified binding of these and several other transcription factors to previously identified as well as novel regulatory elements (**Figures 3C**, **4C**, and **5C**). Transcription factors RXRA, COUP-TF and HNF4A, YY1 and JUND/AP1 also appear to bind *NAGS* promoter while RXRA, HNF4A, YY1, and REST bind the −3 kb *NAGS* enhancer (**Figure 3C**). The ChIP-Seq data also show binding of transcription factor to regions of the NAGS gene that could be novel regulatory elements. For example, Sp1, RXRA, and HNF4A bind to a region in the first intron of the *NAGS* gene. ChIP-Seq data also show that transcription factors bind to a region located between *NAGS* promoter and −3 kb enhancer as well as to the two regions upstream of the −3 kb enhancer; these sites could be novel regulatory elements of the *NAGS* gene (**Figure 3C**). The map of the *NAGS* genomic region shows that transcription of the *PYY* gene initiates at the *NAGS* promoter but in the opposite direction (**Figure 3D**). This is because a

*Gene Expression and Control*

remains to be elucidated [31].

liver-specific enhancer (LSE) [71].

**genomics era**

gene is a good model for human *CPS1*. However, the two species have different metabolic rates due to their different sizes and regulation of the CPS1 gene and urea cycle may differ in the two organisms. More recently, a region of the human *CPS1* gene that corresponds to the rat *Cps1* promoter and proximal enhancer has been shown to bind HNF3 and direct reporter gene expression in hepatoma cells [64]. Human *CPS1* gene is located on chromosome 2, band 2q34 where it spans approx.

The human *OTC* gene is 70 kb long and has 10 exons which contain a 1062 bp long coding sequence [65]. Transcription of the human *OTC* gene initiates at multiple transcription start sites (TSS) [66], while transcription of the mouse and rat *Otc* genes initiate at a single transcription start site located 136 and 98 bp upstream of the translation initiation codon [67, 68]. Within the rat *Otc* promoter, four regions, A–D, bind transcription factors that regulate expression of the *Otc* gene [31]. Region A is a negative regulator of *Otc* transcription [31], and transcription factors that bind to this region have not been identified. Regions B and C bind transcriptional activator HNF4, and transcriptional repressor chicken ovalbumin upstream promoter—transcription factor (COUP-TF) [30, 31]. Region D is located downstream of the transcription start site and its role in expression of the *Otc* gene

The rat *Otc* promoter is sufficient for expression of transgenes in the liver and intestine of transgenic mice [69, 70]. An enhancer located approximately 11 kb upstream of the first exon of the rat *Otc* gene is responsible for a high level of expression of the *Otc* gene in the liver [31]. This −11 kb enhancer has four transcription factor-binding sites, designated I–IV [31]; sites I and II bind C/EBPβ, while transcription factor HNF4 binds to sites I and IV in the rat *Otc* enhancer to activate expression of the *Otc* gene [32, 37, 38]. Since comparative genomics studies revealed that the distance between regions that correspond to the −11 kb rat *Otc* enhancer and OTC promoter vary in mammalian genomes, this region was renamed as the

**2.4 Transcriptional regulation of the NAGS, CPS1, and OTC genes in the** 

Advances in sequencing technology-enabled sequencing of dozens of mammalian genomes and comparisons of their sequences revealed conserved regions in non-coding regions that could function as regulatory elements [72, 73]. This strategy was used to identify NAGS promoter and enhancer [50]. Next-generation sequencing also enabled examination of the function of non-coding regions in the human and mouse genomes including their chromatin structure, and binding of transcription and chromatin remodeling proteins to generate an Encyclopedia of Non-coding DNA Elements (ENCODE). These studies were first carried out in the limited number of cultured cell lines, but are now expanding to include tissues and cultured primary cells and their results have been stored in the ENCODE database [74, 75]. In addition to these large-scale projects, many individual labs have been performing ChIP-Seq experiments and the publically available results of their experiments are being gathered in the Cistrome database [76]. The advantage of the Cistrome database is ability to compare chromatin states and track changes in binding of transcription factors in response to signaling molecules, treatments, and environmental stimuli. Data mining of the ENCODE and Cistrome databases present an opportunity to identify novel regulatory elements in the *NAGS*, *CPS1*,

125 kb and has 38 exons that encode a 1500 amino acids long protein.

**2.3 Transcriptional regulation of mammalian OTC gene**

**86**

#### **Figure 3.**

*Overview of the NAGS genomic region. (A) Binding sites for chromatin remodeling proteins. (B) Epigenetic marks. (C) Transcription factor binding sites. (D) Map and sequence conservation of the NAGS genomic region.*

*PYY* cDNA (GenBank ID BC041057.1) has been isolated from a brain astrocytoma sample and sequenced [77]. This *PYY* transcript may have resulted from aberrant expression of the *PYY* gene in the brain astrocytoma cells since *PYY* is not expressed in the brain according to the Human Protein Atlas [78] and the GTEx track of the UCSC Genome Browser [79].

Transcription factors HNF3/FOXA1, HNF3/FOXA2, and CREB bind to the human *CPS1* promoter and the region upstream of the human *CPS1* that corresponds to the rat *Cps1* distal enhancer as well as additional sites located within the first intron of the *CPS1* gene and upstream of the distal enhancer (**Figure 4C**). Moreover, HNF4A, RXRA, SP1, YY1, JUND/AP2, and REST also bind to the *CPS1*

**89**

**Figure 4.**

Browser (**Figures 3D**, **4D**, and **5D**).

*Data Mining Approaches for Understanding of Regulation of Expression of the Urea Cycle Genes*

upstream region and first intron (**Figure 4C**). It is possible that yet to be identified proteins P1, P2, and P3 that bind to the rat *Cps1* distal enhancer are among these transcription factors. Likewise, the HNF4A and COUP-TF transcription factors that are known to bind to the *OTC* promoter and LSE also bind to sites upstream of the LSE and within first *OTC* intron, and additional transcription factors bind to these regions (**Figure 5C**). The novel regulatory elements that have been identified by the binding of transcription factors coincide with regions that are conserved in vertebrates as indicated by the phyloP and phastCons tracks of the UCSC Genome

*Overview of the CPS1 genomic region. (A) Binding sites for chromatin remodeling proteins. (B) Epigenetic marks. (C) Transcription factor binding sites. (D) Map and sequence conservation of the CPS1 genomic region.*

These data mining efforts identified a common set of transcription factors that bind to the regulatory regions of the *NAGS*, *CPS1*, and *OTC* genes in the liver and may be responsible for the coordinated changes in their expression in response to dietary protein intake and hormonal signaling. The knowledge of transcription

*DOI: http://dx.doi.org/10.5772/intechopen.81253*

*Data Mining Approaches for Understanding of Regulation of Expression of the Urea Cycle Genes DOI: http://dx.doi.org/10.5772/intechopen.81253*

#### **Figure 4.**

*Gene Expression and Control*

**88**

**Figure 3.**

*region.*

UCSC Genome Browser [79].

*PYY* cDNA (GenBank ID BC041057.1) has been isolated from a brain astrocytoma sample and sequenced [77]. This *PYY* transcript may have resulted from aberrant expression of the *PYY* gene in the brain astrocytoma cells since *PYY* is not expressed in the brain according to the Human Protein Atlas [78] and the GTEx track of the

*Overview of the NAGS genomic region. (A) Binding sites for chromatin remodeling proteins. (B) Epigenetic marks. (C) Transcription factor binding sites. (D) Map and sequence conservation of the NAGS genomic* 

Transcription factors HNF3/FOXA1, HNF3/FOXA2, and CREB bind to the human *CPS1* promoter and the region upstream of the human *CPS1* that corresponds to the rat *Cps1* distal enhancer as well as additional sites located within the first intron of the *CPS1* gene and upstream of the distal enhancer (**Figure 4C**). Moreover, HNF4A, RXRA, SP1, YY1, JUND/AP2, and REST also bind to the *CPS1*

*Overview of the CPS1 genomic region. (A) Binding sites for chromatin remodeling proteins. (B) Epigenetic marks. (C) Transcription factor binding sites. (D) Map and sequence conservation of the CPS1 genomic region.*

upstream region and first intron (**Figure 4C**). It is possible that yet to be identified proteins P1, P2, and P3 that bind to the rat *Cps1* distal enhancer are among these transcription factors. Likewise, the HNF4A and COUP-TF transcription factors that are known to bind to the *OTC* promoter and LSE also bind to sites upstream of the LSE and within first *OTC* intron, and additional transcription factors bind to these regions (**Figure 5C**). The novel regulatory elements that have been identified by the binding of transcription factors coincide with regions that are conserved in vertebrates as indicated by the phyloP and phastCons tracks of the UCSC Genome Browser (**Figures 3D**, **4D**, and **5D**).

These data mining efforts identified a common set of transcription factors that bind to the regulatory regions of the *NAGS*, *CPS1*, and *OTC* genes in the liver and may be responsible for the coordinated changes in their expression in response to dietary protein intake and hormonal signaling. The knowledge of transcription

#### **Figure 5.**

*Overview of the OTC genomic region. (A) Binding sites for chromatin remodeling proteins. (B) Epigenetic marks. (C) Transcription factor binding sites. (D) Map and sequence conservation of the OTC genomic region.*

factors that regulate expression of urea cycle genes could provide a clue about amino acid(s) and metabolites that act as sensors of the dietary protein intake. Data mining of the ENCODE and Cistrome databases presented here revealed that transcription factor RXRA binds to regulatory elements of *NAGS*, *CPS1*, and *OTC* genes. The RXRA transcription factor regulates gene expression by forming heterodimers with several transcription factors including peroxisome proliferatoractivated receptor gamma (PPARγ), which regulated glucose metabolism [80]. If future studies show that PPARγ-RXRA heterodimer regulates expression of the *NAGS*, *CPS1*, and *OTC* genes, that would suggest that glucose, rather than amino acid(s), might be a sensor of the balance of protein and carbohydrate intake that regulates expression of the urea cycle genes. Although similar data sets are not yet available for the human small intestine cells, the ENCODE project is ongoing and future data mining efforts will provide a more complete information about transcriptional regulation of the *NAGS*, *CPS1*, and *OTC* genes. Similarly, the Cistrome

**91**

provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

Children's National Medical Center, Center for Genetic Medicine Research, The

George Washington University, Institute for Biomedical Sciences,

\*Address all correspondence to: lcaldovic@childrensnational.org

*Data Mining Approaches for Understanding of Regulation of Expression of the Urea Cycle Genes*

database is growing and queries of its data may reveal molecular mechanisms of *NAGS*, *CPS1*, and *OTC* regulation through differential binding of transcription factors to their regulatory elements. The utility of the data mining approach goes beyond understanding of transcriptional regulation of genes. This approach can be used to explain deleterious effects of sequence variants on expression of genes that are associated with human diseases and identify drug targets for treatment of diseases that can benefit from increased expression of hypomorphic alleles. In the case of *NAGS*, *CPS1*, or *OTC* deficiencies, which have high plasma ammonia, or hyperammonemia, as a common symptom, partial defects in any of the three genes result in the decreased activity or abundance of the corresponding enzyme and decreased capacity for ureagenesis. Protein-restricted diet, which minimizes ammonia production, is standard therapy for patients with partial defects of the *NAGS*, *CPS1*, *OTC*, and other urea cycle genes that were not discussed in this chapter. However, protein-restricted diet also leads to decreased expression of urea cycle genes, including the defective one, leading to further decrease patient's capacity for ureagenesis and increased risk of hyperammonemia. A drug therapy that is based on transcriptional regulation of the *NAGS*, *CPS1*, and *OTC* genes might be able to increase their expression even when patients are on the protein-restricted diet and

decrease patients' risk of hyperammonemia-induced brain damage.

The author has no conflicts of interest to declare.

This work was supported by public health service grants K01DK076846 and R01DK064913 from the National Institute of Diabetes Digestive and Kidney Diseases, National Institutes of Health, Department of Health and Human Services.

*DOI: http://dx.doi.org/10.5772/intechopen.81253*

**Acknowledgements**

**Conflict of interest**

**Author details**

Ljubica Caldovic

Washington, DC, USA

*Data Mining Approaches for Understanding of Regulation of Expression of the Urea Cycle Genes DOI: http://dx.doi.org/10.5772/intechopen.81253*

database is growing and queries of its data may reveal molecular mechanisms of *NAGS*, *CPS1*, and *OTC* regulation through differential binding of transcription factors to their regulatory elements. The utility of the data mining approach goes beyond understanding of transcriptional regulation of genes. This approach can be used to explain deleterious effects of sequence variants on expression of genes that are associated with human diseases and identify drug targets for treatment of diseases that can benefit from increased expression of hypomorphic alleles. In the case of *NAGS*, *CPS1*, or *OTC* deficiencies, which have high plasma ammonia, or hyperammonemia, as a common symptom, partial defects in any of the three genes result in the decreased activity or abundance of the corresponding enzyme and decreased capacity for ureagenesis. Protein-restricted diet, which minimizes ammonia production, is standard therapy for patients with partial defects of the *NAGS*, *CPS1*, *OTC*, and other urea cycle genes that were not discussed in this chapter. However, protein-restricted diet also leads to decreased expression of urea cycle genes, including the defective one, leading to further decrease patient's capacity for ureagenesis and increased risk of hyperammonemia. A drug therapy that is based on transcriptional regulation of the *NAGS*, *CPS1*, and *OTC* genes might be able to increase their expression even when patients are on the protein-restricted diet and decrease patients' risk of hyperammonemia-induced brain damage.
