**5. Summary and discussions**

Reverse engineering GRNs is one of the major challenges in the post-genomics era of biology. In this chapter, we focused on two broad issues in GRN inference: (1) development of an analysis method that uses multiple types of data and (2) network analysis at the NM modular level. Based on the information available nowadays, we proposed a data integration approach that effectively infers the gene networks underlying certain patterns of gene co-regulation in yeast cell cycle and human Hela cell cycling. The predictive strength of this strategy is based on the combined constraints arising from multiple biological data sources including time course gene expression data, combined molecular interaction network data, and gene ontology category information.

This computational framework allows us to fully exploit the partial constraints that can be inferred from each data source. First, to reduce the inference dimensionalities, the genes were grouped into clusters by Fuzzy c-means, where the optimal fuzziness value was determined by statistical properties of gene expression data. The optimal cluster number was identified by integrating gene ontology category information. Second, the NM information established from the combined molecular interaction network was used to assign NM(s) to a given transcription factor. Once the NM(s) for a transcription factor was identified, a hybrid GA-PSO algorithm was applied to search for target gene modules that may be regulated by that particular transcription factor. This search was guided by the successful training of a NN model that mimics the regulatory NM(s) assigned to the transcription factor. The effectiveness of this method was illustrated via well-studied cell cycle dependent transcription factors (Figure 3.10 and 3.11). The upstream BINDING SITE ENRICHMENT ANALYSIS indicated that the proposed method has the potential to identify the underlying regulatory relationships between transcription factors and their downstream genes at the modular level. This demonstrates that our approach can serve as a method for analyzing multi-source data at the modular level.

Compared to the approach developed in [148], our proposed method has several advantages. First, our method performs the inference of GRNs from genome-wide expression data together with other biological knowledge. It has been shown that mRNA expression data alone cannot reflect all the activities in one GRN. Additional information will help constrain the search space of causal relationships between transcription factors and their downstream genes. Second, we decompose the GRN into well characterized functional units - NMs. Each transcription factor is assigned to specific NM(s), which is further used to infer the downstream target genes. We not only reduce the search space in the inference process, but also provide experimental biologists the regulatory modules for straightforward validation, instead of one whole GRN containing thousands of genes and connections as is often generated by IPA. Third, we group the genes into functional groups that are potentially regulated by one common transcription factor. The proposed approach reduces the noise in mRNA expression data by incorporating gene functional annotations.

In summary, we demonstrate that our method can accurately infer the underlying relationships between transcription factor and the downstream target genes by integrating multi-sources of biological data. As the first attempt to integrate many different types of data, we believe that the proposed framework will improve data analysis, particularly as more data sets become available. Our method could also be beneficial to biologists by predicting the components of the GRN in which their candidate gene is involved, followed by designing a more streamlined experiment for biological validation.

#### **6. References**

238 Reverse Engineering – Recent Advances and Applications

Reverse engineering GRNs is one of the major challenges in the post-genomics era of biology. In this chapter, we focused on two broad issues in GRN inference: (1) development of an analysis method that uses multiple types of data and (2) network analysis at the NM modular level. Based on the information available nowadays, we proposed a data integration approach that effectively infers the gene networks underlying certain patterns of gene co-regulation in yeast cell cycle and human Hela cell cycling. The predictive strength of this strategy is based on the combined constraints arising from multiple biological data sources including time course gene expression data, combined molecular interaction

This computational framework allows us to fully exploit the partial constraints that can be inferred from each data source. First, to reduce the inference dimensionalities, the genes were grouped into clusters by Fuzzy c-means, where the optimal fuzziness value was determined by statistical properties of gene expression data. The optimal cluster number was identified by integrating gene ontology category information. Second, the NM information established from the combined molecular interaction network was used to assign NM(s) to a given transcription factor. Once the NM(s) for a transcription factor was identified, a hybrid GA-PSO algorithm was applied to search for target gene modules that may be regulated by that particular transcription factor. This search was guided by the successful training of a NN model that mimics the regulatory NM(s) assigned to the transcription factor. The effectiveness of this method was illustrated via well-studied cell cycle dependent transcription factors (Figure 3.10 and 3.11). The upstream BINDING SITE ENRICHMENT ANALYSIS indicated that the proposed method has the potential to identify the underlying regulatory relationships between transcription factors and their downstream genes at the modular level. This demonstrates that our approach can serve as a method for

Compared to the approach developed in [148], our proposed method has several advantages. First, our method performs the inference of GRNs from genome-wide expression data together with other biological knowledge. It has been shown that mRNA expression data alone cannot reflect all the activities in one GRN. Additional information will help constrain the search space of causal relationships between transcription factors and their downstream genes. Second, we decompose the GRN into well characterized functional units - NMs. Each transcription factor is assigned to specific NM(s), which is further used to infer the downstream target genes. We not only reduce the search space in the inference process, but also provide experimental biologists the regulatory modules for straightforward validation, instead of one whole GRN containing thousands of genes and connections as is often generated by IPA. Third, we group the genes into functional groups that are potentially regulated by one common transcription factor. The proposed approach reduces the noise in mRNA expression data by incorporating gene functional annotations. In summary, we demonstrate that our method can accurately infer the underlying relationships between transcription factor and the downstream target genes by integrating multi-sources of biological data. As the first attempt to integrate many different types of data, we believe that the proposed framework will improve data analysis, particularly as more data sets become available. Our method could also be beneficial to biologists by

**5. Summary and discussions** 

network data, and gene ontology category information.

analyzing multi-source data at the modular level.


Reverse Engineering Gene Regulatory Networks by Integrating Multi-Source Biological Data 241

Naraghi, M. and E. Neher (1997). "Linearized buffered Ca2+ diffusion in microdomains and

Odom, D. T., N. Zizlsperger, et al. (2004). "Control of pancreas and liver gene expression by

Przulj, N., D. A. Wigle, et al. (2004). "Functional topology in a network of protein

Ren, B., H. Cam, et al. (2002). "E2F integrates cell cycle progression with DNA repair,

Ren, B., F. Robert, et al. (2000). "Genome-wide location and function of DNA binding

Ressom, H., R. Reynolds, et al. (2003). "Increasing the efficiency of fuzzy logic-based gene

Ressom, H. W., Y. Zhang, et al. (2006). "Inference of gene regulatory networks from time

Ressom, H. W., Y. Zhang, et al. (2006). Inferring network interactions using recurrent neural

Romer, K. A., G. R. Kayombya, et al. (2007). "WebMOTIFS: automated discovery, filtering

Rual, J. F., K. Venkatesan, et al. (2005). "Towards a proteome-scale map of the human

Saddic, L. A., B. Huvermann, et al. (2006). "The LEAFY target LMI1 is a meristem identity

Segal, E., B. Taskar, et al. (2001). "Rich probabilistic models for gene expression."

Shen-Orr, S. S., R. Milo, et al. (2002). "Network motifs in the transcriptional regulation

Shibutani, S. T., A. F. de la Cruz, et al. (2008). "Intrinsic negative cell cycle regulation

Shmulevich, I., E. R. Dougherty, et al. (2002). "Probabilistic Boolean Networks: a rule-based uncertainty model for gene regulatory networks." Bioinformatics 18(2): 261-274. Simon, I., J. Barnett, et al. (2001). "Serial regulation of transcriptional regulators in the yeast

Spellman, P. T., G. Sherlock, et al. (1998). "Comprehensive identification of cell cycle-

Swiers, G., R. Patient, et al. (2006). "Genetic regulatory networks programming

course gene expression data using neural networks and swarm intelligence." Proceedings of the 2006 IEEE Symposium on Computational Intelligence in

networks and swarm intelligence. Conf Proc IEEE Eng Med Biol Soc (EMBC 2006),

and scoring of DNA sequence motifs using multiple programs and Bayesian

regulator and acts together with LEAFY to regulate expression of

provided by PIP box- and Cul4Cdt2-mediated destruction of E2f1 during S phase."

regulated genes of the yeast Saccharomyces cerevisiae by microarray

hematopoietic stem cells and erythroid lineage specification." Dev Biol 294(2): 525-

replication, and G(2)/M checkpoints." Genes Dev 16(2): 245-256.

Bioinformatics and Computational Biology, Toronto, ON: 435-442.

approaches." Nucleic Acids Res 35(Web Server issue): W217-220.

protein-protein interaction network." Nature 437(7062): 1173-1178.

CAULIFLOWER." Development 133(9): 1673-1682.

network of Escherichia coli." Nat Genet 31(1): 64-68.

hybridization." Mol Biol Cell 9(12): 3273-3297.

expression data analysis." Physiol Genomics 13(2): 107-117.

HNF transcription factors." Science 303(5662): 1378-1381.

interactions." Bioinformatics 20(3): 340-348.

proteins." Science 290(5500): 2306-2309.

New York City, New York, USA.

Bioinformatics 17 Suppl 1: S243-252.

Dev Cell 15(6): 890-900.

540.

cell cycle." Cell 106(6): 697-708.

Neurosci 17(18): 6961-6973.

its implications for calculation of [Ca2+] at the mouth of a calcium channel." J


240 Reverse Engineering – Recent Advances and Applications

Hartemink, A. J., D. K. Gifford, et al. (2002). "Combining location and expression data for

He, F., R. Balling, et al. (2009). "Reverse engineering and verification of gene networks:

Holmes, I. and W. J. Bruno (2001). "Evolutionary HMMs: a Bayesian approach to multiple

Hong, R. and D. Chakravarti (2003). "The human proliferating Cell nuclear antigen regulates

Iranfar, N., D. Fuller, et al. (2006). "Transcriptional regulation of post-aggregation genes in

Ishida, S., E. Huang, et al. (2001). "Role for E2F in control of both DNA replication and

Iyer, V. R., C. E. Horak, et al. (2001). "Genomic binding sites of the yeast cell-cycle

Kennedy, J. and R. C. Eberhart (1995). "Particle swarm optimization." Proceedings of the

Lee, T. I., N. J. Rinaldi, et al. (2002). "Transcriptional regulatory networks in Saccharomyces

Mangan, S. and U. Alon (2003). "Structure and function of the feed-forward loop network

Mangan, S., A. Zaslaver, et al. (2003). "The coherent feedforward loop serves as a signsensitive delay element in transcription networks." J Mol Biol 334(2): 197-204. Maraziotis, I., A. Dragomir, et al. (2005). "Gene networks inference from expression data

Martin, P. J., V. Lardeux, et al. (2005). "The proliferating cell nuclear antigen regulates

Matys, V., O. V. Kel-Margoulis, et al. (2006). "TRANSFAC and its module TRANSCompel:

Mewes, H. W., D. Frishman, et al. (2002). "MIPS: a database for genomes and protein

Milo, R., S. Itzkovitz, et al. (2004). "Superfamilies of evolved and designed networks."

Milo, R., S. Shen-Orr, et al. (2002). "Network motifs: simple building blocks of complex

transcription factors SBF and MBF." Nature 409(6819): 533-538.

motif." Proc Natl Acad Sci U S A 100(21): 11980-11985.

interaction." Nucleic Acids Res 33(13): 4311-4321.

Mitchell, M. (1998). An introduction to genetic algorithm, MIT Press.

sequences." Nucleic Acids Res 30(1): 31-34.

networks." Science 298(5594): 824-827.

Science 303(5663): 1538-1542.

perspectives." J Biotechnol 144(3): 190-203.

alignment." Bioinformatics 17(9): 803-820.

Biol Chem 278(45): 44505-44513.

cerevisiae." Science 298(5594): 799-804.

437-449.

460-469.

4684-4699.

1942-1948.

4837.

D108 - 110.

principled discovery of genetic regulatory network models." Pac Symp Biocomput:

principles, assumptions, and limitations of present methods and future

transcriptional coactivator p300 activity and promotes transcriptional repression." J

Dictyostelium by a feed-forward loop involving GBF and LagC." Dev Biol 290(2):

mitotic functions as revealed from DNA microarray analysis." Mol Cell Biol 21(14):

1995 IEEE International Conference on Neural Networks (Perth, Australia) IV:

using a recurrent neuro-fuzzy approach." Conf Proc IEEE Eng Med Biol Soc 5: 4834-

retinoic acid receptor transcriptional activity through direct protein-protein

transcriptional gene regulation in eukaryotes." Nucleic Acids Res(34 Database):


**1. Introduction**

electrochemical reactors.

significant alteration of its "hardware."

This chapter is devoted to reverse-engineering the cause of a dramatic increase in the total oxygen uptake rate by the lung, wherein oxygen is supplied to the blood to meet the increasing energetic demands between rest and exercise. This uptake rate increases despite a much smaller increase in the oxygen partial pressure difference across the lung's exchange tissues (e.g. alveolar membranes), thought to mainly drive the oxygen-blood transfer in a similar way that electric currents are driven by voltage differences according to Ohm's law. As we explain below, a full understanding of this special property has the potential to improve various engineering processes, such as stabilizing chemical yields in heterogeneous catalysis, improving the efficiency of heat-transporters, and improving energy generation in

**Reverse-Engineering the Robustness of** 

*1Environmental Laboratory, US Army Engineer Research and Development Center,* 

*3Department of Biological Sciences, Missouri University of Science and Technology,* 

**11**

*Vicksburg, MS* 

*Rolla, MO, USA* 

**Mammalian Lungs** 

Michael Mayo1, Peter Pfeifer2 and Chen Hou3

*2Department of Physics, University of Missouri, Columbia, MO* 

To reverse-engineer the cause of this mostly pressure-independent increase in the oxygen uptake rate, we focus on the development of mathematical models based on the rate-limiting physical transport processes of i) diffusion through the airway spaces, and ii) "reaction" of the oxygen molecules across the surface of permeable membranes responsible for transferring oxygen from air to blood. Two of these mathematical models treat the terminal, or acinar, airways of mammalian lungs as hierarchical trees; another treats the entire permeable surface as fractal. By understanding how the parameters of these mathematical models restrict the overall oxygen uptake rate, we infer how the lung preserves its function when exposed to environmental hazards (e.g. smoking), damage (e.g. surgery), or disease (e.g. emphysema). The focus of our work here is to discover, or reverse engineer, the operational principles that allow mammalian lungs to match increased oxygen demands with supply without any

We first begin with a mathematical description of oxygen diffusion as the primary transport mechanism throughout the airways responsible for the oxygen uptake in the deep regions of the mammalian lungs studied here. We then discuss several different, but complementary analytical models that approach the oxygen-transport problem from different directions, while also developing a new one. Although these models are different from one another

