**2.1. Survey of Pathway Databases and Integration Efforts**

8 Bioinformatics

two major advantages.

**Service –Oriented Approach** 

**Semantic Web** 

**Wiki-based Integration** 

sources and then displays the fetched data for its user base. Queries in federated databases are executed within remote data sources and results displayed in federated databases are extracted remotely from the data sources. Due to this capability, federated databasing has



A decentralized approach is also being developed, in which individual data sources agree to open their data via Web Services (WS). The service-oriented approach enables data integration from multiple heterogeneous data sources through computer interoperability. The service-oriented approach features data integration through computer-to-computer communication via Web API and up-to-date data retrieval from diverse data sources. Heterogeneous data integration requires that many data sources should become service providers by opening their data via WS and by standardizing data identities and

Most web pages in biological data sources are designed for human reading. RDF provides standard formats for data interchange and describes data as a simple statement, containing a set of triples: a subject, a predicate, and an object. Any two statements can be linked by an identical subject or object. OWL builds on RDF and Uniform Resource Identifier (URI) and describes data structure and meaning based on ontology, which enables automated data reasoning and inferences by computers. Application of semantic Web technologies is a significant advancement for bioinformatics, enabling automated data processing and reasoning. The semantic integration uses ontologies for data description and thus represents ontology-based integration. [27] reviews the current development of semantic network technologies and their applications to the integration of genomic and proteomic data. His work elaborates on applying a semantic network approach to modeling complex cell signaling pathways and simulating the cause-effect of molecular interactions in human macrophages. [31] Illustrates his approach by comparing federated approach versus

A weakness common to all the above approaches is that the quantity of users' participations in the process is inadequate. With the increasing volume of biological data, data integration inevitably will require a large number of users' participations. A successful example that harnesses collective intelligence for data aggregation and knowledge collection is Wikipedia: an online encyclopedia that allows any user to create and edit content. It is

access to up-to-date data deposited in multiple data sources.

with data access methods at diverse remote data sources.

nomenclature to ease data exchange and analysis.

warehousing versus semantic web using multiple sources.

Table 1 below shows various data integration efforts and projects for biological pathways worldwide.



Hierarchical Biological Pathway Data Integration and Mining 11







Biological networks are studied and modeled at different description levels establishing different pathway types, For example; metabolic pathways describe the conversion of metabolites by enzyme-catalyzed chemical reactions given by their stoichiometric equations, such as the main pathways of the energy household as Glycolysis or Pentose Phosphate pathway. Another pathway type is signal transduction pathways, also known as information metabolism, explaining how cells receive, process, and responds to information from the environment. A brief description about various types of pathways is given

**A. Metabolic Pathways** describe the network of enzyme-catalyzed reactions that release energy by breaking down nutrients (catabolism) and building up the essential compounds necessary for growth (anabolism). Experimentally determined metabolic pathways have established for a few model organisms, but most metabolic pathways databases contain pathway data that has been computationally inferred from the genomes annotations. Because most genome annotations are incomplete, metabolic pathway databases contain pathway holes which can only be addressed by experiment or computational inference. A good test of a reconstructed metabolic network is to ask if it can produce the set of essential compounds necessary for growth, given a known minimal nutrient set. To solve this problem, metabolism can be represented as a bipartite directed graph, where one set of nodes represents metabolites, the other set represents biochemical reactions with labeled edges used to indicate relationships between nodes (reaction X produces metabolite Y, or

class predictions.

co-regulated genes.

**2.2. Types of pathways** 

metabolite Y is-consumed-by reaction X.

below.

analysis of groups of genes based on text.

interactions, and Transcription Factor Binding Sites (TFBS).

data and the directory of cis-regulatory sequence elements.

step algorithm to cluster gene expression data.

#### **Table 1.** Various Data integration Efforts

Other efforts towards designing new applications for data mining and integration at the K.U.Leuven Center for Computational Systems Biology include;

