**Approaches to Access Biological Data Sources**

Assia Rharbi, Khadija Amine, Zohra Bakkoury, Afaf Mikou, Anass Kettani and Abdelkader Betari

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/48740

#### **1. Introduction**

148 Lipoproteins – Role in Health and Diseases

in Lancet 2009; 373: 1175–82

doi:10.1016/j.jacc.2011.12.035

Atherosclerosis 2011; 217S: S1-S44.

DOI 10.1007/s00125-008-1176-8.

update. Can Med Assoc J 2003; 169: 921-924.

[175] Taskinen MR, Barter PJ, Ehnholm C, Sullivan DR, Mann K, Simes J, Best JD, Hamwood S, Keech AC on behalf of the FIELD study investigators. Ability of traditional lipid ratios and apolipoprotein ratios to predict cardiovascular risk in people with type 2

[176] Ridker PM, Danielson E, Fonseca FAH, et al, on behalf of the JUPITER Trial Study Group. Reduction in C-reactive protein and LDL cholesterol and cardiovascular event rates after initiation of rosuvastatin: a prospective study of the JUPITER trial. Lancet 2009; published online March 29, 2009. DOI:10.1016/S0140-6736(09)60447-5. Published

[177] Mora S, Glynn RJ, Boekholdt M,. Nordestgaard BG, Kastelein JJP, Ridker PM.On-Treatment Non–High-Density Lipoprotein Cholesterol, Apolipoprotein B, Triglycerides, and Lipid Ratios in Relation to Residual Vascular Risk After Treatment With Potent Statin Therapy JUPITER (Justification for the Use of Statins in Prevention: An Intervention Trial Evaluating Rosuvastatin) J Am Coll Cardiol, 2012; 59: 1521-1528,

[178] Reiner Z, De Backer G, Graham I, Taskinen M-R, Wiklund O, Agewall S, Alegria E, Chapman MJ, Durrington P, Erdine S, Halcox J, Hobbs R, Kjekshus J, Perrone Filardi P, Riccardi G, Storey RF, Wood D.Review: ESC/EAS Guidelines for the management of dyslipidaemias The Task Force for the management of dyslipidaemias of the European Society of Cardiology (ESC) and the European Atherosclerosis Society (EAS).

[179] Genest J, Frohlich J, Fodor G, McPherson R for the Working Group on Hypercholesterolemia and other Dyslipidemias. Recommendations for the management of dyslipidemia and the prevention of cardiovascular disease: summary of the 2003

[180] Connelly PW, Poapst M, Davignon J et al. Reference values of plasma apolipoproteins A-I and B, and association with nonlipid risk factors in the populations of two

Canadian provinces: Quebec and Saskatchewan. Can J Cardiol 1999; 15: 409-418. [181] Charlton-Menys V, Betteridge DJ, Colhoun H, Fuller J, France M, Hitman GA, Livingstone SJ, Neil HAW, Newman CB, Szarek M, DeMicco DA, Durrington PN. Apolipoproteins, cardiovascular risk and statin response in type 2 diabetes: the Collaborative Atorvastatin Diabetes Study (CARDS). Diabetologia 2009; 52: 218-225.

diabetes. Diabetologia 2010; 53: 1846-1855, DOI 10.1007/s00125-010-1806-9.

In recent years, technological revolutions in genomics and proteomics have revolutionized the work of researchers in molecular biology. Through various techniques of data generation, they have at their hand in the web a very large amount of information contained in public and heterogeneous data sources. Each source has content organized around a particular data type like sequences in Uniprot (for proteins) and Genbank (for gene and mRNA), protein structure in PDB (Protein Data Bank) and publications in biomedical Medline. Their content is heterogeneous in the sense that a similar data can be represented differently in two data sources (eg different names for the same gene). More data sources have a variety in terms of structure, and there are sources of structured data, such as relational databases or semi-structured sources like XML and unstructured sources such as databases composed of flat files. That is to say that a biologist who wishes to obtain information from these sources have to question these one by one, then copy and analyze the data collected, and manage redundancy, complementarities of the information and inconsistencies. Today, one of the greatest challenges of bioinformatics is to enable biologists to effectively access multiple data sources, each with a different pattern. Various approaches have been adopted to unify access to various data sources given a query. Several systems have been produced from data warehouses, a federation of databases or mediators.

In this work, we are interested in mediation systems. Such systems offer to the user a uniform and centralized view of distributed data, this view may also reflect a more abstract, condensed, qualitative data and therefore more meaningful to the user. These mediation systems are also very useful in the presence of heterogeneous data, because they seem to use a homogeneous system.

We aim to assist biologists in their research through the development of a generic tool for the integration of heterogeneous genomic data distributed over the web, and we are placed in a very particular context that is the study of cardiovascular disease and especially familial

© 2012 Rharbi et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

hypercholesterolemia. This is a disorder of high LDL ("bad") cholesterol that is passed down through families, which means it is inherited. This disease is caused by a genetic mutation of certain lipoproteins. Indeed, these lipoproteins (called LDL) carry the 2/3 of cholesterol circulating in the blood; they deliver cholesterol to tissues by a system of recognition between Apo lipoprotein Band a receiver: the LDL receptor (lock and key system) that allows the entry of LDL and their cholesterol content in cells. When the LDL receptor (LDL-R) is weak (about one mutation), LDL accumulates in the blood and artery walls causing familial hypercholesterolemia (HF). So knowing these different mutations by biologists, can greatly facilitate the molecular screening of the disease and therefore to find the proper treatment. However, to answer such a query: "What are the mutations that cause familial hypercholesterolemia (HF)?" The biologist hasto make a fastidious search in disparate and heterogeneous databases which requires a considerable investment time.

This chapter is structured as following:

