**5. Conclusions**

*Artificial Intelligence - Applications in Medicine and Biology*

heterogeneous datasets into a single framework [26].

**4. Challenges and perspectives**

disease. In fact, in case of the breast cancer disease, a genetic testing tool has been implemented [50] based on specific genetic variants in breast cancer type 1 (BRCA1) and 2 (BRCA2) susceptibility genes in chromosomes 17 (17q21.31) and 13 (13q13.1) [51], respectively. It is widely known that the outcome of a disease, in particular a complex disease, or a response to a drug is influenced by multiple genes and significant contribution from the environment. This strongly argues that using only genomic analysis will not be sufficient to entirely embed phenotypic variation and heritability, suggesting that genomic analysis alone is not sufficient to elucidate the complex structure of the disease [52]. Thus, there is a significant need of integrating information derived from environmental studies and other heterogeneous datasets

into genomic analysis to enhance the predictive power of genomic analysis. As indicated above, even though genomic information is critical, it is not sufficient to completely elucidate disease outcome and progression, which involve gene-gene and gene-environment interactions. In this context, the post-genomic analysis may provide a new paradigm to genomic analysis and may enable further functional characterization of genetic susceptibility to a disease and correlate disease-associated (candidate) genes by combining association signals from genomic analysis and available knowledge, including functional, environmental, epidemiological, and clinical information. This integrative approach increases the likelihood of effectively identifying suitable candidate genes [53] and biological pathways that may be critical in the etiology and pathogenesis of the disease, and in the drug response. The next goal is to integrate large-scale datasets from heterogeneous sources [2, 54] to move beyond a single genomic approach and foster a whole genome-based integrative approach to achieve global view [55]. A biological network, which is a network modeling a biological system as an entity composed of sub-units connected as a whole, has become a useful tool enabling the integration of

Currently, there is an exponential growth of several platforms producing large-scale datasets, including genomics, transcriptomics, proteomics, metabolomics, microbial and epidemiological data. These high-dimensional datasets from heterogeneous sources create an opportunity of designing appropriate data-driven learning algorithms and models to ensure effective post-genomic medicine and biomedical research with an increased prediction power. While the use of these large-scale post-genomic datasets from heterogeneous sources, such as transcriptomics, proteomics, metabolomics, microbial and epidemiological data, shows several potential advantages and opportunities, many challenges still exist in terms of computational models, learning algorithms, and biological interpretation of result outputs. Furthermore, as discussed previously, learning, reinforcement, and deep learning algorithms are quickly evolving with several potential applications in biology and medicine (see **Section 2.6**). Currently, predictions from different models are unable to contribute to clinical decision processes as the effectiveness of these models still poses problems in the absence of ground-truth, gold standard (benchmark) datasets, or experimental validation. This suggests that one of the future trend aspects of learning algorithms in biology and medicine will be to make possible the integration of predictive models generated by these learning algorithms into dynamic clinical settings. This integration will necessitate that issues raised above are addressed systematically and will ensure an effective exploitation of the post-genomic datasets and potentially revolutionize the study of human

**12**

disease and health.

Numerous large-scale platforms have been designed for producing different types of high-dimensional datasets, including genomics, transcriptomics, proteomics, metabolomics, microbial and epidemiological data. This data deluge provides a rich source of information, which can advance our understanding of human and pathogenic organisms to enhance post-genomic medicine and biomedical research. In this chapter, we have provided some illustrations of machine learning algorithms for knowledge discovery in biological and health areas and discussed existing challenges. This discussion highlights the need for adequate meta-analysisbased post-genomic models to optimally integrate diverse datasets from different sources. This clearly suggests that initial machine learning algorithms will need to be refined or new ones need to be developed to account for current data challenges in order to speed up the translation of the current and future knowledge into effective new treatment strategies and health measures, enabling efficient clinical disease management and ensuring effective post-genomic medicine.
