**4. Challenges and perspectives**

Currently, there is an exponential growth of several platforms producing large-scale datasets, including genomics, transcriptomics, proteomics, metabolomics, microbial and epidemiological data. These high-dimensional datasets from heterogeneous sources create an opportunity of designing appropriate data-driven learning algorithms and models to ensure effective post-genomic medicine and biomedical research with an increased prediction power. While the use of these large-scale post-genomic datasets from heterogeneous sources, such as transcriptomics, proteomics, metabolomics, microbial and epidemiological data, shows several potential advantages and opportunities, many challenges still exist in terms of computational models, learning algorithms, and biological interpretation of result outputs. Furthermore, as discussed previously, learning, reinforcement, and deep learning algorithms are quickly evolving with several potential applications in biology and medicine (see **Section 2.6**). Currently, predictions from different models are unable to contribute to clinical decision processes as the effectiveness of these models still poses problems in the absence of ground-truth, gold standard (benchmark) datasets, or experimental validation. This suggests that one of the future trend aspects of learning algorithms in biology and medicine will be to make possible the integration of predictive models generated by these learning algorithms into dynamic clinical settings. This integration will necessitate that issues raised above are addressed systematically and will ensure an effective exploitation of the post-genomic datasets and potentially revolutionize the study of human disease and health.

**13**

*Designing Data-Driven Learning Algorithms: A Necessity to Ensure Effective Post-Genomic…*

Numerous large-scale platforms have been designed for producing different types of high-dimensional datasets, including genomics, transcriptomics, proteomics, metabolomics, microbial and epidemiological data. This data deluge provides a rich source of information, which can advance our understanding of human and pathogenic organisms to enhance post-genomic medicine and biomedical research. In this chapter, we have provided some illustrations of machine learning algorithms for knowledge discovery in biological and health areas and discussed existing challenges. This discussion highlights the need for adequate meta-analysisbased post-genomic models to optimally integrate diverse datasets from different sources. This clearly suggests that initial machine learning algorithms will need to be refined or new ones need to be developed to account for current data challenges in order to speed up the translation of the current and future knowledge into effective new treatment strategies and health measures, enabling efficient clinical

disease management and ensuring effective post-genomic medicine.

The authors declare that they have no competing interests.

Machine intelligence and deep learning models present more powerful computational techniques that are able to effectively learn from large complex datasets in order to reveal several hidden interactions within cell variables and give more insight into the intricate processes linked to diseases [56]. On the other hand, despite the current undoubted data wealth, we still have a very limited understanding of the mechanisms underlying the outcome, pathogenesis, and progress of many diseases, which is reflected in an existing gap between this data wealth and translation toward enhancing treatment and interventions for diseases, leading to the paradigm of "world with data wealth and information poor". This is partly due to issues related to different existing datasets, including: (1) increased heterogeneity within a dataset as, in general, these datasets are collected across different locations, thus lacking a standardized representation of the data and (2) variation of cohorts in terms of size across populations and geographical locations. This highlights the need for designing adequate meta-analysis models to assist in retrieving useful information within each data source. This may also require more advanced machine learning techniques to play an important role in genomic medicine and advance our

*DOI: http://dx.doi.org/10.5772/intechopen.84148*

knowledge about disease and health.

**5. Conclusions**

**Conflict of interest**

*Designing Data-Driven Learning Algorithms: A Necessity to Ensure Effective Post-Genomic… DOI: http://dx.doi.org/10.5772/intechopen.84148*

Machine intelligence and deep learning models present more powerful computational techniques that are able to effectively learn from large complex datasets in order to reveal several hidden interactions within cell variables and give more insight into the intricate processes linked to diseases [56]. On the other hand, despite the current undoubted data wealth, we still have a very limited understanding of the mechanisms underlying the outcome, pathogenesis, and progress of many diseases, which is reflected in an existing gap between this data wealth and translation toward enhancing treatment and interventions for diseases, leading to the paradigm of "world with data wealth and information poor". This is partly due to issues related to different existing datasets, including: (1) increased heterogeneity within a dataset as, in general, these datasets are collected across different locations, thus lacking a standardized representation of the data and (2) variation of cohorts in terms of size across populations and geographical locations. This highlights the need for designing adequate meta-analysis models to assist in retrieving useful information within each data source. This may also require more advanced machine learning techniques to play an important role in genomic medicine and advance our knowledge about disease and health.
