Bioinformatics Applications to Identify Molecular Mechanisms, Pathways and Biomarkers in Cancers

#### **Chapter 1**

## Introductory Chapter: Application of Bioinformatics Tools in Cancer Prevention, Screening, and Diagnosis

*Ghedira Kais and Yosr Hamdi*

#### **1. Introduction**

Cancer is a leading cause of death worldwide, with nearly 10 million deaths in 2020, accounting for one in six deaths. Breast, lung, colon rectum, and prostate are considered the most common cancer types [1]. Around one-third of deaths from cancer are due to environmental factors and lifestyle habits, such as tobacco use, high body mass index, alcohol consumption, low fruit and vegetable intake, and lack of physical activity [2]. In addition, 10% of cancer cases are due to genetic factors and around 10% of cancer-causing infections, such as human papillomavirus (HPV) and hepatitis, are responsible for approximately 30% of cancer cases in low- and lower-middle-income countries [3]. Indeed, HPV infection is the main cause of cervical cancer, cancer that can be cured if detected early and treated effectively [4]. The multifactorial character of the disease with the huge amount of data that has been generated during the last decades covering all risk factors behind cancer disease allowed bioinformatics to play an essential role in Cancer research and made oncology a success story in translating and using OMICs data, including genomics, transcriptomics and proteomics data, in clinical settings [5].

#### **2. Use of bioinformatics integrative approaches in oncology**

Numerous research groups worldwide have attempted to develop strategies to identify novel diagnostic and prognostic markers for different cancer types based on computational integrative analyzes and tools. One of the most powerful computational approaches is meta-analysis, where multiple studies interrogating a common hypothesis are analyzed together [6]. Several studies have applied meta-analysis methods to cancer microarray data in order to identify differentially expressed genes (DEGs) between cancer patients and controls. These methods can be applied to identify robust gene-expression signatures in a single cancer type and/or to look for common expression patterns across different types of cancer. In 2004, Rhodes and co-workers investigated and analyzed 40 published cancer microarray data sets, comprising 38 million gene expression measurements from >3700 cancer samples [7]. With the advent of high throughput sequencing technology, known as NGS,

RNA sequencing (RNASeq) has been used in several aspects of cancer research and therapy including the discovery of biomarkers, the characterization of cancer heterogeneity and evolution, cancer immunotherapy, and the investigation of drug resistance [8]. High throughput sequencing technology has the advantage of fastspeed sequencing at low cost and with high accuracy compared to the former Sanger technology. Compared to microarray, RNASeq can also detect unknown gene expression sequences [9]. Gene expression profiling often generates large gene-expression signatures that need to be functionally analyzed to identify a handful of genes of interest that are selected for experimental validation. Several methods have been developed allowing systematic functional analysis of gene expression signatures including Gene Ontology (GO) [10, 11], KEGG [12], TransPath [13], and GenMAPP [14]. Finally, to better understand complex biological processes, such as cancer initiation and progression, it is important to consider the integration of transcriptomic data in the context of complex molecular networks. This implies the mapping of interactomes involving protein-protein interaction with the gene expression signature to identify induced or repressed interactome subnetworks on the basis of known and predicted protein-protein interactions [15].

#### **3. Data science in oncology**

In the past decade, Artificial intelligence (AI), particularly, machine learning (ML) has grown rapidly in the context of data analysis and computing allowing applications and platforms to function in an intelligent manner (https://pubmed.ncbi.nlm. nih.gov/34278328/). ML is a field that refers to a broad range of learning algorithms that perform intelligent predictions based on learning from a subset of data [16]. AI has recently altered the landscape of cancer research and medical oncology using traditional ML algorithms and cutting-edge Deep Learning (DL) approaches [17]. Indeed, ML algorithms including Random Forest (RF), Gradient Boosting Machine (GBM), and Neural Network (NN) have been used to optimize cancer classification [18]. Furthermore, DL-based algorithms have been widely applied in medical imaging to accurately diagnose breast cancer [19], colorectal cancer [20], lung cancer [21], and others [22]. Moreover, AI systems have been developed and used to diagnose early gastric cancer (EGC) from 4667 magnifying image-enhanced endoscopy images, including 1950 EGC images from 1042 cases and 2717 noncancerous images from 769 cases [23].

#### **4. Tools and databases**

Several publicly accessible databases containing cancer related data, and integrating tools for delivering and analyzing information and data, as well as specialized databases dedicated to specific types of cancer, have been developed during the last decades. Most commonly used and prominent ones include the International Cancer Genome Consortium (ICGC) [24] and The Cancer Genome Atlas (TCGA) [25]. A detailed list of publicly available databases and their descriptions has been reported by Pavlopoulou and co-workers [26]. Recently, a novel database integrating RNA-seq, DNA methylation, and related clinical data from over 10,000 cancer patients in the TCGA study as well as from normal tissues in the GTEx study has been developed and made freely available through [27, 28]. Concerning bioinformatics

*Introductory Chapter: Application of Bioinformatics Tools in Cancer Prevention, Screening… DOI: http://dx.doi.org/10.5772/intechopen.104794*

and computational tools for cancer risk prediction, numerous resources have been developed including the International Breast Cancer Intervention Study (IBIS) [29], the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) [30], the BRCAPRO [31] and the Breast Cancer Surveillance Consortium (BCSC) risk model [32]. A comprehensive list of web tools and web servers for cancer genomic study and cancer prognosis analysis has been provided by Yang and coworkers [33] and Zheng and colleagues [34].

#### **5. Precision oncology application**

Molecular and genetic profiling of tumors play an increasingly important role not only in cancer research but also in the clinical management of cancer patients [35]. Multi-omics approaches hold the promise of improving diagnostics, prognostics, and personalized treatment using highly reproducible and robust bioinformatics methods of complex data management and integration to go from the primary analysis of raw molecular profiling data to the automatic generation of a clinical report and its delivery to decision-making clinical oncologists [36]. The initial results coming out from these efforts are promising, but it has also become explicit that the exploitation of the full potential of precision oncology faces many challenges. One major bottleneck resides in the efficient and precise annotation of variants [37]. This challenge requires the use of databases containing well-curated variants as well as their interactions with potential drugs. The second challenge is the rapid development of molecular profiling techniques coming with novel challenges in terms of the development of new bioinformatics tools, pipelines, and workflows adapted to each of these new techniques [38]. Moreover, multi-omics approaches are providing more insights into dysregulated pathways, increasing the level of confidence in reporting actionable variants when they can be confirmed by RNA, protein, or epigenetic profiling. However, the availability of diverse multi-omics data is currently posing new bioinformatics challenges to integrate multiple data sets and identifying potentially efficient treatments [39]. Finally, interpreting the clinical significance of genomic variants and transcriptional changes is a laborious task that cannot be fully automated in a reliable way and therefore needs a multidisciplinary team to apply clinical interpretation to select relevant variants and to recommend targeted, personalized therapies [40]. That being said, bioinformatics still holds the hope to make the intersection of cancer research and medical applications for better clinical management of patients.

*Cancer Bioinformatics*

#### **Author details**

Ghedira Kais1 \* and Yosr Hamdi2 \*

1 Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia

2 Laboratory of Biomedical Genomics and Oncogenetics, Institut Pasteur de Tunis, Université Tunis El Manar, Tunis, Tunisia

\*Address all correspondence to: ghedirakais@gmail.com and yosr.hamdi@pasteur.utm.tn

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Introductory Chapter: Application of Bioinformatics Tools in Cancer Prevention, Screening… DOI: http://dx.doi.org/10.5772/intechopen.104794*

#### **References**

[1] Ferlay J, Ervik M, Lam F, Colombet M, Mery L, Piñeros M, et al. Global Cancer Observatory: Cancer Today. Lyon: International Agency for Research on Cancer; 2020

[2] Cancer Prevention Overview (PDQ®)–Patient Version was originally published by the National Cancer Institute

[3] de Martel C, Georges D, Bray F, Ferlay J, Clifford GM. Global burden of cancer attributable to infections in 2018: A worldwide incidence analysis. The Lancet Global Health. 2020;**8**(2): e180-e190

[4] Burd EM. Human papillomavirus and cervical cancer. Clinical Microbiology Reviews. 2003;**16**(1):1-17. DOI: 10.1128/ CMR.16.1.1-17.2003

[5] Brenner C. Applications of bioinformatics in Cancer. Cancers (Basel). 2019;**11**(11):1630. DOI: 10.3390/ cancers11111630

[6] Rhodes D, Chinnaiyan A. Integrative analysis of the cancer transcriptome. Nature Genetics. 2005;**37**:S31-S37. DOI: 10.1038/ng1570

[7] Rhodes DR, Yu J, Shanker K, et al. Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proceedings of the National Academy of Sciences of the United States of America. 2004;**101**(25):9309-9314. DOI: 10.1073/ pnas.0401994101

[8] Wang Y, Mashock M, Tong Z, Mu X, Chen H, Zhou X, et al. Changing technologies of RNA sequencing and their applications in clinical oncology.

Frontiers in Oncology. 2020;**10**:447. DOI: 10.3389/fonc.2020.00447

[9] Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research. 2008;**18**(9): 1509-1517. DOI: 10.1101/gr.079558.108

[10] Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, et al. The gene ontology (GO) database and informatics resource. Nucleic Acids Research. 2004;**32**(Database issue): D258-D261. DOI: 10.1093/nar/gkh036

[11] Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA. Onto-tools, the toolkit of the modern biologist: Onto-express, onto-compare, onto-design and onto-translate. Nucleic Acids Research. 2003;**31**(13):3775-3378. DOI: 10.1093/nar/gkg624

[12] Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: Integrating viruses and cellular organisms. Nucleic Acids Research. 2021;**49**(D1):D545-D551. DOI: 10.1093/ nar/gkaa970

[13] Krull M, Voss N, Choi C, Pistor S, Potapov A, Wingender E. TRANSPATH: An integrated database on signal transduction and a tool for array analysis. Nucleic Acids Research. 2003;**31**(1): 97-100. DOI: 10.1093/nar/gkg089

[14] Doniger SW, Salomonis N, Dahlquist KD, et al. MAPPFinder: Using gene ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biology. 2003;**4**:R7. DOI: 10.1186/gb-2003-4-1-r7

[15] Erdogan F, Radu TB, Orlova A, Qadree AK, de Araujo ED, Israelian J, et al. JAK-STAT core cancer pathway: An integrative cancer interactome analysis. Journal of Cellular and Molecular Medicine. 2022;**26**(7):2049-2062. DOI: 10.1111/jcmm.17228. Epub 2022 Mar 1. PMID: 35229974; PMCID: PMC8980946

[16] Choi RY, Coyner AS, Kalpathy-Cramer J, Chiang MF, Campbell JP. Introduction to machine learning, neural networks, and deep learning. Translational Vision Science & Technology. 2020;**9**(2):14. DOI: 10.1167/ tvst.9.2.14

[17] Kourou K, Exarchos KP, Papaloukas C, Sakaloglou P, Exarchos T, Fotiadis DI. Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Computational and Structural Biotechnology Journal. 2021;**19**:5546- 5555. DOI: 10.1016/j.csbj.2021.10.006

[18] Ramroach S, Joshi A, John M. Optimisation of cancer classification by machine learning generates an enriched list of candidate drug targets and biomarkers. Molecular Omics. 2020;**16**(2):113-125. DOI: 10.1039/ c9mo00198k

[19] Shang LW, Ma DY, Fu JJ, Lu YF, Zhao Y, Xu XY, et al. Fluorescence imaging and Raman spectroscopy applied for the accurate diagnosis of breast cancer with deep learning algorithms. Biomedical Optics Express. 2020;**11**(7):3673-3683. DOI: 10.1364/ BOE.394772

[20] Choi K, Choi SJ, Kim ES. Computeraided Diagonosis for colorectal Cancer using deep learning with visual explanations. Annual International Conference of the IEEE Engineering in Medicine & Biology Society. 2020;**2020**:1156-1159. DOI: 10.1109/ EMBC44109.2020.9176653

[21] Shimazaki A, Ueda D, Choppin A, Yamamoto A, Honjo T, Shimahara Y, et al. Deep learning-based algorithm for lung cancer detection on chest radiographs using the segmentation method. Scientific Reports. 2022;**12**(1): 727. DOI: 10.1038/s41598-021-04667-w

[22] Ma CY, Zhou JY, Xu XT, Guo J, Han MF, Gao YZ, et al. Deep learningbased auto-segmentation of clinical target volumes for radiotherapy treatment of cervical cancer. Journal of Applied Clinical Medical Physics. 2022;**23**(2):e13470. DOI: 10.1002/acm2.13470

[23] Abe S, Tomizawa Y, Saito Y. Can artificial intelligence be your angel to diagnose early gastric cancer in real clinical practice? Gastrointestinal Endoscopy. 2022;**95**(4):679-681. DOI: 10.1016/j.gie.2021.12.042

[24] International Cancer Genome Consortium, Hudson TJ, Anderson W, Artez A, Barker AD, et al. International network of cancer genome projects. Nature. 2010;**464**(7291):993-998. DOI: 10.1038/nature08987

[25] Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, et al. The Cancer genome atlas Pan-Cancer analysis project. Nature Genetics. 2013;**45**(10):1113-1120. DOI: 10.1038/ ng.2764

[26] Pavlopoulou A, Spandidos DA, Michalopoulos I. Human cancer databases (review). Oncology Reports. 2015;**33**(1):3-18. DOI: 10.3892/ or.2014.3579

[27] Tang G, Cho M, Wang X. OncoDB: An interactive online database for analysis of gene expression and viral infection in cancer. Nucleic Acids Research. 2022;**50**(D1):D1334-D1339. DOI: 10.1093/nar/gkab970

*Introductory Chapter: Application of Bioinformatics Tools in Cancer Prevention, Screening… DOI: http://dx.doi.org/10.5772/intechopen.104794*

[28] Tang G, Cho M, Wang X. OncoDB: An interactive online database for analysis of gene expression and viral infection in cancer. Nucleic Acids Research. 2022;**50**(D1):D1334-D1339

[29] Tyrer J, Duffy SW, Cuzick J. A breast cancer prediction model incorporating familial and personal risk factors. Statistics in Medicine. 2004;**23**(7):1111- 1130. DOI: 10.1002/sim.1668. Erratum in: Statistics in Medicine 2005 Jan 15;24(1):156

[30] Lee A, Mavaddat N, Wilcox AN, Cunningham AP, Carver T, Hartley S, et al. BOADICEA: A comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genetics in Medicine. 2019;**21**(8):1708-1718. DOI: 10.1038/ s41436-018-0406-9

[31] Antoniou AC, Hardy R, Walker L, Evans DG, Shenton A, Eeles R, et al. Predicting the likelihood of carrying a BRCA1 or BRCA2 mutation: Validation of BOADICEA, BRCAPRO, IBIS, myriad and the Manchester scoring system using data from UK genetics clinics. Journal of Medical Genetics. 2008;**45**(7):425-431. DOI: 10.1136/jmg.2007.056556

[32] Shieh Y, Hu D, Ma L, Huntsman S, Gard CC, Leung JW, et al. Breast cancer risk prediction using a clinical risk model and polygenic risk score. Breast Cancer Research and Treatment. 2016;**159**(3): 513-525. DOI: 10.1007/s10549-016-3953-2

[33] Yang Y, Dong X, Xie B, Ding N, Chen J, Li Y, et al. Databases and web tools for cancer genomics study. Genomics Proteomics Bioinformatics. 2015;**13**(1):46-50. DOI: 10.1016/j. gpb.2015.01.005. [Epub 2015 Feb 21]. Erratum in: Genomics Proteomics Bioinformatics. 2015 Jun;13(3):202-203

[34] Zheng H, Zhang G, Zhang L, et al. Comprehensive review of web servers

and bioinformatics tools for Cancer prognosis analysis. Frontiers in Oncology. 2020;**10**:68. DOI: 10.3389/ fonc.2020.00068

[35] Dietel M, Jöhrens K, Laffert MV, Hummel M, Bläker H, Pfitzner BM, et al. A 2015 update on predictive molecular pathology and its role in targeted cancer therapy: A review focussing on clinical relevance. Cancer Gene Therapy. 2015;**22**(9):417-430. DOI: 10.1038/cgt.2015.39

[36] Orlov YL, Baranova AV, Tatarinova TV. Bioinformatics methods in medical genetics and genomics. International Journal of Molecular Sciences. 2020;**21**(17):6224. DOI: 10.3390/ijms21176224

[37] Fröhlich H, Balling R, Beerenwinkel N, et al. From hype to reality: Data science enabling personalized medicine. BMC Medicine. 2018;**16**(1):150. DOI: 10.1186/s12916- 018-1122-7

[38] Singer J, Irmisch A, Ruscheweyh HJ, et al. Bioinformatics for precision oncology. Briefings in Bioinformatics. 2019;**20**(3):778-788. DOI: 10.1093/ bib/bbx143

[39] Miller DT, Lee K, Gordon AS, Amendola LM, Adelman K, Bale SJ, et al. ACMG secondary findings working group. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2021 update: A policy statement of the American College of Medical Genetics and Genomics (ACMG). Genetics in Medicine. 2021;**23**(8):1391-1398. DOI: 10.1038/ s41436-021-01171-4

[40] Qian M, Li Q, Zhang M, et al. Multidisciplinary therapy strategy of precision medicine in clinical practice. Clinical and Translational Medicine. 2020;**10**(1):116-124. DOI: 10.1002/ctm2.15

#### **Chapter 2**
