**2.2 Analyzing gene expression profiles**

With the increased availability of genome-wide gene expression assays in public databases, there is increasing demand for more efficient computational models for data interpretation. The use of artificial neural networks in biomedical research is currently taking precedence over traditional analysis methods, as they have been proven to be better classifiers. Deep neural networks, using data from RNA-seq as inputs, are being used for prediction modeling. Classic models in applications like predicting patient outcomes using gene expression data are still not effective to the expected level, thus creating a need for more efficient robust algorithms. Recent studies that use deep learning models on gene expression data have indicated better performance. Urda et al. [17] illustrated the use of a multi-layer feed-forward artificial neural network, shown in **Figure 2**, in analyzing the RNA-seq gene expression data.

Dincer et al. [18] present a model that uses variational auto-encoders (VAEs) to extract latent variables from publicly available expression datasets and use them as features for predicting phenotypes. Their system, called DeepProfile, uses deep learning to learn a feature representation from large unlabeled expression data samples that are not incorporated in the prediction problem. This system was successfully used for the prediction of response to cancer drugs based on gene expression data. It also helped determine the effects of given drugs on specific patients

#### **Figure 2.**

*Example neural network for binary classification. Input layer of P gene expression levels connected to k-hidden layers through synaptic weights w.*

**7**

different levels of biology.

*Designing Data-Driven Learning Algorithms: A Necessity to Ensure Effective Post-Genomic…*

and thus provides a tool for precision medicine. The model was trained on gene expression data of acute myeloid leukemia, from GEO. Results indicated that lowdimensional representation (latent variables) generated using VAEs significantly outperformed the original input feature representation (gene expression levels) in the drug response prediction problem. Therefore, variational auto-encoders were shown to be effective in extracting a low-dimensional feature representation from unlabeled gene expression datasets and these learned features were found to

It is worth noting that detecting certain differentially expressed genes (DEGs) from RNA-seq results still faces challenges despite the quality control measures applied during sample preparation and data analysis. Data processing methods can lead to a certain number of false-positives and false-negatives that affect the accuracy and sensitivity of DEGs analysis. The combination of machine learning techniques with RNA-seq has been shown to significantly improve the sensitivity of DEGs [18] and thus help increase the identification of DEGs that are missed by traditional RNA-seq techniques. The study by Wang et al. [19] used a differential network analysis, based on machine learning, to predict stress-responsive genes by learning the patterns of 32 expression characteristics of known stress-related genes. For analysis using machine learning, the WEKA 3 data mining software was used for feature selection, classifier training, and evaluation. Three feature selection algorithms, correlation feature selection (CFS), information gain (InfoGain), and RELIEF [20], were used to identify features and five classifiers, logistic regression, random forest, LMT, classification via regression, and random subspace, that exhibited better performance than other machine learning algorithms, were deployed to predict up- and down-regulated genes. With this approach, the authors were able to identify the

**2.3 Inferring protein-protein interaction and biological networks for knowledge** 

In the context of this chapter, we only focus on protein-protein interaction (PPI) network, which is defined as a set of nodes (or vertices), representing proteins connected by undirected edges (or links), which are the interactions or relationships between them (either *direct physical* or *functional* interactions). A physical interaction is an interaction that involves physical contact between proteins, and on the other hand, functional interaction, which is broad, does not necessarily involve direct physical contact, but rather refers to a mechanism through which a protein participates in cell functions [21]. Several learning algorithms have been used to infer human and human-pathogen PPIs [22], including

There exist several types of PPI networks based on the type of interactions and when integrated in a single network, the relationships between proteins in a unified network are referred to as functional interactions. Here, we only refer to functional interactions, which include physical and genetic interactions, and those inferred from knowledge about co-expression and shared evolutionary history or biological pathways. Other types of biological networks include signaling networks, gene regulatory or DNA-protein interaction networks [24, 25], disease-gene networks linking diseases to genes causing the disease, and drug interaction networks connecting drugs to their targets [26]. These biological networks have been used in several applications and analyzing individual, collective, and sub-network behaviors of these biological networks has enabled effective knowledge discovery at

capture important processes relevant to the prediction problem.

*DOI: http://dx.doi.org/10.5772/intechopen.84148*

top 23 most informative features.

**discovery**

ANN [23].
