*Designing Data-Driven Learning Algorithms: A Necessity to Ensure Effective Post-Genomic… DOI: http://dx.doi.org/10.5772/intechopen.84148*

and thus provides a tool for precision medicine. The model was trained on gene expression data of acute myeloid leukemia, from GEO. Results indicated that lowdimensional representation (latent variables) generated using VAEs significantly outperformed the original input feature representation (gene expression levels) in the drug response prediction problem. Therefore, variational auto-encoders were shown to be effective in extracting a low-dimensional feature representation from unlabeled gene expression datasets and these learned features were found to capture important processes relevant to the prediction problem.

It is worth noting that detecting certain differentially expressed genes (DEGs) from RNA-seq results still faces challenges despite the quality control measures applied during sample preparation and data analysis. Data processing methods can lead to a certain number of false-positives and false-negatives that affect the accuracy and sensitivity of DEGs analysis. The combination of machine learning techniques with RNA-seq has been shown to significantly improve the sensitivity of DEGs [18] and thus help increase the identification of DEGs that are missed by traditional RNA-seq techniques. The study by Wang et al. [19] used a differential network analysis, based on machine learning, to predict stress-responsive genes by learning the patterns of 32 expression characteristics of known stress-related genes. For analysis using machine learning, the WEKA 3 data mining software was used for feature selection, classifier training, and evaluation. Three feature selection algorithms, correlation feature selection (CFS), information gain (InfoGain), and RELIEF [20], were used to identify features and five classifiers, logistic regression, random forest, LMT, classification via regression, and random subspace, that exhibited better performance than other machine learning algorithms, were deployed to predict up- and down-regulated genes. With this approach, the authors were able to identify the top 23 most informative features.
