**2. Databases of genes, pathways and drugs for drug repositioning**

Genes play a critical role in gene signature-based drug repositioning. Especially, the targets of drugs are of importance in traditional drug development. In General, the targets of drugs are human or viral proteins, which are druggable [12] and associated with a particular disease or multi diseases. So far, there are about 900 biomolecules targeted by about 1500 US FDA-approved drugs as curated by Rita et al. [13]. Obtaining this information will facilitate the process of gene signaturebased drug repositioning. Some databases and web servers have gene information, which are useful in drug development [14].

GeneCards (https://www.genecards.org/) is an integrative knowledge base and web server with comprehensive information on all human genes, scratching more than 150 high-quality web sources, from genotype to phenotypes and functional information [15]. Though it is a general database, which is not centric on drug development, it provides comprehensive knowledge about a gene of interest. It is highly recommended to browse this website at the beginning of a study of a target.

DGIdb (drug-gene interaction database, www.dgidb.org) is a webserver with drug-gene interaction and druggable genes information, collected from more than thirty high-quality web sources [16]. If biomarkers or therapeutic targets are identified, then researchers could search which drugs could target the biomarker or therapeutic target using DGIdb, achieving a quick translational opportunity.

The Open Targets database (https://www.opentargets.org/) aims to identify and prioritize promising therapeutic targets of drugs by analyzing human genetics, genomics and functional genomics data [17, 18]. The database emphasizes the importance of genetics of diseases via genome-wide association studies to approach gene causal inference, which is beneficial to drug development [19, 20].

The Clue.io webserver (https://clue.io/) includes the updated CMap LINCS gene expression resource perturbed by CRISPR gene over-expression, RNAi gene knockdown and CRISPR gene knockout generating loss-of-function mutants [9, 21]. This webserver has abundant data about the gene perturbation, providing a great resource to study the effect of a target, mimicking the targets affected by drugs [22–24]. Meanwhile, it also supplies a drug repositioning hub for researchers, a curated library of drugs with a companion knowledge resource [25].

Pathways, besides gene level, could also be a key resource in drug repositioning. Pathway, consisting of a set of genes, could be the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, gene ontology (GO), Reactome Pathway Database (https://reactome.org/) and other gene sets. As genes in a pathway are not randomly selected, a generalized pathway concept is the gene set, substantially enlarging the function aspects of pathways. A good resource of the gene sets is the Molecular Signatures Database (MSigDB, https://www.gsea-msigdb.org/gsea/ msigdb/) as it supplied a downloadable gmt-formatted gene set dataset, facilitating its use in the bioinformatic analysis [26]. Several reasons highlight the importance of the pathway. Firstly, it could be used to illuminate the mode of action of drugs by connecting the genes and drugs [27]. Secondly, it could be a feature summarizing the gene-signature at a higher level, which is useful in machine learning-based modeling. It is different from the gene level as it captures different information about drugs or diseases [28–30]. Thirdly, the pathway analysis could enhance the confidence of the prediction of the candidate drugs [31].

The information about drugs is an invaluable resource to drug repositioning and an evaluation dataset of drug repositioning. The repoDB database is a standard dataset to benchmark various computational repositioning methods, which consist of 6677 approved and 4123 failed drug-indication pairs [32]. The Experimental Knowledge-Based Drug Repositioning Database (EK-DRD, http://www.idruglab. com/drd/index.php) curated 1861 FDA-approved and 102 withdrawn drugs with validated drug repositioning annotations [33]. These datasets will facilitate the training and testing of the machine-learning-based models.
