2. Fundamental concepts

In this section, some basic symbols of the graph mining, pattern recognition, [39] and information theory are described. A graph is an ordered pair G = (V, E) comprising of a set of vertices denoted as V and a set of edges denoted as E. To avoid ambiguity, the graph is described here precisely as undirected and simple. Let, Q ¼ ðN; EÞ be an unweighted as well as undirected graph, and H be a (hypograph) of it, (H ⊆ N). Further, suppose, the density of H, denoted by <sup>∣</sup>IEð Þ <sup>H</sup> <sup>∣</sup> Dsð Þ <sup>H</sup> , be defined as DsðHÞ ¼ , where IEð Þ <sup>H</sup> depicts the induced edge-set of <sup>H</sup>, and <sup>∣</sup>H<sup>∣</sup> <sup>∣</sup>H<sup>∣</sup> refers to the cardinality of H. Suppose, the highest density of the graph H, referred to as Ds<sup>∗</sup> <sup>H</sup> , is illustrated as follows: Ds<sup>∗</sup> ð Þ ð Þ¼ <sup>H</sup> max<sup>H</sup> <sup>⊆</sup> <sup>V</sup>fDS Hð Þg. Now, if <sup>Q</sup> ¼ ðN; <sup>E</sup><sup>Þ</sup> is <sup>a</sup> weighted P <sup>e</sup> <sup>∈</sup>IE <sup>H</sup> graph, Dsð Þ <sup>H</sup> will be ð Þwte , where IEð Þ H symbolizes the induced edge-set of H, and wte <sup>∣</sup>H<sup>∣</sup> denotes the weight of the edge e∈ IEð Þ H . Entropy of a random variable evaluates the amount of uncertainty corresponding to the variable [40]. The entropy of a discrete variable A, referred <sup>P</sup> to as EPð Þ <sup>A</sup> , is defined in the following: EPð Þ¼ <sup>A</sup> � ð Þ ð Þ, where p a <sup>a</sup><sup>∈</sup> <sup>A</sup>p a log bp a ð Þ refers to the probability mass function of A, and the value of b, in general, is considered as 2. Mutual information [41] between two random variables estimates the quantity of information that they combinedly share, i.e., the mutual dependency between them. When mutual information is zero, this signifies that these two variables are entirely independent to each other; whereas when mutual information is higher, it signifies that these two variables are extremely dependent on each other.

Topological Overlap Measure (TOM) and other related measures: Ravasz et al. [42] proposed a new measure Topological Overlap Measure (TOM) that provided the similarity between two nodes belonging to a network depending upon nearest neighbor concept. Furthermore, various modified versions of TOM such as weighted TOM (wTOM) [43], generalized TOM (GTOM) [44] are present in the literature. In the course of computing the wTOM, Pearson correlation coefficient scores are first evaluated for all pairs of vertices, and then a soft thresholding power (say, β >¼ 1) is utilized from the correlation coefficient matrix through scale free topology. After that, weighted adjacency matrix is calculated using the coefficient matrix using the calculated power β. Then wTOM is computed from the weighted adjacency matrix. In the same way, the GTOM can also be defined just like TOM except it counts the number of m-step neighbors while calculating TOM measure between two vertices. Now, for calculating GTOM of order 0 (i.e., GTOM0), the adjacency score becomes the score of GTOM0. But, for determining the GTOM with higher order than zero (i.e., GTOM1, GTOM2, GTOM3,...), it follows the same procedure of TOM calculation, but counts up to d €-th neighbors for each vertex (d € <sup>¼</sup> <sup>1</sup>; <sup>2</sup>; <sup>3</sup>, …). Notably, GTOM1, GTOM2 and other higher order GTOM work only on binary matrix. So, before using those measures, the weighted adjacency matrix is translated into binary matrix in which the greater adjacency value than a specified cutoff (e.g., 70% score of the distance between the minimum and maximum adjacency values is converted into 1, and the lower value than the cutoff is transferred into 0.

In data mining, hierarchical clustering is one of the most popular cluster analyses in forming a hierarchy of clusters. There exist two types of strategies: agglomerative and divisive [45]. As is already known, agglomerative hierarchical clustering does not need any input parameters except the similarity matrix. Thus, there is no extra burden of utilizing cluster initialization as it simply merges two closest clusters at each iteration and continues till a singleton cluster is found. Divisive hierarchical clustering also follows the same style but in a reverse order. This is the major benefit of performing hierarchical clustering over the traditional K-means clustering algorithm, which is sensitive to initialization.

Association rule mining (ARM) [46] is a popular method for generating interesting relation- ˜ ° ships among different items (viz., genes). Suppose, GST <sup>¼</sup> <sup>g</sup>1; <sup>g</sup>2;…; <sup>g</sup> be <sup>a</sup> item set (gene <sup>n</sup> set) and SST ¼ fs1;s2;…;smg be sample set (viz., transaction set). Therefore, an association rule can be stated as A ) C, where A,C ⊆ GST and A ∩C ¼ ϕ. Notably, A and C symbolize as antecedent and consequent, respectively. An association rule can be described as the causeeffect relationships of the corresponding item sets in the transactions of a transactional dataprofile in a big shopping market. A set of bought items may fall into a transaction. In a similar fashion, many genes may occur together in a sample (transaction) of a gene expression profile or similar profile. Many of these genes may be up-regulated or down-regulated, whereas the remaining genes will be non-differentially expressed.
