*3.2.1 Hierarchical methods*

*Applications of Pattern Recognition*

(reads/fragments per kilobase of exon per million reads/fragments mapped), DESeq2's

Cluster analysis techniques have proven to be helpful to understand gene expression data by uncovering unknown relationships among genes and unveiling different subtypes of diseases when it comes to clustering biological samples [10]. In the following section, we present methods for sample-based and gene-based clustering, starting with traditional methods used after data transformation then model-based clustering for data generated using a combination of probability distributions

Traditional clustering algorithms like hierarchical clustering and k-means cannot be directly applied to RNA-seq count data, to apply these methods for cluster analysis of RNA-seq data, that tend to follow an over-dispersed Poisson or negative binomial distribution, we need to transform the data in order to have a distribution closer to the normal distribution. In the following section, we present popular

• Logarithmic, widely used method to deal with skewed data in many research domains, often used to reduce the variability of the data and make the data conform more closely to the normal distribution. However, it was

median of ratios and EdgeR's trimmed mean of M values (TMM) [9].

**3. Clustering methods for gene expression data**

**3.1 Data transformation methods**

methods for data transformation:

**112**

**2.5 Clustering**

*Cluster analysis of RNA-sequencing data.*

**Figure 3.**

(**Figure 3**).

Hierarchical clustering method is the most popular method for gene expression data analysis. In hierarchical clustering, genes with similar expression patterns are grouped together and are connected by a series of branches (clustering tree or dendrogram). Experiments with similar expression profiles can also be grouped together using the same method. This clustering technique is divided into two types: agglomerative and divisive. In an agglomerative or bottom-up clustering method each observation is assigned to its own cluster. In a comparative study on Cancer data [14], three variants of Hierarchical Clustering Algorithms (HCAs): Single-Linkage (SL), Average-Linkage (AL) and Complete-Linkage (CL) with 12 distance measure have been used to cluster RNA-seq Samples. The same methods will be used in our study along with hierarchical clustering with Poisson distribution [15].
