**6. Discussion and conclusion**

Clustering the samples of the three datasets to the sub-classes defined in the phenotype table of recounts2 was not easy. We first tried to visualize the separation between the subtypes using principal component analysis (**Figures 7** and **8**–**10**), then using 4 variants of the hierarchical clustering and k-medoids we classified the samples of each dataset (**Figures 11** and **12** show the hierarchical clustering plots of the dataset SRP032789). The performance of the 5 methods was different depending on the dataset (**Tables 3**–**5**), making it impossible to make a general system of recommendation. However, we can see that the k-medoid method has relatively

**Figure 7.** *PCA of data from the study SRP032789.*

**Figure 8.** *PCA of the data from the study SRP049097.*

**121**

**Figure 12.**

*the Poisson distance.*

*Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq*

*Dendrograms obtained for the dataset from SRP032789 study using three variants of the hierarchical clustering* 

*Dendrograms obtained for the dataset from the study SRP032789 using the hierarchical clustering method with* 

*DOI: http://dx.doi.org/10.5772/intechopen.94069*

*PCA of the data from the study SRP044668.*

**Figure 10.**

**Figure 11.**

*method with the Euclidean distance.*

**Figure 9.** *PCA of the data from the study SRP042620.*

*Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq DOI: http://dx.doi.org/10.5772/intechopen.94069*

**Figure 10.** *PCA of the data from the study SRP044668.*

**Figure 11.**

*Applications of Pattern Recognition*

**120**

**Figure 9.**

**Figure 8.**

**Figure 7.**

*PCA of the data from the study SRP049097.*

*PCA of data from the study SRP032789.*

*PCA of the data from the study SRP042620.*

*Dendrograms obtained for the dataset from SRP032789 study using three variants of the hierarchical clustering method with the Euclidean distance.*

**Figure 12.**

*Dendrograms obtained for the dataset from the study SRP032789 using the hierarchical clustering method with the Poisson distance.*

better performance than the other methods for all the datasets. In the second part of the study, we compared a few machine learning methods used for the classification of RNA-seq data. The performance of the models surpasses the classical methods used before, also RF and PLDA performed better than SVM which does not perform very well when the data set is large and has noise. Note that the model accuracies given in this study should not be considered as a generalization. The results can depend on several criteria: normalization and transformation methods, gene-wise overdispersions, outliers, number of classes etc. (**Tables 6**–**11**).
