**5.5 Results**


**119**

*Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq*

**Classifier Accuracy Kappa** rf 0.8235 0.765 SVM 0.7647 0.6909 PLDA 0.7647 0.6866

**Classifier Accuracy Kappa** rf 0.9412 0.9249 SVM 0.5882 0.4685 PLDA 0.7843 0.7267

**Classifier Accuracy Kappa** rf 0.8214 0.7271 SVM 0.6786 0.5218 PLDA 0.7143 0.5573

All experiments are performed on a machine with 16 GB RAM, 1024 GB hard

rf 176.67 781.31 4234.89 1412.19 SVM 1080.92 2333.52 6645.21 1597.89 PLDA 31.45 60.93 234.98 72.66

**SRP032789 SRP049097 SRP042620 SRP044668**

Clustering the samples of the three datasets to the sub-classes defined in the phenotype table of recounts2 was not easy. We first tried to visualize the separation between the subtypes using principal component analysis (**Figures 7** and **8**–**10**), then using 4 variants of the hierarchical clustering and k-medoids we classified the samples of each dataset (**Figures 11** and **12** show the hierarchical clustering plots of the dataset SRP032789). The performance of the 5 methods was different depending on the dataset (**Tables 3**–**5**), making it impossible to make a general system of recommendation. However, we can see that the k-medoid method has relatively

disk running with a windows operating system and MLSeq R package.

*DOI: http://dx.doi.org/10.5772/intechopen.94069*

*Classification results for SRP049097 data.*

*Classification results for SRP042620 data.*

**5.6 Computational time**

*Classification results for SRP044668 data.*

**Table 8.**

**Table 9.**

**Table 10.**

**Table 11.**

**6. Discussion and conclusion**

*Computational time in seconds.*

**Table 7.** *Classification results for SRP032789 data.* *Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq DOI: http://dx.doi.org/10.5772/intechopen.94069*


#### **Table 8.**

*Applications of Pattern Recognition*

*Performance of clustering methods (SRP032789).*

*Performance of clustering methods (SRP049097).*

*Performance of clustering methods (SRP042620).*

**Table 3.**

**Table 4.**

**Table 5.**

**Table 6.**

**5.4 Machine learning classification**

*Performance of clustering methods (SRP044668).*

Three widely used machine learning algorithms were used for the classification of the four datasets, Random forests, support vector machine and Poisson linear discriminant analysis. To perform this analysis, we first split the data into two parts as training and test sets, with 70% of samples for the training dataset, and the remaining 30% samples for the testing dataset, the training set is used to fit the parameters of the model, that is used thereafter to predict the responses for the observations in the test dataset. Normalization was applied with Deseq median ratio method and the variance stabilizing transformation was applied for the normalization of the dataset. The model was trained using 5-fold cross validation repeated 2 times. The number of levels for tuning parameters is set to 10.

**hclus (complete) hclust (single) hclust (average) hclust (complete) k-medoids Euclidean Euclidean Euclidean Poisson distance Euclidean** 0.4146015 0.3818763 0.4146015 0.4146015 0.6798897

**hclus (complete) hclust (single) hclust (average) hclust (complete) k-medoids Euclidean Euclidean Euclidean Poisson distance Euclidean** 0.02880412 −0.003409256 0.0005777741 0.1874828 0.2791547

**hclus (complete) hclust (single) hclust (average) hclust (complete) k-medoids Euclidean Euclidean Euclidean Poisson distance Euclidean** 0.1944569 0.005551586 0.1285448 0.1468464 0.2579758

**hclus (complete) hclust (single) hclust (average) hclust (complete) k-medoids Euclidean Euclidean Euclidean Poisson distance Euclidean** 0.2379903 −0.007755123 0.399417 0.2657942 0.3771837

**Classifier Accuracy Kappa** rf 1 1 SVM 0.6667 0.5 PLDA 1 1

**118**

**Table 7.**

*Classification results for SRP032789 data.*

**5.5 Results**

*Classification results for SRP049097 data.*


#### **Table 9.**

*Classification results for SRP042620 data.*


#### **Table 10.**

*Classification results for SRP044668 data.*

#### **5.6 Computational time**

All experiments are performed on a machine with 16 GB RAM, 1024 GB hard disk running with a windows operating system and MLSeq R package.


**Table 11.**

*Computational time in seconds.*
