**5.3 Standard deviation**

**Figures 4**–**6** are examples to show the standard deviation of the transformed data, across samples, against the mean, using the shifted logarithm transformation, the regularized log transformation and the variance stabilizing transformation.


**117**

**Figure 6.**

**Figure 4.**

**Figure 5.**

*Standard deviation of the transformed data using the variance stabilizing transformation.*

*Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq*

*Standard deviation of the transformed data using the shifted logarithm transformation.*

*Standard deviation of the transformed data using the regularized log transformation.*

*DOI: http://dx.doi.org/10.5772/intechopen.94069*

#### **Table 2.**

*Description of the four datasets from recount2.*

*Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq DOI: http://dx.doi.org/10.5772/intechopen.94069*

**Figure 4.** *Standard deviation of the transformed data using the shifted logarithm transformation.*

**Figure 5.** *Standard deviation of the transformed data using the regularized log transformation.*

*Applications of Pattern Recognition*

**5.2 Adjusted Rand Index**

**5.3 Standard deviation**

**Number of samples**

**Dataset (accession)**

Description of the four datasets from recount2 is shown in **Table 2**.

in the phenotype table in recount2, then we used the ARI for cluster validation.

**Number of classes**

SRP049097 54 4 3 subtypes of Leiomyosarcoma:

SRP042620 168 6 • 28 breast cancer cell lines.

**Figures 4**–**6** are examples to show the standard deviation of the transformed data, across samples, against the mean, using the shifted logarithm transformation, the regularized log transformation and the variance stabilizing transformation.

SRP032789 20 4 17 breast tumor samples of three different subtypes: • TNBC. • Non-TNBC. • HER2-positive.

**Classes**

tumors.

cancer.

SRP044668 94 3 • 39 contrast-enhancing glioma core samples.

• 8 LMS cases from subtype I • 6 cases from subtype II • 3 cases from subtype III • 7 cases of normal tissues

to ER+ primary tumors.

to TNBC primary tumors.

• 42 Triple Negative Breast Cancer (TNBC) primary

• 30 uninvolved breast tissue samples that were adjacent

• 5 breast tissue samples from reduction mammoplasty procedures performed on patients with no known

• 21 uninvolved breast tissue samples that were adjacent

• 36 non-enhancing FLAIR glioma margin samples.

• 17 non-neoplastic brain tissue samples.

• 42 Estrogen Receptor Positive (ER+) and HER2 Negative Breast Cancer primary tumors.

There are several similarity measures for cluster evaluation, we chose to work with the adjusted Rand index which is the corrected-for-chance version of the Rand index. It is a measure used in data clustering to evaluate the performance of a clustering method, by comparing the results of a clustering algorithm against known classes from external criteria [26]. In our study, we performed different sample-based classification method on four different datasets, after that, we compared the results to the class labels we associated to each sample based on the field "characterization of the samples"

**5.1 Datasets**

**116**

**Table 2.**

*Description of the four datasets from recount2.*


#### **Table 3.**

*Performance of clustering methods (SRP032789).*


#### **Table 4.**

*Performance of clustering methods (SRP049097).*


#### **Table 5.**

*Performance of clustering methods (SRP042620).*


#### **Table 6.**

*Performance of clustering methods (SRP044668).*
