**7. References**

284 Bioinformatics

from AB372573 to AB372814.

mechanisms and factors involved in human liver cancer.

*Nippon Medical School, Department of Surgery for Organ Function and Biological Regulation,* 

**Author details** 

*Japan* 

Yoshiaki Mizuguchi and Eiji Uchida

**6. Future direction** 

We cloned 210 novel microRNA candidates. Samples of the novel microRNAs that were detected from our study and its bioinformatics data are tabulated in Table 2 and 3. And those novel miRNAs have been deposited with DDBJ under consecutive accession codes

We have been demonstrated the usefulness and accuracy of sequencing in genetic research of the liver. One of the main problems with applying sequencing to the miRNA transcription research is that sequencing is a time-consuming procedure. And an important consideration for the discovery of miRNA by sequencing is the difficultly in identifying miRNAs that are expressed at low levels, at highly specific stages or in rare cell types. Moreover, a serious problem is that some miRNAs are difficult to profile precisely due to their physical properties or post-transcriptional modifications, such as RNA editing. In principle, these limitations can be overcome by extensive sequencing of small RNA libraries from a broad range of samples. For differential display, the sequencing-based method has the theoretical advantage in that it has the capability to discover and detect novel miRNAs. Based on our sequence variability results, especially with regard to RNA modifications, the accuracy of the sequence-based method is expected to be superior to that of the hybridization-based method. For the prediction of novel miRNAs, methods that rely on phylogenetic 1 genes. To overcome this problem, we made use of a computational approach for structural conservation criteria using the thermodynamic stability and intrinsic structural features of miRNAs. In clinics, pathologists often meet difficult situations in which they cannot clearly tell whether the tissue specimens they are observing are malignant or benign. Thus, in our opinion, using some miRNAs as a tumor marker would help clinicians to clearly determine whether that tissue is cancerous. miRNA sequences followed by bioinformatics have greater power than individual miRNAs or other clinic-pathological variables for the detection of high risk patients' groups with poor prognoses. There is currently little data available as to how we can use each miRNA to predict high risk groups; however, additional future miRNA work and data accumulation will elucidate such criteria. And further investigation is warranted to clarify the mechanism of aberrant expression of miRNAs in cancer and its participation in carcinogenesis. Nevertheless, these findings show that sequence-based miRNA profiling has potential for the confirmation of precise miRNA dynamics in a specific disease. In addition, it will increase our understanding of the


#### *Websites for database used in this manuscript*

NCBI Entrez Nucleotide database,

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide; European ribosomal RNA database, http://psb.ugent.be/rRNA/; Genomic tRNA database ,

http://lowelab.ucsc.edu/GtRNAdb/; RNAdb, http://research.imb.uq.edu.au/randb/;

NONCODE, http://www.bioinfo.org.cn/NONCODE/; NCBI Reference Sequence, ftp://ftp.ncbi.nih.gov/refseq/; UCSC Genome Bioinformatics Site, http://genome.ucsc.edu; OncoDb HCC, http://oncodb.hcc.ibms.sinica.edu.tw/index.ht

**Chapter 0**

**Chapter 13**

**Ensemble Clustering for Biological Datasets**

Recent technologies and tools generated excessive data in bioinformatics domain. For example, microarrays measure expression levels of ten thousands of genes simultaneously in a single chip. Measurements involve relative expression values of each gene through an

Biological data requires both low and high level analysis to reveal significant information that will shed light into biological facts such as disease prediction, annotation of a gene function and guide new experiments. In that sense, researchers are seeking for the effect of a treatment or time course change befalling. For example, they may design a microarray experiment treating a biological organism with a chemical substance and observe gene expression values comparing with expression value before treatment. This treatment or change make researchers focus on groups of genes, other biological molecules that have significant relationships with each other under similar conditions. For instance, gene class labels are usually unknown, since there is a little information available about the data. Hence, data analysis using an unsupervised learning technique is required. Clustering is an unsupervised learning technique used in diverse domains including bioinformatics. Clustering assigns objects into the same cluster, based on a cluster definition. A cluster definition or criterion is the similarity between the objects. The idea is that one needs to find the most important cliques among many from the data. Therefore, clustering is widely used to obtain biologically meaningful partitions. However, there is no best clustering approach for the problem on hand and clustering algorithms are biased towards certain criteria. In other words, a particular clustering approach has its own objective and assumptions about the data. Diversity of clustering algorithms can benefit from merging partitions generated individually. Ensemble clustering provides a framework to merge individual partitions from different clustering algorithms. Ensemble clustering may generate more accurate clusters than individual clustering approaches. Here, an ensemble clustering framework is implemented as described in [10] to aggregate results from K-means, hiearchical clustering and C-means algorithms. We employ C-means instead of spectral clustering in [10]. We also use different

> ©2012 Pirim and ¸Seker, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly

©2012 Pirim and ¸Seker, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Harun Pirim and ¸Sadi Evren ¸Seker

http://dx.doi.org/10.5772/49956

**1. Introduction**

image processing task.

Additional information is available at the end of the chapter

cited.
