**5. Conclusion**

Clustering groups of objects such that similar ones are placed in the same cluster, and in its application to biological datasets are very important in that it can help identification of natural groups of biological entities that might give insight about biomarkers. In this chapter, we review some clustering algorithms applied to biological data. Ensemble clustering approaches for biological data are also reviewed. Implementation of K-means, C-means and HC algorithms and merging of the algorithms using an ensemble frame work are presented using two different datasets. The datasets are protein and DLBCL-B. Two different cluster validation indices, adjusted rand and silhouette, are used for comparing the partitions from individual algorithms and ensemble clustering. Investigating Table 1, we conclude that merging individual partitions improves C-rand values meaning that ensemble approach finds partitions similar to the real partitions. Ensemble approach is coded as a Java application and available upon request.

[7] Bezdek, J. C. [1981]. *Pattern Recognition with Fuzzy Objective Function Algorithms*, Kluwer

Ensemble Clustering for Biological Datasets 297

[8] Fisher, R. A. [1936]. The use of multiple measurements in taxonomic problems, *Annals*

[10] Fred, A. L. N. & Jain, A. K. [2005]. Combining multiple clusterings using evidence accumulation, *IEEE Transaction on Pattern Analysis and Machine Intelligence* 27: 835–850. [11] Galluccio, L., Michel, J.J., O., Comon, P., Hero, A. O. & Kliger, M. [2009]. Combining multiple partitions created with a graph-based construction for data clustering, *Proceedings of IEEE International Workshop on Machine Learning for Signal Processing*,

[12] Geraci, F., Leoncini, M., Montangero, M., Pellegrini, M. & Renda, M. E. [2009]. K-boost: a scalable algorithm for high-quality clustering of microarray gene expression data, *Journal*

[13] Ghazalpour, A., Doss, S., Zhang, B., Wang, S., Plaisier, C., Castellanos, R., Brozell, A., Schadt, E. E., Drake, T. A., Lusis, A. J. & Horvath, S. [2006]. Integrating genetic and network analysis to characterize genes related to mouse weight, *PLoS Genetics* 2(8).

[14] Ghouila, A., Yahia, S. B., Malouche, D., Jmel, H., Laouini, D., Guerfali, F. Z. & Abdelhak, S. [2009]. Application of multi-som clustering approach to macrophage gene expression

[15] Glover, F. W. & Kochenberger, G. [2006]. New optimization models for data mining, *International Journal of Information Technology and Decision Making* 5(4): 605–609. [16] Gumus, E., Kursun, O., Sertbas, A. & Ustek, D. [2012]. Application of canonical correlation analysis for identifying viral integration preferences, *Bioinformatics*

[17] Gungor, Z. & Unler, A. [2008]. K-harmonic means data clustering with tabu-search

[18] He, Y. & Hui, S. C. [2009]. Exploring ant-based algorithms for gene expression data

[19] Heath, J. W., Fu, M. C. & Jank, W. [2009]. New global optimization algorithms for model-based clustering, *Computational Statistics and Data Analysis* 53(12): 3999–4017. [20] Hore, P., Hall, L. O. & Goldgof, D. B. [2009]. A scalable framework for cluster ensembles,

[21] Hoshida, Y., Brunet, J. P., Tamayo, P., Golub, T. R. & Mesirov, J. P. [2007]. Subclass mapping: Identifying common subtypes in independent disease data sets, *PLoS ONE*

[22] Hu, X. & Yoo, I. [2004]. Cluster ensemble and its applications in gene expression analysis, *Proc. 2nd conference on Asia-Pacific bioinformatics (APBC'04)*, Australian Computer Society,

[23] Hubert, L. & Arabie, P. [1985]. Comparing partitions, *Journal of Classification* 2: 193–218.

[24] Huttenhower, C., Flamholz, A. I., Landis, J. N., Sahi, S., Myers, C. L., Olszewski, K. L., Hibbs, M. A., Siemers, N. O., Troyanskaya, O. G. & Coller, H. A. [2007]. Nearest neighbor

[9] Frank, A. & Asuncion, A. [2010]. UCI machine learning repository.

Academic Publishers, Norwell, MA, USA.

URL: *http://archive.ics.uci.edu/ml*

*of Computational Biology* 16(6): 859–873.

analysis, *Infection, Genetics and Evolution* 9(3): 328–336.

method, *Applied Mathematical Modelling* 32(6): 1115–1125.

analysis, *Artificial Intelligence in Medicine* 47(2): 105–119.

URL: *http://dx.doi.org/10.1016/j.patcog.2008.09.027*

Inc., Darlinghurst, Australia, Australia, pp. 297–302.

10.1007/BF01908075. URL: *http://dx.doi.org/10.1007/BF01908075*

URL: *http://dx.plos.org/10.1371*

*Pattern Recognition* 42(5): 676–688.

URL: *http://dx.plos.org/10.1371*

28(5): 651–655.

2(11): 1195.

Grenoble, France, pp. –.

*Eugen.* 7: 179–188.
