Preface

*Advances in Knowledge Discovery and Data Mining* aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications.

First part of the book consists of the first and second phases of the knowledge discovery. Second part focuses on the data mining applications from energy, electricity, speech recognition and network security.

#### **Section 1. Knowledge Discovery**

Chapter 1 presents the mathematical formulation of unified theory of data mining and multi agent system architecture. The data mining processes clustering, classification and visualization are unified by means of mathematical functions and multi agent system is developed (Khan, et al.).

Chapter 2 focuses on an overview of existing methods that deal with methods of data selection and sampling (Borovicka, et al.).

Chapter 3 provides an overview of main dimensionality reduction algorithms, together with a detailed description of the most used similarity measures in time series data mining (Cassisi, et al.).

Chapter 4 explorers the effect of different data representations on the performance of neural network and regression was investigated on different datasets that have binary or Boolean class target (Siraj, et al.).

#### XII Preface

Chapter 5 proposes employing rough set theory to obtain the knowledge about the distance protective relay under supervised learning (Othman and Aris).

Preface XI

**Assoc.Prof.Dr. Adem Karahoca**

Turkey

Bahcesehir University, Engineering Faculty,

Head of Software Engineering Department, Istanbul,

Chapter 15 compares the perfomances of recently developed data mining algorithms for adaptive intrusion detection, which scale up the detection rates (DR) for different types of network intrusions and reduce the false positives (FP) at acceptable level in

Chapter 16 reviews botnet detection, attack types caused by botnets and well-known botnet classifications and diverse types of detection techniques (Alparslan, et al.).

Chapter 17 proposes applying data mining for forecasting the wind power plants'

I would like to thank all the authors for their contributions, and the InTech Publishing team for their helpful assistance, and last but not least, the people who help disseminate this book. I hope that this book will inspire readers to pursue education

intrusion detection (Farid, et al.).

energy production (Bâra and Lungu).

and research in emerging field of data mining.

Chapter 6 presents an unsupervised classification for hyperspectral remote sensing image. It can effectively extract the low reflectance object such as vegetation in shadowed region or water from hyperspectral image using spectral data mining (Wen, et al.).

Chapter 7 demonstrates that there are some parameters to be considered in the choice of visualization techniques, which are: data type, task type, data scalability, data dimension and position of the attributes in the display. Also presents the application of geometrical and iconographic techniques over results of clustering algorithm with the objective of illustrate the contribution of the guidelines (Madalena, et al.).

Chapter 8 proposes two computing frameworks for large-scale data mining via tree structured data analysis framework and parallel machine learning framework (Yanai and Yanase).

#### **Section 2. Data Mining Applications**

Chapter 9 discusses the task of modulation classication in cognitive radio. The modulation classication becomes fundamental, since this information allows the RC to adapt its transmission parameters for the spectrum to be shared efciently, without causing interference to other users (Freitas, et al.).

Chapter 10 proposes a methodology combining ARIMA and Artificial Neural Network (ANN) for short-term energy price prediction multi-step-ahead in the Brazilian market (Filho, et al.).

Chapter 11 offers data mining application for short term electricity load forecasting to organize supply and demand fluctuations (Razak, et al.).

Chapter 12 proposes a set of ve adaptive ATM interfaces, which are adapted to the behavior of an ATM customer population (Shaikh and Mahmood).

Chapter 13 benchmarks the different data mining techniques to obtain best speech recognition (Venkateswarlu, et al.).

Chapter 14 suggests remote sensing data. Elevation as a variable is considered in spatial distribution of snow, vegetation will be used as an indicator. Also chapter investigates (1) quantification of SCA (Snow-Cover Area per unit area of elevation band) – Elevation relations and NDVI-Elevation relations; (2) comparisons among Snow-Elevation-Vegetation relations, to obtain a better understanding of snowcovered area vary with elevation, and the relation to vegetation by using a quantitative method to detect the Snow-NDVI-Elevation relation (Jie).

Chapter 15 compares the perfomances of recently developed data mining algorithms for adaptive intrusion detection, which scale up the detection rates (DR) for different types of network intrusions and reduce the false positives (FP) at acceptable level in intrusion detection (Farid, et al.).

X Preface

et al.).

and Yanase).

**Section 2. Data Mining Applications**

Brazilian market (Filho, et al.).

recognition (Venkateswarlu, et al.).

causing interference to other users (Freitas, et al.).

organize supply and demand fluctuations (Razak, et al.).

method to detect the Snow-NDVI-Elevation relation (Jie).

behavior of an ATM customer population (Shaikh and Mahmood).

Chapter 5 proposes employing rough set theory to obtain the knowledge about the

Chapter 6 presents an unsupervised classification for hyperspectral remote sensing image. It can effectively extract the low reflectance object such as vegetation in shadowed region or water from hyperspectral image using spectral data mining (Wen,

Chapter 7 demonstrates that there are some parameters to be considered in the choice of visualization techniques, which are: data type, task type, data scalability, data dimension and position of the attributes in the display. Also presents the application of geometrical and iconographic techniques over results of clustering algorithm with

Chapter 8 proposes two computing frameworks for large-scale data mining via tree structured data analysis framework and parallel machine learning framework (Yanai

Chapter 9 discusses the task of modulation classication in cognitive radio. The modulation classication becomes fundamental, since this information allows the RC to adapt its transmission parameters for the spectrum to be shared efciently, without

Chapter 10 proposes a methodology combining ARIMA and Artificial Neural Network (ANN) for short-term energy price prediction multi-step-ahead in the

Chapter 11 offers data mining application for short term electricity load forecasting to

Chapter 12 proposes a set of ve adaptive ATM interfaces, which are adapted to the

Chapter 13 benchmarks the different data mining techniques to obtain best speech

Chapter 14 suggests remote sensing data. Elevation as a variable is considered in spatial distribution of snow, vegetation will be used as an indicator. Also chapter investigates (1) quantification of SCA (Snow-Cover Area per unit area of elevation band) – Elevation relations and NDVI-Elevation relations; (2) comparisons among Snow-Elevation-Vegetation relations, to obtain a better understanding of snowcovered area vary with elevation, and the relation to vegetation by using a quantitative

the objective of illustrate the contribution of the guidelines (Madalena, et al.).

distance protective relay under supervised learning (Othman and Aris).

Chapter 16 reviews botnet detection, attack types caused by botnets and well-known botnet classifications and diverse types of detection techniques (Alparslan, et al.).

Chapter 17 proposes applying data mining for forecasting the wind power plants' energy production (Bâra and Lungu).

I would like to thank all the authors for their contributions, and the InTech Publishing team for their helpful assistance, and last but not least, the people who help disseminate this book. I hope that this book will inspire readers to pursue education and research in emerging field of data mining.

> **Assoc.Prof.Dr. Adem Karahoca** Bahcesehir University, Engineering Faculty, Head of Software Engineering Department, Istanbul, Turkey

**Section 1** 

**Knowledge Discovery** 

**Section 1** 

**Knowledge Discovery** 

**Chapter 1** 

© 2012 Khan et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 Khan et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Towards the Formulation of a Unified Data** 

**Mining Theory, Implemented by Means of** 

Dost Muhammad Khan, Nawaz Mohamudally and D. K. R. Babajee

Data mining techniques and algorithms encompass a variety of datasets like medical, geographical, web logs, agricultural data and many more. For each category of data or information, one has to apply the best suited algorithm to obtain the optimal results with highest accuracy. This is still a problem for many data mining tools as no unified theory has been adopted. The scientific community is very much conscious about this problematical issue and faced multiple challenges in establishing consensus over a unified data mining theory. The researchers have attempted to model the best fit algorithm for specific domain areas, for instance, formal analysis into the fascinating question of how overfitting can happen and estimating how well an algorithm will perform on future data that is solely based on its training set error (Moore Andrew W., 2001, Grossman. Robert, Kasif. Simon, et

Another problem in trying to lay down some kind of formalism behind a unified theory is that the current data mining algorithms and techniques are designed to solve individual consecutive tasks, such as classification or clustering. Most of the existing data mining tools are efficient only to specific problems, thus the tool is limited to a particular set of data for a specific application. These tools depend again on the correct choice of algorithms to apply and how to analyze the output, because most of them are generic and there is no context specific logic that is attached to the application. A theoretical framework that unifies different data mining tasks including clustering, classification, interpretation and association rules will allow developer and researchers in their quest for the most efficient and effective tool commonly called a unified data mining engine (UDME), (Singh. Shivanshu K., Eranti. Vijay Kumer. & Fayad. M.E., 2010, Das. Somenath 2007 & Khan. Dost Muhammad,

**Multiagent Systems (MASs)** 

Additional information is available at the end of the chapter

al, 1998, Yang. Qlang, et al, 2006 & Wu. Xindong, et al 2008).

http://dx.doi.org/10.5772/47984

Mohamudally. Nawaz, 2010).

**1. Introduction** 
