**6. Conclusion and future work**

There are a number of cases resulting in poor classification performance, such as the following:

**Dataset SVM DT KNN RF AdaBoost**

Ozone (1 h) 103 2 0.05 1 17 Euclidean distance 10 5 10 10 Ozone (8 h) 103 5 0.55 1 5 Chebyshev distance 10 5 100 10 Leaf 100 1 0.05 2 1 Manhattan distance 60 0 100 10 Eucalyptus 101 1 0.15 2 9 Manhattan distance 50 2 80 40 Forest type 100 1 0.15 3 11 Manhattan distance 50 11 10 10 Cloud 100 1 0.05 1 17 Euclidean distance 100 0 100 40

**C E C M N Distance metric I K I P**

For example, because the number of instances in "cloud" dataset is very few (due to the insufficient number of instances), inferior results are obtained for most of the applied algorithms as expected. However, even in such cases while some algorithms fail, some others manage to perform well (e.g., C4.5 DT 82%). In this situation, the classifier's performance can also be enhanced by applying ensemble learning methods as in the case of AdaBoost with 84% classification accuracy for the same dataset. AdaBoost is a powerful ensemble learning algorithm because its distribution update step ensures that instances misclassified by the previous classifier are more likely to be included in the training data of the next classifier with the chance

Due to the fact that classification accuracy as a performance metric is not just enough to decide whether a learner is considerably good or not, the precision, recall, and f-measure values were also calculated for each model (**Table 5**). It is also clear from the table values that applying ensemble strategies compared to single learners makes more sense in terms of clas-

• In case of the presence of either noisy or missing data

**Table 4.** Optimum classifier parameters corresponding to each dataset.

• If there is an insufficient number of instances available

• If the algorithm parameters are not correctly determined

• If there are too many number of classes

• If the feature dependencies are ignored

• If the feature selection is not well performed

• If a complex relationship is inherent

12 Data Mining

• If the class labels are imbalanced

of further enhancement.

sifier performance.

This study aims to provide helpful guidelines for future applications by presenting the advantages and challenges of ensemble-based environmental data mining and comparing alternative ensemble strategies through experimental studies. It compares four different ensemble strategies for environmental data mining: (i) bagging, (ii) bagging combined with random feature subset selection, (iii) boosting, and (iv) voting. In the experimental studies, ensemble methods are tested on different real-world environmental datasets.

[5] Lei KS, Wan F. Applying ensemble learning techniques to ANFIS for air pollution index prediction in Macau. In: International Symposium on Neural Networks (ISNN'12); 11-14

Ensemble Methods in Environmental Data Mining http://dx.doi.org/10.5772/intechopen.74393 15

[6] Singh KP, Gupta S, Rai P. Identifying pollution sources and predicting urban air quality using ensemble learning methods. Atmospheric Environment. 2013;**80**:426-437. DOI:

[7] Granata F, de Marinis G. Machine learning methods for wastewater hydraulics. Flow Measurement and Instrumentation. 2017;**57**:1-9. DOI: 10.1016/j.flowmeasinst.2017.08.004 [8] Budka M, Gabrys B, Ravagnan E. Robust predictive modelling of water pollution using biomarker data. Water Research. 2010;**44**:3294-3308. DOI: 10.1016/j.watres.2010.03.006 [9] Rodriguez-Galiano V, Mendes MP, Garcia-Soldado MJ, Chica-Olmo M, Ribeiro L. Predictive modeling of groundwater nitrate pollution using random Forest and multisource variables related to intrinsic and specific vulnerability: A case study in an agricultural setting (Southern Spain). Science of the Total Environment. 2014;**476**:189-206. DOI:

[10] Heung B, Hodúl M, Schmidt MG. Comparing the use of training data derived from legacy soil pits and soil survey polygons for mapping soil classes. Geoderma. 2017;**290**:

[11] Halmy MWA, Gessler PE. The application of ensemble techniques for land-cover classification in arid lands. International Journal of Remote Sensing. 2015;**36**:5613-5636. DOI:

[12] Wang Q, Xie Z, Li F. Using ensemble models to identify and apportion heavy metal pollution sources in agricultural soils on a local scale. Environmental Pollution. 2015;**206**:

[13] Crimmins SM, Dobrowski SZ, Mynsberge AR. Evaluating ensemble forecasts of plant species distributions under climate change. Ecological Modelling. 2013;**266**:126-130.

[14] Engler R, Waser LT, Zimmermann NE, Schaub M, Berdos S, Ginzler C, Psomas A. Combining ensemble modeling and remote sensing for mapping individual tree species at high spatial resolution. Forest Ecology and Management. 2013;**310**:64-73. DOI: 10.1016/j.

[15] Healey SP, Cohen WB, Yang Z, Brewer CK, Brooks EB, Gorelick N, et al. Mapping forest change using stacked generalization: An ensemble approach. Remote Sensing of

[16] Gaál M, Moriondo M, Bindi M. Modelling the impact of climate change on the Hungarian wine regions using random forest. Applied Ecology and Environmental Research.

[17] Nelson TA, Coops NC, Wulder MA, Perez L, Fitterer J, Powers R, Fontana F. Predicting climate change impacts to the Canadian Boreal forest. Diversity. 2014;**6**:133-157. DOI:

Environment. 2018;**204**:717-728. DOI: 10.1016/j.rse.2017.09.029

2012;**10**:121-140. DOI: 10.15666/aeer/1002\_121140

July 2012. Berlin, Heidelberg: Springer; 2012. pp. 509-516

10.1016/j.atmosenv.2013.08.023

10.1016/j.scitotenv.2014.01.001

10.1080/01431161.2015.1103915

51-68. DOI: 10.1016/j.geoderma.2016.12.001

227-235. DOI: 10.1016/j.envpol.2015.06.040

DOI: 10.1016/j.ecolmodel.2013.07.006

foreco.2013.07.059

10.3390/d6010133

In the future, the following studies can be carried out:

