Prediction of Monthly Streamflow Using Extreme Gradient Boosting and Extreme Learning Machine

*Sinan Jasim Hadi, Arkan J. Hadi, Kamaran S. Ismail, Mohammad Ali Ghorbani and Mustafa Tombul* 

#### **Abstract**

 Streamflow is an essential part of the hydrologic cycle. The prediction of streamflow is important in most of the water resource management applications. In this study, the performance of two data-driven models, namely, extreme learning machine (ELM) and extreme gradient boosting (XGB), in predicting the streamflow 1 month ahead is evaluated. A basin located in the southeast of Turkey was selected as an application. Downstream flow is predicted 1 month ahead by using the optimum lags of downstream itself, upstream, rainfall, temperature, and potential evapotranspiration as input variables. Using these variables, several input combinations were developed to identify the best combination. The results showed that using the variables beside the lagged downstream flow increases the model performance. For example, using only lagged downstream flow, only Nash-Sutcliffe 0.335 and RMSE 26.851 m3 /s can be obtained, while using all the variable as inputs, Nash-Sutcliffe 0.671 and RMSE 18.879 m3 /s are obtained by using XGB. This study found that XGB outperformed the ELM although the former is a tree-based model.

**Keywords:** streamflow prediction, extreme learning machine, extreme gradient boosting, data-driven models

#### **1. Introduction**

One of the crucial components in the global- and regional-scale hydrologic cycle is streamflow [1–3]. The streamflow is highly related to the flood and drought disasters, and it is the main source of fresh water. Consequently, the highly accurate streamflow forecasting, especially in the regions that are vulnerable to floods and droughts, is very important for managing the water resources efficiently [1, 4]. Streamflow forecasting is done in two categories: short-term and long-term forecasts. Short-term forecast can be hourly or daily, and that is important in flood and warning system applications, while long-term forecast which can be monthly or annual is important for the operation of the reservoir, the transportation of sediments, and making decisions in irrigation management [5–7].

Hydrologic models in general are divided into two categories: physical-based model and data-driven model (DDM). A physical-based model formulation is

 implemented according to the physical process interaction between the hydrological elements. DDMs are driven from the data in which a model forms the relationship between the inputs and the outputs regardless of the physical interaction. Several DDMs have been developed around the globe and used for hydrologic modeling in general and streamflow in particular such as autoregression, artificial neural network (ANN), adaptive neuro-fuzzy inference system (ANFIS), support vector machine, and genetic programming [8].

The selection of the model is important in relation with the study area and its physical characteristics. From the hydrologic point of view, another important factor in the selection of the model is the number of variables available that can be used as predictands and their length. As an example, ANFIS model creates rules between the variables and the number of these rules that highly depends on the variables used and their amount, and as a consequence, the computational cost could be very high [1, 8, 9]. Such an issue also exists in other models such as ANN.

 ANN has been the most widely used DDMs in all models and especially in hydrologic models due its ability in modeling the nonlinear relationships and its fast performance. There are numerous numbers of studies that applied the ANN on streamflow modeling [2, 3, 8, 10–13]. Although ANN is considered as fast in comparison to the aforementioned DDMs, it is criticized by several shortcomings: long tuning process of the back-propagation algorithm, the local minima, and over fitting. Extreme learning machine (ELM) which is firstly proposed by [14] overcomes these shortcomings by its calculation and structure which consists of single-hidden layer feed-forward networks (SLFNs) of ANN. ELM has been also used for predicting the streamflow by several researchers such as [7, 15–17].

Another DDM, namely, extreme gradient boosting (XGB), was proposed by [18]. According to the knowledge of the authors, XGB model has not been used in predicting the streamflow in the literature. Therefore, the objective of this study is to compare the performance of the well-tested ELM method with the new XGB in predicting the monthly streamflow 1 month ahead.

#### **2. Study area and data**

 A basin shown in **Figure 1** located in the southeast of Turkey, namely, Goksu-Gokdere or 1805 (i.e., its code), is selected as an application to examine the performance of the ELM and XGB in predicting the monthly streamflow. The basin is located in Seyhan basin and situated between 37°36′07"N–38°17′20"N and 35°34′44"E–36°06'45"E. The physical characteristics of the basin are as follows: the average slope is 23%, the longest water path is equal to 192 km, the area of the basin is about 1790 km<sup>2</sup> , and the highly varying elevation is 319–2967 m.

 Daily streamflows measured at downstream which is to be predicted and at the upstream (**Figure 1**) were collected for the period 1973–1994 from the Ministry of Forests and Water Affairs—general directory of water affairs. The daily streamflow data were converted monthly. Due to the nonexistence of a meteorological station inside the basin, the rainfall and temperature data were interpolated using inverse distance weighting which proved to be the best method in this area [18] by using the observations of 17 stations around the basin. The monthly potential evapotranspiration was also obtained from the CruTS3.23 data which were assessed in the local area as reliable data [19].

*Prediction of Monthly Streamflow Using Extreme Gradient Boosting and Extreme Learning… DOI: http://dx.doi.org/10.5772/intechopen.87836* 

**Figure 1.**  *Study area.* 

#### **3. Methodology**

The number of lags must be identified before starting developing the model's combination in the case of using several parameters as inputs. In order to identify the optimum number of lags, the autocorrelation function (ACF) and cross correlation function (CCF) were used. According to the results of ACF and CCF, the lags having correlation with downstream flow (i.e., to be predicted) are identified as follows: 2, 2, 1, 1, and 1 for downstream, upstream flow, evapotranspiration, temperature, and rainfall. Several combinations were developed in order to find the best combination that can be used to predict the streamflow (**Table 1**). The different combinations were entered into ELM and XGB. In all the models, the output is the downstream flow 1 month ahead which is to be predicted, while the inputs are downstream (DS) which represents the affecting lags, upstream (US), rainfall (R), temperature (T), and/or potential evapotranspiration (E). The evaluation of model performance of the combinations is implemented using Nash-Sutcliffe coefficient (NC) and root-mean-square error (RMSE). The entire data were normalized before being modeled and then were denormalized after getting the predicted values. The normalization was done using

$$\mathbf{y} = (\mathbf{1} - \mathbf{O}.\mathbf{1}) \frac{\boldsymbol{\varkappa} - \boldsymbol{\varkappa}\_{\min}}{\boldsymbol{\varkappa}\_{\max} - \boldsymbol{\varkappa}\_{\min}} + \mathbf{O}.\mathbf{1} \tag{1}$$

 where *xmin* and *xmax* are the minimum and the maximum values of the variable *x*, respectively, and *y* is the normalized value. 1 and 0.1 is the maximum and minimum values of the normalized values.


#### **Table 1.**

*Combinations of the developed models.* 

#### **4. Results and discussion**

After the different combinations were entered into ELM and XGB one by one and for every combination, the best parameters were obtained. For ELM only the best number of the hidden neuron in the hidden layer (not reported here) while for XGB several parameters including maximum depth, eta, gamma, minimum child weight, and subsample were optimized using hyper parameter optimization algorithm. The optimized parameters of the XBG are shown in **Table 2**.

The results of the ELM listed in **Table 3** show that the use of only the lagged downstream flow in predicting the 1 month ahead implies a weak model with low NC and high RMSE 0.504 and 37.135 m3 /s, respectively. The inclusion of other variables increases the performance of the model. Considering only the test subset in evaluating the performance, DSUSRE combination which consists of downstream, upstream, rainfall, and evapotranspiration has the best performance with NC 0.67 but not the best based on RMSE 30.060 m3 /s.

**Table 4** which contains the results of the performance of XGB in the different combinations shows that the combination with only the downstream also has the lowest performance with NC 0.335 and RMSE 26.851 m3 /s. The best-performed


#### **Table 2.**

*The XGP optimized parameters.* 


**Table 3.**  *Results of ELM model.* 

#### *Prediction of Monthly Streamflow Using Extreme Gradient Boosting and Extreme Learning… DOI: http://dx.doi.org/10.5772/intechopen.87836*

 combination is DSUSRET which contains the entire input variables with NC 0.671 and RMSE 18.879 m3 /s. Hence, using the other variables rather than the lagged downstream flow itself improves the model performance dramatically especially when the upstream is added to the downstream. This is expected as the downstream is a consequence of the upstream, and their relation is inversely proportional with the distance between the measuring points.

 Considering both training and testing subsets, the DSUSRET combination which contains the entire input variables using XGB is the best performance. Therefore, it is chosen for the comparison between ELM and XGB models. **Figure 2** represents the Taylor diagram of the last combination for both ELM and XGB by considering the entire dataset. According to the figure, XGB is closer than ELM to the observed data by having higher correlation, less constant RMSE, and close standard deviation.

 **Figures 3** and **4** which show the scatter plot of the observed versus the predicted values of the DSUSRET combination of ELM and XGB, respectively, including both training and test subsets, indicate that XGB has more accurate performance than ELM as the points are more scattered from the 1:1 line in ELM. The correlation which calculated for all the dataset is higher for XGB 0.92, while it is lower for ELM 0.85.

 Regarding the best-performed combinations, DSUSRE combination which contains all the variables except the temperature is identified as the best-performed


### **Table 4.**

*Results of XGB model.* 

**Figure 2.**  *Taylor diagram of DSUSRET combination of ELM and XGB.* 

**Figure 3.**  *Observed vs. predicted of ELM (m3 /s) for training and test subsets.* 

**Figure 4.**  *Observed vs. predicted of XGB (m3 /s) for training and test subsets.* 

combination in the ELM model. This is due to that the prediction of the downstream flow requires the information of all the used variables except the temperature because the information needed to be obtained from the temperature are also available in the evapotranspiration as the evapotranspiration is calculated using the temperature. In the XGB model, the situation is different as the best-performed model is DSUSRET which contains all the variables, and that is due to the tree nature of the model in which it chooses only the important inputs and separates them from those who have no contribution in the prediction of the model.

#### **5. Conclusion**

Extreme learning machine (ELM) and extreme gradient boosting (XGB) performance in predicting streamflow 1 month ahead are evaluated and compared in this study. Downstream flow 1 month ahead is used as output, while lagged

*Prediction of Monthly Streamflow Using Extreme Gradient Boosting and Extreme Learning… DOI: http://dx.doi.org/10.5772/intechopen.87836* 

downstream, upstream flow, rainfall, temperature, and potential evapotranspiration are used as inputs with different combination. According to the results, it can be concluded that including variables beside the lagged downstream such as upstream, rainfall, potential evapotranspiration, and/or temperature can improve the model's performance dramatically. ELM is a very fast algorithm as it does not need any tuning process, while XGB is slower as it requires hyper parameters tuning to have them optimized. XGB is performed better than ELM in terms of predicting accuracy although XGB is a tree-based model. XGB has a very important characteristic, that is, the importance matrix which explicitly identifies the most and less important feature in the model. In the future research, this matrix can be used as a selection tool of the model inputs especially when there is a large number of inputs.

#### **Acknowledgements**

 The authors wish to thank the Turkish Republic General Department of Water Affairs—Ministry of Forest and Water Affairs and Turkish Republic General Department of Meteorological Affairs in Turkey for providing the data necessary to complete this work.

### **Author details**

Sinan Jasim Hadi1 \*, Arkan J. Hadi<sup>2</sup> , Kamaran S. Ismail3 , Mohammad Ali Ghorbani4 and Mustafa Tombul5

1 Department of Real Estate Development and Management, Ankara University, Ankara, Turkey

2 Department of Chemical Engineering, Soran University, Soran, Erbil, Iraq


5 Department of Civil Engineering, Eskişehir Technical University, Eskişehir, Turkey

\*Address all correspondence to: sinan.jasim@yahoo.com

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Hadi SJ, Tombul M. Monthly streamflow forecasting using continuous wavelet and multi-gene genetic programming combination. Journal of Hydrology. 2018;**561**:674-687

 [2] Liu Z, Zhou P, Chen G, Guo L. Evaluating a coupled discrete wavelet transform and support vector regression for daily and monthly streamflow forecasting. Journal of Hydrology. 2014;**519**:2822-2831

[3] Makkeasorn A, Chang N-B, Zhou X. Short-term streamflow forecasting with global climate change implications–A comparative study between genetic programming and neural network models. Journal of Hydrology. 2008;**352**(3):336-354

 [4] Kisi O, Cimen M. A wavelet-support vector machine conjunction model for monthly streamflow forecasting. Journal of Hydrology. 2011;**399**(1):132-140

[5] Guven A. Linear genetic programming for time-series modelling of daily flow rate. Journal of Earth System Science. 2009;**118**(2):137-146

[6] Yaseen ZM, El-Shafie A, Afan HA, Hameed M, Mohtar WHMW, Hussain A. RBFNN versus FFNN for daily river flow forecasting at Johor River, Malaysia. Neural Computing and Applications. 2016;**27**(6):1533-1542

[7] Yaseen ZM, Jaafar O, Deo RC, Kisi O, Adamowski J, Quilty J, et al. Streamflow forecasting using extreme learning machines: A case study in a semi-arid region in Iraq. Journal of Hydrology. 2016;**542**:603-614

[8] Hadi SJ, Tombul M. Forecasting daily streamflow for basins with different physical characteristics through datadriven methods. Water Resources Management. 2018;**32**(10):3405-3422

 [9] Hadi SJ, Tombul M. Streamflow forecasting using four wavelet transformation combinations approaches with data-driven models: A comparative study. Water Resources Management. 2018;**32**(14):4661-4679

[10] Shiau J-T, Hsu H-T. Suitability of ANN-based daily streamflow extension models: A case study of Gaoping River basin, Taiwan. Water Resources Management. 2016;**30**(4):1499-1513

[11] Kişi Ö. Neural networks and wavelet conjunction model for intermittent streamflow forecasting. Journal of Hydrologic Engineering. 2009;**14**(8):773-782

[12] Abdollahi S, Raeisi J, Khalilianpour M, Ahmadi F, Kisi O. Daily mean streamflow prediction in perennial and non-perennial rivers using four data driven techniques. Water Resources Management. 2017;**31**(15):4855-4874

[13] Isik S, Kalin L, Schoonover JE, Srivastava P, Lockaby BG. Modeling effects of changing land use/cover on daily streamflow: An artificial neural network and curve number based hybrid approach. Journal of Hydrology. 2013;**485**:103-112

[14] Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: Theory and applications. Neurocomputing. 2006;**70**(1-3):489-501

[15] Deo RC, Şahin M. An extreme learning machine model for the simulation of monthly mean streamflow water level in eastern Queensland. Environmental Monitoring and Assessment. 2016;**188**(2):90

[16] Li B, Cheng C. Monthly discharge forecasting using wavelet neural networks with extreme learning machine. SCIENCE

*Prediction of Monthly Streamflow Using Extreme Gradient Boosting and Extreme Learning… DOI: http://dx.doi.org/10.5772/intechopen.87836* 

CHINA Technological Sciences. 2014;**57**(12):2441-2452

 [17] Lima AR, Cannon AJ, Hsieh WW. Forecasting daily streamflow using online sequential extreme learning machines. Journal of Hydrology. 2016;**537**:431-443

[18] Chen T, Guestrin C, editors. Xgboost. A scalable tree boosting system. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 2016. California, USA: ACM; 2016

[19] Hadi SJ, Tombul M. Conversion of CruTS 3.23 data and evaluation of precipitation and temperature variables in a local scale. In: MATEC Web of Conferences. EDP Sciences. 2017;**120**:05007

**729**

**Chapter 59**

Soils

**Abstract**

943 cm3

of 944 cm3

ASTM D698.

**1. Introduction**

volume of 943 cm3

mold (with a volume of 943 cm3

*Mustafa Özer and Ahmet Erdağ*

Effect of the Mold Size on the

Compaction Parameters of the

In this study, the effect of the mold size on the compaction parameters of the soils was investigated. For this purpose, a coarse-grained (finer than 19.00 mm) and a fine-grained (finer than 4.75 mm) soil were used in the tests. Compaction tests were conducted according to the current version of the ASTM D698 standard. In order to determine the effect of mold size, each soil used in study was compacted with both 6 in and 4 in diameter molds (with a volume of 2124 and

 respectively). According to the results obtained, slightly greater values of density/unit weight were obtained with the 4 in diameter mold (with a volume

). These results are compatible with the results mentioned in the

**Keywords:** compaction, dry unit weight, mold size, optimum water content, proctor

In many engineering structures such as highway, railway, airfield, and earth dam, loose soils need to be compacted to increase their dry density. The meaning of the verb "compact" in soil mechanics is to press the soil particles tightly together by expelling air from void spaces [1]. Compaction process increases the strength characteristics and settlement properties of soils. Compaction is actually a rather cheap and effective way to improve the properties of a given soil [1]. In order to obtain proper compaction in situ, laboratory compaction tests must be conducted prior to in situ compaction. Maximum dry unit weight and optimum water content under given compaction effort are obtained from laboratory compaction test to guide to the in situ compaction. According to current ASTM D698 [2], two different molds in terms of size can be used in laboratory compac-

of the material to be tested. In ASTM D698 [2] three methods named Methods A, B, and C were identified in terms of mold size and coarseness of the material to be tested. In Method A, 101.6 mm (4 in) diameter mold (with a volume of 943 cm3

shall be used. This method may be used if 25% or less by mass of the material is retained on the No. 4 (4.75 mm) sieve. In Method B, 101.6 mm (4 in) diameter

or less by mass of the material is retained on the 3/8 in (9.5 mm) sieve. In Method

. The selection of the mold is made according to the coarseness

) shall be used. This method may be used if 25%

, while the other has a

)

tion tests. One of these molds has a volume of 2124 cm3

#### **Chapter 59**
