**3. Results and discussion**

In this section, the computation procedures, which are necessary for obtaining the ensemble model, are described. The ensemble model is proposed to have the following structure:

$$P\_{ensemble}^{\rangle} = \sum\_{\pi}^{\mu \text{ }1} \mathcal{B}\_{i}^{\text{ }\pi} P\_{i}^{\rangle} \tag{4}$$

where *β<sup>i</sup>* are the weights of the individual learners and *Pi t* is a vector of predicted flows by model *i* for day *j*. The harmony search method was used to determine the corresponding weights of individual models. Application of this method for 2-day ahead prediction of flows follows in the subsequent paragraphs.

One harmony consists of *n* members, where *n* is the number of models. In the case of this work, there are nine models present in the ensemble. All values of the weights *β<sup>i</sup>* are restricted to the interval ⟨0, 1⟩.

The problem solved should be defined by the objective function, which is proposed in this paper to have the following form:

$$O\_f = 1 - \left(1 - \frac{\sum\_{i=1}^{N} (O\_i - P\_j)^2}{\sum\_{i=1}^{N} (O\_i - \overline{O})^2} \right) + \left| \alpha - \sum\_{i=1}^{n} \beta\_i \right| \tag{5}$$

$$0 \le \beta\_i \le 1,\tag{6}$$

where *Pi* and *Oi* are computed and observed flows, *N* is the number of days and *O*¯¯ is average value of observed flows. Expression in the rounded parentheses is the Nash-Sutcliffe model efficiency coefficient (NSE). It was used in this study for evaluation of models efficiency because it is most often used to assess the predictive power of hydrological models. The NSE ranges from −∞ up to 1, where NSE = 1 means a perfect agreement between the observed and simulated data, i.e. closer the model efficiency is to 1, the more accurate the model is. The last component of the objective function (as an absolute value) forces the sum of the ensemble members' weights *β<sup>i</sup>* to be equal to *α*, which is a regularization constant, by default equal to 1. Only rarely in cases when the models are systematically underestimating or overestimating, the regulation constant could have a slightly different values (maximum ±0.05). In this work, the authors only used the default value 1, because a relatively good prediction could be expected from the state-of-the-art models used as the ensemble members. This objective function is proposed to be minimized. In the case of an ideal model, the value of the objective function is zero.

Harmony search algorithm parameters were set as follow: *HMS* (memory size) was set to 10; *HMCR* (the harmony memory's consideration rate) was set to 0.91 and *PAR*, i.e., the pitch adjustment rate, was set to 0.1. The maximum number of improvisations *NI* = 500,000.

Step 4. A new solution's objective function computation. If the new harmony has better value of the objective function than any harmony in the harmony memory, the worst harmony vector in harmony memory is replaced by this new harmony vector. Step 5. Repeat from Step 3 to Step 5 until termination criterion is satisfied. In this work, the harmony search stops if there is no improvement in an objective function during the last 500 iterations or if the total (predefined) number of iterations

In this section, the computation procedures, which are necessary for obtaining the ensemble model, are described. The ensemble model is proposed to have the following structure:

model *i* for day *j*. The harmony search method was used to determine the corresponding weights of individual models. Application of this method for 2-day ahead prediction of flows

One harmony consists of *n* members, where *n* is the number of models. In the case of this

The problem solved should be defined by the objective function, which is proposed in this

0 ≤ *β<sup>i</sup>* ≤ 1, (6)

value of observed flows. Expression in the rounded parentheses is the Nash-Sutcliffe model efficiency coefficient (NSE). It was used in this study for evaluation of models efficiency because it is most often used to assess the predictive power of hydrological models. The NSE ranges from −∞ up to 1, where NSE = 1 means a perfect agreement between the observed and simulated data, i.e. closer the model efficiency is to 1, the more accurate the model is. The last component of the objective function (as an absolute value) forces the sum of the ensemble members'

in cases when the models are systematically underestimating or overestimating, the regulation constant could have a slightly different values (maximum ±0.05). In this work, the authors only used the default value 1, because a relatively good prediction could be expected from the state-of-the-art models used as the ensemble members. This objective function is proposed to

be minimized. In the case of an ideal model, the value of the objective function is zero.

*<sup>N</sup>* (*Oi* <sup>−</sup> *<sup>O</sup>*¯¯)<sup>2</sup>) <sup>+</sup> <sup>|</sup>*<sup>α</sup>* <sup>−</sup> <sup>∑</sup>*<sup>i</sup>*=1

are computed and observed flows, *N* is the number of days and *O*¯¯ is average

to be equal to *α*, which is a regularization constant, by default equal to 1. Only rarely

work, there are nine models present in the ensemble. All values of the weights *β<sup>i</sup>*

∑*<sup>i</sup>*=1 *<sup>N</sup>* (*Oi* − *Pi* )2 \_\_\_\_\_\_\_\_\_\_ ∑*<sup>i</sup>*=1

*n i*=1 *β<sup>i</sup>* \* *Pi*

*t*

*<sup>j</sup>* (4)

is a vector of predicted flows by

*<sup>n</sup> βi*| (5)

are restricted

*<sup>j</sup>* = ∑

are the weights of the individual learners and *Pi*

is reached.

162 Time Series Analysis and Applications

where *β<sup>i</sup>*

where *Pi*

weights *β<sup>i</sup>*

to the interval ⟨0, 1⟩.

and *Oi*

**3. Results and discussion**

*Pensemble*

follows in the subsequent paragraphs.

paper to have the following form:

*Of* <sup>=</sup> <sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup>

One of the main issues which must be carefully considered is what exactly has to be data *Pi* , which will serve as inputs to the harmony search optimization objective function (5). As was previously stated, these are basically the computed values of the predicted flows by each model. While performing these computations, we are in a model building phase, and that is why only training data can be used. There are two possibilities evaluated in this study as to how to obtain such data. The first possibility is achieved using the following steps:


The problem of obtaining data *PR*,*C* by this methodology, if it is used for calculating ensemble weights, is that in this approach there is no mechanism that avoids overfitting of the final ensemble. Overfitting or a lack of generalization means that the weights of the models obtained could work well on the training data, but poorly on the testing set. Due to this problem, the authors also proposed a second option, which will be compared to the previous one:


the cross-validation, the number of rows of this input matrix *PR*,*<sup>C</sup>* is *R = N\*r* and the number of columns *C = n* + 1 (one column is the observed data). In this work, *n =* 9, *k =* 10, *N =* 535 (the data were reduced by the sampling!) and *r =* 5.

The ensemble models obtained from these two approaches are hereinafter identified as EHS1 for the first case and EHS2 for the second.

The process of assessing the performance of a hydrologic model involves making some estimates of the "closeness" of the simulated behavior of the model to observations (in our case, the streamflow). The most basic approach for assessing a model's performance is through a visual inspection of the simulated and observed hydrographs (**Figure 4**). An objective assessment requires the use of a mathematical estimate of the error between the simulated and observed hydrological variables. The predictive accuracy of the ensemble and its members was evaluated using the Nash-Sutcliffe coefficient of efficiency (NSE), the root mean square error (RMSE) and the correlation coefficient (*r*).

In **Table 2**, the root mean square error, correlation coefficient and Nash-Sutcliffe efficiency are evaluated for the ensemble members and the proposed ensembles. The identification of the models from their abbreviations in the heading of this table is possible. Two ensemble optimization approaches, which are identified as EHS1 and EHS2, are evaluated in **Table 2** and were described hereinbefore.

The selection of the appropriate settings for the ensemble members evaluated in **Table 2** is described in Section 2.2. A grid search was mostly used for the tuning; in some cases, the settings recommended in the scientific literature were applied. Regarding ensembles EHS1 and EHS2, it can be clearly seen that the hypothesis about the poor performance of the above-mentioned first proposition for obtaining matrix *PR*,*C* was confirmed. Ensemble model EHS1 performed well on the training data (with an NSE equal to 0.82, when an NSE of 0.79 was achieved by the best ensemble component, which was the GBM model), but on the testing set, which is evaluated in **Table 2**, the ensemble EHS1 gives worst results than most of the ensemble members. The ensemble approach to modeling is worth applying only in a case where the ensemble performs better than any of its members. If one considers the weights of the multilayer perceptron in ensemble EHS1, it is presumably inappropriately high (MLP are generally less precise


**Table 2.** Evaluation of the computations by *r* and NSE and the final values of the model weights in the ensembles.

models), which means that this model is overfitted and that the poor generalization is a consequence of the approach used for the development of the EHS1 model. To the contrary, according to **Table 2**, in which the testing data are evaluated, the results with a good generalization were achieved by ensemble EHS2. From now on, we will only speak about this second model.

the cross-validation, the number of rows of this input matrix *PR*,*<sup>C</sup>* is *R = N\*r* and the number of columns *C = n* + 1 (one column is the observed data). In this work, *n =* 9, *k =* 10, *N =* 535

The ensemble models obtained from these two approaches are hereinafter identified as EHS1

The process of assessing the performance of a hydrologic model involves making some estimates of the "closeness" of the simulated behavior of the model to observations (in our case, the streamflow). The most basic approach for assessing a model's performance is through a visual inspection of the simulated and observed hydrographs (**Figure 4**). An objective assessment requires the use of a mathematical estimate of the error between the simulated and observed hydrological variables. The predictive accuracy of the ensemble and its members was evaluated using the Nash-Sutcliffe coefficient of efficiency (NSE), the root mean square

In **Table 2**, the root mean square error, correlation coefficient and Nash-Sutcliffe efficiency are evaluated for the ensemble members and the proposed ensembles. The identification of the models from their abbreviations in the heading of this table is possible. Two ensemble optimization approaches, which are identified as EHS1 and EHS2, are evaluated in **Table 2** and

The selection of the appropriate settings for the ensemble members evaluated in **Table 2** is described in Section 2.2. A grid search was mostly used for the tuning; in some cases, the settings recommended in the scientific literature were applied. Regarding ensembles EHS1 and EHS2, it can be clearly seen that the hypothesis about the poor performance of the above-mentioned first proposition for obtaining matrix *PR*,*C* was confirmed. Ensemble model EHS1 performed well on the training data (with an NSE equal to 0.82, when an NSE of 0.79 was achieved by the best ensemble component, which was the GBM model), but on the testing set, which is evaluated in **Table 2**, the ensemble EHS1 gives worst results than most of the ensemble members. The ensemble approach to modeling is worth applying only in a case where the ensemble performs better than any of its members. If one considers the weights of the multilayer perceptron in ensemble EHS1, it is presumably inappropriately high (MLP are generally less precise

**GBM B\_GLM RF MLP MARS MLR SVM B\_GAM GLMNET EHS1 EHS2**

NSE 0.806 0.783 0.808 0.676 0.593 0.376 0.800 0.787 0.782 0.759 0.825 *r* 0.898 0.885 0.900 0.832 0.802 0.724 0.896 0.888 0.884 0.874 0.909 RMSE 13.575 14.371 13.519 17.548 19.661 24.355 13.788 14.219 14.410 9.684 8.247

0.128 0.011 0.190 0.549 0.021 0.022 0.032 0.003 0.045

0.134 0.056 0.379 0.034 0.083 0.021 0.218 0.029 0.046

**Table 2.** Evaluation of the computations by *r* and NSE and the final values of the model weights in the ensembles.

(the data were reduced by the sampling!) and *r =* 5.

for the first case and EHS2 for the second.

164 Time Series Analysis and Applications

error (RMSE) and the correlation coefficient (*r*).

were described hereinbefore.

Weights EHS1

Weights EHS2

Column nine of **Table 2** with the evaluation of the ensemble members could also be seen as a case study of the evaluation of these models. The models are ordered from best to worst, so they can be ranked and compared with each other. As could be expected for such a complicated process as the flow formation in a river is, this process was described more successfully by nonlinear models, especially by the recently developed boosting types of algorithms. However, when the weights of the models for the EHS2 ensemble in **Table 2** are considered, it can be seen that this order does not imply that the weights will also be ordered in the same way as precision. An efficient ensemble should consist of predictors that are not only sufficiently precise, but also diverse, i.e. ones that if make wrong predictions they make them at different parts of the input space, e.g. which are not highly correlated. The correlation of the models is evaluated in **Figure 3**.

From the conjoint consideration of **Table 2** (weights of models for the EHS2) and **Figure 3**, it can be seen that, after optimization of the weights, the best three models, the GBM, RF, and SVM, are included in the proposed ensemble with the highest contribution (their weights are the highest). But the next best model, the boosted GAM (B\_GAM), is included in the ensemble with a relatively small weight. That is because this model is highly correlated with the three best models mentioned and also with the GLMNET model. A similar case could also be observed with some other members of the ensemble. From this phenomenon, it could be evaluated that the optimization procedure, which was proposed in this paper, is searching for the best weights not only from the point of view of the best performance of the models but also is considering the diversity of the models as well, which is, as was mentioned,


**Figure 3.** Correlation between the simulated results obtained by the ensemble members and with the observed data.

**Figure 4.** Time series of the testing dataset of the observed flows and the same flows simulated by the proposed ensemble model in the year 1997.

not less important. The authors assume that this is mainly due to the procedure by which matrix *PR*,*<sup>C</sup>* was obtained for model EHS2. As could be expected, the smallest contribution to the EHS2 ensemble has its least precise member: the multiple linear regression (MLR).

In **Figure 4**, a time series graph of the testing dataset with the observed flows and the flows simulated by the proposed ensemble model is seen. As can be seen, the predicted flows follow the real values with a high degree of precision, and the proposed ensemble approach could be used as an innovative alternative for flow predictions.
