**5. Results and discussion**

**Figure 2.** Support vector machine structure.

*<sup>X</sup>* <sup>∈</sup>*<sup>R</sup> <sup>m</sup>*, {*XK* , *YK* }*<sup>K</sup>* =1

106 Water Stress in Plants

where ξk and *ξ<sup>k</sup>*

as:

In this method, the input vectors are considered as supports forming the backbone of the whole model structure through a training process. If *N* samples of the population given by

> *fx W X b* () ( ) = + f

introduced the convex optimization problem with an insensitivity loss function [40]:

\*2 \* <sup>1</sup> minimize , , , ( ) <sup>2</sup>

*wb W C*

\*

*k kk*

error tolerance , and *C* is a positive trade-off parameter that determines the degree of the empirical error in the optimization problem. Following previous researchers [41, 42], the

*W X bY k N*

e x

 ex

*kk k*

ì ü - -£+ ï ï í ý +- £+ = ï ï <sup>³</sup> î þ

( ) subject to ( ) 1, 2, , , 0

*k k*

x x

*T*

*YW X b*

x x

f

*T*

f

where *X* is an input parameter with *m* components and *Y* is its response output variable, *W* is a weight vector, *b* represents a bias, and *φ* is a transfer function which exhibits nonlinear behavior, mapping the input vectors into a higher dimensional space. As these mapped vectors can compromise the complex nonlinear regression of the input space, Cortes and Vapnik

*<sup>N</sup>* , *Y* ∈*R*, a function or SVM estimator on a regression can be considered

1

= =

x x

*k k*

*k*

*k N*

\*

\* are slack variables that penalize training errors by the loss function over the

(3)

+ - å (4)

L (5)

The prime objective of using phase space reconstruction was to find a proper lag time for developing the models in this study. In order to have a comprehensive understanding of model performance, GEP models were defined by all lag times up to the optimum value determined for water demand in the CKD. The AMI calculations of the water demand in the CKD resulted in a lag time of 3 months. **Figure 3** shows that the first local minimum point occurs at 3 months,

**Figure 3.** Average mutual information (AMI) for water demand.

**Figure 4.** Phase space diagram lag times (1–3 months).

allowing the AMI an optimum lag time for phase space reconstruction (*τ* = 0.6591 for 2 months, *τ* = 0.5073 for 3 months).

**Figure 4a–c** shows the phase space diagrams of water demand for *τ* = 1, 2, and 3 months, respectively. Each figure represents the state of WDS demand at the given time. The evolution of phase space in this time series was given by reconstructing a pseudo phase space in which the demand of CKD, a nonlinear system, was considered by its self-interaction using AMI [43]. **Figure 4c** (*τ* =3) has a more regular pattern in comparison with the other two previous states of phase space (*τ* = 1, 2; **Figure 4a** and **b**, respectively), showing a lag time of 3 months to be optimum.

Prior to analysis with GEP models, a correlation table between the explanatory variables and water demand provided a better understanding of how to define the input factors (**Table 3**). The correlations were 0.92, 0.84, −0.83, 0.11, and −0.01 for *D* vs. *T*, *D* vs. HOR, *D* vs. RH, *D* vs. *P*, and *D* vs. *R*, respectively. Interestingly, water demand was highly correlated to temperature and hotel occupancy rate in CKD, showing the periodic cycle of demand due to seasonal changes. This research, however, employed all input factors in evolving the GEP models.


*D*, demand; *P*, population; HOR, hotel occupancy factor; *T*, temperature, RH, relative humidity, and *R*, rainfall.

**Table 3.** Correlation between water demand and factors impacting demand.

**Table 4** shows all 27 GEP models developed in the present study. Three superior models were highlighted in each category or classification of determinants. Interestingly, a lag time of 3 months outperformed other combinations in all different classifications which show the importance of using phase space construction in studying complex systems. This shows that an appropriate lag time determined by AMI can significantly improve the performance of the forecasting model. Different genetic operators were also used to understand which mathe‐ matical operations better define the nature of these determinants. The first operator {+, −, *x*} showed better performance in the first two classifications, i.e., for demand based and demand plus climatic info based categories. The second operator (OP2) {+, −, *x*, *x*<sup>2</sup> , *x*<sup>3</sup> } outperformed other operators in (OP3) (demand + socioeconomic + climatic information) of input parameters in which socioeconomic factors were included. It is interesting that using more complex mathematical operations, as in OP3 {+, −, *x*, *x*<sup>2</sup> , *x*<sup>3</sup> , √, *ex* , log, ln} consistently reduced the quality of the models' performance. This showed that water demand forecasting could be reasonably explained by models using basic mathematical operations despite its complexity. Used to investigate the sensitivity of the models to determinant classification, the genetic operator, and lag time, the performance indices of MAE and RMSE did little to distinguish among the best performing models (*M*1*D*3OP1, *M*2*D*3OP1, and *M*3*D*3OP2) in each category, i.e., MAE = 0.304, 0.3035, and 0.291, respectively, and RMSE = 0.3984, 0.3664, and 0.3660. While *R*<sup>2</sup> values showed M2 and M3 models to slightly outperform M1 models, plotting observed and predicted demand over time, as well as scatter plots of observed vs. predicted demand served to further delineate differences in performance (**Figure 5**). Comparing cumulative water demand calculated by each of the three top models to observed values showed the *M*1*D*3OP1 and *M*3*D*3OP2 models to be more accurate than *M*2*D*3OP1 (**Figure 6**). In order to distinguish between *M*1*D*3OP1 and *M*3*D*3OP2 a plot of cumulative (observed – predicted) was plotted (**Figure 7**). This

allowing the AMI an optimum lag time for phase space reconstruction (*τ* = 0.6591 for 2 months,

**Figure 4a–c** shows the phase space diagrams of water demand for *τ* = 1, 2, and 3 months, respectively. Each figure represents the state of WDS demand at the given time. The evolution of phase space in this time series was given by reconstructing a pseudo phase space in which the demand of CKD, a nonlinear system, was considered by its self-interaction using AMI [43]. **Figure 4c** (*τ* =3) has a more regular pattern in comparison with the other two previous states of phase space (*τ* = 1, 2; **Figure 4a** and **b**, respectively), showing a lag time of 3 months to be

Prior to analysis with GEP models, a correlation table between the explanatory variables and water demand provided a better understanding of how to define the input factors (**Table 3**). The correlations were 0.92, 0.84, −0.83, 0.11, and −0.01 for *D* vs. *T*, *D* vs. HOR, *D* vs. RH, *D* vs. *P*, and *D* vs. *R*, respectively. Interestingly, water demand was highly correlated to temperature and hotel occupancy rate in CKD, showing the periodic cycle of demand due to seasonal changes. This research, however, employed all input factors in evolving the GEP models.

*D T R RH P HOR*

**Table 4** shows all 27 GEP models developed in the present study. Three superior models were highlighted in each category or classification of determinants. Interestingly, a lag time of 3 months outperformed other combinations in all different classifications which show the importance of using phase space construction in studying complex systems. This shows that an appropriate lag time determined by AMI can significantly improve the performance of the forecasting model. Different genetic operators were also used to understand which mathe‐ matical operations better define the nature of these determinants. The first operator {+, −, *x*} showed better performance in the first two classifications, i.e., for demand based and demand

other operators in (OP3) (demand + socioeconomic + climatic information) of input parameters in which socioeconomic factors were included. It is interesting that using more complex

> , *x*<sup>3</sup> , √, *ex*

of the models' performance. This showed that water demand forecasting could be reasonably

, *x*<sup>3</sup>

, log, ln} consistently reduced the quality

} outperformed

*D* 1.00 0.92 −0.01 −0.83 0.11 0.84 *T* 0.92 1.00 0.10 −0.89 0.00 0.92 *R* −0.01 0.10 1.00 −0.05 −0.26 0.11 *RH* −0.83 −0.89 −0.05 1.00 0.02 −0.84 *P* 0.11 0.00 −0.26 0.02 1.00 −0.09 *HOR* 0.84 0.92 0.11 −0.85 −0.09 1.00 *D*, demand; *P*, population; HOR, hotel occupancy factor; *T*, temperature, RH, relative humidity, and *R*, rainfall.

**Table 3.** Correlation between water demand and factors impacting demand.

mathematical operations, as in OP3 {+, −, *x*, *x*<sup>2</sup>

plus climatic info based categories. The second operator (OP2) {+, −, *x*, *x*<sup>2</sup>

*τ* = 0.5073 for 3 months).

optimum.

108 Water Stress in Plants

**Figure 5.** Observed and predicted demand over time (left), and scatter plots of observed vs. predicted demand (right) using superior GEP models: (a) *M*1*D*3OP1; (b) *M*2*D*3OP1; c) *M*3*D*3OP2.

showed model *M*3*D*3OP2 to be the best given the lesser fluctuations in errors and a consistent pattern throughout the plot's time period. This better performance may be attributable to the combination of socioeconomic factors with demand and climatic data; this might having resulted in a more consistently accurate model, which lowered the error associated compared to the other two models.

**Figure 6.** Cumulative demand with time.



*\*M*1, Demand; *M*2, Demand + Climactic; *M*3, Demand + Climactic + Socioeconomic; *D*1, **τ** (lag) = 1 month; *D*2, **τ** = 2 months; *D*3, **τ** = 3 months; *OP*1, {+, −, x}; *OP*2, {+, −, x, x2, x3}; *OP*3, {+, −, x, x2, x3, √, ex, log, ln}; *R*<sup>2</sup> , coefficient of determination; MAE, mean absolute error; RMSE, root mean square error.

**Table 4.** Performance of GEP models.

showed model *M*3*D*3OP2 to be the best given the lesser fluctuations in errors and a consistent pattern throughout the plot's time period. This better performance may be attributable to the combination of socioeconomic factors with demand and climatic data; this might having resulted in a more consistently accurate model, which lowered the error associated compared

to the other two models.

110 Water Stress in Plants

**Figure 6.** Cumulative demand with time.

**Model ID\* Training Testing** 

**MAE RMSE**  *R***<sup>2</sup> MAE RMSE**  *R***<sup>2</sup>**

*M***1***D***1***OP***<sup>1</sup>** 0.4687 0.6974 0.6284 0.4833 0.6067 0.6343 *M***1***D***1***OP***<sup>2</sup>** 0.4718 0.6100 0.6252 0.4849 0.6120 0.6300 *M***1***D***1***OP***<sup>3</sup>** 0.4672 0.6118 0.6235 0.4800 0.6112 0.6281 *M***1***D***2***OP***<sup>1</sup>** 0.3552 0.4721 0.7754 0.378 0.4607 0.7892 *M***1***D***2***OP***<sup>2</sup>** 0.3574 0.4721 0.7756 0.3794 0.4608 0.7892 *M***1***D***2***OP***<sup>3</sup>** 0.3008 0.4049 0.8481 0.4188 0.5188 0.8346 *M***1***D***3***OP***<sup>1</sup> 0.3229 0.4317 0.8156 0.3040 0.3984 0.8452** *M***1***D***3***OP***<sup>2</sup>** 0.2858 0.3641 0.8691 0.3488 0.3106 0.8452 *M***1***D***3***OP***<sup>3</sup>** 0.3545 0.4647 0.7849 0.3637 0.4548 0.8029 *M***2***D***1***OP***<sup>1</sup>** 0.3777 0.4790 0.7735 0.4529 0.5296 0.7552 *M***2***D***1***OP***<sup>2</sup>** 0.3955 0.4933 0.7560 0.4423 0.5169 0.7546 *M***2***D***1***OP***<sup>3</sup>** 0.3914 0.4893 0.7903 0.4596 0.5488 0.7643

**Figure 7.** Cumulative (target-model) demand with time.

The superior GEP models from each classification were compared to SVM models implement‐ ing three different kernel functions (RBF, Poly, and Lin). Training and testing performance indices for the SVM models developed with each of the three kernel functions showed *Poly* kernel functions to outperform RBF and Lin functions (**Table 5**). The fact that Lin kernels performed poorly indicates that the nature of input parameters could not be considered using such functions. The *M*2*D*3Poly model was selected as the superior SVM model to be compared with the GEP models (**Figure 8**).

**Figure 8.** The best SVM model.


*\*M*1, Demand; *M*2, Demand + Climactic; *M*3, Demand + Climactic + Socioeconomic; *D*1, *τ* (lag) = 1 month; *D*2, *τ* = 2 months; *D*3, *τ* = 3 months; RBF, Poly, Lin *R*<sup>2</sup> , coefficient of determination; RMSE, root mean square error; *E*, Nash-Sutcliffe coefficient.

**Table 5.** Performance of SVM models.

## **6. Conclusion**

The superior GEP models from each classification were compared to SVM models implement‐ ing three different kernel functions (RBF, Poly, and Lin). Training and testing performance indices for the SVM models developed with each of the three kernel functions showed *Poly* kernel functions to outperform RBF and Lin functions (**Table 5**). The fact that Lin kernels performed poorly indicates that the nature of input parameters could not be considered using such functions. The *M*2*D*3Poly model was selected as the superior SVM model to be compared

with the GEP models (**Figure 8**).

112 Water Stress in Plants

**Figure 8.** The best SVM model.

In an attempt to improve model prediction accuracy, a wide range of modeling techniques has been proposed by researchers over recent years in the water demand forecasting field. The present research explored a new approach to modeling water demand, namely genetic expression programming along with phase space reconstruction. In this method, input factors are not randomly chosen as in previous studies. Instead, appropriate lag time determinations made by the AMI method defined the structure of the explanatory variables employed in the models. The outcome of this research demonstrated GEP models to be highly sensitive to classification of input factors, proper lag time, and selection of genetic operators. In general, soft computing techniques like GEP should receive more attention in forecasting behaviors of complex systems such as WDS. These models can offer valuable information to WDS operators and designers to deploy optimum determinants in their forecast models. The three best GEP models proposed in this research were compared using different performance indices, however, differentiating between them was difficult due to the similarity in statistical index values. One of three GEP models was selected due to lower cumulative error in predicting demand and less fluctuation in comparison with the other two GEP models. However, these models were slightly outperformed by a SVM model, which showed even better performance indices. This shows that both GEP and SVM can be useful techniques in water demand forecasting and can account for nonlinearity of the input parameters
