3. Methodology

#### 3.1. Case study and data information

#### 3.1.1. Understandings

Unlike natural water resources like rainfall, the lower percentage of drinking water which is change to waste water after use, back to the cycle. Water pressure in a pipeline, water quality, supply peak consumption time, pipeline maintenance, maintenance cost, specialist and educated human resources, pipeline failure management, etc. are the variables that all of them should be under control at the same time. Also, to develop an integrated long-term plan, availability of resources is crucial. Therefore, knowing about the value of consumption in a specific period is the first step for any management plan beyond urban drinking water supply and allocation. This chapter investigates the first step of every long-term plan development in urban drinking water as discussed below. Water utility management needs drinking water long-term forecasted values in several terms. (1) water distribution network design; (2) supply and consumption management; (3) efficient application of distribution network; (4) pipeline pressure management; (5) network development; (6) optimizing the cost of water supply and network maintenance.

#### 3.1.2. Study area

better at extracting a mathematical equation which shows the relation between input and output variables [59–61]. Nasseri et al. developed a hybrid model combining the extended Kalman filter with genetic programming for monthly water demand forecasting in Tehran [62]. Shabani et al. proposed a new rationale and a novel technique in forecasting water demand using lag time to feed the determinants of water demand by the development of GEP and SVM models [14]. Yousefi et al. implemented sophisticated mathematical models to forecast water demand of City of Kelowna in monthly temporal scale. Their study assessed the performance

Among the variety of examined methods Artificial Neural Networks (ANNs), have been applied to the various period in the wide variety of hydrological issues. The main reason of ANNs frequent usage is its ability to overcome the relationship in determining the complexity of time series, even with the shortage of amount of data available to train the models. Therefore, most of the studies applicable in area of water resources demand applies ANNs to

Regarding the literature review reported by Nourani et al. concluded about the dominant application of wavelet-based models [63]. Moreover, Labat notified about the improving ability of wavelet in models' performance [64]. Therefore, the application of wavelet brought researchers attention into the area such as denoising [65]; stream flow and water resources [66]; evaporation and climatic models [67]; groundwater level modeling [68]; water demand forecasting [13, 38], where in most of the mentioned studies combination of Wavelet-ANNs performed accurately over conventional models without hybrid wavelet models (e.g. ARIMA,

The objectives of this study are four-fold: (1) to investigate chaotic behavior of case data and finding the proper lag time; (2) to find the accuracy of the forecasting for one-day ahead lead time with various input combination, and (3) to study if phase space reconstruction (PSR) based on optimum embedding dimension would improve the accuracy of the models, and 4) application of wavelet decomposition by five different transform functions combined with all

Unlike natural water resources like rainfall, the lower percentage of drinking water which is change to waste water after use, back to the cycle. Water pressure in a pipeline, water quality, supply peak consumption time, pipeline maintenance, maintenance cost, specialist and educated human resources, pipeline failure management, etc. are the variables that all of them should be under control at the same time. Also, to develop an integrated long-term plan, availability of resources is crucial. Therefore, knowing about the value of consumption in a

of GEP using wavelet decomposition [38].

134 Wavelet Theory and Its Applications

the mentioned models with and without PSR.

3.1. Case study and data information

MLR, ANN and etc.).

3. Methodology

3.1.1. Understandings

forecast short, mid and long-term demand values [13, 30, 31].

The present research selected Water consumption of the City of Kelowna (BC, Canada) as the test case. The city of Kelowna water utility provides services for approximately 65,000 residents. Poplar Point, Eldorado, Cedar Creek and Swick Road pump stations cover services for 99% of the population of the area [69]. However, few areas in the boundary are named as "Future City" where does not contain any population yet, land development plan shows water servicing is considered in the area. Monitoring of water quality, the operation of the pumps, water level in reservoirs, and pipeline pressure are conducted by the use of Supervisory Control and Data Acquisition Software (SCADA).

## 3.1.3. Review of data records

Hourly water demand for the above-mentioned stations has been made available by the city utility of Kelowna. The data used 6 years (approximately 52,464 hourly consumption) starting from January 1st, 2011 to 30th December 2016. Figure 1 shows the variation of daily and monthly water demand and the consumption pattern. Concerning the 6 years water demand samples of daily scale (2186 points), the first 5 years (1882 points) are used for calibrating the models and the last year (365 points – 2016) is considered as the test period. Table 1 shows the characteristics of the dataset in the test case.

#### 3.2. Phase space reconstruction (PSR)

Given a set of physical variables and their interactions, the dynamics of a system (e.g., water consumption) can be defined by a single point moving on a trajectory, where each of its points

Figure 1. Time series plot of (a) daily water demand; (b) average of the consumption pattern in 24 h within 6 years.


Table 1. Statistics of water consumption of Kelowna City in different temporal resolutions (\*m<sup>3</sup> ).

represents a state of the system. The lag-embedding method reconstructs phase-space from a univariate or multivariate time series generated by a deterministic dynamic system [70]. The underlying dynamics can be studied by building an m-dimensional space Xt defined by [49]:

$$X\_t = \{ \mathbf{x}\_t, \mathbf{x}\_{t-\tau}, \mathbf{x}\_{t-2\tau}, \dots, \mathbf{x}\_{t-(m-1)\tau} \}, t = 1, 2, \dots, N \tag{1}$$

ce ¼ lim r!0

specific value of m after which Ce remains unchanged [54, 59].

investigated the number of HLN from 1 to 20 in 1 to 200 epochs.

3.4. Artificial neural networks

determine the output signal as (uj).

3.5. Gene expression programming

ln Cmð Þr

The parameters m and Ce can be determined as the slopes of the lines when plotted Cm(r) against r in logarithmic scale. In a deterministic system, Ce increases by increasing m until eventually remaining unchanged. The correlation dimension of time series is defined as the

Application of Wavelet Decomposition and Phase Space Reconstruction in Urban Water Consumption Forecasting:…

When ANN is based roughly on the neural layout of the human brain and is capable of nonlinear modeling processes that can classify the patterns and recognize the capabilities [77]. Regarding the ability of multilayer perceptron (MLP)-ANN outperformance as a conventional ANN approaches [77, 78, 79], this research employed three-layer MLP-ANN (input, hidden and output layers) and the different number of neurons. In hidden layer, neurons are calculated by the summation of demand values (di) with the given weight for each value (wij) to

> uj <sup>¼</sup> <sup>X</sup><sup>t</sup> i¼1

Oj ¼ ϕ uj � θ<sup>j</sup>

where ϕ is the transfer function and θ is a threshold limit [80, 81, 82]. Among various transfer functions (e.g., sigmoid shape, piecewise, step, linear and non-linear functions), the logistic sigmoid and Purelin (linear) transfer functions. Regarding the large number of input variables in the present study, no transfer function is applied to reduce the computationally demanding. While, the logistic sigmoid and Purelin transfer functions that are commonly used in literature [79, 81, 82] are provided at the output and hidden layers, respectively (further details about the bias and transfer functions are available at [79, 81]). Feed-forward multi linear perceptron is employed in this study containing input, hidden and output layers. The number of neurons in the input layer varies from 1 to 10 (without decomposition) and from 4 to 24 (with decomposition). Moreover, the neurons of the layers are connected with the neurons in the next layer by weights. Also, to consider all optimal solutions with the highest probable accuracy, this study

Evolutionary computation has received significant attention among researchers for studying complex engineering systems. Genetic algorithm (GA), genetic programming (GP), and gene expression programming (GEP) were inspired by Darwin's theory of evolution [60, 61]. GEP defines an algorithm and equation which shows the relation between input and output variables. GA and GP rely on a string of numbers with defined length called "chromosomes", while GEP employs a set of nonlinear entities with different shapes and sizes, "expression/

ln <sup>r</sup> : (4)

http://dx.doi.org/10.5772/intechopen.76537

137

wijdi (5)

� � (6)

where Xt is a vector of the observed data of {xt} t = 1,…,N, N is the total number of observed data, τ is the lag time, and m is embedding dimension. The embedding dimension (m) is typically in the range of 1–10 [53, 54]. The lag-embedding method is sensitive to both embedding parameters of τ and m. Average mutual information (AMI) and autocorrelation function (ACF) are the two well-known methods for estimating the lag time [71, 72]. More details about ACF and different functions are available at [73].

#### 3.3. Correlation dimension (Chaos investigation)

Correlation dimension is a nonlinear measure of the correlation between pairs lying on the attractor. The dimension of a system reveals the number of effective variables in the system. Kermani (2016) classified different dimensions in a system as topological, Hausdorf, box counting, point-wise, and correlation dimension. These dimensions are nearly equal in chaotic systems [52, 74]. This research employed correlation dimension, as it is a lower bound measure of the fractal dimension [59, 74]. For time series whose underlying dynamics is chaotic, the correlation dimension gets a finite fractional value, whereas it is infinite for stochastic systems. The later does not saturate to a specific amount of correlation exponent [75]. For an mdimensional phase-space, the correlation function, Cm(r), is defined as the fraction of states closer than r [76].

$$\mathbb{C}\_{m}(r) = \lim\_{N\_{p} \to \infty} \frac{2}{\left(N\_{p} - w\right)\left(N\_{p} - w - 1\right)} \sum\_{i=1}^{N\_{p}} \sum\_{j=i+1+w}^{N\_{p}} H\left(r - |X\_{i} - X\_{j}|\right) \tag{2}$$

where H is the Heaviside step function, Xi is the ith state vector, Np is the number of points on the reconstructed attractor, r is the radius of a sphere with the content of Xi or Xj. The Theiler window (w) is the correction needed to avoid spurious results due to temporal correlations instead of dynamical ones. Cm(r) is proportional to r for stochastic time series, whereas for chaotic time series it scales with r as:

$$
\mathbb{C}\_m(r) \propto r^{c\_\varepsilon}.\tag{3}
$$

where ce is correlation exponent defined by:

Application of Wavelet Decomposition and Phase Space Reconstruction in Urban Water Consumption Forecasting:… http://dx.doi.org/10.5772/intechopen.76537 137

$$c\_{t'} = \lim\_{r \to 0} \frac{\ln \mathbb{C}\_m(r)}{\ln r}. \tag{4}$$

The parameters m and Ce can be determined as the slopes of the lines when plotted Cm(r) against r in logarithmic scale. In a deterministic system, Ce increases by increasing m until eventually remaining unchanged. The correlation dimension of time series is defined as the specific value of m after which Ce remains unchanged [54, 59].

#### 3.4. Artificial neural networks

represents a state of the system. The lag-embedding method reconstructs phase-space from a univariate or multivariate time series generated by a deterministic dynamic system [70]. The underlying dynamics can be studied by building an m-dimensional space Xt defined by [49]:

Data 2186 114597.2 14,124 43046.4 20074.5 0.46 0.73 �0.38

Average\* Standard

deviation\*

where Xt is a vector of the observed data of {xt} t = 1,…,N, N is the total number of observed data, τ is the lag time, and m is embedding dimension. The embedding dimension (m) is typically in the range of 1–10 [53, 54]. The lag-embedding method is sensitive to both embedding parameters of τ and m. Average mutual information (AMI) and autocorrelation function (ACF) are the two well-known methods for estimating the lag time [71, 72]. More details about

Correlation dimension is a nonlinear measure of the correlation between pairs lying on the attractor. The dimension of a system reveals the number of effective variables in the system. Kermani (2016) classified different dimensions in a system as topological, Hausdorf, box counting, point-wise, and correlation dimension. These dimensions are nearly equal in chaotic systems [52, 74]. This research employed correlation dimension, as it is a lower bound measure of the fractal dimension [59, 74]. For time series whose underlying dynamics is chaotic, the correlation dimension gets a finite fractional value, whereas it is infinite for stochastic systems. The later does not saturate to a specific amount of correlation exponent [75]. For an mdimensional phase-space, the correlation function, Cm(r), is defined as the fraction of states

Np

X Np

H r � Xi � Xj � � �

: (3)

� � � (2)

j¼iþ1þw

i¼1

where H is the Heaviside step function, Xi is the ith state vector, Np is the number of points on the reconstructed attractor, r is the radius of a sphere with the content of Xi or Xj. The Theiler window (w) is the correction needed to avoid spurious results due to temporal correlations instead of dynamical ones. Cm(r) is proportional to r for stochastic time series, whereas for

Cmð Þr ∝ r

ce

2 Np � <sup>w</sup> � � Np � <sup>w</sup> � <sup>1</sup> � �<sup>X</sup>

� �, t <sup>¼</sup> <sup>1</sup>, <sup>2</sup>, …, N (1)

Coefficient of variation

).

Skew Kurtosis

Xt ¼ xt; xt�<sup>τ</sup>; xt�2τ;…; xt�ð Þ <sup>m</sup>�<sup>1</sup> <sup>τ</sup>

Table 1. Statistics of water consumption of Kelowna City in different temporal resolutions (\*m<sup>3</sup>

ACF and different functions are available at [73].

Max. Value\* Min. Value\*

3.3. Correlation dimension (Chaos investigation)

Cmð Þ¼ r lim

chaotic time series it scales with r as:

where ce is correlation exponent defined by:

Np!∞

closer than r [76].

Property Number of Data

136 Wavelet Theory and Its Applications

When ANN is based roughly on the neural layout of the human brain and is capable of nonlinear modeling processes that can classify the patterns and recognize the capabilities [77]. Regarding the ability of multilayer perceptron (MLP)-ANN outperformance as a conventional ANN approaches [77, 78, 79], this research employed three-layer MLP-ANN (input, hidden and output layers) and the different number of neurons. In hidden layer, neurons are calculated by the summation of demand values (di) with the given weight for each value (wij) to determine the output signal as (uj).

$$u\_{\vec{j}} = \sum\_{i=1}^{t} w\_{\vec{i}\vec{j}} d\_{\vec{i}} \tag{5}$$

$$O\_j = \phi(u\_j - \theta\_j) \tag{6}$$

where ϕ is the transfer function and θ is a threshold limit [80, 81, 82]. Among various transfer functions (e.g., sigmoid shape, piecewise, step, linear and non-linear functions), the logistic sigmoid and Purelin (linear) transfer functions. Regarding the large number of input variables in the present study, no transfer function is applied to reduce the computationally demanding. While, the logistic sigmoid and Purelin transfer functions that are commonly used in literature [79, 81, 82] are provided at the output and hidden layers, respectively (further details about the bias and transfer functions are available at [79, 81]). Feed-forward multi linear perceptron is employed in this study containing input, hidden and output layers. The number of neurons in the input layer varies from 1 to 10 (without decomposition) and from 4 to 24 (with decomposition). Moreover, the neurons of the layers are connected with the neurons in the next layer by weights. Also, to consider all optimal solutions with the highest probable accuracy, this study investigated the number of HLN from 1 to 20 in 1 to 200 epochs.

#### 3.5. Gene expression programming

Evolutionary computation has received significant attention among researchers for studying complex engineering systems. Genetic algorithm (GA), genetic programming (GP), and gene expression programming (GEP) were inspired by Darwin's theory of evolution [60, 61]. GEP defines an algorithm and equation which shows the relation between input and output variables. GA and GP rely on a string of numbers with defined length called "chromosomes", while GEP employs a set of nonlinear entities with different shapes and sizes, "expression/ parse trees". The expression tree accommodates the ease of a GA solution as well as the capability of accepting the nonlinear/complex behavior in a typical GP solution. The chromosome can have one or more genes of equal length. A gene represents a set of symbols containing two parts; a head which has functions and terminals and a tail which only has terminals. Initiating with the random generation of chromosomes, GEP is followed by different applications of genetic operators like replication, recombination, mutation, etc. The terminating condition for developing GEP depends upon the selection of maximum fitness. This research applied 30 chromosomes, eight head sizes, three genes, and arithmetic operators of {+,-, �, x, x<sup>2</sup> , √x}.

#### 3.6. Multilinear regression

When MLR corresponds to a linear combination of the components of multiple signals x (e.g. recorded discharge, lag time discharge, or combination of both) to a single output signal y (Demand) by:

$$y = b + \sum\_{i=0}^{N} a\_i \mathbf{x}\_i \tag{7}$$

summation of all is equal with the value of original data. This research employed Haar, the second and fourth order Daubechies (db2, db4), and the second and fourth order Symlets (Sym2, sym4) wavelets to decompose daily water demand time series into sub-series. The software MATLAB 2015 (https://www.mathworks.com) was employed for the analysis.

Application of Wavelet Decomposition and Phase Space Reconstruction in Urban Water Consumption Forecasting:…

This research measured the models' accuracy by coefficient of determination (CD), root mean

<sup>i</sup>¼<sup>1</sup> Oi � <sup>O</sup> � � Fi � <sup>F</sup> � �

<sup>2</sup> PNt

Pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Nt <sup>i</sup>¼<sup>1</sup> ð Þ Oi � Fi

Nt

where Nt is the number of values, O and F are the observed and forecasted values of demand, respectively. O and F are the mean of the observed and forecasted demand values, respectively. Note that the range of CD is between 0 and 1 with higher positive values indicate better agreement. A lower value of RMSE and MAE indicates better agreement between the observed

Existence of chaotic behavior in the time series is shown in Figure 2. However, the results are not entirely based on the proof of having chaotic behavior, as the figure only shows possible low-dimensional chaotic behavior. Theoretically, several methods are well known for investigating the chaotic behavior such as lag time calculation method (e.g., average mutual information (AMI), Autocorrelation function (ACF)), correlation dimension, largest Lyapunov exponent, etc.). This study investigates the chaotic behavior by applying ACF and correlation dimension. Having chaotic behavior allows using ACF to calculate the lag time of the time

series. The value of lag time is considered as the first approach of ACF to 0 (Figure 2).

The results show 83-days as the lag time of the time series. Therefore, 83-day is used to design combination of inputs as phase space for the time series. In this study, the difference between 1st day and 83rd day is used as delay period for phase space reconstruction varying embedding dimensions from 1 to 10 (m1: Dt; m2: Dt,Dt-τ; m10: Dt,…,Dt-<sup>10</sup>τ). It should be noticed that

<sup>i</sup> Fi � <sup>F</sup> � �<sup>2</sup> h i<sup>1</sup>

2

2

3 7 5

2

: (10)

: (11)

http://dx.doi.org/10.5772/intechopen.76537

139

Oi � Fi j j: (12)

PNt

<sup>i</sup> Oi � <sup>O</sup> � �<sup>2</sup> h i<sup>1</sup>

MAE <sup>¼</sup> <sup>1</sup> n Xn j¼1

s

MSE ¼

4.1. Phase space reconstruction and investigation of chaotic behavior

3.8. Evaluation of models' performance

and forecasted values.

4. Preliminary results

squared error (RMSE) and mean absolute error (MAE) defined as:

2 6 4

PNt

CD ¼

where xi is the defined input (demand) and ai is regression coefficient determined by the least square method with the residual r defined by:

$$r = y - a\_1 \mathbf{x}\_1 - a\_2 \mathbf{x}\_2 - \dots - b. \tag{8}$$

#### 3.7. Wavelet decomposition

Commonly wavelet transforms are used for decomposition, de-noising, and compression of the time series [83]. Time series have a combination of low and high frequency which represent improved features (e.g., cyclical trends) and chaotic element, respectively [84]. Considering these frequencies, separation of low and high frequency is helpful in studying the original pattern and behavior of the time series. One of the mentioned methods is discrete wavelet transform (DWT) to separate per level of frequencies in time series. One of the common discretion ways proposed by Mallat that this study used the mentioned DTW method to separate the frequencies of the applied data [85]. The level of the decomposition shows the subseries. For example, for level 1 decomposition, the number of subseries is two. Therefore, the number of levels indicates the number of subseries plus one. Level 3 is considered as suitable decomposition level in the present study regarding the number of data (2186 day) and following Nourani et al. (2009) that offered [83]:

$$L\_n = \text{int}[\log(N)].\tag{9}$$

where Ln is the number, the level of decomposition and N is the number of used data. Thus, the proper level in this study is considered as 3. However, increasing the level number does not necessarily improve the accuracy of the models. Therefore, the original data are discretized in a high-frequency subset (a3) and three high frequencies as (d1), (d2) and (d3), where the summation of all is equal with the value of original data. This research employed Haar, the second and fourth order Daubechies (db2, db4), and the second and fourth order Symlets (Sym2, sym4) wavelets to decompose daily water demand time series into sub-series. The software MATLAB 2015 (https://www.mathworks.com) was employed for the analysis.

#### 3.8. Evaluation of models' performance

parse trees". The expression tree accommodates the ease of a GA solution as well as the capability of accepting the nonlinear/complex behavior in a typical GP solution. The chromosome can have one or more genes of equal length. A gene represents a set of symbols containing two parts; a head which has functions and terminals and a tail which only has terminals. Initiating with the random generation of chromosomes, GEP is followed by different applications of genetic operators like replication, recombination, mutation, etc. The terminating condition for developing GEP depends upon the selection of maximum fitness. This research applied 30 chromosomes, eight head sizes, three genes, and arithmetic operators of

When MLR corresponds to a linear combination of the components of multiple signals x (e.g. recorded discharge, lag time discharge, or combination of both) to a single output signal y

> <sup>y</sup> <sup>¼</sup> <sup>b</sup> <sup>þ</sup><sup>X</sup> N

> > i¼0

where xi is the defined input (demand) and ai is regression coefficient determined by the least

Commonly wavelet transforms are used for decomposition, de-noising, and compression of the time series [83]. Time series have a combination of low and high frequency which represent improved features (e.g., cyclical trends) and chaotic element, respectively [84]. Considering these frequencies, separation of low and high frequency is helpful in studying the original pattern and behavior of the time series. One of the mentioned methods is discrete wavelet transform (DWT) to separate per level of frequencies in time series. One of the common discretion ways proposed by Mallat that this study used the mentioned DTW method to separate the frequencies of the applied data [85]. The level of the decomposition shows the subseries. For example, for level 1 decomposition, the number of subseries is two. Therefore, the number of levels indicates the number of subseries plus one. Level 3 is considered as suitable decomposition level in the present study regarding the number of data (2186 day)

where Ln is the number, the level of decomposition and N is the number of used data. Thus, the proper level in this study is considered as 3. However, increasing the level number does not necessarily improve the accuracy of the models. Therefore, the original data are discretized in a high-frequency subset (a3) and three high frequencies as (d1), (d2) and (d3), where the

aixi (7)

r ¼ y � a1x<sup>1</sup> � a2x<sup>2</sup> � … � b: (8)

Ln ¼ int log ½ � ð Þ N : (9)

{+,-, �, x, x<sup>2</sup>

(Demand) by:

, √x}.

138 Wavelet Theory and Its Applications

3.6. Multilinear regression

3.7. Wavelet decomposition

square method with the residual r defined by:

and following Nourani et al. (2009) that offered [83]:

This research measured the models' accuracy by coefficient of determination (CD), root mean squared error (RMSE) and mean absolute error (MAE) defined as:

$$\mathbf{CD} = \left[ \frac{\sum\_{i=1}^{N\_l} \left( O\_i - \overline{O} \right) \left( F\_i - \overline{F} \right)}{\left[ \sum\_{i}^{N\_l} \left( O\_i - \overline{O} \right)^2 \right]^{\frac{1}{2}} \left[ \sum\_{i}^{N\_l} \left( F\_i - \overline{F} \right)^2 \right]^{\frac{1}{2}}} \right]^2. \tag{10}$$

$$MSE = \sqrt{\frac{\sum\_{i=1}^{N\_t} \left(O\_i - F\_i\right)^2}{N\_t}}.\tag{11}$$

$$MAE = \frac{1}{n} \sum\_{j=1}^{n} |O\_i - F\_i|. \tag{12}$$

where Nt is the number of values, O and F are the observed and forecasted values of demand, respectively. O and F are the mean of the observed and forecasted demand values, respectively. Note that the range of CD is between 0 and 1 with higher positive values indicate better agreement. A lower value of RMSE and MAE indicates better agreement between the observed and forecasted values.
