**3.4. Overview of the models**

The sediment rating curve method is the traditional method for converting measured flows to predict suspended sediment load and this paper aims to test the performance of evolutionary computing models but uses a host of other techniques for the inter-comparison purpose. These models are outlined in this section but evolutionary computing is explained in more detail. Their underlying notion is that past values contain a sufficient amount of information to predict the future values and a systematic way of representing this notion is purported in Table 2 in terms of a selection of models. These models, in essence, are reminiscent of regression analysis but GEP, ANN and MLR models approach the problem in their own individual ways to unearth the structure of the information inherent in time series. Notably, the SRC model is expressed by Model 1 and the deterministic chaos model is expressed by Model 0. These models will all be evaluated by using coefficient of Correlation (CC), Relative Absolute Errors (RAE) and Root Mean Square Errors (RMSE).


Where *Qt* and *St represent* respectively flow and suspended sediment load at day *t*.

**Table 2.** Modelling Structures of the Selected Modelling Techniques

### *3.4.1. Sediment rating curve*

262 Genetic Programming – New Approaches and Successful Applications

*Discharge*  (*m3/sec*)

**Table 1.** Statistical Parameters for Dataset from the Mississippi River Basin

Where *Qt* and *St represent* respectively flow and suspended sediment load at day *t*. **Table 2.** Modelling Structures of the Selected Modelling Techniques

*Suspended Sediment*  (*ton/day*)

of Variation; Skew = Skewness; Kurt = Kurtosis

**3.4. Overview of the models** 

**Data Type** 

**Training set Testing set All Dataset** 

*Discharge*  (*m3/sec*)

*Suspended Sediment*  (*ton/day*)

*Discharge*  (*m3/sec*)

*Suspended Sediment*  (ton/day)

Datapoints 9131 9131 365 365 9496 9496 Mean 6.37E5 1.27E4 4.52E5 1.58E4 6.30E5 1.28E4 St dev 6.30E5 7.60E3 2.53E5 7.80E3 6.21E5 7.60E3 Max 4.97E6 4.25E4 1.22E6 3.45E4 4.97E6 4.25E4 Min. 4.00E3 2.80E3 6.70E4 5.40E3 4.00E3 2.80E3 CV 1.0 0.60 0.56 0.49 0.99 0.59 Skew 1.78 0.95 0.10 0.51 1.82 0.93 Kurt 3.98 0.34 -0.98 -0.85 4.20 0.27 \*Data = Number of Data; Std = Standard Deviation; Max = Maximum Value; Min = Minimum Value; CV = Coefficient

The sediment rating curve method is the traditional method for converting measured flows to predict suspended sediment load and this paper aims to test the performance of evolutionary computing models but uses a host of other techniques for the inter-comparison purpose. These models are outlined in this section but evolutionary computing is explained in more detail. Their underlying notion is that past values contain a sufficient amount of information to predict the future values and a systematic way of representing this notion is purported in Table 2 in terms of a selection of models. These models, in essence, are reminiscent of regression analysis but GEP, ANN and MLR models approach the problem in their own individual ways to unearth the structure of the information inherent in time series. Notably, the SRC model is expressed by Model 1 and the deterministic chaos model is expressed by Model 0. These models will all be evaluated by using coefficient of Correlation (CC), Relative Absolute Errors (RAE) and Root Mean Square Errors (RMSE).

**Model Input variables Output variables The Structure**  *Model 0 St*-1, *St-*2… *St* Chaos *Model 1 Qt St* ANN, SRC *Model 2 Qt , St*-1 *St* GEP, ANN, MLR *Model 3 Qt ,Qt-1 St* GEP, ANN, MLR *Model 4 Qt , Qt*-1 *, St*-1 *St* GEP, ANN, MLR *Model 5 Qt,Qt*-1*,Qt*-2 *St* GEP, ANN, MLR *Model 6 Qt,Qt*-1*,Qt*-2*,St*-1 *St* GEP, ANN, MLR Sediment rating-curve is a flux-averaged relationship between suspended sediment, *S*, and water discharge, *Q*, expressed as a power law in the form of: *<sup>b</sup> S aQ* , where *a* and *b* are coefficients. Values of *a* and *b* for a particular stream are determined from data via a linear regression between (log *S*) and (log *Q*). The SRC model is represented in terms of Model 1 in Table 2. For more critical views on this model, references may be made Kisi (2005) and Walling (1977), among others.

#### *3.4.2. Evolutionary computing*

Evolutionary computing techniques apply optimisation algorithms as a tool to facilitate the mimicking of natural selection. A building block approach to generalised evolution driven by natural selection is yet to be presented, although Khatibi (2011) has outlined a rationale for it. Traditional understanding of natural selection for biological species is well developed, according to which the process takes place at the gene level of all individuals of all species carrying hereditary material for reproduction by inheriting from their parents and by passing on a range of their characteristics to their offspring. The process of reproduction is never a perfect copying process, as mutation may occur from time to time in biological reproductions involving the random process of reshuffling the genes during sexual reproduction. The paper assumes preliminary knowledge on genes, chromosomes, gene pool, DNA and RNA, where the environment also has a role to play. The environment for the production of proteins and sexual reproduction is different than the outer environment for the performance of the individual entities supported by the proteins or produced by sexual reproduction. The outer environment is characterised by (i) being limited in resources, (ii) having no foresight, (iii) organisms tend to produce more offspring than can be supported, a process that is driven by positive feedback loops, and (iv) there is a process of competition and selection. Some of these details are normally overlooked or simplified in evolutionary computing and therefore the paper stresses the point that natural selection takes place at the gene level and this is not directly applicable to that at the social level.

Facts on natural selection are overwhelming but there are myths as well, e.g. the myth of "the survival of the fittest" and this term is widely used in evolutionary computing. Although the fittest has a selective advantage to survive, this is not a guarantee for the survival in the natural world. An overview of the dynamics of natural selection in an environment is that (i) the environment can only support a maximum population of certain size, but there is also a lower size at the critical mass below which a population is at risk of losing its viability; (ii) there is a process of reproduction, during which natural selection operates at the gene level, although there are further processes operating at the individual levels beyond the direct reach of natural selection (e.g. interactions among the individuals catered for by other mechanisms or each individual is under selection pressure by the environment); (iii) the process of reproduction is associated with mutation, which gives rise to the production of gene pools.

A great deal of the above overview has been adopted in evolutionary computing, the history of which goes back to the 1960s when Rechenberg (1973) introduced evolution strategies. The variants of this approach include genetic algorithm (Holland, 1975), evolutionary programming (Fogel et al, 1966), genetic programming (Koza, 1992) and Gene Expression Programming (GEP), Ferreira (2001a). This paper uses the latter approach, which in a simple term is a variation of GP but each of these techniques have differences with one another. These techniques have the capability for deriving a set of mathematical expressions to describe the relationship between the independent and dependent variables using such functions as mutation, recombination (or crossover) and evolution.

Inter-Comparison of an Evolutionary Programming

Model of Suspended Sediment Time-Series with Other Local Models 265

**Figure 4.** Expression Trees – (a) typical expression tree; (b) the selected GEP model in this study

from those a new population with better performance traits.

As the term terminal suggests, it comprises a set of values at the tail-ends of the genes of the chromosomes and these are made meaningful by the functions making up the other component of the genes of the chromosomes. In GEP, these are represented by a bilingual notation called Karva language of (i) genetic codes, which are not deemed necessary for a description here and (ii) expression trees (or parse trees), as illustrated by Figure 4.a and the recommended solution is shown in Figure 4.b, which is transcribed by Equation (4) in Section 4.2. The initial chromosomes of the initial population are no different than the solution shown in Figure 4.b but their difference is that the composition of each of the initial chromosomes is selected often in random and then GEP is expected to improve them through evolution by the strategy of selections, replication and mutation but there are other facilities that not mentioned facilitating a more robust solution and these include inversion, transposition and recombination. The improvements are carried out through selection from one generation to another and this is why this modelling strategy is called evolutionary computation. The main strength of this approach is that it does not set up any system of equation to predict the future but it evaluates the fitness of each chromosome and selects

The GEP employed in this study is based on evolving computer programs of different sizes and shapes encoded in linear chromosomes of fixed lengths, Ferreira, 2001a; Ferreira, (2001b). The chromosomes are composed of multiple genes, each gene encoding a smaller subprogram. Furthermore, the structural and functional organisation of the linear

This chapter is concerned with GEP and one of the important preliminary decisions in its implementations is to establish the models represented in Table 2 (Models 2-6). There is no prior knowledge of the appropriateness of any of these models and therefore this is normally fixed in a preliminary modelling task through a trial-and-error procedure. Whichever the model choice (Model 2 – Model 6 or similar other ones), each implementation of GEP builds up the model in terms of the values of the coefficients (referred to as terminals) and the operations (functions) through the procedure broadly outlined in Figure 3.

**Figure 3.** Simplified Outline of Implementation of Evolutionary Programming Models

The working of a gene expression program depicted in Figure 3 is outlined as follows. A chromosome in GEP is composed of genes and each gene is composed of (i) terminals and (ii) functions. The gene structures and chromosomes in GEP are illustrated for the solution that is obtained for the dataset used in this study (see Section 4.2). The terminals as their names suggest are composed of constants and variables and the functions comprise mathematical operations, as shown by (4.a)-(4.f).

#### Inter-Comparison of an Evolutionary Programming Model of Suspended Sediment Time-Series with Other Local Models 265

264 Genetic Programming – New Approaches and Successful Applications

functions as mutation, recombination (or crossover) and evolution.

operations (functions) through the procedure broadly outlined in Figure 3.

**Figure 3.** Simplified Outline of Implementation of Evolutionary Programming Models

mathematical operations, as shown by (4.a)-(4.f).

The working of a gene expression program depicted in Figure 3 is outlined as follows. A chromosome in GEP is composed of genes and each gene is composed of (i) terminals and (ii) functions. The gene structures and chromosomes in GEP are illustrated for the solution that is obtained for the dataset used in this study (see Section 4.2). The terminals as their names suggest are composed of constants and variables and the functions comprise

A great deal of the above overview has been adopted in evolutionary computing, the history of which goes back to the 1960s when Rechenberg (1973) introduced evolution strategies. The variants of this approach include genetic algorithm (Holland, 1975), evolutionary programming (Fogel et al, 1966), genetic programming (Koza, 1992) and Gene Expression Programming (GEP), Ferreira (2001a). This paper uses the latter approach, which in a simple term is a variation of GP but each of these techniques have differences with one another. These techniques have the capability for deriving a set of mathematical expressions to describe the relationship between the independent and dependent variables using such

This chapter is concerned with GEP and one of the important preliminary decisions in its implementations is to establish the models represented in Table 2 (Models 2-6). There is no prior knowledge of the appropriateness of any of these models and therefore this is normally fixed in a preliminary modelling task through a trial-and-error procedure. Whichever the model choice (Model 2 – Model 6 or similar other ones), each implementation of GEP builds up the model in terms of the values of the coefficients (referred to as terminals) and the

**Figure 4.** Expression Trees – (a) typical expression tree; (b) the selected GEP model in this study

As the term terminal suggests, it comprises a set of values at the tail-ends of the genes of the chromosomes and these are made meaningful by the functions making up the other component of the genes of the chromosomes. In GEP, these are represented by a bilingual notation called Karva language of (i) genetic codes, which are not deemed necessary for a description here and (ii) expression trees (or parse trees), as illustrated by Figure 4.a and the recommended solution is shown in Figure 4.b, which is transcribed by Equation (4) in Section 4.2. The initial chromosomes of the initial population are no different than the solution shown in Figure 4.b but their difference is that the composition of each of the initial chromosomes is selected often in random and then GEP is expected to improve them through evolution by the strategy of selections, replication and mutation but there are other facilities that not mentioned facilitating a more robust solution and these include inversion, transposition and recombination. The improvements are carried out through selection from one generation to another and this is why this modelling strategy is called evolutionary computation. The main strength of this approach is that it does not set up any system of equation to predict the future but it evaluates the fitness of each chromosome and selects from those a new population with better performance traits.

The GEP employed in this study is based on evolving computer programs of different sizes and shapes encoded in linear chromosomes of fixed lengths, Ferreira, 2001a; Ferreira, (2001b). The chromosomes are composed of multiple genes, each gene encoding a smaller subprogram. Furthermore, the structural and functional organisation of the linear

chromosomes allows the unconstrained operation of important genetic functions, such as mutation, transposition and recombination. It has been reported that GEP is 100-10,000 times more efficient than GP systems (Ferreira, 2001a; Ferreira, 2001b) for a number of reasons, including: (i) the chromosomes are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, etc); (ii) the parse trees or expression trees are exclusively the expression of their respective chromosomes; they are entities upon which selection acts, and according to fitness, they are selected to reproduce with modification.

Inter-Comparison of an Evolutionary Programming

Model of Suspended Sediment Time-Series with Other Local Models 267

prescribed sweeps or until a prescribed error tolerance is reached. The mean square error over the training samples is the typical objective function to be minimized. After training is complete, the ANN performance is validated. Depending on the outcome, either the ANN

An overview of the data presented in Figure 2 invokes the thought that other than the annual trend within the data, the underlying process is probably random and a more rational way of explaining the data would be through probabilistic approaches. One such method applied to the selected data is the Multi-Linear Regression (MLR) model. It fits a linear combination of the components of a multiple signals *x* (e.g. recorded flows and suspended sediment timeseries as defined by the Models 2-6 in Table 2) to a single output signal *y*, as defined by (1.a) (e.g. predicted suspended sediment load) and returns the

Where *<sup>i</sup> x* is defined in Table 2 in terms of various models and *<sup>i</sup> a* values are called regression coefficients, which are estimated by using the least square or any other similar method. In this study, the coefficients of the regressions were determined using the least

A cursory view of the suspended sediment record of the Mississippi River in Figure 2 provides no clue to a strategy for its underlying patterns, if any, although annual trend

 0

� � � − ���� − ���� − … − � 11 22 *r y ax ax b* ... (1b)

*i*

*i i*

*y b ax* (1a)

*N*

has to be retrained or it can be implemented for its intended use.

**Figure 5.** Implementation of the ANN Models and its Various Layers

residual, *r*, i.e. the difference signal, as defined by (1.b):

� � � � ∑ ���� � ���

*3.4.4. Multi Linear Regression (MLR)* 

square method.

*3.4.5. Chaos theory* 

### *3.4.3. Artificial Neural Networks (ANNs)*

Whilst evolutionary programming emulates the working of Nature, ANNs emulate the workings of neurons in the brain. Both the brain and ANNs are parallel information processing systems consisting of a set of neurons or nodes arranged in layers but this is where the parallel ends. The actual process of information processing in the brain is a topical research issue but the drivers of ANNs are polynomial algebra and there is no evidence that the brains of humans, monkeys or any other animals employ algebraic computations such as optimisation methods. Although there is a great incentive to understand the working of the brain, it is not imperative to be constrained by it and the use of algebra in ANNs is not criticised here but awareness is raised as these two processes are not identical.

The ANN theory has been described in many books, including the text by Rumelhart et al. (1986). The application of ANNs has been the subject of a large number of papers that have appeared in the recent literature. There are various implementations of ANNs but the type used in this study is a Multi-Layer feedforward Perceptron (MLP) trained with the use of back propagation learning algorithm with the following functions: (i) the input layer accepts the data, (ii) intermediate layer processes them, and (iii) the output layer displays the resultant outputs. The number of hidden layers is decided in a preliminary modelling process by finding the most efficient number of layers through a trial-and-error procedure. Each neuron in a layer is connected to all the neurons of the next layer, and the neurons in one layer are not connected among themselves. All the nodes within a layer act synchronously.

This study implements the ANN models in terms of Models 1-6 of Table 2 and Figure 5 shows one of the implementation selected. For each of these models, the data passing through the connections from one neuron to another are multiplied by weights that control the strength of a passing signal. When these weights are modified, the data transferred through the network changes; consequently, the network output also changes. The signal emanating from the output node(s) is the network's solution to the input problem.

In the back-propagation algorithm, a set of inputs and outputs is selected from the training set and the network calculates the output based on the inputs. This output is subtracted from the actual output to find the output-layer error. The error is back propagated through the network, and the weights are suitably adjusted. This process continues for the number of prescribed sweeps or until a prescribed error tolerance is reached. The mean square error over the training samples is the typical objective function to be minimized. After training is complete, the ANN performance is validated. Depending on the outcome, either the ANN has to be retrained or it can be implemented for its intended use.

**Figure 5.** Implementation of the ANN Models and its Various Layers

#### *3.4.4. Multi Linear Regression (MLR)*

266 Genetic Programming – New Approaches and Successful Applications

*3.4.3. Artificial Neural Networks (ANNs)* 

with modification.

synchronously.

chromosomes allows the unconstrained operation of important genetic functions, such as mutation, transposition and recombination. It has been reported that GEP is 100-10,000 times more efficient than GP systems (Ferreira, 2001a; Ferreira, 2001b) for a number of reasons, including: (i) the chromosomes are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, etc); (ii) the parse trees or expression trees are exclusively the expression of their respective chromosomes; they are entities upon which selection acts, and according to fitness, they are selected to reproduce

Whilst evolutionary programming emulates the working of Nature, ANNs emulate the workings of neurons in the brain. Both the brain and ANNs are parallel information processing systems consisting of a set of neurons or nodes arranged in layers but this is where the parallel ends. The actual process of information processing in the brain is a topical research issue but the drivers of ANNs are polynomial algebra and there is no evidence that the brains of humans, monkeys or any other animals employ algebraic computations such as optimisation methods. Although there is a great incentive to understand the working of the brain, it is not imperative to be constrained by it and the use of algebra in ANNs is not

The ANN theory has been described in many books, including the text by Rumelhart et al. (1986). The application of ANNs has been the subject of a large number of papers that have appeared in the recent literature. There are various implementations of ANNs but the type used in this study is a Multi-Layer feedforward Perceptron (MLP) trained with the use of back propagation learning algorithm with the following functions: (i) the input layer accepts the data, (ii) intermediate layer processes them, and (iii) the output layer displays the resultant outputs. The number of hidden layers is decided in a preliminary modelling process by finding the most efficient number of layers through a trial-and-error procedure. Each neuron in a layer is connected to all the neurons of the next layer, and the neurons in one layer are not connected among themselves. All the nodes within a layer act

This study implements the ANN models in terms of Models 1-6 of Table 2 and Figure 5 shows one of the implementation selected. For each of these models, the data passing through the connections from one neuron to another are multiplied by weights that control the strength of a passing signal. When these weights are modified, the data transferred through the network changes; consequently, the network output also changes. The signal

In the back-propagation algorithm, a set of inputs and outputs is selected from the training set and the network calculates the output based on the inputs. This output is subtracted from the actual output to find the output-layer error. The error is back propagated through the network, and the weights are suitably adjusted. This process continues for the number of

emanating from the output node(s) is the network's solution to the input problem.

criticised here but awareness is raised as these two processes are not identical.

An overview of the data presented in Figure 2 invokes the thought that other than the annual trend within the data, the underlying process is probably random and a more rational way of explaining the data would be through probabilistic approaches. One such method applied to the selected data is the Multi-Linear Regression (MLR) model. It fits a linear combination of the components of a multiple signals *x* (e.g. recorded flows and suspended sediment timeseries as defined by the Models 2-6 in Table 2) to a single output signal *y*, as defined by (1.a) (e.g. predicted suspended sediment load) and returns the residual, *r*, i.e. the difference signal, as defined by (1.b):

$$y = b + \sum\_{i=0}^{N} a\_i x\_i \\ \text{y} = b + \sum\_{i=0}^{N} a\_i x\_i \tag{1a}$$

$$r = y - a\_{\pi} \chi\_{\pi} - a\_{\pi} \chi\_{\pi} - \dots \to r = y - a\_{1} \mathbf{x}\_{1} - a\_{2} \mathbf{x}\_{2} - \dots - b \tag{1b}$$

Where *<sup>i</sup> x* is defined in Table 2 in terms of various models and *<sup>i</sup> a* values are called regression coefficients, which are estimated by using the least square or any other similar method. In this study, the coefficients of the regressions were determined using the least square method.

#### *3.4.5. Chaos theory*

A cursory view of the suspended sediment record of the Mississippi River in Figure 2 provides no clue to a strategy for its underlying patterns, if any, although annual trend superimposed on random variation may be an immediate reaction of a hydrologist. Another strategy to explore such possible patterns is through the application of chaos theory, more specifically through the "phase-space diagram" as shown in Figure 6 for this river data. A point in the phase-space represents the state of the system at a given time. The narrow dark band in the figure signifies strong determinism but its scattered band signifies the presence of noise and therefore there is a possibility to explain this set of data by chaos theory. The dark band signifies convergence of the trajectories of the phase-space with a fractal dimension towards the attractor of the data, where the dynamics of the system can be reduced to a set of deterministic laws to enable the prediction of its future states.

Inter-Comparison of an Evolutionary Programming

; and the minimization of the

and *m*,

at time

Model of Suspended Sediment Time-Series with Other Local Models 269

minimises the function *I*( )

*N N* (2b)

*,* respectively, whereas *PXi* ( ( )) , *PXi* ( ( ))

*PXi PXi* (2a)

are marginal probabilities

*fort*

is their

*,* 

interaction, and a time series of a single variable can carry the information about the dynamics of the entire multi-variable system. The trajectories of the phase-space diagram describe the evolution of the system from some initial state, and hence represent the history

This paper applies chaos theory to analyse the suspended sediment load of the Mississippi River data in a similar fashion to the other modelling strategies described above. It uses the local prediction method for training and testing, as outlined below, but it is a traditional practice to apply several methods to build evidence for the existence of chaotic signals in a

False Nearest Neighbours to do that of the optimal values for the embedding

AMI (Fraser and Swinney, 1986) defines how the measurements *X t*( ) at time *t* are

 

( ( ), ( )) ( ) ( ( ), ( ))log[ ] ( ( )) ( ( )) *Xi Xi PXi Xi <sup>I</sup> PXi Xi*

 

The False Nearest Neighbours procedure (Kennel et al., 1992) is a method to obtain the optimum embedding dimension for phase-space reconstruction. By checking the neighbourhood of points embedded in projection manifolds of increasing dimension, the algorithm eliminates 'false Neighbours': This means that points apparently lying close together due to projection are separated in higher embedding dimensions. when the ratio between the number of false neighbours at the dimension *m +*1 and *m* is below a given threshold, generally smaller than 5%, each *m m* ' 1 is an optimal embedding. A poor reconstruction of few embedding states with several components is obtained if

2. **Correlation Dimension (CD) method**: is a nonlinear measure of the correlation between pairs lying on the attractor. For time series whose underlying dynamics is chaotic, the correlation dimension gets a finite fractional value, whereas for stochastic systems it is infinite. For an *m* -dimensional phase-space, the correlation function ( ) *C r <sup>m</sup>* is defined as the fraction of states closer than *r* (Grassberger and Procaccia, 1983;

> , 1 <sup>2</sup> ( ) lim ( ) ( 1) *N i j <sup>N</sup> i j C r Hr Y Y*

related, from an information theoretic point of view, to measurements *X t*( )

particular data. These techniques employ the delay-embedding parameters of

which are unknown a-priori. The following methods are used in this chapter:

1. Average Mutual Information (AMI) is used to estimate

*.* The average mutual information is defined as:

where *i* is total number of samples. *PXi* ( ( )) and *PXi* ( ( ))

*m*' is too large and the next analyses should not be performed.

( ), ( )

adds the maximum information on *X i*( ) *.*

for measurements *X i*( ) and *X i*( )

joint probability. The optimal delay time

of the system.

dimension, *m*.

*t* 

*X i*( ) 

Theiler, 1986):

**Figure 6.** Phase-space Diagram of Daily Suspended Sediment Data in the Mississippi River Basin

Chaos theory is a method of nonlinear time series analysis and involves a host of methods, essentially based on the phase-space reconstruction of a process, from scalar or multivariate measurements of physical observables. This method is implemented in terms of Model 0 of Table 2. It is largely based on the representation of the underlying dynamics through reconstruction of phase-space, originally given by Takens, 1981. It is implemented in terms of two parameters of delay time and embedding dimension, according to which given a set of physical variables and an analytical model describing their interactions, the dynamics of the system can be represented geometrically by a single point moving on a trajectory, where each of its points corresponds to a state of the system. The phase-space diagram is essentially a co-ordinate system, whose coordinates represent the variables necessary to completely describe the state of the system at any moment.

One difficulty in its construction is that in most practical situations, information on every variable influencing the system may not be available. However, a time series of a single variable may be available, which may allow the construction of a (pseudo) phase-space. The idea behind such a reconstruction is that a non-linear system is characterized by selfinteraction, and a time series of a single variable can carry the information about the dynamics of the entire multi-variable system. The trajectories of the phase-space diagram describe the evolution of the system from some initial state, and hence represent the history of the system.

268 Genetic Programming – New Approaches and Successful Applications

superimposed on random variation may be an immediate reaction of a hydrologist. Another strategy to explore such possible patterns is through the application of chaos theory, more specifically through the "phase-space diagram" as shown in Figure 6 for this river data. A point in the phase-space represents the state of the system at a given time. The narrow dark band in the figure signifies strong determinism but its scattered band signifies the presence of noise and therefore there is a possibility to explain this set of data by chaos theory. The dark band signifies convergence of the trajectories of the phase-space with a fractal dimension towards the attractor of the data, where the dynamics of the system can be

reduced to a set of deterministic laws to enable the prediction of its future states.

**Figure 6.** Phase-space Diagram of Daily Suspended Sediment Data in the Mississippi River Basin

completely describe the state of the system at any moment.

Chaos theory is a method of nonlinear time series analysis and involves a host of methods, essentially based on the phase-space reconstruction of a process, from scalar or multivariate measurements of physical observables. This method is implemented in terms of Model 0 of Table 2. It is largely based on the representation of the underlying dynamics through reconstruction of phase-space, originally given by Takens, 1981. It is implemented in terms of two parameters of delay time and embedding dimension, according to which given a set of physical variables and an analytical model describing their interactions, the dynamics of the system can be represented geometrically by a single point moving on a trajectory, where each of its points corresponds to a state of the system. The phase-space diagram is essentially a co-ordinate system, whose coordinates represent the variables necessary to

One difficulty in its construction is that in most practical situations, information on every variable influencing the system may not be available. However, a time series of a single variable may be available, which may allow the construction of a (pseudo) phase-space. The idea behind such a reconstruction is that a non-linear system is characterized by selfThis paper applies chaos theory to analyse the suspended sediment load of the Mississippi River data in a similar fashion to the other modelling strategies described above. It uses the local prediction method for training and testing, as outlined below, but it is a traditional practice to apply several methods to build evidence for the existence of chaotic signals in a particular data. These techniques employ the delay-embedding parameters of and *m*, which are unknown a-priori. The following methods are used in this chapter:

1. Average Mutual Information (AMI) is used to estimate ; and the minimization of the False Nearest Neighbours to do that of the optimal values for the embedding dimension, *m*.

AMI (Fraser and Swinney, 1986) defines how the measurements *X t*( ) at time *t* are related, from an information theoretic point of view, to measurements *X t*( ) at time *t .* The average mutual information is defined as:

$$I(\tau) = \sum\_{X(i), X(i+\tau)} P(X(i), X(i+\tau)) \log[\frac{P(X(i), X(i+\tau))}{P(X(i))P(X(i+\tau))}] \tag{2a}$$

where *i* is total number of samples. *PXi* ( ( )) and *PXi* ( ( )) are marginal probabilities for measurements *X i*( ) and *X i*( ) *,* respectively, whereas *PXi* ( ( )) , *PXi* ( ( )) is their joint probability. The optimal delay time minimises the function *I*( ) *fort , X i*( ) adds the maximum information on *X i*( ) *.*

The False Nearest Neighbours procedure (Kennel et al., 1992) is a method to obtain the optimum embedding dimension for phase-space reconstruction. By checking the neighbourhood of points embedded in projection manifolds of increasing dimension, the algorithm eliminates 'false Neighbours': This means that points apparently lying close together due to projection are separated in higher embedding dimensions. when the ratio between the number of false neighbours at the dimension *m +*1 and *m* is below a given threshold, generally smaller than 5%, each *m m* ' 1 is an optimal embedding. A poor reconstruction of few embedding states with several components is obtained if *m*' is too large and the next analyses should not be performed.

2. **Correlation Dimension (CD) method**: is a nonlinear measure of the correlation between pairs lying on the attractor. For time series whose underlying dynamics is chaotic, the correlation dimension gets a finite fractional value, whereas for stochastic systems it is infinite. For an *m* -dimensional phase-space, the correlation function ( ) *C r <sup>m</sup>* is defined as the fraction of states closer than *r* (Grassberger and Procaccia, 1983; Theiler, 1986):

$$\mathcal{C}(r) = \lim\_{N \to \infty} \frac{2}{N(N-1)} \sum\_{i,j=1}^{N} H(r - \left| Y\_i - Y\_j \right|) \tag{2b}$$

where *H* is the Heaviside step function, *Yi* is the *th <sup>i</sup>* state vector, and *<sup>N</sup>* is the number of points on the reconstructed attractor. The number *w* is called Theiler window and it is the correction needed to avoid spurious results due to temporal correlations instead of dynamical ones. For stochastic time series ( ) *<sup>m</sup> Cr r <sup>m</sup>* holds, whereas for chaotic time series the correlation function scales with *r* as:

$$\mathbb{C}\_m(r) \propto r^{D\_2} \tag{2c}$$

Inter-Comparison of an Evolutionary Programming

Model of Suspended Sediment Time-Series with Other Local Models 271

*X Af* (2h)

 

*d*

*d*

*d*

  (2k)

 

(2l)

*XX X X TT T* (2i)

*XT* for which the values are already known, the coefficients, *f,* are

 

2 2

*m m m m*

2

 

1 1 1 ( 1) 1 1 ( 1) 2 2 2 ( 1) 2 2 ( 1)

( 1) ( 1)

*n n n m n n m*

In order to obtain a stable solution, the number of rows in the Jacobian matrix *A* must

 ( )! ! ! *m d*

*m d*

As stated by Porporato and Ridolfi *(1997),* even though *F-values* are first degree polynomials, the prediction is nonlinear, because during the prediction procedure every point *x t*( ) belongs to a different neighbourhood and is therefore defined by different

( , ,..., ) *p p np*

and 0 10 11 1( 1) 200 ( 1)( 1)...( 1) ( , , ,..., , ,..., ) *m d <sup>m</sup> m m f ff f f f f* (2j)

*TT T T T*

*XX X X X XX X X X*

*TT T T T*

*TT T T T*

*XX X X X*

1 2

and *A* is the *n m d md* ( )! ! ! Jacobian matrix which in its explicit form is:

*n*

The SRC model was implemented by using a simple least squares method leading to

**Model Input Training Testing**

**Table 3.** Statistical Performance of the Sediment Rating Curve for the Training and Test Periods

 *S*=13.2*Q*1.14 (3) The performance of this model is summarised in Table 3 and shown in Figure 7. Evidently, its performance is poor and the concern raised in the literature on this model is confirmed.

Model 1: *Qt* 0.76 2.62E5 4.11E5 0.82 3.89E5 4.86E5

**CC MAE RMSE CC MAE RMSE** 

Using *n* of *<sup>h</sup>*

where

satisfy:

*XT* and

*h p*

1 1

1

*A*

expressions for *f* (Koçak,1997).

**4. Setting up models and preliminary results** 

This is a sufficient justification to search for reliable models.

**4.1. Performance of sediment rating curve** 

determined by solution of the following equation:

where *D2*, correlation exponent, quantifies the degrees of freedom of the process, and defined by:

$$D\_2 = \lim\_{r \to 0} \frac{\ln C(r)}{\ln r} \tag{2d}$$

and can be reliably estimated as the slope in the ln ( ) *C r <sup>m</sup>* vs. ln( )*r* plot.

3. **Local Prediction Model**: The author's implementation of the local prediction method for deterministic chaos is details in Khatibi et al (2011) but the overview is that a correct phase-space reconstruction in a dimension *m* facilitates an interpretation of the underlying dynamics in the form of an *m*-dimensional map, *<sup>T</sup> f* ,according to

$$\mathcal{Y}\_{j+T} = f\_T(\mathcal{Y}\_j) \tag{2e}$$

where *Yj* and *Yj T* are vectors of dimension *m*, describing the state of the system at times *j* (i.e. the current state) and *j T* (i.e. the future state), respectively. The problem then is to find an appropriate expression for *<sup>T</sup> f* (i.e. *<sup>T</sup> F* ). Local approximation entails the subdivision of the *<sup>T</sup> f* domain into many subsets (neighbourhoods), each of which identifies some approximations *<sup>T</sup> F* ,valid only in that same subset. In other words, the dynamics of the system is described step-by-step locally in the phase-space. In this *m*dimensional space, prediction is performed by estimating the change of *Xi* with time, which are observed values of discrete scalar timeseries, with delay coordinates in the *m*dimensional phase space. The relation between the points *Xt* and *Xt p* (the behaviour at a future time *p* on the attractor) is approximated by function *F* as:

$$X\_{t+p} \cong F(X\_t) \tag{2f}$$

In this prediction method, the change of *Xt* with time on the attractor is assumed the same as those of nearby points, ( , 1,2,..., ) *<sup>h</sup> Xh n <sup>T</sup>* . Herein, *Xt p* is determined by the *d*th order polynomial ( )*<sup>t</sup> F X* as follows (Itoh, 1995):

$$\mathbf{x}\_{t+p} \cong f\_0 + \sum\_{k\_1=0}^{m-1} f\_{1k\_1} \mathbf{X}\_{t-k\_1 \tau} + \sum\_{\substack{k\_2=k\_1\\k\_1=0}}^{m-1} f\_{2k\_1k\_2} \mathbf{X}\_{t-k\tau\_1} \mathbf{X}\_{t-k\_2 \tau} + \dots + \sum\_{\substack{k\_d=k\_{d-1}\\k\_d=k\_d}}^{m-1} f\_{dk\_1k\_2\dots k\_d} \mathbf{X}\_{t-k\_1 \tau} \mathbf{X}\_{t-k\_2 \tau} \dots \mathbf{X}\_{t-k\_d \tau} \tag{29}$$
 
$$\vdots$$
 
$$\underset{\underset{k\_2=k\_1}{k\_2}}$$

Using *n* of *<sup>h</sup> XT* and *h p XT* for which the values are already known, the coefficients, *f,* are determined by solution of the following equation:

$$X \cong Af \tag{2h}$$

where

270 Genetic Programming – New Approaches and Successful Applications

series the correlation function scales with *r* as:

*Yi* is the *th*

points on the reconstructed attractor. The number *w* is called Theiler window and it is the correction needed to avoid spurious results due to temporal correlations instead of dynamical ones. For stochastic time series ( ) *<sup>m</sup> Cr r <sup>m</sup>* holds, whereas for chaotic time

where *D2*, correlation exponent, quantifies the degrees of freedom of the process, and

ln ( ) lim *<sup>r</sup>* ln *C r <sup>D</sup>*

 <sup>2</sup> <sup>0</sup>

3. **Local Prediction Model**: The author's implementation of the local prediction method for deterministic chaos is details in Khatibi et al (2011) but the overview is that a correct phase-space reconstruction in a dimension *m* facilitates an interpretation of the

where *Yj* and *Yj T* are vectors of dimension *m*, describing the state of the system at times *j* (i.e. the current state) and *j T* (i.e. the future state), respectively. The problem

dynamics of the system is described step-by-step locally in the phase-space. In this *m*dimensional space, prediction is performed by estimating the change of *Xi* with time, which are observed values of discrete scalar timeseries, with delay coordinates in the *m*dimensional phase space. The relation between the points *Xt* and *Xt p* (the behaviour

In this prediction method, the change of *Xt* with time on the attractor is assumed the

 

*x f fX f X X f XX X* (2g)

1 1 12 1 2 1 2 1 2

0 .

*t p k tk kk t k t k dk k k t k t k t k*

*f* (i.e. *<sup>T</sup>*

*f* domain into many subsets (neighbourhoods), each of which

and can be reliably estimated as the slope in the ln ( ) *C r <sup>m</sup>* vs. ln( )*r* plot.

underlying dynamics in the form of an *m*-dimensional map, *<sup>T</sup>*

at a future time *p* on the attractor) is approximated by function *F* as:

*m m m*

1 1 1 0 1 2 ...

*k k k k k*

1 2 1 1

then is to find an appropriate expression for *<sup>T</sup>*

same as those of nearby points, ( , 1,2,..., ) *<sup>h</sup>*

order polynomial ( )*<sup>t</sup> F X* as follows (Itoh, 1995):

0

1

*k*

subdivision of the *<sup>T</sup>*

identifies some approximations *<sup>T</sup>*

*i* state vector, and *N* is the number of

<sup>2</sup> ( ) *<sup>D</sup> Cr r <sup>m</sup>* (2c)

*<sup>r</sup>* (2d)

*f* ,according to

*F* ). Local approximation entails the

 

( ) *Y fY jT T j* (2e)

*F* ,valid only in that same subset. In other words, the

( ) *X FX tp t* (2f)

*Xh n <sup>T</sup>* . Herein, *Xt p* is determined by the *d*th

... ... *d d*

 

*k k k*

. . 0

2 1 1

*d d*

where *H* is the Heaviside step function,

defined by:

$$X = (X\_{T\_{1+p}}, X\_{T\_{2+p}}, \dots, X\_{T\_{n+p}}) \tag{2i}$$

$$\text{and}\\
\qquad f = (f\_{0'}, f\_{10'}, f\_{11'}, \dots, f\_{1(m-1)'}, f\_{200'}, \dots, f\_{d(m-1)(m-1)\dots(m-1)})\tag{2}$$

and *A* is the *n m d md* ( )! ! ! Jacobian matrix which in its explicit form is:

$$A = \begin{vmatrix} \mathbf{1} & \mathbf{X}\_{T\_1} & \mathbf{X}\_{T\_{1-r}} & \dots & \mathbf{X}\_{T\_{1-(m-1)r}} & \mathbf{X}\_{T\_1}^2 & \dots & \mathbf{X}\_{T\_{1-(m-1)r}}^d \\ \mathbf{1} & \mathbf{X}\_{T\_2} & \mathbf{X}\_{T\_{2-r}} & \dots & \mathbf{X}\_{T\_{2-(m-1)r}} & \mathbf{X}\_{T\_2}^2 & \dots & \mathbf{X}\_{T\_{2-(m-1)r}}^d \\ \vdots & & \vdots & & \vdots & & \vdots \\ \mathbf{1} & \mathbf{X}\_{T\_n} & \mathbf{X}\_{T\_{n-r}} & \dots & \mathbf{X}\_{T\_{n-(m-1)r}} & \mathbf{X}\_{T\_n}^2 & \dots & \mathbf{X}\_{T\_{n-(m-1)r}}^d \end{vmatrix}\_{\mathbf{I} \times (m-1)r} \tag{2k}$$

In order to obtain a stable solution, the number of rows in the Jacobian matrix *A* must satisfy:

$$m \geq \frac{(m+d)!}{m!d!} \tag{2l}$$

As stated by Porporato and Ridolfi *(1997),* even though *F-values* are first degree polynomials, the prediction is nonlinear, because during the prediction procedure every point *x t*( ) belongs to a different neighbourhood and is therefore defined by different expressions for *f* (Koçak,1997).

#### **4. Setting up models and preliminary results**

#### **4.1. Performance of sediment rating curve**

The SRC model was implemented by using a simple least squares method leading to

$$\mathbf{S} = \mathbf{1} \mathbf{3}.\\\mathbf{2} \mathbf{Q}^{1.14} \tag{3}$$

The performance of this model is summarised in Table 3 and shown in Figure 7. Evidently, its performance is poor and the concern raised in the literature on this model is confirmed. This is a sufficient justification to search for reliable models.


**Table 3.** Statistical Performance of the Sediment Rating Curve for the Training and Test Periods

272 Genetic Programming – New Approaches and Successful Applications

**Figure 7.** Comparison of Observed Suspended Sediment with that Modelled by SRC; (a) hydrograph, (b) cumulative values

#### **4.2. Implementation of GEP**

The preliminary investigation for the construction of a relationship between flows and suspended sediment in GEP requires: (i) the setting of the functions, as discussed below; (ii) the fitness function; and (iii) a range of other parameters, but the default values, given in Table 11, were sufficient in this study. The following functions were investigated:

$$\{\star\_{\prime}, \star\}\tag{4a}$$

Inter-Comparison of an Evolutionary Programming

Model of Suspended Sediment Time-Series with Other Local Models 273

{+,-,, *x*} (4b)

{+,-,, *x*2} (4c)

{+,-,, *x*3} (4d)

{+,-,, *ex*} (4e)

{+,-,, ln(*x*)} (4f)

The performance of each function was investigated in terms of CC, MAE, and RMSE and the results are shown in Table 4.a for the training periods. The results show that (i) the model performances are more sensitive to the choice of independent variables than the function choices; (ii) the models not including suspended sediment time series perform poorly; and (iii) the model performance is not overly sensitive to the choice of the function. Appendix I, Table 11 specifies the fitness function to be Root Relative Squared

**Model** *Qt Qt , St-1 Qt ,Qt-1 Qt , Qt-1 , St-1 Qt,Qt-1,Qt-2 Qt,Qt-1,Qt-2,St-1*

**CC** 0.77 0.99 0.78 0.99 0.78 0.99 **MAE** 2.79E5 3.88E4 2.78E5 3.25E4 2.79E5 3.37E4 **RMSE** 4.13E5 8.43E4 4.07E5 7.74E4 4.05E5 7.98E4

**CC** 0.77 0.99 0.77 0.99 0.77 0.99 **MAE** 2.82E05 3.87E04 2.81E05 3.80E04 2.78E05 3.27E04 **RMSE** 4.15E05 8.42E04 4.14E05 8.36E04 4.10E05 7.75E04

**CC** 0.77 0.99 0.78 0.99 0.78 0.99 **MAE** 2.82E05 3.89E04 2.78E05 3.25E04 2.76E05 3.45E04 **RMSE** 4.15E05 8.43E04 4.08E05 7.74E04 4.05E05 8.02E04

**CC** 0.77 0.99 0.77 0.99 0.77 0.99 **MAE** 2.43E5 3.89E4 2.76E5 3.21E4 2.76E5 3.41E4 **RMSE** 4.05E5 8.43E4 4.12E5 7.76E4 4.13E5 8.03E4

**CC** 0.77 0.99 0.77 0.99 0.77 0.99 **MAE** 2.81E5 3.88E4 2.81E5 3.56E4 2.42E5 3.64E4 **RMSE** 4.15E5 8.43E4 4.14E5 8.16E4 4.00E5 8.23E4

**CC** 0.76 0.99 0.76 0.99 0.78 0.99 **MAE** 2.56E5 3.89E4 2.64E5 3.25E4 2.60E5 3.21E4 **RMSE** 4.09E5 8.42E4 4.10E5 7.72E4 4.02E5 7.72E4

**Table 4. a.** Statistical Performance of a Selection of Functions for the Training Period

Errors (RRSE).

**4.a):**  {+,-,}

**(4.b):**  {+,-,, *x*}

**(4.c):**  {+,-,, *x*2}

**(4.d):**  {+,-,,*x*3}

**(4.e):**  {+,-,,e*x*}

**(4.f):**  {+,-,,ln(*x*)}

$$\{\mathbf{+}\_{r}, \mathbf{x}\_{r}, \mathbf{x}\}\tag{4b}$$

$$\{\star\_{\prime}, \times, \chi^2\}\tag{4c}$$

$$\{\star\_{\prime}, \times\_{\prime}, \mathbf{x}^{3}\}\tag{4d}$$

$$\{+\_{"\prime}, \times\_{\prime} c^{a}\}\tag{4e}$$

$$\{+\_{"\prime}, \times, \ln(x)\}\tag{4f}$$

The performance of each function was investigated in terms of CC, MAE, and RMSE and the results are shown in Table 4.a for the training periods. The results show that (i) the model performances are more sensitive to the choice of independent variables than the function choices; (ii) the models not including suspended sediment time series perform poorly; and (iii) the model performance is not overly sensitive to the choice of the function. Appendix I, Table 11 specifies the fitness function to be Root Relative Squared Errors (RRSE).

272 Genetic Programming – New Approaches and Successful Applications

**Figure 7.** Comparison of Observed Suspended Sediment with that Modelled by SRC; (a) hydrograph,

The preliminary investigation for the construction of a relationship between flows and suspended sediment in GEP requires: (i) the setting of the functions, as discussed below; (ii) the fitness function; and (iii) a range of other parameters, but the default values, given in

{+,-,} (4a)

Table 11, were sufficient in this study. The following functions were investigated:

(b) cumulative values

**4.2. Implementation of GEP** 


**Table 4. a.** Statistical Performance of a Selection of Functions for the Training Period

The performance of the GEP model is presented in Table 4.b, according to which there is not much to differences between performances of a number of the alternative models but (4.e) is selected in this study for the prediction purposes (its expression tree is given in Figure 4) and given below.


$$S\_t = S\_{t-1} + 16.77Q\_t - 16.77Q\_{t-1} - 13.87 \tag{5}$$

Inter-Comparison of an Evolutionary Programming

Model of Suspended Sediment Time-Series with Other Local Models 275

**Figure 8.** Comparison of Observed Suspended Sediment with that Modelled by GEP; (a) hydrograph,

ANN implements another AI approach to the data represented in Figure 2 by another strategy, as described in Section 3.4.3. A preliminary investigation was carried out to make decisions on the choice of the models given in Table 2 (Models 1-6) and the ANN structure in terms of the neuron structure of the various layers. Table 5 presents model structures investigated. The preliminary modelling task also included a normalisation function for the

(b) cumulative values

**4.3. Implementation of ANN** 

**Table 4 b.** Statistical analysis of the estimated values for the test period

Figure 8 compares modelled suspended sediment against their observed values, according to which the improvement by GEP is remarkable compared with SRC. Overall, the GEP modelling results follow observed values rather faithfully both in values and patterns, although there are still discrepancies in predicted values.

**Figure 8.** Comparison of Observed Suspended Sediment with that Modelled by GEP; (a) hydrograph, (b) cumulative values

#### **4.3. Implementation of ANN**

274 Genetic Programming – New Approaches and Successful Applications

and given below.

**(4.a):**  {+,-,}

**(4.b)**  {+,-,,*x*}

**(4.c)**  {+,-,,*x*2}

**(4.d):**  {+,-,,*x*3}

**(4.e):**  {+,-,,e*x*}

**(4.e):**  {+,-,,ln(*x*)}

The performance of the GEP model is presented in Table 4.b, according to which there is not much to differences between performances of a number of the alternative models but (4.e) is selected in this study for the prediction purposes (its expression tree is given in Figure 4)

**Model** *Qt Qt , St-1 Qt ,Qt-1 Qt , Qt-1 , St-1 Qt,Qt-1,Qt-2 Qt,Qt-1,Qt-2,St-1*

**CC** 0.83 0.99 0.84 0.99 0.84 0.99

**MAE** 3.99E5 2.34E4 4.01E5 2.32E4 4.08E5 2.08E4

**RMSE** 4.72E5 3.87E4 4.73E5 4.06E4 4.79E5 3.75E4

**CC** 0.83 0.99 0.84 0.99 0.84 0.99

**MAE** 3.99E05 2.33E04 4.01E05 2.28E04 3.96E05 2.34E04

**RMSE** 4.69E05 3.87E04 4.70E05 3.08E04 4.65E05 4.04E04

**CC** 0.83 0.99 0.84 0.99 0.84 0.99

**MAE** 4.00E05 2.35E04 4.01E05 2.32E04 3.79E05 2.31E04

**RMSE** 4.69E05 3.86E04 4.71E05 4.06E04 4.67E05 3.96E04

**CC** 0.83 0.99 0.83 0.99 0.83 0.99 **MAE** 3.87E5 2.35E4 4.00E5 2.10E4 3.96E5 2.06E4

**RMSE** 4.94E5 3.87E4 4.75E5 3.81E4 4.69E5 3.69E4

**CC** 0.83 0.99 0.84 0.99 0.84 0.99 **MAE** 3.98E5 2.35E4 3.97E5 2.06E4 3.80E5 2.17E4

**RMSE** 4.67E5 3.86E4 4.66E5 3.64E4 4.84E5 3.73E4

**CC** 0.83 0.99 0.83 0.99 0.83 0.99

**MAE** 3.90E5 2.35E4 3.90E5 2.40E4 3.81E5 2.29E4

**RMSE** 4.93E5 3.86E4 4.81E5 4.18E4 4.69E5 4.05E4

Figure 8 compares modelled suspended sediment against their observed values, according to which the improvement by GEP is remarkable compared with SRC. Overall, the GEP modelling results follow observed values rather faithfully both in values and patterns,

**Table 4 b.** Statistical analysis of the estimated values for the test period

although there are still discrepancies in predicted values.

�� � ���� � 1�.77�� − 1�.77���� − 13.87 (5)

ANN implements another AI approach to the data represented in Figure 2 by another strategy, as described in Section 3.4.3. A preliminary investigation was carried out to make decisions on the choice of the models given in Table 2 (Models 1-6) and the ANN structure in terms of the neuron structure of the various layers. Table 5 presents model structures investigated. The preliminary modelling task also included a normalisation function for the

data. In this study, MATLAB was employed to develop the ANN model and its *mapstd* function was selected for the normalisation (further defaults values are given in Table 11). The investigated ANN model structures are defined in Table 5 and their results for both the training and testing periods are presented in Table 6.

Inter-Comparison of an Evolutionary Programming

Model of Suspended Sediment Time-Series with Other Local Models 277

**Model Training Model Testing** 

**Model Inputs CC MAE RMSE CC MAE RMSE** 

A visual assessment for the existence of chaotic behaviour in the suspended sediment time series was presented in Figure 9, although it was not conclusive evidence but just invoked the possibility of the existence of a low-dimensional chaos. Traditionally, several techniques are employed to show the existence of low-dimensional chaos and below the results of the

1. Using the AMI method, the delay time, is estimated for the data as the intercept with the x-axis of the curves by plotting the values of the AMI evaluated by the TISEAN package (Hegger *et al.*, 1999) against delay times progressively increased from 1 to 100. The value of delay time is calculated as the first (local) minimum in the variation of AMI against varying delay time from 1 to 100 day. The results are shown in Figure (9.a), signifying a well-defined first minimum at delay time of 94 day. The delay time is then used in the determination of the sufficient embedding dimension using the percentage of false nearest neighbours for the time series. Figure (9.b) shows the results of the false nearest neighbours method for embedding dimension *m*, by allowing it to vary from 1

2. The presence of chaotic signals in the data is further confirmed by the correlation dimension method. Figure (10.a) shows the relationship between correlation function *C*(*r*) and radius r (i.e. ln*C*(*r*) versus ln(*r*)) for increasing m, whereas Figure (10.b) shows the relationship between the correlation dimension values *D*2(*m*) and the embedding dimension values m. It can be seen from Figure (10.b) that the value of correlation exponent increases with the embedding dimension up to a certain value and then saturates beyond it. The saturation of the correlation exponent is an indication of the existence of deterministic dynamics. The saturated correlation dimension is 3.5, (*D*2=3.5). The value of correlation dimension also suggests the possible presence of chaotic behaviour in the dataset. The nearest integer above the correlation dimension

value (*D2*=4) is taken as the minimum dimension of the phase space.

**Table 7.** Statistical analysis of the estimated values for the train and test period

determination of the dimensions of the phase-state diagram are given:

**4.5. Implementation of the deterministic chaos model** 

to 40 and hence its value is 28.

*Qt* 0.77 2.41E5 4.05E5 0.833 3.85E5 4.96E5 *Qt , St-1* **0.994 3.90E4 8.40E4 0.988 2.40E4 3.90E4**  *Qt ,Qt-1* 0.78 2.40E5 3.97E5 0.837 3.85E5 4.98E5 *Qt , Qt-1 , St-1* 0.993 3.30E4 7.60E4 0.987 2.50E4 4.10E4 *Qt,Qt-1,Qt-2* 0.779 2.40E5 3.95E5 0.840 3.84E5 4.96E5 *Qt,Qt-1,Qt-2,St-1* 0.993 3.30E4 7.70E4 0.987 2.50E4 4.10E4


**Table 5.** ANN Structure (number of nodes in layers)

The performances of Models 1-6 are shown in Table 6 in terms of the values of three statistical indices of CC, MAE and RMSE. The performance of different models in terms of CC is remarkably high but Model 4 (*Qt , Qt-1 , St-1*) produce less deviations, which is selected for the final run.


**Table 6.** Statistical Performance of the Selected Model Structure for the Training and Testing periods

#### **4.4. Implementation of the MLR model**

The MLR modelling strategy was implemented using Mathematica to derive regression coefficients for both periods of model fitting (training in the AI terminology) and testing using different statistical indices (CC, MAE and RMSE) given in Table 7, which shows that Model 2 (*Qt , St-1*) performs relatively better than the others. The regression equation suggested by this technique is given by:

$$\mathbf{S}\_{t} = 0.24\mathbf{Q}\_{t} + \mathbf{30.99S}\_{t-1} \tag{6}$$


**Table 7.** Statistical analysis of the estimated values for the train and test period

#### **4.5. Implementation of the deterministic chaos model**

276 Genetic Programming – New Approaches and Successful Applications

training and testing periods are presented in Table 6.

**Table 5.** ANN Structure (number of nodes in layers)

**4.4. Implementation of the MLR model** 

suggested by this technique is given by:

for the final run.

data. In this study, MATLAB was employed to develop the ANN model and its *mapstd* function was selected for the normalisation (further defaults values are given in Table 11). The investigated ANN model structures are defined in Table 5 and their results for both the

**Model Identifier Model Inputs Training Testing**

Model 1 *Qt* 2-5-1 2-5-1 Model 2 *Qt , St-1* 3-5-1 3-5-1 Model 3 *Qt ,Qt-1* 3-7-1 3-7-1 Model 4 *Qt , Qt-1 , St-1* 4-6-1 4-6-1 Model 5 *Qt,Qt-1,Qt-2* 4-9-1 4-9-1 Model6 *Qt,Qt-1,Qt-2,St-1* 5-12-1 5-12-1

The performances of Models 1-6 are shown in Table 6 in terms of the values of three statistical indices of CC, MAE and RMSE. The performance of different models in terms of CC is remarkably high but Model 4 (*Qt , Qt-1 , St-1*) produce less deviations, which is selected

**Model Inputs CC MAE RMSE CC MAE RMSE** 

**Table 6.** Statistical Performance of the Selected Model Structure for the Training and Testing periods

The MLR modelling strategy was implemented using Mathematica to derive regression coefficients for both periods of model fitting (training in the AI terminology) and testing using different statistical indices (CC, MAE and RMSE) given in Table 7, which shows that Model 2 (*Qt , St-1*) performs relatively better than the others. The regression equation

�� � ������ � ��� ������ (6)

*Qt* 0.999 2.32E4 2.70E4 0.999 2.16E4 2.30E4 *Qt , St-1* 0.999 2.59E4 3.12E4 0.996 2.17E4 2.64E4 *Qt ,Qt-1* 0.999 2.00E4 2.79E4 0.981 4.19E4 4.84E4 *Qt , Qt-1 , St-1* **0.999 2.01E4 2.51E4 0.998 1.18E4 1.37E4**  *Qt,Qt-1,Qt-2* 0.991 7.57E4 8.42E4 0.942 8.47E4 8.41E4 *Qt,Qt-1,Qt-2,St-1* 0.995 5.66E4 6.42E4 0.976 4.63E4 5.47E4

**Model Training Model Testing** 

A visual assessment for the existence of chaotic behaviour in the suspended sediment time series was presented in Figure 9, although it was not conclusive evidence but just invoked the possibility of the existence of a low-dimensional chaos. Traditionally, several techniques are employed to show the existence of low-dimensional chaos and below the results of the determination of the dimensions of the phase-state diagram are given:


3. Local prediction algorithm is used to predict suspended sediment time series. The procedure involves varying the value of the embedding dimension in a range, say 3-8, and estimating the CC and RMSE. The embedding function with the highest coefficient of correlation is selected as the solution. These are given in Table 8 for Mississippi River basin for the dataset with daily time interval, as well as a selection of other time steps. It shows that the best predictions are achieved when the embedding dimension is *m*=3 produce the best results.

Inter-Comparison of an Evolutionary Programming

Model of Suspended Sediment Time-Series with Other Local Models 279

**5. Inter-comparison of the models and discussion of results** 

**Model Performance Model Structure Outcome Comments** 

**Table 9.** Qualitative Overview of the Performances of Various Modelling Strategies

**GEP** Good Model 4 Eq. (4.e)

**MLR** Good Model 2 Eq. (6)

(closest to observed), and SRC (poor)

**SRC** Poor Model 1 Eq. (3) For rough-and-ready estimates

**ANN** Good Model 4 The model is bounded to software

**Chaos** Good Model 0 Needs expertise to implement

**Figure 11.** Model Predictions for Suspended Sediment – Performances of GP, ANN, MLR, Chaos

choose between the other models, although the performance of ANN stands out.

Scatter diagrams are also a measure of performance. These are presented in Figures 12, which provides a quantitative basis that (i) SRC performs poorly and (ii) there is little to

another.

Table 9 summarises the performance and main features of each and all of the modelling strategies. The results presented so far confirms the experience that the traditional SRC model performs poorly and may only be used for rough-and-ready assessments. However, the results by the GEP model show that considerable improvements are likely by using it. This section also analyses the relative performance of the various modelling strategies. An overall visual comparison of all the results is presented in Figure 11, according to which GEP, ANN, MLR and local prediction models perform remarkably well and similar to one

**Figure 9.** Analysis of the Phase-Space Diagram of Suspended Sediment Data in the Mississippi River basin; (9.a): Average Mutual Information; (9.b) Percentage of false nearest neighbours


**Table 8.** Local Prediction Using Different Embedding Dimension for the Mississippi River Dataset

**Figure 10.** Correlation Dimension Method to Identify the Presence of Chaos Signal in the Dataset; (10.a): Convergence of log*C*(*r*) versus log(*r*); (10.b): saturation of correlation dimension *D*2(*m*) with embedding dimension m – this signifies chaotic signals in the Dataset
