**5. Inter-comparison of the models and discussion of results**

278 Genetic Programming – New Approaches and Successful Applications

produce the best results.

3. Local prediction algorithm is used to predict suspended sediment time series. The procedure involves varying the value of the embedding dimension in a range, say 3-8, and estimating the CC and RMSE. The embedding function with the highest coefficient of correlation is selected as the solution. These are given in Table 8 for Mississippi River basin for the dataset with daily time interval, as well as a selection of other time steps. It shows that the best predictions are achieved when the embedding dimension is *m*=3

**Figure 9.** Analysis of the Phase-Space Diagram of Suspended Sediment Data in the Mississippi River

**Table 8.** Local Prediction Using Different Embedding Dimension for the Mississippi River Dataset

**Figure 10.** Correlation Dimension Method to Identify the Presence of Chaos Signal in the Dataset; (10.a): Convergence of log*C*(*r*) versus log(*r*); (10.b): saturation of correlation dimension *D*2(*m*) with

embedding dimension m – this signifies chaotic signals in the Dataset

**m CC RMSE 3 0.988 4.00E4**  4 0.988 4.10E4 5 0.986 4.30E4 6 0.985 4.60E4 7 0.986 4.40E4 8 0.987 4.20E4

basin; (9.a): Average Mutual Information; (9.b) Percentage of false nearest neighbours

Table 9 summarises the performance and main features of each and all of the modelling strategies. The results presented so far confirms the experience that the traditional SRC model performs poorly and may only be used for rough-and-ready assessments. However, the results by the GEP model show that considerable improvements are likely by using it. This section also analyses the relative performance of the various modelling strategies. An overall visual comparison of all the results is presented in Figure 11, according to which GEP, ANN, MLR and local prediction models perform remarkably well and similar to one another.


**Table 9.** Qualitative Overview of the Performances of Various Modelling Strategies

**Figure 11.** Model Predictions for Suspended Sediment – Performances of GP, ANN, MLR, Chaos (closest to observed), and SRC (poor)

Scatter diagrams are also a measure of performance. These are presented in Figures 12, which provides a quantitative basis that (i) SRC performs poorly and (ii) there is little to choose between the other models, although the performance of ANN stands out.

Inter-Comparison of an Evolutionary Programming

**(ton/year) Dif. Val. (%)** 

Model of Suspended Sediment Time-Series with Other Local Models 281

**Estimation Val.** 

presented in Table 10. The table show that the traditional SRC model is in error by as much as nearly 50% but the other models perform well, among which the error in the performance of ANN is the lowest. It is also noted that, despite the good performance of ANN models, it is not transferrable, like the GEP models. The implementation of both ANN and

**SRC** 1.65E8 3.06E8 +46 % **GEP** 1.65E8 1.65E8 - 0.4 % **ANN** 1.65E8 1.64E8 - 0.3 % **MLR** 1.65E8 1.66E8 +0.6 % **Chaos** 1.65E8 1.66E8 +0.7 % **Table 10.** Total Volume of Suspended Sediment Predicted by each of the Models at Gauging Station for

The chapter presents the performance of the GEP model, as a variation of evolutionary programming, to forecast suspended sediment load of the Mississippi River, the USA. GEP is just a modelling strategy, where any other relevant strategy is just as valid if its performance is satisfactory. The overall results show that the information contained in the

1. **Evolutionary computing**: this produced a formula to forecast the future values in terms of recorded values of flows and suspended sediment. The results show that the strategy

2. **Emulation of the working of the brain**: this successfully fitted an inbuilt polynomial to the data. It performs better than the other tested models but is not readily transferrable

3. **Regression analysis**: this produced a regression equation, according to which the future values would regress towards average recorded values, in spite of the presence of noise. 4. **Deterministic chaos**: this produced future values of suspended sediment load by identifying an attractor towards which the system performance would converge even

The only common feature in the above modelling strategies is their use of optimisation techniques. Otherwise, they are greatly different from one another but remarkably, they produce models fit for purpose and can explain the data. Undoubtedly, the data can be explained by many more sets of equations or by other possible strategies. This emphasises that models are just tools and the modelling task is to test the performance of the various models to add confidence to the results. Yet the poor performance of the traditional SRC

A review of the data (in Section 3) shows that the overall contribution of the datapoints in the test period is average; its individual characteristics in terms of kurtosis shows that the annual hydrographs are less peaked and more flat but at the same time the suspended sediment load during the year was significantly high. Thus, the minimum values during this

deterministic chaos models require considerable expertise.

**(ton/year)**

observed data can be treated by the following modelling strategies:

as it resides in particular software applications.

when the internal system behaves erratically.

can be successful in identifying a number of different formulae.

underlines the fact that a good performance cannot be taken for granted.

**Model Actual Val.** 

the Mississippi River basin

**Figure 12.** Scatter between Modelled and Observed Suspended Sediment Load

The relative performances of GEP, ANN, MLR and local prediction models are not still visible from Figure 12 and therefore attention is focused on the differences between the GEP and ANN models with respect to their corresponding observed values. Figures 13 shows the respective results for both the GEP and ANN models and that of ANN is remarkable, as the differences are nearly zero. It may be reported that those of local prediction model and MLR are very close to that of GEP.

**Figure 13.** Performances of the ANN and GEP Models – y-ordinates: observed – modelled values

Due to the importance of the volume of transported sediment, the total predicted values are also compared with that of the observed values for the testing period and the results are presented in Table 10. The table show that the traditional SRC model is in error by as much as nearly 50% but the other models perform well, among which the error in the performance of ANN is the lowest. It is also noted that, despite the good performance of ANN models, it is not transferrable, like the GEP models. The implementation of both ANN and deterministic chaos models require considerable expertise.

280 Genetic Programming – New Approaches and Successful Applications

**Figure 12.** Scatter between Modelled and Observed Suspended Sediment Load

are very close to that of GEP.

The relative performances of GEP, ANN, MLR and local prediction models are not still visible from Figure 12 and therefore attention is focused on the differences between the GEP and ANN models with respect to their corresponding observed values. Figures 13 shows the respective results for both the GEP and ANN models and that of ANN is remarkable, as the differences are nearly zero. It may be reported that those of local prediction model and MLR

**Figure 13.** Performances of the ANN and GEP Models – y-ordinates: observed – modelled values

Due to the importance of the volume of transported sediment, the total predicted values are also compared with that of the observed values for the testing period and the results are


**Table 10.** Total Volume of Suspended Sediment Predicted by each of the Models at Gauging Station for the Mississippi River basin

The chapter presents the performance of the GEP model, as a variation of evolutionary programming, to forecast suspended sediment load of the Mississippi River, the USA. GEP is just a modelling strategy, where any other relevant strategy is just as valid if its performance is satisfactory. The overall results show that the information contained in the observed data can be treated by the following modelling strategies:


The only common feature in the above modelling strategies is their use of optimisation techniques. Otherwise, they are greatly different from one another but remarkably, they produce models fit for purpose and can explain the data. Undoubtedly, the data can be explained by many more sets of equations or by other possible strategies. This emphasises that models are just tools and the modelling task is to test the performance of the various models to add confidence to the results. Yet the poor performance of the traditional SRC underlines the fact that a good performance cannot be taken for granted.

A review of the data (in Section 3) shows that the overall contribution of the datapoints in the test period is average; its individual characteristics in terms of kurtosis shows that the annual hydrographs are less peaked and more flat but at the same time the suspended sediment load during the year was significantly high. Thus, the minimum values during this

year were significantly above the average but persistent and though less dynamic. However, all the four modelling strategies coped well with these data peculiarities. If the data during the test period have a more pronounced feature not very common during the training period, the various local modelling strategies are likely to perform poorly in their own unique way and one of the greatest tasks of research in modelling should be investigations to understand these unique features and not to sweep them under the carpet.

Inter-Comparison of an Evolutionary Programming

Model of Suspended Sediment Time-Series with Other Local Models 283

*ai* Values Called Regression

*Τ* Delay Time *Cm*(*r*) Fraction of states *H* Heaviside Step *N* Number of Points *D*<sup>2</sup> Correlation Exponent *Yj* Vectors of Dimension *M* Dimensional phase Step

*A* Jacobean Matrix *x*(*t*) different neighbors *R* Radius Spherical *C*(*r*) Correlation Function

**7. Appendix** 

**SRC Sediment Rating Curve** MLR Multi Linear Regression *Qt* Discharge Series *St* Sediment Series

MLR *Xi* Term of Various Model

Table 11 Defaults Values Employed in Implementing GEP and ANN Models

IS Transposition 0.1 RIS Transposition 0.1 1-point Recombination 0.3 2-point Recombination 0.3 Gene Recombination 0.1 Gene Transposition 0.1 Population (Chromosome) size 30

> Head Size 7 Number of Genes 3 Linking Function Addition

Random Numerical Constants Yes Number of generation 1000 Arithmetic functions (4.a)-(4.f)

**Table 11.** Default Parameter Values Used by the Model

**GP ANN Training parameters Values Training parameters Values** 

Fitness Function RRSE: RRSE: Root Relative Squared Errors

Crossover rate 0.1 Goal Mean Square Error Mutation rate 0.044 Epochs 10 - 100 Inversion 0.1 Training algorithm Trainlm

**Symbols** 

Chaos

**Appendix I** 

A general view projected by the investigation in this chapter is that the performance of modelling techniques must not be the only basis of practical applications. Equal attention must also be paid to the quality of the data used. If the data suffers from inherent uncertainties, no good model will compensate for the inherent shortfalls.
