**10. Case study II: Comparison of data-driven modelling techniques for river flow forecasting**

In the case study GP was used for prediction of average daily flow values one day in advance at two locations, Rajghat and Mandaleshwar, in the Narmada basin, India using the previous values of measured streamflows at these two locations. The observations of daily average stream flow values at both these stations for the years 1987–1997 were obtained from the Central Water Commission, Narmada Division, Bhopal, India. Considering the variations in daily stream flow values four separate models for the monsoon months of July, August, September and October were prepared along with the one separate but common model for the non monsoon months of November–June. Thus five models were developed in all for each station (total 10 models) to predict discharge at one day in advance. In a view of fair judgment along with GP, ANN and Model trees approach was also employed to develop the models. The number of antecedent discharge values which were used for predicting discharge one day in advance was decided by carrying out the auto-correlation analysis.

The GP models were developed with major fitness function of mean squared error, initial population size of (2048), mutation frequency of (95%) and the cross-over frequency of (53%) with same data division for both ANN and GP models so that their results could be compared. All the developed forecasting models were tested for unseen inputs and their qualitative and quantitative performance was judged by means of correlation coefficient (r) between the observed and forecasted values along with root mean square error (RMSE) and plotting scatter plots between the same. Hydrographs were also plotted to visualize the behavior of the forecasting models particularly for extreme events (peaks).

Genetic Programming: A Novel Computing Approach in Modeling Water Flows 223

The Mandaleshwar models behaved in a similar fashion as that of the Rajghat models with correlation coefficients of r > 0.7 for all ANN, GP and MT models. For the month of August the performance of all models was reasonable with r values of 0.74, 0.78 and 0.71 for ANN, GP and MT models respectively. The other monthly models of ANN, GP and MT also performed well, with high correlation coefficients in testing (r > 0.86). It was again observed that GP models work better while predicting extreme events. The maximum observed discharge of 3790 m3/s was predicted as 1742 m3/s by the ANN model, 3342 m3/s by the GP model and 1718 m3/s by the MT model. Figure 7 shows discharge hydrographs for the ManNov-June models. The RMSE values also showed a similar trend to that of the

Thus it was seen that the GP technique outperforms both ANN and MT in almost all the cases in terms of overall accuracy in prediction. The GP approach based on evolutionary principles has a completely different approach to the ANN technique in that it does not involve any transfer function, and evolves generations of "offspring" based on the "fitness criteria" and genetic operations; this seems to capture the underlying trends better than the ANN technique. Thus it can be said that ANN and MT perform almost equally but GP performed better than both of them where prediction accuracy in both normal and extreme

Applications of GP for modeling water flows were discussed in the preceding sections of this chapter. It may be noted that every attempt is made to provide readers the details of GP techniques and their parameters employed in each work. However in view of keeping the length of the chapter in stipulated limits sometimes the readers are referred to the original paper. Details about the data are also provided at appropriate locations. Interested readers may further enquire the authors or download the data whenever possible from the web sites to perform the similar exercise. The applications were from three particular areas of water flows namely Ocean Engineering, Hydrology and Hydraulics. It was shown in all the applications for that modeling of natural random processes of complex underlying phenomenon the Genetic Programming can certainly be employed. The results of this technique were found to be superior than other contemporary soft computing techniques. However it was also seen that the tool is not explored to its full capacity by the research community in any of the above fields. The developed GP models also need to be applied at operational level. For this a partnership between the researchers and practitioners is necessary. The GP models can certainly work as supplementary tool if not as replacement techniques. It can be said that the early days of GP modeling are over and the tool needs to be used more judiciously for the problems worthy of its use. Otherwise a stage will be reached where in GP will be used because data is available. It's use is certainly for the phenomena which are difficult to explain and model. However if the technique is to stay here it needs to be explored further for more challenging problems like modeling of infiltration, high flood

events, hurricane path, storm surge, tsunami water levels to name a few.

correlation coefficients.

events is concerned.

**11. Concluding remarks and future scope** 

After examining the results it was observed that for the location of Rajghat in the month of July, ANN model exhibited a reasonable performance in testing with an 'r' value of 0.75 between the observed and forecasted discharges whereas GP model had showed a better 'r' value of 0.78 with better performance for higher values of stream flow, though overpredicted in some instances. The MT model gave a lower 'r' value of 0.7 and prediction of MT model for high stream flows was poor as compared to ANN and GP models. The scatter plot (Fig. 5) between the observed and forecasted discharges confirmed this with a balanced scatter except at the high values of measured stream flows.

**Figure 5.** Scatter plot for RajJuly Model

For the months of August and September, models showed similar performance with GP models performing better than their ANN and MT counterparts (r GP = 0.75,rANN = 0.7, r MT = 0.72 for Raj Aug and r GP = 0.79,rANN = 0.76, r MT = 0.78 for Raj Sept). For the October model, the predicted discharges in testing were highly in agreement with the observed values for both the models as shown by the discharge hydrograph (Fig. 6). The results were also supported by a high value of correlation coefficient (r = 0.92 for ANN and GP and r = 0.87 for MT) for all the three models in testing.

The Mandaleshwar models behaved in a similar fashion as that of the Rajghat models with correlation coefficients of r > 0.7 for all ANN, GP and MT models. For the month of August the performance of all models was reasonable with r values of 0.74, 0.78 and 0.71 for ANN, GP and MT models respectively. The other monthly models of ANN, GP and MT also performed well, with high correlation coefficients in testing (r > 0.86). It was again observed that GP models work better while predicting extreme events. The maximum observed discharge of 3790 m3/s was predicted as 1742 m3/s by the ANN model, 3342 m3/s by the GP model and 1718 m3/s by the MT model. Figure 7 shows discharge hydrographs for the ManNov-June models. The RMSE values also showed a similar trend to that of the correlation coefficients.

Thus it was seen that the GP technique outperforms both ANN and MT in almost all the cases in terms of overall accuracy in prediction. The GP approach based on evolutionary principles has a completely different approach to the ANN technique in that it does not involve any transfer function, and evolves generations of "offspring" based on the "fitness criteria" and genetic operations; this seems to capture the underlying trends better than the ANN technique. Thus it can be said that ANN and MT perform almost equally but GP performed better than both of them where prediction accuracy in both normal and extreme events is concerned.

#### **11. Concluding remarks and future scope**

222 Genetic Programming – New Approaches and Successful Applications

The GP models were developed with major fitness function of mean squared error, initial population size of (2048), mutation frequency of (95%) and the cross-over frequency of (53%) with same data division for both ANN and GP models so that their results could be compared. All the developed forecasting models were tested for unseen inputs and their qualitative and quantitative performance was judged by means of correlation coefficient (r) between the observed and forecasted values along with root mean square error (RMSE) and plotting scatter plots between the same. Hydrographs were also plotted to visualize the

After examining the results it was observed that for the location of Rajghat in the month of July, ANN model exhibited a reasonable performance in testing with an 'r' value of 0.75 between the observed and forecasted discharges whereas GP model had showed a better 'r' value of 0.78 with better performance for higher values of stream flow, though overpredicted in some instances. The MT model gave a lower 'r' value of 0.7 and prediction of MT model for high stream flows was poor as compared to ANN and GP models. The scatter plot (Fig. 5) between the observed and forecasted discharges confirmed this with a balanced

For the months of August and September, models showed similar performance with GP models performing better than their ANN and MT counterparts (r GP = 0.75,rANN = 0.7, r MT = 0.72 for Raj Aug and r GP = 0.79,rANN = 0.76, r MT = 0.78 for Raj Sept). For the October model, the predicted discharges in testing were highly in agreement with the observed values for both the models as shown by the discharge hydrograph (Fig. 6). The results were also supported by a high value of correlation coefficient (r = 0.92 for ANN and GP and r = 0.87

behavior of the forecasting models particularly for extreme events (peaks).

scatter except at the high values of measured stream flows.

**Figure 5.** Scatter plot for RajJuly Model

for MT) for all the three models in testing.

Applications of GP for modeling water flows were discussed in the preceding sections of this chapter. It may be noted that every attempt is made to provide readers the details of GP techniques and their parameters employed in each work. However in view of keeping the length of the chapter in stipulated limits sometimes the readers are referred to the original paper. Details about the data are also provided at appropriate locations. Interested readers may further enquire the authors or download the data whenever possible from the web sites to perform the similar exercise. The applications were from three particular areas of water flows namely Ocean Engineering, Hydrology and Hydraulics. It was shown in all the applications for that modeling of natural random processes of complex underlying phenomenon the Genetic Programming can certainly be employed. The results of this technique were found to be superior than other contemporary soft computing techniques. However it was also seen that the tool is not explored to its full capacity by the research community in any of the above fields. The developed GP models also need to be applied at operational level. For this a partnership between the researchers and practitioners is necessary. The GP models can certainly work as supplementary tool if not as replacement techniques. It can be said that the early days of GP modeling are over and the tool needs to be used more judiciously for the problems worthy of its use. Otherwise a stage will be reached where in GP will be used because data is available. It's use is certainly for the phenomena which are difficult to explain and model. However if the technique is to stay here it needs to be explored further for more challenging problems like modeling of infiltration, high flood events, hurricane path, storm surge, tsunami water levels to name a few.

Genetic Programming: A Novel Computing Approach in Modeling Water Flows 225

[2] The ASCE Task Committee, (2000) Artificial neural networks in hydrology. I: preliminary

[4] Babovic V, Keijzer M, (2000) Genetic programming as a model induction engine. Journal

[5] Koza J, (1992) Genetic Programming: On the Programming of Computers by Means of

[6] Londhe S, (2008) Soft computing approach for real-time estimation of missing wave

[7] Brameier M (2004) On linear genetic programming. Ph.D. thesis. University of

[8] Guven A, (2009) Linear genetic programming for time-series modelling of daily flow

[9] Jain P, Deo M, (2006) Neural networks in ocean engineering. Int. Journal of Ships and

[10] Maier H, Dandy G, (2000) Neural networks for prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ. Model.

[11] Dawson C, Wilby R, (2001) Hydrological modelling using artificial neural networks.

[12] Kambekar A, Deo M, (2012) Wave Prediction Using Genetic Programming And Model

[13] Londhe S, Charahate S, (2010) Comparison of data-driven modelling techniques for

[14] Kalra R, Deo M, (2007) Genetic Programming to retrieve missing information in wave

[15] Ustoorikar K, Deo M, (2008) Filling up Gaps in wave data with Genetic Programming.

[16] Charhate S, Deo M, Sanil Kumar V, (2007) Soft and Hard Computing Approaches for Real Time Prediction of Coastal Currents in a Tide Dominated Area. Journal of Engineering for the Maritime Environment. Proceedings of the Institution of

[17] Gaur S, Deo M, (2008) Real time wave forecasting using genetic programming Ocean

[18] Jain P, Deo M, (2008) Artificial intelligence tools to forecast ocean waves in real time.

[19] Kambekar A, Deo M, (2010) Wave simulation and forecasting using wind time history

[20] Ghorbani M, Khatibi R, Aytek A, Makarynskyy O, Shiri J, (2010a) Sea water level forecasting using genetic programming and comparing the performance with Artificial

[21] Ghorbani M, Makarynskyy O, Shiri J, Makarynska D, (2010b) Genetic Programming for Sea Level Predictions in an Island Environment. International Journal of Ocean and

and data driven Methods. Ships and Offshore Structures. 5(3). 253-266

Neural Networks. Computers and Geosciences. 36. 620-627

records along the west coast of India. Applied Ocean Research. 29. 99-111

concepts. J. Hydrol. Engg. ASCE 5(2). 115–123.

Natural Selection. A Bradford Book. MIT Press.

heights. Ocean Engineering. 35. 1080-1089

rate. J. Earth Syst. Sci. 118(2). 137-146

Offshore Structures. 1. 25–35.

Progr. Phys. Geogr. 25(1). 80–108.

Marine Structures. 21. 177-195

Engineering. 35. 1166-1175

Climatic systems. 1(1). pp. 27-35,

Trees. Journal of Coastal Research. Doi: 28(1). 43-50

Mechanical Engineers, London, M4, 221:147-163

The Open Ocean Engineering Journal. 1. 13-21.

river flow Forecasting. Hydrological Sciences. 55(7). 1163-1173

Dortmund.

Soft. 15. 101–124.

of Hydroinformatics. 2(1) pp. 35 – 61

[3] Maynard S, (1975) The Theory Of Evolution. Penguin. London.

**Figure 6.** RajOct Model results [13]

**Figure 7.** ManNovJune Model results [13]
