**9. Case study I: Soft computing approach for real-time estimation of missing wave heights**

The work dealt with application of GP to retrieve the missing/ lost wave data at a particular location using the wave heights at other locations in the region. Six regional networks (with buoys 42001, 42003, 42007, 42036, 42039,42040) were developed in the Gulf Of Mexico (Figure 2) around USA coastline to estimate the wave heights at a location using wave heights at other five locations in the network. The required data from these six buoys was measured by National Data Buoy Center (NDBC, http://www.ndbc.noaa.gov) of National Oceanic and Atmospheric administration of USA (NOAA, http://www.noaa.gov ). The common wave data at all the above six locations for the years 2002-2004 was used in the present work. The networks were developed by having one station as target location at a time and remaining five locations as inputs turn by turn. Approximately 70% of the total values were used to calibrate the model and the remaining was kept unseen for testing. While doing this a particular event which occurred during Hurricane Ivan in 2004 at buoy 42040 which involved a Significant Wave Height of 15.96 m was focused for studying the performance of developed models during extreme events. It is to be noted that the exercise was of estimation and not of forecasting for which both the tools did not performed well as noted in the section on applications of GP in Ocean Engineering.

218 Genetic Programming – New Approaches and Successful Applications

the original paper. To save space the list is not provided here.

referred from the original paper.

**missing wave heights** 

An alternative approach of GP was proposed in the estimation of relative scour depth using field data by [38]. The comparison between the GP model with ANN found that the GP model has good ability of forecasting the scour depth. The discharge intensity and height of fall were used as inputs to estimate scour depth below tail water. The predictive ability of this approach is however clouded by use of very small number of data (total 91 data sets) used for calibration and testing of the model. The values of initial model parameters can be

**9. Case study I: Soft computing approach for real-time estimation of** 

The work dealt with application of GP to retrieve the missing/ lost wave data at a particular location using the wave heights at other locations in the region. Six regional networks (with

models were developed with dimensionless variables and separate for main channel and floodplain. Both the velocity prediction on flood plain and main channels showed good correlations with measured values. However the resulting expressions were complex. A dimensionally aware GP was then used to predict the velocity separately in main channel and flood plains. The performance of the symbolic expressions induced by the dimensionless GP for the floodplain and main channel was marginally better than those for the dimensionally aware GP. However, the expressions were more complex and not particularly useful for knowledge induction. The dimensionally aware GP was shown to hold more scientific information, as units of measurement were included, although it was also shown to be open ended in that it does not strictly adhere to the dimensional analysis framework, thereby allowing improved goodness-offit whilst yielding on goodness-of-dimension. The paper provides no information about the initial values of GP parameters used in evolving the GP model. GP was applied to the determination of the Chezy's roughness coefficient for corrugated channels in wakeinterference flow, i.e. hyper-turbulent flow by [37]. The GP models were calibrated using the experimental data devised by carrying out experiments for 3 plastic corrugated pipes with variations of discharge and slope. GP quite easily and quickly supplied at least two good formulae that fit the experimental data better and are more parsimonious than the monomial formula (mathematical). Moreover, GP has supplied six parsimonious expressions (one or two constants compared to four for the monomial formula) for the Chezy's resistance coefficient, all confirming the dependencies on hydraulic radius, slope and roughness index. It can be said that the two new formulae for the Chezys resistance coefficient, derived from these GP formulae by means of 'mathematical/physical post-refinement', are suitable for explaining the effect of the macro-roughness elements, with respect to the behavior of the rough commercial channels and their traditional expressions for resistance coefficients. The work indicated that this approach, which combines data-mining techniques together with a theoretical understanding, provides very good results. It was also commented that strictly speaking, GP is a data-driven technique, but prior knowledge during the setting up of the evolutionary search and final physical postrefinement of the hypothesis should make it very close to a white box technique, especially when GP is used in scientific discovery problems. The initial model parameters can be found in

Thus a network was developed with wave buoy 42040 as the target and buoys 42001, 42003, 42007, 42036, 42039 as inputs. Along with 42040 the other locations namely 42003, 42007, 42039 also experienced largest ever wave heights of 11.04, 9.09, 12.05 making the entire event a truly extra ordinary event having a return period of over 5000 years [39]. The initial parameters selected for a GP run were as follows: initial population size 500, mutation frequency 95%, and crossover frequency 50%. The fitness criterion was the mean squared error.

Additionally a three layer Feed Forward Neural Network was also developed for the same buoy network. The results were also compared with a large-scale continuous wave modeling /forecasting systems (NOAA's WAVEWATCH III model) which follows the approach of physics-based model. Though WAVEWATCH III is a continuous running forecasting model it was the only source of information for wave environment at a location and therefore in absence of any reliable observed data, these results were used for comparison. The GP model estimated a wave height of 13.67m as against 15.96 m as compared to 9.05m that of ANN model and 7.82m of WAVEWATCH III, which was an excellent result as far as GP approach is considered. Figure 3 shows the wave plot at 42040 in testing.

From results of all the models developed by both the approaches (ANN & GP), it was observed that all models performed reasonably well in testing as evident by wave height plots, scatter plots along with the correlation coefficient ranging from 0.85 to 0.98, MAE from 0.13 to 0.28, RMSE from 0.20 to 0.45 m and coefficient of efficiency from 0.67 and 0.96. When it was tried to remove 42001 from the network as it is away from the prevailing wind direction by training a separate GP model with 42003, 42007, 42036, and 42039 as 'input buoys' and 42040 as 'target buoy', though the value of correlation coefficient was increased, the peak prediction was not in a fair range of accuracy for extreme event of Hurricane Ivan. Due to better performance of the network with inclusion of buoy 42001 especially for extreme event, buoy 42001 was retained in the network. Also it was found that 42039 was a potential candidate for redeployment in any other suitable position outside the network as

Genetic Programming: A Novel Computing Approach in Modeling Water Flows 221

**0 5 10 15 Observed SWH (m)**

**Figure 4.** a. Scatter plot for buoy 42039 (GP approach); b. Scatter plot for buoy 42039 (ANN approach)

**0**

**A N N estim ated S W H (m )**

**5**

**10**

**15**

**network Input buoys Target buoy rANN rGP** BN1 42003, 42007, 42036, 42039, 42040 42001 0.85 0.88 BN2 42001, 42007, 42036, 42039, 42040 42003 0.87 0.91 BN3 42001, 42003, 42036, 42039, 42040 42007 0.90 0.92 BN4 42001, 42003, 42007, 42039, 42040 42036 0.92 0.94 BN5 42001, 42003, 42007, 42036, 42040 42039 0.98 0.98 BN6 42001, 42003, 42007, 42036, 42039 42040 0.94 0.97

**10. Case study II: Comparison of data-driven modelling techniques for** 

In the case study GP was used for prediction of average daily flow values one day in advance at two locations, Rajghat and Mandaleshwar, in the Narmada basin, India using the previous values of measured streamflows at these two locations. The observations of daily average stream flow values at both these stations for the years 1987–1997 were obtained from the Central Water Commission, Narmada Division, Bhopal, India. Considering the variations in daily stream flow values four separate models for the monsoon months of July, August, September and October were prepared along with the one separate but common model for the non monsoon months of November–June. Thus five models were developed in all for each station (total 10 models) to predict discharge at one day in advance. In a view of fair judgment along with GP, ANN and Model trees approach was also employed to develop the models. The number of antecedent discharge values which were used for predicting discharge one day in advance was decided by carrying out the auto-correlation

(Ref: [6])

**0**

**G P estim ated S W H (m )**

**5**

**10**

**15**

analysis.

**Table 3.** Results of buoy networks [6]

**0 5 10 15 Observed SWH (m)**

**river flow forecasting** 

**Figure 2.** Study area and Buoy Locations (Ref: [6])

**Figure 3.** Wave height comparison at 42040 during Hurricane Ivan (Ref: [6])

the buoy network developed for 42039 , provided the wave heights using wave heights at other five locations in the network with the best accuracy achieved between all the networks (r = 0.98). Figure 4(a, b) shows the scatter plots for results of buoy 42039. Table 3 shows results reproduced from [6] giving the details of developed networks along with correlation coefficient between the model estimated and observed values for both GP and ANN models. In general it was shown that GP was superior to other soft tool of ANN and numerical model WAVEWATCH in retrieving the missing wave heights including the extreme events and in redeployment of buoy at other location outside the network.

**Figure 4.** a. Scatter plot for buoy 42039 (GP approach); b. Scatter plot for buoy 42039 (ANN approach) (Ref: [6])


**Table 3.** Results of buoy networks [6]

**Figure 2.** Study area and Buoy Locations (Ref: [6])

**SWH (m)**

**Figure 3.** Wave height comparison at 42040 during Hurricane Ivan (Ref: [6])

**observed GP ANN wavewatch**

and in redeployment of buoy at other location outside the network.

the buoy network developed for 42039 , provided the wave heights using wave heights at other five locations in the network with the best accuracy achieved between all the networks (r = 0.98). Figure 4(a, b) shows the scatter plots for results of buoy 42039. Table 3 shows results reproduced from [6] giving the details of developed networks along with correlation coefficient between the model estimated and observed values for both GP and ANN models. In general it was shown that GP was superior to other soft tool of ANN and numerical model WAVEWATCH in retrieving the missing wave heights including the extreme events

**1 10 19 28 37 46 55 64 73 82 91 100 Time (hr)**

**16/9/2004 21hr**
