**1. Introduction**

226 Genetic Programming – New Approaches and Successful Applications

of Civil Engineers. Issue MAO. 1-7.

Association. 37(2). 439-450

563-578

Nordic Hydrology. 33(5). 331-346

Hydroinformatics. 12.4. 365-379

Management. 1(2). 117-123

20(4), 477-484

Water Resource Management. 25. 2901-2916

Techonologies - HIT) June 1999. D2K-0699-1.

Mathematical and Computer Modelling. 33. 707–721

wave measurements. Applied Ocean Research. 30. 120-129

prediction of circular pile scour. Ocean Engineering. 36. 985-991

runoff modeling. Water Resources Management. 13. 219-231

by inverse Modeling. Natural Hazards. 49(2). 293-310

[22] Charhate S, Deo M, Londhe S, (2008) Inverse modeling to derive wind parameters from

[23] Charhate S, Deo M, Londhe S, (2009) Genetic programming for real time prediction of offshore wind. International Journal of Ships and Offshore Structures. 4(1). 77-88. [24] Daga, M, Deo M, (2009) Alternative data-driven methods to estimate wind from waves

[25] Singh A, Deo M, Sanil Kumar V, (2007) Combined Neural network – genetic programming for sediment transport. Journal of Maritime Engineering, The Institution

[26] Guven A, Azmathulla Md, Zakaria N, (2009) Linear genetic programming for

[27] Savic D, Walters G, Davidson J, (1999) A genetic Programming approach to rainfall-

[28] Drecourt J, (1999) Application of Neural Networks and Genetic Programming to Rainfall Runoff Modeling. Danish Hydraulic Institute (Hydro-Informatics

[29] Whigham P, Crapper, P, (2001) Modeling rainfall runoff using Genetic Programming.

[30] Khu S, Liong S, Babovic V, Madsen H, Muttil N, (2001) Genetic Programming And Its Application In Real-Time Runoff Forecasting. J of American Water Resources

[31] Babovic V, Keijzer M, (2002) Rainfall runoff modeling Based on Genetic programming.

[32] Sivapragasam C, Maheswaran R, Venkatesh V, (2007) Genetic programming approach

[33] Parasuraman K, Elshorbagy A, Carey K, (2007) Modelling the dynamics of the evapotranspiration process using genetic Programming. Hydrological Sciences. 52(3).

[34] El Baroudy I, Elshorbagy A, Carey S, Giustolisi O, Savic D, (2010) Comparison of three data-driven techniques in modeling the evapotranspiration process. Journal of

[35] Azmathullah MD, Ghani A, Leow C, Chang C, Zakaria N, (2011) Gene-Expression Programming for the Development of a Stage-Discharge Curve of the Pahang River.

[36] Harris E, Babovic V, Falconey R, (2003) Velocity Predictions in Compound Channels with Vegetated Floodplains using Genetic Programming. Int. J. River Basin

[37] Giustolisi O, (2004) Using genetic programming to determine Chezy's resistance

[38] Azmathullah MD, Ghani A, Zakaria N, Lai S, Chang C, Leow C, (2008) "Genetic Programming to Predict Ski-Jump bucket Spillway Scour. Journal of Hydrodynamics.

[39] Panchang V, Li D, (2006), Large waves in the Gulf of Mexico Caused by Hurricane Ivan. Bulletin of the American Meteorological Society. DOI: 10.1175/BAMS-87-4-481, 481-489.

coefficient in corrugated channels. Journal of Hydroinformatics. 6.3. 157-173

for flood routing in natural channels. Hydrological processes. 22. 623-628

With the advent of computers a wide range of mathematical and numerical models have been developed with the intent of predicting or approximating parts of hydrologic cycle. Prior to the advent of conceptual or process based models, physical hydraulic models, which are reduced scale representations of large hydraulic systems, were used commonly in water resources engineering. Fast development in the computational systems and numerical solutions of complex differential equations enabled development of conceptual models to represent physical systems in almost all arenas of life including hydrological and water resources systems. Thus, in the last two decades large number of mathematical models was developed to represent different processes in the hydrological cycle. Hydrological models can be broadly classified in to three.


Physical models are reduced scale representations of the actual hydrological system and the responses obtained from these models are up-scaled to estimate the responses of the real system. Conceptual models are based on different individual processes or components of a hydrological process. For example, in modelling the watershed response to a storm event a conceptual model make use of different equations to compute different components like subsurface flow, evapo-transpiration, channel flow, groundwater flow, surface run off etc. The third type of modelling involves using mathematical and statistical techniques to fit a model to a data set which then relates the dependent variable to the independent variables. This type of modelling includes regression models, response matrix, transfer functions, neural networks, support vector machine etc. The most widely used "black box" type modelling approach in hydrology and water resources literature is neural networks. Genetic

© 2012 Sreekanth and Datta, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Sreekanth and Datta, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

programming is a potential tool to develop simple and efficient functional relationship between hydrological variables. In spite of the wide range of possible applications in hydrology and water resources, GP has not been widely reported in the hydrology and water resources literature. The focus of this chapter is to discuss the potential applicability of genetic programming to develop simple and computationally efficient hydrological models, in light of a few studies reported in the recent years. The key points discussed are as follows;

Genetic Programming: Efficient Modeling Tool in Hydrology and Groundwater Management 229

**Figure 1.** Symbolic representation of parent and offspring genetic programs

6.

In figure 1, two parent programs to model a physical phenomenon are shown. After testing these programs for their modelling performance, they are operated by cross-over operator. That is, parts of the programs are crossed over at the dashed locations to generate the offspring programs. Also, mutation is illustrated by arbitrarily changing the parameter 2 to

In the last decade a few studies in the broad area of hydrology have utilized genetic programming based models for making hydrological predictions. The utility of GP in developing rainfall-runoff models, which are highly non-linear models was addressed in [2] They combined the use of GP based models with other conceptual models in deriving useful hydro-climatic models. It was concluded that GP was able to develop more robust models in that the functional relationships between different model inputs could be easily identified thus resulting in more transparency of the "black box" type of modelling. Another study [3] applied genetic programming and artificial neural networks in hydrology to model the effect of rain on the runoff flow in an urban basin. This study also illustrated the possibility of including the physical basis of the problem in the GP based model. Another research in this direction [4] compared three different artificial intelligence techniques viz, neural networks, adaptive neuro-fuzzy inference system (ANFIS), and genetic programming for discharge routing of a river in Turkey. The study revealed that GP displayed a better edge over the other two modelling approaches in all the statistics compared like the mean absolute error (MAE), mean squared relative error (MSRE) and correlation coefficient. Kisi et al (2010) [5] developed a wavelet gene expression programming (WGEP) for forecasting daily precipitation and compared it with wavelet neuro-fuzzy models (WNF). The results


### **1.1. Genetic programming as a modelling tool**

Genetic programming belongs to and is one of the latest members in the family of evolutionary computation. Evolutionary computation refers to the group of computational techniques which are inspired by and emulate the natural process of evolution which resulted in the formation of the entire variety of organisms present on earth. Just as the way evolution and natural selection has resulted in the formation of organisms that are competent and best suitable inhabitants to live in any natural environment, the principle has been applied in computational science to evolve solutions to complex engineering problems which are subject to random and chaotic environments similar to the circumstances in which natural evolution has occurred. Evolutionary computation forms the basic principle behind the evolutionary algorithms like genetic algorithm (GA), genetic programming (GP), Evolutionary programming, evolution strategy, differential evolution. Evolutionary algorithms, widely used in mathematical optimization, are in general based on the application of evolutionary principles like selection, cross-over and mutation to a "population" of candidate solutions over a number of generations to find the optimal solutions to an engineering problem. Genetic algorithm is, for example, a widely used optimization techniques using these principles as the basic "operators" of the algorithm. Genetic programming [1] is similar to genetic algorithm in this aspect that it uses these genetic operators selection, cross-over and mutation in its algorithms. However, the uniqueness of genetic programming is that it performs these operators over symbolic expression or formulae or programs rather than over numbers which represent the candidate solutions. Thus, in genetic programming the candidate solutions are symbolic expressions or formulae. In a modelling framework these symbolic expressions or formulae or programs are candidate models to simulate a physical phenomenon. The parse tree notations of two parent and offspring genetic programs are shown in figure 1. Thus the optimal formula that is evolved by genetic programming can be used as a best fit model for predicting the physical phenomenon under consideration.

**Figure 1.** Symbolic representation of parent and offspring genetic programs

"black box" nature of data intensive models.

**1.1. Genetic programming as a modelling tool** 

network architectures.

under consideration.

follows;

programming is a potential tool to develop simple and efficient functional relationship between hydrological variables. In spite of the wide range of possible applications in hydrology and water resources, GP has not been widely reported in the hydrology and water resources literature. The focus of this chapter is to discuss the potential applicability of genetic programming to develop simple and computationally efficient hydrological models, in light of a few studies reported in the recent years. The key points discussed are as

1. GP's ability to develop simple models with interpretability to overcome the curse of

2. Lesser number of parameters used in GP models as compared to parallel neural

Genetic programming belongs to and is one of the latest members in the family of evolutionary computation. Evolutionary computation refers to the group of computational techniques which are inspired by and emulate the natural process of evolution which resulted in the formation of the entire variety of organisms present on earth. Just as the way evolution and natural selection has resulted in the formation of organisms that are competent and best suitable inhabitants to live in any natural environment, the principle has been applied in computational science to evolve solutions to complex engineering problems which are subject to random and chaotic environments similar to the circumstances in which natural evolution has occurred. Evolutionary computation forms the basic principle behind the evolutionary algorithms like genetic algorithm (GA), genetic programming (GP), Evolutionary programming, evolution strategy, differential evolution. Evolutionary algorithms, widely used in mathematical optimization, are in general based on the application of evolutionary principles like selection, cross-over and mutation to a "population" of candidate solutions over a number of generations to find the optimal solutions to an engineering problem. Genetic algorithm is, for example, a widely used optimization techniques using these principles as the basic "operators" of the algorithm. Genetic programming [1] is similar to genetic algorithm in this aspect that it uses these genetic operators selection, cross-over and mutation in its algorithms. However, the uniqueness of genetic programming is that it performs these operators over symbolic expression or formulae or programs rather than over numbers which represent the candidate solutions. Thus, in genetic programming the candidate solutions are symbolic expressions or formulae. In a modelling framework these symbolic expressions or formulae or programs are candidate models to simulate a physical phenomenon. The parse tree notations of two parent and offspring genetic programs are shown in figure 1. Thus the optimal formula that is evolved by genetic programming can be used as a best fit model for predicting the physical phenomenon

3. GP's ability to parsimoniously identify the significance of the modelling inputs.

In figure 1, two parent programs to model a physical phenomenon are shown. After testing these programs for their modelling performance, they are operated by cross-over operator. That is, parts of the programs are crossed over at the dashed locations to generate the offspring programs. Also, mutation is illustrated by arbitrarily changing the parameter 2 to 6.

In the last decade a few studies in the broad area of hydrology have utilized genetic programming based models for making hydrological predictions. The utility of GP in developing rainfall-runoff models, which are highly non-linear models was addressed in [2] They combined the use of GP based models with other conceptual models in deriving useful hydro-climatic models. It was concluded that GP was able to develop more robust models in that the functional relationships between different model inputs could be easily identified thus resulting in more transparency of the "black box" type of modelling. Another study [3] applied genetic programming and artificial neural networks in hydrology to model the effect of rain on the runoff flow in an urban basin. This study also illustrated the possibility of including the physical basis of the problem in the GP based model. Another research in this direction [4] compared three different artificial intelligence techniques viz, neural networks, adaptive neuro-fuzzy inference system (ANFIS), and genetic programming for discharge routing of a river in Turkey. The study revealed that GP displayed a better edge over the other two modelling approaches in all the statistics compared like the mean absolute error (MAE), mean squared relative error (MSRE) and correlation coefficient. Kisi et al (2010) [5] developed a wavelet gene expression programming (WGEP) for forecasting daily precipitation and compared it with wavelet neuro-fuzzy models (WNF). The results

showed that WGEP models are effective in forecasting daily precipitation with better performance over WNF models. Selle [6] utilized genetic programming to systematically develop alternative model structures with different complexity levels for hydrological modelling with the objective of testing whether GP can be used to identify the dominant processes within the hydrological system. Models were developed for predicting the deep percolation responses under surface irrigated pastures to different soil types, water table depths and water ponding times during surface irrigation. The dominant process in the model prediction as determined from the models generated using genetic programming was found to be comparable to those determined using conceptual models. Thus it was concluded that Genetic programming can be used to evaluate the structure of hydrological models. A common aspect of GP based modelling that all these studies reported is the fact that the GP modelling resulted in fairly simpler models which could be easily interpreted for the physical significance of the input variables in making a prediction. Jyothiprakash and Magar (2012) [12] performed a comparative study of reservoir inflow models developed using ANN, ANFIS and linear GP for lumped and distributed data. The study reported superior performance of GP models over ANN and ANFIS models.

Genetic Programming: Efficient Modeling Tool in Hydrology and Groundwater Management 231

prediction could be readily identified from the model structure. When carefully implemented models can throw light into and identify the key physical processes contributing to the phenomenon predicted and hence the development of the model. This is an important feature lacking from many of the data mining based prediction models resulting from which these modelling approaches are often earmarked as "black-box" models. "Black-box" nature of the prediction models often result in the limited use of such

The authors had conducted a study [7] to evaluate the complexity of predictive models developed using Genetic programming in comparison with models developed using neural networks. The models based on GP and neural network were developed as potential surrogate models to a complex numerical groundwater flow and transport model. The saltwater intrusion levels at monitoring locations resulting due to the excitation of the aquifer by pumping from a number of groundwater pumping wells were modelled by using GP and neural networks. The pumping rates at these groundwater well locations for three different stress periods were the inputs or independent variables for the model. The resulting salinity levels at the monitoring locations were the dependent

The GP and ANN based surrogate models were trained based on the training and validation data generated using a three dimensional coupled flow and transport simulation model FEMWATER. The GP models were developed using a software Discipulus, which uses a linear genetic programming algorithm. The ANN surrogate models were developed using a feed forward back propagation algorithm implemented in the software neuroshell. The input data considered were the pumping rates at eleven well locations over three different time periods, constituting 33 input variables. Since pumping at each location can take any real value between the prescribed minimum and maximum these input variables constitute a 33 dimensional continuous space, each dimension representative of a pumping rate at a particular location in a particular stress period. Hence efficient training of the GP and ANN models required carefully chosen input data which is representative of the entire input space. Latin hypercube sampling was performed to choose uniformly distributed input samples from the 33 dimensional input space. An input sample is a vector of 33 values of pumping rate at 11 well locations during three stress periods. The salinity level at each observation location is the dependent variable or output. The values of the outputs required for training the GP and ANN models were generated by running the FEMWATER model. The numerical simulation model was run numerous times to generate the output data set corresponding to each input vector. The input-output data set generated following this procedure was divided into two sets with three quarters of the data in one set and the rest in the other. The larger set was used for training GP and ANN models and the smaller one was used for validating the models. The members of the training and validation sets for both GP

**2.1. Model complexity of GP and neural networks – Comparative study** 

models for practical predictive applications.

variables or outputs.

and ANN were chosen randomly.
