**3. Results and discussion**

Measured climatic data of the above variables, corresponding from January to June of 1998, were fed into both the symbolic regression genetic programming model and the multiple linear regression model in order to estimate water temperature. The models were then applied using data from January to June of 1999 in order to approach water temperature averages. Comparisons for the1998 and 1999 results were then made.

The genetic programming algorithm (equation 4) determined the next mathematical model which approaches the water temperature (average of each ten minutes).

$$\begin{aligned} T\_w &= (T\_a + \cos(\cos((T\_a + \cos T\_a) \ast 0.6904149)) + \\ &+ \cos(\cos(1.17748531 \ast T\_a + \cosh\_r)) + 1.87808843) \ast 0.67508628 \end{aligned} \tag{4}$$

Using equation (4), the individual with the best performance reported an objective function value of 0.7922.

Meanwhile, the multiple linear regression model is expressed as follows:

$$\mathbf{T\_w} = 0.00022505\mathbf{r\_s} + 0.00036289\mathbf{r\_n} + 0.66464617\mathbf{I\_a} - 0.02807297\mathbf{V\_v} - 124438982\mathbf{h\_r} + 3.87792166 \tag{5}$$

Where:

*Tw* corresponds to the average water temperature each ten minute interval at instant *t+160* in ºC

*Ta* is the average air temperature each ten minute interval, with seven days filtering, corresponding to instant t+160, in ºC

*hr* represents the average relative humidity each ten minutes interval, with seven days filtering, corresponding to instant t+160 in decimals

*rs* is the average solar radiation each ten minutes interval, at instant t, in W/m2 *rn* corresponds to the average net radiation each ten minutes interval, corresponding to instant t+160, in W/m2

and finally,

248 Genetic Programming – New Approaches and Successful Applications

strong correlation or lack of correlation between *xi* and *y*.

made using the Microsoft Excel data analysis tool.

**3. Results and discussion** 

value of 0.7922.

corresponding to instant t+160, in ºC

filtering, corresponding to instant t+160 in decimals

Where:

ºC

Multiple linear regressions (MLR) relate a dependent variable, *y,* with two or more

Coefficients *a1,a2,a3,…an*, are weighting factors which allow one to see the relative importance of each variable *xi* as y is approached. Indirectly the coefficients can indicate if there is a

This method is often applied for several hydrology problems such as: forecasting equations for standardized runoff in a region of a country with standardized teleconnection indices, when El Niño or La Niña phenomenon occur [43] (González et al., 2000), or as an auxiliary method in estimating intensity-duration-frequency curves. In this research, regressions were

Measured climatic data of the above variables, corresponding from January to June of 1998, were fed into both the symbolic regression genetic programming model and the multiple linear regression model in order to estimate water temperature. The models were then applied using data from January to June of 1999 in order to approach water temperature

The genetic programming algorithm (equation 4) determined the next mathematical model

cos(cos(1.17748531\* cosh )) 1.87808843) \* 0.67508628

Using equation (4), the individual with the best performance reported an objective function

*Tw* corresponds to the average water temperature each ten minute interval at instant *t+160* in

*Ta* is the average air temperature each ten minute interval, with seven days filtering,

*hr* represents the average relative humidity each ten minutes interval, with seven days

w sn a v r T 0.00022505r 0.00036289r 0.66464617T 0.02807297V 1.24438982h 3.87792166 (5)

(4)

*a r*

averages. Comparisons for the1998 and 1999 results were then made.

which approaches the water temperature (average of each ten minutes).

Meanwhile, the multiple linear regression model is expressed as follows:

*wa a a*

*TT T T*

( cos(cos(( cos ) \* 0.6904149))

*T* 

11 22 33 *n n y ax ax ax ax* (3)

independents variables, *x1, x2, x3,…, xn,* by means of an equation expressed as:

**2.5. Multiple linear regressions** 

*vv* represents the average wind speed each ten minutes interval, corresponding to instant t+160, in m/s.

The objective function value using equation 5 was 0.8724.

Figure 4 represents both measured and calculated water temperature variation versus time using both equations (4) and (5). Measured and calculated water temperature values also appear in Figure 5 with equations (4) and (5) in comparison with the identity function.

Figure 4 indicate similar results for both genetic programming and multiple linear regression models in comparison with measured data.

In Figures 5 the measured data were compared against the identity function and the best correlation between these values was found using genetic programming (r=0.9697).

**Figure 4.** Time variation of measured and calculated water temperature data, Ribarroja Station. January to June, 1998

**Figure 5.** Comparison between genetic programming and multiple linear regression models against measured data and the identity function. Ribarroja Station. January to June, 1998

Water temperature approach with multiple linear regression and genetic programming algorithm from January to June 1999

Equations (4) and (5) were applied to measured data from 1999 at the Ribarroja Station in order to arrive at the average water temperature. Measured water temperature data and the obtained residuals using both models are shown in Figure 6.

According to Figure 6 the differences between measured and calculated water temperature shown were up to 5.5 °C (underestimation) and about 0.5 °C (overestimation) while differences with equation 5 reported an underestimation near to 4.5°C and the overestimation of almost 2°C, so the range of variation in water temperature reported by both equations is almost the same.

In order to get better results in future works must be analyzed the data standardization as a preprocessing to get new mathematical linear and nonlinear models [44], The variables could be standardized by subtracting the mean and dividing by the standard deviation:

$$Z = \frac{T\_w - \overline{T}\_w}{\sigma\_{T\_w}} \tag{6}$$

Comparison Between Equations Obtained by Means of Multiple


Residuals eq(4) mean =1.04, standard deviation=1.08 eq(5) mean = 0.43, standard deviation=1.09



0

1

2

3

**Residuals**

4

5

6

Linear Regression and Genetic Programming to Approach Measured Climatic Data in a River 251

*wT* mean of *Tw*, with the same units than *Tw* (the arithmetic average can be used) and

Another possibility to analyze is the splitting of the considered function by taking into account the different times of year that causes a variation in water temperature behavior.

**Figure 6.** Residuals and measured water temperature data for the year 1999 at the Ribarroja Station in

Measured Residuals MLR eq(5) Residuals GP eq(4)

0 50000 100000 150000 200000 250000 300000 **t (min)**

Water temperature adjustment curves, in a gauged station on the Ebro River in Spain, were obtained by means of two procedures: a genetic programming algorithm (equation 4) and a multiple linear regression (equation 5), using data from 1998. The multiple linear regression method yielded a function containing the five considered variables (solar radiation, net radiation, wind speed, air temperature and relative humidity) with each variable weighted. The genetic programming algorithm yielded a function where water temperature was obtained only as a function of air temperature and relative humidity. The others variables were eliminated by the evolution algorithm due to the lack of correlation between water temperature and the remaining variables although solar radiation is implied inside the air

standard deviation of *Tw*, with the same units than *Tw* 

*wT* 

Spain

**4. Conclusions** 

0.00

5.00

10.00

**Tw (ºC)**

15.00

20.00

25.00

temperature term.

where:

*Z* standardized variable, dimensionless

*Tw* variable before standardization, with physical dimensions

*wT* mean of *Tw*, with the same units than *Tw* (the arithmetic average can be used) and *wT* standard deviation of *Tw*, with the same units than *Tw* 

Another possibility to analyze is the splitting of the considered function by taking into account the different times of year that causes a variation in water temperature behavior.

**Figure 6.** Residuals and measured water temperature data for the year 1999 at the Ribarroja Station in Spain
