**4. Data synthesis and architecture identification of models**

In this work, the diffuse pollution concentration in stream is considered due to herbicide atrazine application in corn fields of the watershed. Concentration measurements data were obtained from the National Water Quality Assessment (NAWQA) program of the U S Geological Survey (USGS) (http://water.usgs.gov/nawqa/naqamap.html) for the period 1992 to 2002. The stream considered is White River, and monitoring site for the atrazine concentration measurement, is Hazeltone (Crawford, C.G, 1995), the outlet site of the watershed of White River Basin in Indiana State. At Hazeltone site, Latitude is 38°29'23", and Longitude is 87°33'00" and Drainage area 11,305.00 square miles. The White River basin is a part of the Mississippi River system where the application of atrazine accounts for 24 percent of all agricultural herbicides. The major agricultural chemical characteristics, AC, which contribute to the atrazine concentration at the watershed outlet are identified as its application rate (lb/Acre) and application time. The major land use patterns, AL, is the extent of cropped area (percentage of cultivated area (Pareira, 1990; Crawfard, 2001; and Capel and Larson, 2001).

Time series of data (average monthly values) from 1992-2001 are utilized for model building and validation. The major agricultural chemical characteristics, AC, which contribute to the atrazine concentration at the watershed outlet are identified as its application rate (lb/acre) and application time. The major land use pattern, AL, is the extent of cropped area (percentage of cultivated area (Crawford, 2001, 1995).These data are utilized for identification of fuzzy and ANN based models architectures by applications of the methodologies discussed in previous sections. The performance evaluations criteria are utilized to judge the predictive capability of the best performing fuzzy and ANN models. The procedure of developing fuzzy logic rule based model is implemented using the data of atrazine application rate as first input, atrazine application season as second input, and the percentage area applied with atrazine as third input. The atrazine concentration measurement values observed at the monitoring site is the output for the fuzzy rule based model. The weighted average of herbicide application rates and percentage of area applied of the corn and soybean cropped area are given in Table 1. The seven years data (1992-1998) are utilized for training and the three years data (1999-2001) (Table 1) are utilized for testing models.


Table 1. Agricultural Herbicide Atrazine Application Rate and Percentage Area Applied for the Corn Crop.

In this work, the diffuse pollution concentration in stream is considered due to herbicide atrazine application in corn fields of the watershed. Concentration measurements data were obtained from the National Water Quality Assessment (NAWQA) program of the U S Geological Survey (USGS) (http://water.usgs.gov/nawqa/naqamap.html) for the period 1992 to 2002. The stream considered is White River, and monitoring site for the atrazine concentration measurement, is Hazeltone (Crawford, C.G, 1995), the outlet site of the watershed of White River Basin in Indiana State. At Hazeltone site, Latitude is 38°29'23", and Longitude is 87°33'00" and Drainage area 11,305.00 square miles. The White River basin is a part of the Mississippi River system where the application of atrazine accounts for 24 percent of all agricultural herbicides. The major agricultural chemical characteristics, AC, which contribute to the atrazine concentration at the watershed outlet are identified as its application rate (lb/Acre) and application time. The major land use patterns, AL, is the extent of cropped area (percentage of cultivated area (Pareira, 1990; Crawfard, 2001; and

Time series of data (average monthly values) from 1992-2001 are utilized for model building and validation. The major agricultural chemical characteristics, AC, which contribute to the atrazine concentration at the watershed outlet are identified as its application rate (lb/acre) and application time. The major land use pattern, AL, is the extent of cropped area (percentage of cultivated area (Crawford, 2001, 1995).These data are utilized for identification of fuzzy and ANN based models architectures by applications of the methodologies discussed in previous sections. The performance evaluations criteria are utilized to judge the predictive capability of the best performing fuzzy and ANN models. The procedure of developing fuzzy logic rule based model is implemented using the data of atrazine application rate as first input, atrazine application season as second input, and the percentage area applied with atrazine as third input. The atrazine concentration measurement values observed at the monitoring site is the output for the fuzzy rule based model. The weighted average of herbicide application rates and percentage of area applied of the corn and soybean cropped area are given in Table 1. The seven years data (1992-1998) are utilized for training and the three years data (1999-2001)

**4. Data synthesis and architecture identification of models** 

Capel and Larson, 2001).

(Table 1) are utilized for testing models.

the Corn Crop.

Year Weighted Percentage

Area

1992 79 1.35 1993 91 1.31 1994 87 1.35 1995 87 1.31 1996 91 1.31 1997 84 1.33 1998 89 1.36 1999 91 1.26 2000 80 1.41 2001 94 1.35

Table 1. Agricultural Herbicide Atrazine Application Rate and Percentage Area Applied for

Application Rate (lb/Acre)

#### **4.1 Evaluation of fuzzy c-means centers**

The FCM model represented by equation (5) is used to partition the input data into fuzzy partitions. The FCM algorithm is implemented using MATLAB version 6.5 for ε equal to 10-5 to obtain the pre-specified fuzzy centers. The 3, 4, and 5 fuzzy centers for the inputs application rate and weighted percentage area obtained using the FCM model is shown in Table 2. Instead of iterating for the optimal number of fuzzy centers, a prior knowledge about the fuzzy partitioning for the fuzzy rule based models were utilized in implementing fuzzy c-means algorithm.


Table 2. Different Fuzzy Partition Centers Using FCM Model

#### **4.2 Training and testing the fuzzy rule based model with FCM**

The seven years data (1992-1998) are utilized for training and the three years data (1999- 2001) are utilized for testing the fuzzy rule based model with FCM. The model is assumed to be performing satisfactory when model efficiency coefficient (MENash) as given by equation (12) is greater than 90 percent, and other performance indices are also improved. Although arbitrary, it may be used as stopping criteria to limit the processing of large number of rules with increase in linguistic fuzzy variables for the inputs.

Performance of fuzzification of inputs application rate and weighted percentage area were studied by assigning 3, 5, and 7 fuzzy variables without using FCM (Singh, 2008). Though performance of fuzzifiction with 7 variables worked better than fuzzification with 3 and 5 variables; fuzzification by 5 fuzzy variables are comparable to fuzzification with 7 variables as shown in Table 3. Fuzzy rule based models with 3, 5 and 7 fuzzy variables are represented by Fuzzy\_3M, Fuzzy\_5M, and Fuzzy\_7M models respectively in the Table 3. As 3 partitions are not adequate, four fuzzy partitions were specified for the use of fuzzy rule based system with FCM model. The four centers as shown in Table 2, obtained using FCM are partitioned into four linguistic fuzzy variables as low, medium, high, and very high. A

Prediction of Herbicides Concentration in Streams 239

The performance of the FCM based fuzzy rule based model is evaluated based on performance indices as described in performance evaluation criteria. These include root mean square error (RMSE), correlation coefficient (R) between the actual and estimated monthly average concentration measurement values of atrazine herbicides, standard error of estimate (SEE) and MENash. The performance evaluation results of the fuzzy rule based model with four fuzzy variables obtained using FCM, represented as Fuzzy\_4\_FCM, is also compared with that of the fuzzy rule based models with 3, 5, 7 linguistic variables for both of the input 1 and input 3. The performance of the Fuzzy\_4\_FCM model is also compared with solution results of an artificial neural network (ANN) based model using back

propagation algorithm (Rumelhart et al. 1986) as represented by ANN\_M in Table 3.

**Training Error (1992-198) Testing Error (1999-2001)** 

**MENas h** 

**<sup>E</sup>R SSE MENash RMSE R SSE** 

**Fuzzy\_3M** 1.318 0.891 1.377 0.550 0.703 0.886 0.771 0.623

**Fuzzy\_5M** 0.836 0.969 0.837 0.894 0.455 0.952 0.498 0.855

**Fuzzy\_7M** 0.706 0.970 0.775 0.915 0.342 0.975 0.375 0.914

**ANN\_M** 1.153 0.918 1.264 0.752 0.906 0.759 0.993 0.446

**<sup>M</sup>**0.492 0.998 0.539 0.967 0.725 0.968 0.416 0.901

It can be noted from the Table 3 that the error statistics are better for Fuzzy\_4M\_FCM model than those of Fuzzy\_3M, Fuzzy\_5M and ANN\_M model in both the training and testing in prediction in atrazine concentration measurement values. Its performance is even better than Fuzzy\_7M model in training. Model efficiency (MENash) in training is 94.3 percent whereas it is 91.5 percent for Fuzzy\_7M model. Similarly, RMSE, R, and SSE values are also comparable. In testing, results are also comparable though error statistics for Fuzzy\_7M model is slightly better than Fuzzy\_4\_FCM. Thus, the FCM optimized fuzzy membership functions partitions in Fuzzy\_4\_FCM model are performing comparable to almost double the fuzzy partitions without FCM in Fuzzy\_7M model. Figure 2 shows better RMSE value

It can also be noted from Table 3 that performances of fuzzy rule based model is better than those obtained using an ANN model with 2 inputs (atrazine application rate and weighted percentage area), 12 outputs (average monthly concentration measurements), and 11 hidden nodes (selected on the basis of experimentation) represented by ANN\_M model. The poor performance by ANN\_M model may be due to inadequate training patterns for experimentation, as the total number of free parameters become more than the number of

Table 3. Comparison of training and testing errors for different models.

by Fuzzy\_4\_FCM model in comparison to other models.

training patterns even for 1 hidden node in hidden layer.

**5. Concentration measurement estimation results** 

Models

**Fuzzy\_4M\_FC**

**RMS**

sample schematic representation of membership function is shown for the input atrazine application rate in Figure 1.

Fig. 1. A sample representation of linguistic variables membership function for first input.

The input application season is assigned 12 fuzzy variables, S1-S12 corresponding to each month of a year. The output concentration measurement values of atrazine is represented by 25 fuzzy centers by FCM model and represented by fuzzy variables, C1-C25, so that all the ranges of atrazine concentration measurement values in the data set for the period 1992- 2001, is adequately represented. All the fuzzy variables in inputs and outputs are represented by triangular shape, except at the domain edges, where they are semi trapezoidal. This representation has been selected based on literature due to their computational efficiency (Khrisnapuram R 1998; Guillaume and Charnomordic, 2004). A sample representation of the membership functions is shown in Figure 1 for the first input. Of course, other divisions of the inputs and output domain regions and other shapes of membership functions are possible. The total number of rules in case of 4 linguistic variables for inputs application rate and weighted percentage area, and 12 fuzzy variables for seasons are 192. The total number of rules was much high i.e. 588 when 7 fuzzy variables were used for inputs application rate and weighted percentage area. The model building process is completed by creating combined fuzzy rule base using inputs-output pair values of training set data. Finally, the defuzzification converts fuzzy output produced by the fuzzy rule base model as crisp output corresponding to any new inputs.

sample schematic representation of membership function is shown for the input atrazine

Fig. 1. A sample representation of linguistic variables membership function for first input. The input application season is assigned 12 fuzzy variables, S1-S12 corresponding to each month of a year. The output concentration measurement values of atrazine is represented by 25 fuzzy centers by FCM model and represented by fuzzy variables, C1-C25, so that all the ranges of atrazine concentration measurement values in the data set for the period 1992- 2001, is adequately represented. All the fuzzy variables in inputs and outputs are represented by triangular shape, except at the domain edges, where they are semi trapezoidal. This representation has been selected based on literature due to their computational efficiency (Khrisnapuram R 1998; Guillaume and Charnomordic, 2004). A sample representation of the membership functions is shown in Figure 1 for the first input. Of course, other divisions of the inputs and output domain regions and other shapes of membership functions are possible. The total number of rules in case of 4 linguistic variables for inputs application rate and weighted percentage area, and 12 fuzzy variables for seasons are 192. The total number of rules was much high i.e. 588 when 7 fuzzy variables were used for inputs application rate and weighted percentage area. The model building process is completed by creating combined fuzzy rule base using inputs-output pair values of training set data. Finally, the defuzzification converts fuzzy output produced by the fuzzy rule base

model as crisp output corresponding to any new inputs.

application rate in Figure 1.
