5. Expanding applications of ANNs

#### 5.1. Deducing general rules from an ANN-analysis approach

It is well documented that soil properties, especially those associated with soil drainage, can be describe in terms of DEM-generated topo-hydrologic variables. However, relationships between soil drainage and these variables are usually difficult to define with conventional statistical methods because of their intense non-linearity. ANNs provide a useful tool to address the non-linear mapping. However, ANNs are "black boxes" with little or no possibility to understand their internal behaviors in Ref. [43] and as a result, relationships between soil drainage class and independent variables are not transparent to users. Furthermore, ANNprediction accuracy is heavily dependent on the data used to calibrate the model. ANNs also potentially can over-fit the calibration data, which has the effect of decreasing prediction accuracy outside of the calibration data in Ref. [34]. These problems inherently limit the use of ANNs outside areas where the model was originally developed. ANNs could, however, be used to analyze relationships between soil drainage class and topo-hydrologic variables that were quantified by network-parameter.

Once the ANNs were trained and tested, they were used to generate the relationships (curves) between ANN-predicted soil drainage classes and topo-hydrologic variables (Table 4). Within ANNs with one topo-hydrologic variables, ANN-predicted soil drainage classes (dependent variable) were plotted against independent single variables, with coarse resolution soil drainage data (CSD) being set as constants. Within ANNs with two topo-hydrologic variables, ANN-predicted soil drainage classes were plotted as three dimension surfaces against the two variables, with CSD being set as constants.

The ANN-generated soil drainage-variable relationships (curves) were subsequently formulated as simple mathematical equations using non-linear regression method. Parameters of soil drainage equations were estimated with the Curve Fitting Tool of MATLAB. The used weighted least-squares regression that minimizes the error estimate was used to avoid biases in Ref. [39], included an additional scale factor (the weight factor; the cell count (%) of topohydrologic variables) based on Eq. (10):

$$S = \sum\_{i=1}^{n} w\_i \left( y\_i - \widehat{y}\_i \right)^2 \tag{10}$$

where wi are the weights, n is the number of data points included in the fit, S is summed square of residuals, yi is the observed response value, y \_ <sup>i</sup> is the fitted response.

Soil drainage equations with single topo-hydrologic variables are summarized in Table 4. Most of the soil drainage equation curves (fitting curves) compared well to the corresponding ANN-generated curves, it indicated that prediction performance of soil drainage equations agreed with ANNs in most cases. The maps predicted by the best soil drainage-single variable equation (soil drainage-VSP equation) had accuracies of 44%. Compared to the corresponding ANNs, reductions of accuracy were 2% for the equations.

5. Expanding applications of ANNs

66 Advanced Applications for Artificial Neural Networks

were quantified by network-parameter.

two variables, with CSD being set as constants.

hydrologic variables) based on Eq. (10):

of residuals, yi is the observed response value, y

ANNs, reductions of accuracy were 2% for the equations.

5.1. Deducing general rules from an ANN-analysis approach

It is well documented that soil properties, especially those associated with soil drainage, can be describe in terms of DEM-generated topo-hydrologic variables. However, relationships between soil drainage and these variables are usually difficult to define with conventional statistical methods because of their intense non-linearity. ANNs provide a useful tool to address the non-linear mapping. However, ANNs are "black boxes" with little or no possibility to understand their internal behaviors in Ref. [43] and as a result, relationships between soil drainage class and independent variables are not transparent to users. Furthermore, ANNprediction accuracy is heavily dependent on the data used to calibrate the model. ANNs also potentially can over-fit the calibration data, which has the effect of decreasing prediction accuracy outside of the calibration data in Ref. [34]. These problems inherently limit the use of ANNs outside areas where the model was originally developed. ANNs could, however, be used to analyze relationships between soil drainage class and topo-hydrologic variables that

Once the ANNs were trained and tested, they were used to generate the relationships (curves) between ANN-predicted soil drainage classes and topo-hydrologic variables (Table 4). Within ANNs with one topo-hydrologic variables, ANN-predicted soil drainage classes (dependent variable) were plotted against independent single variables, with coarse resolution soil drainage data (CSD) being set as constants. Within ANNs with two topo-hydrologic variables, ANN-predicted soil drainage classes were plotted as three dimension surfaces against the

The ANN-generated soil drainage-variable relationships (curves) were subsequently formulated as simple mathematical equations using non-linear regression method. Parameters of soil drainage equations were estimated with the Curve Fitting Tool of MATLAB. The used weighted least-squares regression that minimizes the error estimate was used to avoid biases in Ref. [39], included an additional scale factor (the weight factor; the cell count (%) of topo-

> wi yi � y \_ i � �<sup>2</sup>

where wi are the weights, n is the number of data points included in the fit, S is summed square

\_

Soil drainage equations with single topo-hydrologic variables are summarized in Table 4. Most of the soil drainage equation curves (fitting curves) compared well to the corresponding ANN-generated curves, it indicated that prediction performance of soil drainage equations agreed with ANNs in most cases. The maps predicted by the best soil drainage-single variable equation (soil drainage-VSP equation) had accuracies of 44%. Compared to the corresponding

<sup>i</sup> is the fitted response.

(10)

<sup>S</sup> <sup>¼</sup> <sup>X</sup><sup>n</sup> i¼1

\* ANN structure: input layer's nodes: (inputs) hidden layer's nodes (20) output layer's nodes (1). \*\*Digital soil drainage classes: rapidly drained (VR)-0, rapidly drained (R)-1, well drained (W)-2, moderately well drained (MW)-3, imperfectly drained (I)-4, poorly drained (P)-5, very poorly drained (VP)-6.

Table 4. ANNs, ANN-generated curves with fitting curves, and equations for soil drainage.

Some disagreements also were observed between soil drainage equation curves (fitting curves) and ANN-generated curves. It implied an advantage of soil drainage equations. These disagreed sections are most likely to occur when there are no or few data points in calibration or validation data sets. In these cases, ANN model predictions appeared unrealistic. For example, when VSP was >18.5 m, CSD-VSP ANN predictions demonstrated a sudden change, which could not be explained and was highly unrealistic. In contrast, the corresponding soil drainage equations curve (fitting curve) logically extended its curvilinear trend, which could avoid the unrealistic predictions made by ANNs in value range where there are insufficient calibration data. Thus, the obtained soil drainage equations could overcome the poor generalization problem of ANN models.

In addition, no requirement for special software support when performing predictions was another advantage of soil drainage equation, compared to ANNs using MATLAB software in Ref. [34] or soil landscape models using ARC/INFO software in Ref. [17].

For ANNs with two topo-hydrologic variables, we intended to produce three-dimensional surfaces (Table 4). However, the results were not able to produce meaningful mathematical equations because of the complexity of the data and the uneven distributions of data points across the range of independent variables. For example, the soil drainage surface (CSD = well) from the CSD-VSP-slope ANN model has a contour surface that was too difficult to formulize because of lack of general patterns.

#### 5.2. Mapping soil property maps over a very large area

Various models, including ANNs, have been developed to predict soil properties. However, it is difficult to use these existing models to produce high-resolution soil property maps over a very large area (>1000 km<sup>2</sup> ). This is because these models are either interpolation models or statistics models that were built based on the relationships between local environment variables and observed soil property conditions in the field. When applied over a large area, these models may perform well in areas with similar landforms where field samples were collected, but have trouble in areas with significantly different landforms. It is also difficult to build a new model that can produce soil property maps over a large area because it is very difficult to collect sufficient field samples for either interpolation or model calibration. In order to produce soil drainage map over a very large area with limited number of field samples, a two-stage approach was used to produce soil drainage map over a large area (e.g. the province of Nova Scotia) in Ref. [44]. In the first stage, soil drainage-VSP equation, generated from a soil drainage ANN in BBW, was used as the base model because it can capture the general trend of soil drainage distribution rules along topographic gradient. The base equation was directly used to predict soil drainage maps in the province of Nova Scotia. In the second stage, after dividing the entire provincial area into sub-area (landform) based on different division methods, corresponding linear transformation models were subsequently developed to adapt soil drainage classes produced by the base model to fit field samples. Each linear transformation model is composed of a set of linear equations and each linear equation responded to a special landform. Each linear equation was designed as Eq. (11).

$$SD\_{linear}^{i} = a^{i} + b^{i} SD\_{base} \tag{11}$$

where SDbase is the initial drainage classes produced by base model.

Some disagreements also were observed between soil drainage equation curves (fitting curves) and ANN-generated curves. It implied an advantage of soil drainage equations. These disagreed sections are most likely to occur when there are no or few data points in calibration or validation data sets. In these cases, ANN model predictions appeared unrealistic. For example, when VSP was >18.5 m, CSD-VSP ANN predictions demonstrated a sudden change, which could not be explained and was highly unrealistic. In contrast, the corresponding soil drainage equations curve (fitting curve) logically extended its curvilinear trend, which could avoid the unrealistic predictions made by ANNs in value range where there are insufficient calibration data. Thus, the obtained soil drainage equations could overcome the poor general-

\*\*Digital soil drainage classes: rapidly drained (VR)-0, rapidly drained (R)-1, well drained (W)-2, moderately well drained

ANN structure: input layer's nodes: (inputs) hidden layer's nodes (20) output layer's nodes (1).

Table 4. ANNs, ANN-generated curves with fitting curves, and equations for soil drainage.

(MW)-3, imperfectly drained (I)-4, poorly drained (P)-5, very poorly drained (VP)-6.

Inputs of ANNs\* Soil drainage curves Soil drainage equation\*\*

—

—

CSD = well, VSP,

68 Advanced Applications for Artificial Neural Networks

CSD = well, VSP,

SDR

\*

slope

ization problem of ANN models.

a i , bi and SD<sup>i</sup> linear responded to a special landform (i) of Nova Scotia. a i is the shifting parameter, which described average difference of soil drainage conditions between the BBW and a special landform of Nova Scotia. b<sup>i</sup> is the stretching parameter, which described the change rate of soil drainage conditions between two the BBW and a special landform of Nova Scotia. SDi linear is the adapted soil drainage classes. Attributes of coarse soil maps were used as the criteria to divide the entire area of Nova Scotia into sub-area (landforms), including slope, topographic pattern, drainage, and texture. Each dividing criteria responded to a set of landforms (Table 5). For each landform of each linear transformation model, using all of field samples within the landform (sub-area) as calibration data, parameters a i and bi of the landform (i) were estimated with the regression analysis tool. Only linear equations that passed P < 0.05 based on F and t test for the significance of the correlation coefficient were kept. In order to reduce the number

Table 5. Linear transformation models with different landform sets.

For each landform of each linear transformation model, using all of field samples within the

with the regression analysis tool. Only linear equations that passed P < 0.05 based on F and t test for the significance of the correlation coefficient were kept. In order to reduce the number

i and bi

Deduced soil drainage curves from base equation Parameters of linear equation

of the landform (i) were estimated

linear <sup>¼</sup> <sup>a</sup><sup>i</sup> <sup>þ</sup> bi

a1 = 2.083; b1 = 0.341 a2 = 1.065; b2 = 0.604 a3 = 1.060; b3 = 0.500

a1 = 0.132; b1 = 0.856 a2 = 1.694; b2 = 0.432 a3 = 1.209; b3 = 0.576 a4 = 0.566; b4 = 0.775 a5 = 0.990; b5 = 0.550

a1 = 1.011; b1 = 0.593 a2 = 1.567; b2 = 0.364

SDbase

SD<sup>i</sup>

landform (sub-area) as calibration data, parameters a

70 Advanced Applications for Artificial Neural Networks

Attributes of dividing subarea (landform set)

Topographic pattern: drunlinoid-1 hummocky-2 knob&knoll-3 ridged-4 smooth-5

Drainage: well-1 Imperfect-2

Slope: level (L)-1 undulating (U)-2 rolling (R)-2

> of linear equations, field samples that come from different landforms were combined when no significant differences were detected (P > 0.05).

> As showed in Figure 11, prediction accuracies of linear transformation models under different landform sets (Table 5) were always better than prediction accuracy of base equation. It indicated that the two-stage methods provide a viable way to extend base equation to generate soil drainage maps over a large area with limited number of field samples.

Figure 11. Accuracy comparison of base equation and linear transformation models with different landform sets.
