**Model Averaging for Semivariogram Model Parameters**

Asim Biswas and Bing Cheng Si

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/52339

## **1. Introduction**

[57] Vachaud, G., Passerat De Silans, A., Balabanis, P., & Vauclin, M. (1985). Temporal Stability of Spatially Measured Soil Water Probability Density Function. Soil sci. soc.

[58] Vereecken, H., Kamai, T., Harter, T., Kasteel, R., Hopmans, J., & Vanderborght, J. (2007). Explaining Soil Moisture Variability as a Function of Mean Soil Moisture: A Stochastic Unsaturated Flow Perspective. Geophys. res. lett., 34, DOI:

[59] Vivoni, E. R., Gebremichael, M., Watts, C. J., Bindlish, R., & Jackson, T. J. (2008). Comparison of Ground-Based and Remotely-Sensed Surface Soil Moisture Estimates

[60] Western, A., Grayson, R., Blöschl, G., Willgoose, G., & McMahon, T. (1999). Observed Spatial Organization of Soil Moisture and Its Relation to Terrain Indices. Water re‐

[61] Wilson, D. J., Western, A. W., & Grayson, R. B. (2005). A Terrain and Data-Based Method for Generating the Spatial Distribution of Soil Moisture. Adv. water resour.,

[62] Yoo, C., & Kim, S. (2004). EOF Analysis of Surface Soil Moisture Field Variability.

[63] Zar, J. H. (1972). Significance Testing of the Spearman Rank Correlation Coefficient. J.

[64] Zhao, Y., Peth, S., Wang, X. Y., Lin, H., & Horn, R. (2010). Controls of Surface Soil Moisture Spatial Patterns and Their Temporal Stability in a Semi-Arid Steppe. Hy‐

[65] Zhou, X., Lin, H., & Zhu, Q. (2007). Temporal Stability of Soil Moisture Spatial Varia‐ bility at Two Scales and Its Implication for Optimal Field Monitoring. Hydrol. earth

[66] Zhu, Q., & Lin, H. (2011). Influences of Soil, Terrain, and Crop Growth on Soil Mois‐

ture Variation from Transect to Farm Scales. Geoderma, 163, 45-54.

over Complex Terrain During SMEX04. Remote sens. environ., 112, 314-325.

Am. J., 49, 822-828.

80 Advances in Agrophysical Research

10.1029/2007GL031813.

sour. res., 35, 797-810.

Adv. water resour., 27, 831-842.

am. stat. assoc., 67, 578-580.

drol. process., 24, 2507-2519.

syst. sci. discuss, 4, 1185-1214.

28, 43-54.

Soil varies considerably from location to location (Nielsen et al., 1973). Knowledge on soil spa‐ tial variability is important in ecological modelling (Burrough, 1983; Corwin et al., 2006), envi‐ ronmental prediction (Trangmar et al., 1985), precision agriculture (Goderya, 1998), soil quality assessment (Heuvelink and Pebesma, 1999; McBratney et al., 2000), and natural resour‐ ces management. Adequate understanding of the variability in soil properties as a function of space and time is necessary for developing logical, empirical and physical models of soil and landscape processes (Burrough, 1993; Foussereau et al., 1993; Wilding et al., 1994). Geostatis‐ tics, a widely used approach, has been used to identify the spatial structure in the variability of soil attributes (Vieira, 2000; Carvalho et al., 2002; Vieira et al., 2002). Semivariance function characterizes the spatial continuity between points. When the semivariance is plotted against the lag distance or separation distance between points, the plot is called semivariogram (Fig. 1; McBratney and Webster, 1986; Isaaks and Srivastava, 1989). The structure of the semivario‐ gram is explained by three properties; the nugget, the sill and the range (Fig. 1). These spatial structures of semivariogram help in identifying autocorrelation and replicating samples, re‐ vealing dominant pattern in data series, identifying major ongoing processes (Si et al., 2007), designing experiments (Fagroud and van Meirvenne, 2002) and monitoring networks (Pra‐ kash and Singh, 2000), selecting proper data analysis method and interpreting data (Lambert et al., 2004), and assessing simulation and uncertainty analysis in a better way (Papritz and Du‐ bois, 1999). The semivariogram structures also help to quantify spatial dependence between observations. Modelling of observed semivariance values helps in predicting the spatial distri‐ bution of attribute values (Goovaerts, 1998). The spatial distribution of attribute values is very important in separating random noise in semivariance, and interpolation and mapping analy‐ sis such as kriging (Deutsch and Journel, 1998; Nielsen and Wendorth, 2003).

© 2013 Biswas and Si; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2013 Biswas and Si; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

which makes the interpretation difficult. Therefore, interpretation of a particular spatial proc‐ ess from the parameter values of a particular model will be biased. In addition the parameter value from individual models may have large uncertainty. This uncertainty is inherent to the model selection and is well beyond just the issue of determining best model(s). Model averag‐ ing is a technique designed to help in accounting the uncertainty associated with the models

The model averaging is not a new concept in statistics. People started combining and aver‐ aging models since 1970s for different purposes (Hoeting et al., 1999). During averaging models are assigned with weights based on their performance. The AIC values are used to calculate the weights for the models (Burnham and Anderson, 2002). Before the recent de‐ velopment of computational power, the averaging procedure ignored the uncertainty associ‐ ated with models. Recently, the use of model averaging is increased to reduce the uncertainty associated with the model selection. Information on the use of model averaging procedure in soil science is scarce (Webster and McBratney, 1989) and there is no informa‐ tion of reducing uncertainty associated with the semivariogram model parameters from averaging of commonly used semivariogram model parameters. Therefore, the objective of this paper is to reduce the uncertainty associated with semivariogram model parameters through averaging. The weighted average of the parameter values of commonly used mod‐

els can be a better way of describing the spatial processes.

γ*s*

γ*l*

Exponential Model <sup>γ</sup>*e*(*<sup>h</sup>* ;*a*, *<sup>b</sup>*, *<sup>c</sup>*)=*<sup>a</sup>* <sup>+</sup> (*b*−*a*){1−exp( <sup>−</sup>*<sup>h</sup>*

**Table 1.** Commonly used models for semivariogram fitting in soil science

(*<sup>h</sup>* ;*a*, *<sup>b</sup>*, *<sup>c</sup>*)={*<sup>a</sup>* <sup>+</sup> (*b*−*a*) 1.5 *<sup>h</sup>*

*b otherwise*

<sup>γ</sup>*g*(*<sup>h</sup>* ;*a*, *<sup>b</sup>*, *<sup>c</sup>*)=*<sup>a</sup>* <sup>+</sup> (*b*−*a*){1−exp( <sup>−</sup>*<sup>h</sup>* <sup>2</sup>

*h <sup>c</sup> if <sup>h</sup>* <sup>≤</sup>*<sup>c</sup>*

*b otherwise*

(*<sup>h</sup>* ;*a*, *<sup>b</sup>*, *<sup>c</sup>*)={*<sup>a</sup>* <sup>+</sup> (*b*−*a*)

Geostatistics can explain the spatial variability and the patterns of a variable in field from its autocorrelation. It describes the relationship between measurements at different locations

*<sup>c</sup>* <sup>−</sup>0.5( *<sup>h</sup> c* ) 3

*<sup>c</sup>* )}

*<sup>c</sup>* <sup>2</sup> )}

*if h* ≤*c*

Model Averaging for Semivariogram Model Parameters

http://dx.doi.org/10.5772/52339

83

**Name of Model Equations †**

and their parameters.

Spherical Model

Gaussian Model

**2. Theory**

Linear Plateau Model

† In all models; γ = 0 when h = 0

**Figure 1.** A typical example of semivariogram showing different components.

The choice of theoretical models and its fitting procedure is very important to get a better pre‐ diction of unsampled locations (McBratney and Webster, 1986). Spherical model (Burgess and Webster, 1980; Vieira et al., 1981; Van Kuilenburg et al., 1982; Vauclin et al., 1983; Trangmar, 1985), exponential model (David, 1977), Gaussian model (McBratney and Webster, 1986) and linear plateau model (Burgess and Webster, 1980; Hajrasuliha et al., 1980; Vauclin et al., 1983) are important among the most commonly used semivariogram models in the field of soil sci‐ ence. The maximum likelihood method (Cressie, 1991) or least square regression (Vieira et al., 1981; Yost et al., 1982; Trangmar et al., 1985) including the weighted least square methods (Cressie, 1985) optimize the parameter value by minimizing the deviations of model predic‐ tion from the experimental semivariances. Small sum of deviations between the model and the experimental values indicate superior performance of the models. Small and comparable sum of the deviations indicates comparable performance of the models. Therefore, selection of model is prerequisite for better prediction (Burnham and Anderson, 2002). In reality, there can be several models to choose from. Selection of good models requires balancing of goodness of fit and complexity, which is generally, determined using a likelihood ratio approach leading to a chi-square test. The Akaike Information Criterion (AIC) values can be calculated from maxi‐ mum likelihood function (Akaike, 1973), which is used to evaluate the performance of the models (Webster and McBratney, 1989). The smallest AIC value indicates the best model, but the values of parameters such as nugget, sill and range can be different from model to model, even though they have the same physical meaning (Trangmar, 1985). Different value of a pa‐ rameter for different models clearly indicates the uncertainty associated with the parameter, which makes the interpretation difficult. Therefore, interpretation of a particular spatial proc‐ ess from the parameter values of a particular model will be biased. In addition the parameter value from individual models may have large uncertainty. This uncertainty is inherent to the model selection and is well beyond just the issue of determining best model(s). Model averag‐ ing is a technique designed to help in accounting the uncertainty associated with the models and their parameters.

The model averaging is not a new concept in statistics. People started combining and aver‐ aging models since 1970s for different purposes (Hoeting et al., 1999). During averaging models are assigned with weights based on their performance. The AIC values are used to calculate the weights for the models (Burnham and Anderson, 2002). Before the recent de‐ velopment of computational power, the averaging procedure ignored the uncertainty associ‐ ated with models. Recently, the use of model averaging is increased to reduce the uncertainty associated with the model selection. Information on the use of model averaging procedure in soil science is scarce (Webster and McBratney, 1989) and there is no informa‐ tion of reducing uncertainty associated with the semivariogram model parameters from averaging of commonly used semivariogram model parameters. Therefore, the objective of this paper is to reduce the uncertainty associated with semivariogram model parameters through averaging. The weighted average of the parameter values of commonly used mod‐ els can be a better way of describing the spatial processes.


† In all models; γ = 0 when h = 0

**Table 1.** Commonly used models for semivariogram fitting in soil science

## **2. Theory**

**Figure 1.** A typical example of semivariogram showing different components.

82 Advances in Agrophysical Research

The choice of theoretical models and its fitting procedure is very important to get a better pre‐ diction of unsampled locations (McBratney and Webster, 1986). Spherical model (Burgess and Webster, 1980; Vieira et al., 1981; Van Kuilenburg et al., 1982; Vauclin et al., 1983; Trangmar, 1985), exponential model (David, 1977), Gaussian model (McBratney and Webster, 1986) and linear plateau model (Burgess and Webster, 1980; Hajrasuliha et al., 1980; Vauclin et al., 1983) are important among the most commonly used semivariogram models in the field of soil sci‐ ence. The maximum likelihood method (Cressie, 1991) or least square regression (Vieira et al., 1981; Yost et al., 1982; Trangmar et al., 1985) including the weighted least square methods (Cressie, 1985) optimize the parameter value by minimizing the deviations of model predic‐ tion from the experimental semivariances. Small sum of deviations between the model and the experimental values indicate superior performance of the models. Small and comparable sum of the deviations indicates comparable performance of the models. Therefore, selection of model is prerequisite for better prediction (Burnham and Anderson, 2002). In reality, there can be several models to choose from. Selection of good models requires balancing of goodness of fit and complexity, which is generally, determined using a likelihood ratio approach leading to a chi-square test. The Akaike Information Criterion (AIC) values can be calculated from maxi‐ mum likelihood function (Akaike, 1973), which is used to evaluate the performance of the models (Webster and McBratney, 1989). The smallest AIC value indicates the best model, but the values of parameters such as nugget, sill and range can be different from model to model, even though they have the same physical meaning (Trangmar, 1985). Different value of a pa‐ rameter for different models clearly indicates the uncertainty associated with the parameter,

Geostatistics can explain the spatial variability and the patterns of a variable in field from its autocorrelation. It describes the relationship between measurements at different locations (or times) separated by certain distance (or time). For example, a soil property *Z(xi )* meas‐ ured at location *xi* , where *i* =1, 2, …, *n* and *n* is the number of samples along a transect. The continuity of this relationship can be investigated from *h* scatterplots, which can be done by plotting the measured values *Z(xi )* as horizontal axis and *Z(xi +h)* as vertical axis, where *i* =1, 2, …. The *h* scatterplot will show a cloud of points, where every point represents one pair of sample locations *Z(xi )* and *Z(xi +h)* (Pannatier, 1996; Zawadzki and Fabijanczyk, 2007). For each *h*, half of the mean value of the squared difference *Z(xi )* - *Z(xi +h)* is defined as semivar‐ iance (Matheron, 1962).

$$\mathcal{N}\left(h\right) = \frac{1}{2N\left(h\right)} \sum\_{i=1}^{N\left(h\right)} \left[ Z\left(\mathbf{x}\_i\right) - Z\left(\mathbf{x}\_i + h\right) \right]^2 \tag{1}$$

2 1

<sup>=</sup> (4)

Model Averaging for Semivariogram Model Parameters

http://dx.doi.org/10.5772/52339

85

<sup>2</sup> ë û (6)

æ ö é ù = - ç ÷ ë û è ø (7)

^ <sup>|</sup> *<sup>y</sup>* is the maximized likelihood and can

(*j* = 1, 2, 3,...,*p*), the esti‐

*i*

mation the fit can always be improved by diminishing the residual sum of squared errors with addition of parameters to the models (Webster and McBratney, 1989). In case of the same number of parameters in the most commonly used semivariogram models, the small and comparable RSS values indicate the comparable performance of different models. A sol‐ ution can be achieved by fitting each model using the least square approximation and then comparing the goodness of fit for each model (Webster and McBratney, 1989). The perform‐ ance of a model can be evaluated and the fitting can be said as the best with the lowest Akaike Information Criterion (AIC) (Akaike, 1973). AIC is an estimate of the expected Kull‐ back-Leiler information (a ruler to measure the similarity between the statistical model and the true distribution) lost by using a model to approximate the process that generated ob‐ served pattern (Burnham and Anderson, 2002; Johnson and Omland, 2004). It is also defined

<sup>2</sup> is the variance calculated in Eq. 2 for lag *i*. In this weighted least squares approxi‐

*AIC* = - 2ln maximized likelihood 2 number of parameters ( ) + ´ ( ) (5)

( ˆ *AIC L y p* = - + 2ln é ù q

( ˆ ln ln

q

selected model. If there are *R* numbers of models with parameters *θ<sup>j</sup>*

2 *n RSS L y*

where, *n* is the number of data points. With close performance, different models with same number of parameters having the same physical meaning produces different optimized pa‐ rameter values. A single process in field will be represented by different values based on the selected models. In this case the estimated parameter values will be associated with uncer‐ tainty, which can be reduced by making weighted average based on the performance of the

mated parameters can be calculated from the Eq. 8 (Burnham and Anderson, 2002; Johnson

*n*

s

*i*

*w*

where, *σ<sup>i</sup>*

as Eq. 5:

Or it can be estimated as in Eq. 6.

be estimated from Eq. 7.

and Omland, 2004).

where, *p* is the number of parameters and ln *L* (*θ*

The variance (*σ<sup>i</sup>* 2 ) of the squared difference *Z(xi )* - *Z(xi +h)* can be calculated using Eq. (2).

$$
\sigma\_i^2 = \frac{1}{N\binom{h}{h} - 1} \sum\_{i=1}^{N\binom{h}{i}} \left(a\_i - \overline{a}\_i\right)^2 \tag{2}
$$

where, *ai* = *Z*(*xi* ) −*Z*(*xi* + *h* ) 2 and *N(h)* is the number of pairs in a cloud for a particular *h*.

The experimental semivariogram can be fitted to a mathematical model. Not all the mathe‐ matical functions that seem to fit the observed values can be considered as semivariogram model. One important criterion for the semivariogram models is to be "positive-definite" to ensure the nonnegative covariance values restricting the models to be permissible for fitting (Isaaks and Srivastava, 1989; Goovaerts, 1997; Deutsch and Journel, 1998). There are a num‐ ber of permissible semivariogram models including the most commonly used spherical, ex‐ ponential, Gaussian, and linear plateau (Table 1).

Among the fitting procedures, the weighted nonlinear least square method is the most ro‐ bust and reliable method (Cressie, 1985). This procedure minimizes the residual sum of squared errors (RSS) between experimental semivariance data and the models by optimiz‐ ing the model parameters: nugget, sill and range values. The RSS can be calculated using Eq. 3.

$$RSS = \sum\_{i=1}^{m} w\_i \left[ \tilde{\mathcal{Y}}\_i - \mathcal{Y}\_i \right]^2 \tag{3}$$

where, *m* is the number of lags, *γ*˜*<sup>i</sup>* is the semivariance value for lag *i*, *γ<sup>i</sup>* are the correspond‐ ing model predictions, and *wi* are weighting factors. The weighing factor used in calculating RSS is related to the variance associated with the semivariance calculation (Jian et al., 1996).

$$w\_i = \frac{1}{\sigma\_i^2} \tag{4}$$

where, *σ<sup>i</sup>* <sup>2</sup> is the variance calculated in Eq. 2 for lag *i*. In this weighted least squares approxi‐ mation the fit can always be improved by diminishing the residual sum of squared errors with addition of parameters to the models (Webster and McBratney, 1989). In case of the same number of parameters in the most commonly used semivariogram models, the small and comparable RSS values indicate the comparable performance of different models. A sol‐ ution can be achieved by fitting each model using the least square approximation and then comparing the goodness of fit for each model (Webster and McBratney, 1989). The perform‐ ance of a model can be evaluated and the fitting can be said as the best with the lowest Akaike Information Criterion (AIC) (Akaike, 1973). AIC is an estimate of the expected Kull‐ back-Leiler information (a ruler to measure the similarity between the statistical model and the true distribution) lost by using a model to approximate the process that generated ob‐ served pattern (Burnham and Anderson, 2002; Johnson and Omland, 2004). It is also defined as Eq. 5:

$$AIC = -\text{ } \text{2ln}\left(\text{maximal likelihood}\right) + \text{ } \text{2} \times \left(\text{number of parameters}\right) \tag{5}$$

Or it can be estimated as in Eq. 6.

(or times) separated by certain distance (or time). For example, a soil property *Z(xi*

( ) ( ) ( ) ( )

1

*i h Zx Zx h*

1

*N h*

=

*N h*

continuity of this relationship can be investigated from *h* scatterplots, which can be done by

2, …. The *h* scatterplot will show a cloud of points, where every point represents one pair of

*)* as horizontal axis and *Z(xi*

( ) <sup>2</sup>

*i i*

*)* - *Z(xi*

*a a*

) −*Z*(*xi* + *h* ) 2 and *N(h)* is the number of pairs in a cloud for a particular *h*.

( ) ( ) ( ) <sup>2</sup> <sup>2</sup>

*N h i i i i*

1

1

The experimental semivariogram can be fitted to a mathematical model. Not all the mathe‐ matical functions that seem to fit the observed values can be considered as semivariogram model. One important criterion for the semivariogram models is to be "positive-definite" to ensure the nonnegative covariance values restricting the models to be permissible for fitting (Isaaks and Srivastava, 1989; Goovaerts, 1997; Deutsch and Journel, 1998). There are a num‐ ber of permissible semivariogram models including the most commonly used spherical, ex‐

Among the fitting procedures, the weighted nonlinear least square method is the most ro‐ bust and reliable method (Cressie, 1985). This procedure minimizes the residual sum of squared errors (RSS) between experimental semivariance data and the models by optimiz‐ ing the model parameters: nugget, sill and range values. The RSS can be calculated using

*ii i*

ing model predictions, and *wi* are weighting factors. The weighing factor used in calculating RSS is related to the variance associated with the semivariance calculation (Jian et al., 1996).

g g

1

*i RSS w*

=

*m*

2

= - é ù å ë û % (3)

is the semivariance value for lag *i*, *γ<sup>i</sup>* are the correspond‐

=

, where *i* =1, 2, …, *n* and *n* is the number of samples along a transect. The

*+h)* (Pannatier, 1996; Zawadzki and Fabijanczyk, 2007). For

<sup>=</sup> é ù - + å ë û (1)

<sup>=</sup> - - å (2)

*+h)* can be calculated using Eq. (2).

*)* - *Z(xi*

ured at location *xi*

84 Advances in Agrophysical Research

sample locations *Z(xi*

iance (Matheron, 1962).

The variance (*σ<sup>i</sup>*

where, *ai* = *Z*(*xi*

Eq. 3.

plotting the measured values *Z(xi*

2

*)* and *Z(xi*

g

ponential, Gaussian, and linear plateau (Table 1).

where, *m* is the number of lags, *γ*˜*<sup>i</sup>*

each *h*, half of the mean value of the squared difference *Z(xi*

1 2

*N h*

) of the squared difference *Z(xi*

s

*)* meas‐

*+h)* as vertical axis, where *i* =1,

*+h)* is defined as semivar‐

$$AIC = -2\ln\left[L\left(\left.\,\,\hat{\theta}\right|y\right) + 2p\right] \tag{6}$$

where, *p* is the number of parameters and ln *L* (*θ* ^ <sup>|</sup> *<sup>y</sup>* is the maximized likelihood and can be estimated from Eq. 7.

$$\ln\left[L\left(\cdot\,\,\hat{\theta}\,\,\middle|\,y\right)=-\frac{n}{2}\ln\left(\frac{RSS}{n}\right)\right] \tag{7}$$

where, *n* is the number of data points. With close performance, different models with same number of parameters having the same physical meaning produces different optimized pa‐ rameter values. A single process in field will be represented by different values based on the selected models. In this case the estimated parameter values will be associated with uncer‐ tainty, which can be reduced by making weighted average based on the performance of the selected model. If there are *R* numbers of models with parameters *θ<sup>j</sup>* (*j* = 1, 2, 3,...,*p*), the esti‐ mated parameters can be calculated from the Eq. 8 (Burnham and Anderson, 2002; Johnson and Omland, 2004).

$$
\hat{\vec{\theta}} = \sum\_{j=1}^{R} \mathcal{W}\_{j} \hat{\theta}\_{j} \tag{8}
$$

It is conventional to remove the factors of 2 by defining

*ab*

a

é ù = ë û

a

be written as (Press et al., 1992);

al., 1992) (Eq. 17).

eter values from fitting, and *θ*¯

2 2 1

*d dz dz* c

*a b*

For a linear equation, the second derivative term can be dismissed because it is zero or small enough when compared to the term involved in first order derivative. So the *αab* formula can

*<sup>N</sup>* 1 ;,, ;,,

é ù <sup>=</sup> ê ú

So for a three parameter (*a, b, c*) semivariogram model we will get a 3 × 3 Hessian Matrix, (Eq. 16).

*dd dd dd da da da db da dc dd dd dd db da db db db dc dd dd dd dc da dc db dc dc*

 gg

 gg

 gg


2

ë û <sup>å</sup> (18)

^ is the estimated model param‐

 qq

is the average of estimated model parameter values.

 gg

é ù ê ú

> gg

> gg

ë û

The approximation of the covariance matrix can be done by inversing the matrix *α* (Press et

This matrix component indicates the variance associated with each parameter for different models. At the same time we need to calculate the variance associated with the averaged pa‐ rameter. The parameter variance for each model and the weight assigned to the model are used to calculate the variance of the averaged parameter (Eq. 18) (Burnham and Anderson, 2002)

*i i a b*

111

*iii NNN*

===

gg

gg

gg

*iii NNN*

===

ååå

*iii*

===

ååå

*NNN*

ååå

111

111

<sup>1</sup> *<sup>C</sup>* a

( ) ( ) ( )

*j j*

q

é ù <sup>=</sup> ê ú + -

ˆ ˆ ˆ ˆ varˆ var

*W g*

1

*j*

=

q

where, *W* is the weights assigned to *R* number of models, *g*, *θ*

^

*R*

( ) ( )

 g

*d habc d habc dz dz*

º (14)

Model Averaging for Semivariogram Model Parameters

http://dx.doi.org/10.5772/52339

87

ë û <sup>å</sup> (15)

(16)

2 *ab*

a

2 1

= s g

where, *Wj* are the weights assigned to a particular model. The weights can be calculated from the AIC values for each model (Eq. 9).

$$\mathcal{W}\_{\circ} = \frac{\exp\left(-\Delta\_{\circ}/2\right)}{\sum\_{j=1}^{R} \exp\left(-\Delta\_{j}/2\right)}\tag{9}$$

where, *Δ<sup>j</sup>* = *AICj* − *AIC*min and *AICj* and *AICmin* are the AIC values for a particular model and the lowest AIC value model.

To explain the uncertainty associated with the parameter, we need to calculate the variance associated with each optimized parameter. If a model has *p* parameters, we can calculate *p* × *p* covariance matrix. From the calculation of Hessian Matrix (the second derivative matrix of *χ*<sup>2</sup> merit function) we can estimate the standard error in fitted parameters. In the calculation of Hessian Matrix (Press et al., 1992), let us assume we have to fit a model:

$$
\gamma = \gamma \left( h; a, b, c \right) \tag{10}
$$

The *χ*<sup>2</sup> merit function, that will be equal to RSS (Eq. 11),

$$\log^2\left(a,b,c\right) = \sum\_{i=1}^{m} \frac{1}{\sigma\_i^2} \left[\tilde{\boldsymbol{\gamma}}\_i - \boldsymbol{\gamma}\left(h;a,b,c\right)\right]^2 = RSS\left(a,b,c\right) \tag{11}$$

The *χ*<sup>2</sup> gradient for the parameters, *z*, will be zero when *χ*<sup>2</sup> is minimum. The components can be calculated as Eq. (12).

$$\frac{d\chi^2}{dz\_k} = -2\sum\_{i=1}^N \frac{\left[\tilde{\gamma}\_i - \gamma\left(h; a, b, c\right)\right]}{\sigma\_i^2} \frac{d\chi\left(h; a, b, c\right)}{dz\_k} \text{where, } k = a, b, c. \tag{12}$$

An additional derivative will give,

$$\frac{d^2 \chi^2}{dz\_a dz\_b} = 2 \sum\_{i=1}^N \frac{1}{\sigma\_i^2} \left[ \frac{d \chi\left(h; a, b, c\right)}{dz\_a} \frac{d \chi\left(h; a, b, c\right)}{dz\_b} - \left\langle \tilde{\gamma}\_i - \gamma\left(h; a, b, c\right) \right\rangle \frac{d^2 \chi\left(h; a, b, c\right)}{dz\_a dz\_b} \right] \tag{13}$$

It is conventional to remove the factors of 2 by defining

1 <sup>ˆ</sup> <sup>ˆ</sup> *<sup>R</sup>*

*j* q

1

*j*

=

*j R*

=

of Hessian Matrix (Press et al., 1992), let us assume we have to fit a model:

g g

2 1

*i*

g g

( ) ( ) ( ) <sup>2</sup> <sup>2</sup>

<sup>1</sup> , , ;,, , ,

;,, ;,, <sup>2</sup> where, , , .

g

é ù - ë û <sup>=</sup> - = <sup>å</sup> %

( ) ( ) { ( )} ( ) 2 2 <sup>2</sup>

*a b i i a b a b d d habc d habc d habc*

*dz dz dz dz dz dz*

 g

<sup>1</sup> ;,, ;,, ;,, <sup>2</sup> ;,,

= ê ú - -

*i*

å % (13)

g g

ë û

é ù

*habc*

*h a b c RSS a b c*

=- = é ù å ë û % (11)

merit function, that will be equal to RSS (Eq. 11),

*m*

= s

1

g

=

2 1

*<sup>N</sup> <sup>i</sup>*

g g

*i i*

gradient for the parameters, *z*, will be zero when *χ*<sup>2</sup>

( ) ( ) <sup>2</sup> 2

*k k i i d habc d habc*

s

*dz dz*

*W*

where, *Wj*

86 Advances in Agrophysical Research

The *χ*<sup>2</sup>

The *χ*<sup>2</sup>

from the AIC values for each model (Eq. 9).

where, *Δ<sup>j</sup>* = *AICj* − *AIC*min and *AICj*

c*abc*

can be calculated as Eq. (12).

c

An additional derivative will give,

*N*

= s

c

the lowest AIC value model.

*j j*

( )

*j*

exp 2

To explain the uncertainty associated with the parameter, we need to calculate the variance associated with each optimized parameter. If a model has *p* parameters, we can calculate *p* × *p* covariance matrix. From the calculation of Hessian Matrix (the second derivative matrix of *χ*<sup>2</sup> merit function) we can estimate the standard error in fitted parameters. In the calculation

exp 2


( )

*j*

are the weights assigned to a particular model. The weights can be calculated

<sup>=</sup> å (8)

å -D (9)

and *AICmin* are the AIC values for a particular model and

= (*habc* ;,, ) (10)

*k abc*

g

is minimum. The components

(12)

 q *W* =

$$
\alpha\_{ab} \equiv \frac{1}{2} \frac{d^2 \chi^2}{dz\_a dz\_b} \tag{14}
$$

For a linear equation, the second derivative term can be dismissed because it is zero or small enough when compared to the term involved in first order derivative. So the *αab* formula can be written as (Press et al., 1992);

$$\alpha\_{ab} = \sum\_{i=1}^{N} \frac{1}{\sigma\_i^2} \left[ \frac{d\gamma\left(h;a,b,c\right)}{dz\_a} \frac{d\gamma\left(h;a,b,c\right)}{dz\_b} \right] \tag{15}$$

So for a three parameter (*a, b, c*) semivariogram model we will get a 3 × 3 Hessian Matrix, (Eq. 16).

$$
\begin{bmatrix}
\sum\_{i=1}^{N} \frac{d\boldsymbol{\gamma}}{da} \frac{d\boldsymbol{\gamma}}{da} & \sum\_{i=1}^{N} \frac{d\boldsymbol{\gamma}}{da} \frac{d\boldsymbol{\gamma}}{db} & \sum\_{i=1}^{N} \frac{d\boldsymbol{\gamma}}{da} \frac{d\boldsymbol{\gamma}}{dc} \\
\sum\_{i=1}^{N} \frac{d\boldsymbol{\gamma}}{db} \frac{d\boldsymbol{\gamma}}{da} & \sum\_{i=1}^{N} \frac{d\boldsymbol{\gamma}}{db} \frac{d\boldsymbol{\gamma}}{db} & \sum\_{i=1}^{N} \frac{d\boldsymbol{\gamma}}{db} \frac{d\boldsymbol{\gamma}}{dc} \\
\sum\_{i=1}^{N} \frac{d\boldsymbol{\gamma}}{dc} \frac{d\boldsymbol{\gamma}}{da} & \sum\_{i=1}^{N} \frac{d\boldsymbol{\gamma}}{dc} \frac{d\boldsymbol{\gamma}}{db} & \sum\_{i=1}^{N} \frac{d\boldsymbol{\gamma}}{dc} \frac{d\boldsymbol{\gamma}}{dc}
\end{bmatrix}
\tag{16}
$$

The approximation of the covariance matrix can be done by inversing the matrix *α* (Press et al., 1992) (Eq. 17).

$$
\begin{bmatrix} \mathbb{C} \\ \end{bmatrix} = \begin{bmatrix} a \\ \end{bmatrix}^{-1} \tag{17}
$$

This matrix component indicates the variance associated with each parameter for different models. At the same time we need to calculate the variance associated with the averaged pa‐ rameter. The parameter variance for each model and the weight assigned to the model are used to calculate the variance of the averaged parameter (Eq. 18) (Burnham and Anderson, 2002)

$$\operatorname{var}\left(\hat{\overline{\theta}}\right) = \sum\_{j=1}^{R} W\_j \left| \operatorname{var}\left(\hat{\theta} \mid \mathbb{g}\_j\right) + \left(\hat{\theta} - \hat{\overline{\theta}}\right)^2 \right| \tag{18}$$

where, *W* is the weights assigned to *R* number of models, *g*, *θ* ^ is the estimated model param‐ eter values from fitting, and *θ*¯ ^ is the average of estimated model parameter values.


**Table 2.** Descriptive statistics of the data sets used for model fitting

#### **3. Materials and methods**

Demonstration of this average parameter estimation procedure was done using two soil pa‐ rameters, which were chosen from two different experimental datasets. One soil property, sand content, was collected from a regularly spaced (3 m) 128 point transect (Zeleke and Si, 2005). The sampling site was located at Smeaton, Saskatchewan, Canada (53° 40′N latitude and 104° 58′W longitude). The hydrometer method was used to determine the particle sizes (Gee and Bauder, 1986). Another soil property, copper (Cu) content, was selected from a da‐ taset consisting of 359 topsoil samples collected by the Swiss Federal Institute of Technology in the Swiss Jura (Atteia et al., 1994). The data points were selected according to a regular configuration of a mesh of 250 m × 250 m with several clusters from an area of approximate‐ ly 1450 ha. The cluster samples were collected following nested sampling design with 100, 40, 15, and 6 m sampling interval. Copper content was measured using a direct current plas‐ ma spectroscopy (ARL, Spectran V) with other 5 heavy metals (Atteia et al., 1994).

**Lag Distance (m)**

**Lag Distance (km) 012**

**Figure 3.** Empirical semivariogram and fitted models of logarithm-transformed copper content data.

**Experimental data Spherical Model Exponential Model Gaussian Model Linear Plateau Model**

Model Averaging for Semivariogram Model Parameters

http://dx.doi.org/10.5772/52339

89

**Experimental data Spherical Model Exponential Model Gaussian Model Linear Plateau Model**

**0 40 80 120**

**Semivariance (%)**

**Semivariance [ln(mg**

**0.2**

**0.4**

**0.6**

**2/kg2)]** **20**

**Figure 2.** Empirical semivariogram and fitted models for sand content data.

**30**

**40**

#### **4. Results and discussion**

The exploratory information about the dataset is given in Table 2. The skewness of the sand 0.28, which indicates a near normal statistical distribution of the dataset. The copper content is highly skewed (skewness = 3.02) and exhibits a log normal statistical distribution. There‐ fore, we decided to normalize the copper content using natural logarithm transformation. The exploratory information of logarithm transformed values for copper content is present‐ ed in parentheses in Table 2.

**Figure 2.** Empirical semivariogram and fitted models for sand content data.

**Characteristics Sand (Zeleke and Si, 2005) Copper (Atteia et al., 1994)**

Demonstration of this average parameter estimation procedure was done using two soil pa‐ rameters, which were chosen from two different experimental datasets. One soil property, sand content, was collected from a regularly spaced (3 m) 128 point transect (Zeleke and Si,

(Gee and Bauder, 1986). Another soil property, copper (Cu) content, was selected from a da‐ taset consisting of 359 topsoil samples collected by the Swiss Federal Institute of Technology in the Swiss Jura (Atteia et al., 1994). The data points were selected according to a regular configuration of a mesh of 250 m × 250 m with several clusters from an area of approximate‐ ly 1450 ha. The cluster samples were collected following nested sampling design with 100, 40, 15, and 6 m sampling interval. Copper content was measured using a direct current plas‐

The exploratory information about the dataset is given in Table 2. The skewness of the sand 0.28, which indicates a near normal statistical distribution of the dataset. The copper content is highly skewed (skewness = 3.02) and exhibits a log normal statistical distribution. There‐ fore, we decided to normalize the copper content using natural logarithm transformation. The exploratory information of logarithm transformed values for copper content is present‐

58′W longitude). The hydrometer method was used to determine the particle sizes

40′N latitude

2005). The sampling site was located at Smeaton, Saskatchewan, Canada (53°

ma spectroscopy (ARL, Spectran V) with other 5 heavy metals (Atteia et al., 1994).

Minimum 52.50 3.55 (1.27) Maximum 78.75 166.40 (5.11) Mean 64.23 23.58 (2.87) Median 64.41 17.20 (2.84) Mode 58.75 16.40 (2.80) Skewness 0.28 3.02 (0.42) Kurtosis -0.73 11.88 (0.11) Standard Deviation 6.06 22.24 (0.72) Variance 36.76 494.48 (0.52)

**Table 2.** Descriptive statistics of the data sets used for model fitting

**3. Materials and methods**

88 Advances in Agrophysical Research

**4. Results and discussion**

ed in parentheses in Table 2.

and 104°

**Figure 3.** Empirical semivariogram and fitted models of logarithm-transformed copper content data.

Semivariance for sand content (Fig. 2) and logarithm transformed copper content (Fig. 3) are calculated and plotted as a function of lag interval. For the semivariance calculation of regu‐ larly spaced sand samples, the lag interval is 6 m, which is twice the minimum sampling interval. The maximum lag distance used for this calculation is 144 m and the minimum number of pairs used for the semivariogram calculation is 24. The semivariance of the cop‐ per content is calculated using minimum and maximum lag distance of 0.1 km and 2 km re‐ spectively, which leads to at least 20 pairs of data points. The experimental semivariograms are fitted with four most commonly used semivariogram models (Table 1) following weight‐ ed least square estimation. The fitted semivariogram models and the experimental semivar‐ iograms are shown in Fig. 2 and Fig. 3 for sand and copper, respectively. The optimized parameters (nugget, sill and range) values for each of the models are presented in Table 3. The nugget value optimized by different models for the sand content varies from 14.67 % (exponential model) to 18.13 % (Gaussian model). The value of sill varies from 36.40 % (line‐ ar plateau model) to 39.27 % (exponential model). The optimized range value that indicates the distance over which the processes are spatially dependent, for the sand content varies widely from 47.10 m (Gaussian model) to over 100 m (Spherical model). The variance (val‐ ues in the paresis in Table 3) associated with each parameters are approximated from the calculation of Hessian Matrix (Press et al., 1992). The variance associated with the parameter estimate is very high as the semivariance is calculated from the semivariance cloud, which spreads over a range of values. The nugget, sill, and range have their own physical signifi‐ cance and different models should result in the same set of parameter values. However, due to model uncertainty, four models have different set of optimized parameter values for the sand content. The range of optimized parameter values (Table 3) from different models clearly indicate how uncertain our models and parameters are.

AIC values for all four models are very similar for copper content (vary from -111.739 to -113.739). The similar AIC values for the models indicate that all four models are almost equivalent. Though the performance of the models is very comparable, the optimized pa‐ rameter values from different models are not similar. In this situation, the selection of pa‐

**Parameter Model Nugget Sill Range VN/VS RSS**


36.71 (231.66)

39.27 (1081.00)

36.57 (214.91)

**ln(mg2/kg2) ln(mg2/kg2) --km--**

0.553 (0.037)

0.548 (0.039)

0.544 (0.034)

0.543 (0.033)

36.40 (181.53) 74.37

100.94 (29610.00)

Model Averaging for Semivariogram Model Parameters

http://dx.doi.org/10.5772/52339

91

49.15 (32550.00)

47.10 (5233.00)

(8379.00)

0.480 (0.767)

0.132 (0.133)

0.171 (0.097)

0.271 (0.235) 0.438 0.021

0.374 0.021

0.495 0.040

0.455 0.026

0.183 0.067

0.049 0.061

0.224 0.061

0.164 0.063

rameter value from a particular model can be associated with high uncertainty.

(257.45)

(423.95)

(198.24)

(217.15)

(0.074)

(0.177)

(0.062)

(0.080)

**Table 3.** Optimized parameter values and their variance (in paresis) for different semivariogram models for sand

The model averaging is conducted to reduce parameter uncertainty. Weights are assigned to the models based on their performance and importance. The model with lowest AIC value indicates best model. Based on the smallest AIC, ΔAIC is calculated from the difference be‐ tween AIC value of a particular model and the smallest AIC value out of four models. The ΔAIC is used to measure the likelihood of each model. The value of the likelihood of models helps in assigning weights for different models. The spherical and the exponential models for the sand semivariogram have the highest weight, which are almost equal. This indicates an equivalent performance of these two models. The Gaussian and linear plateau models have very small weights, indicating less importance of the models than that of the other two. Whereas for copper, the weights assigned to different models are close enough to explain the performance as equivalent. Based on the weights assigned to each model, the average

Sand Spherical 16.08

Copper Spherical 0.101

content and copper

Exponential 14.67

Gaussian 18.13

Linear Plateau 16.57

Exponential 0.027

Gaussian 0.122

Linear Plateau 0.089

Similarly there are a wide range of optimized parameter values from different models for copper content. The optimized nugget values vary widely in different models (0.027 ln(mg2 /kg2 ) for exponential model to 0.122 ln(mg2 /kg2 ) for Gaussian model) (Table 3). Wide range of sill and range values clearly indicates the uncertainty associated with the parame‐ ters. Different parameter values for different models make the interpretation of the parame‐ ters and the semivariogram structures difficult and uncertain. The ratio of the nugget semivariance (VN) to the sill semivariance (VS) gives a measure of the strength or degree of spatial structure (Cambardella et al., 1994). Therefore, it is sometime difficult to explain the semivariogram structure of a particular property with different nugget and sill values for different models.

In this situation the goodness of fit for each model is calculated. The residual sum of squared error (RSS) between the experimental data and the model data is presented in Table 3. The comparable RSS values indicate that the four models are comparable. From the RSS value, the maximum likelihood is calculated for each model. Akaike Information Criterion (AIC) for different models is calculated from the maximum likelihood of the models and is presented in Table 4. The AIC values for the spherical (-167.269) and exponential models (-167.187) fitted for sand semivariogram are very close indicating a close performance of the models. The performance of the linear plateau and Gaussian models are also acceptable. The AIC values for all four models are very similar for copper content (vary from -111.739 to -113.739). The similar AIC values for the models indicate that all four models are almost equivalent. Though the performance of the models is very comparable, the optimized pa‐ rameter values from different models are not similar. In this situation, the selection of pa‐ rameter value from a particular model can be associated with high uncertainty.

Semivariance for sand content (Fig. 2) and logarithm transformed copper content (Fig. 3) are calculated and plotted as a function of lag interval. For the semivariance calculation of regu‐ larly spaced sand samples, the lag interval is 6 m, which is twice the minimum sampling interval. The maximum lag distance used for this calculation is 144 m and the minimum number of pairs used for the semivariogram calculation is 24. The semivariance of the cop‐ per content is calculated using minimum and maximum lag distance of 0.1 km and 2 km re‐ spectively, which leads to at least 20 pairs of data points. The experimental semivariograms are fitted with four most commonly used semivariogram models (Table 1) following weight‐ ed least square estimation. The fitted semivariogram models and the experimental semivar‐ iograms are shown in Fig. 2 and Fig. 3 for sand and copper, respectively. The optimized parameters (nugget, sill and range) values for each of the models are presented in Table 3. The nugget value optimized by different models for the sand content varies from 14.67 % (exponential model) to 18.13 % (Gaussian model). The value of sill varies from 36.40 % (line‐ ar plateau model) to 39.27 % (exponential model). The optimized range value that indicates the distance over which the processes are spatially dependent, for the sand content varies widely from 47.10 m (Gaussian model) to over 100 m (Spherical model). The variance (val‐ ues in the paresis in Table 3) associated with each parameters are approximated from the calculation of Hessian Matrix (Press et al., 1992). The variance associated with the parameter estimate is very high as the semivariance is calculated from the semivariance cloud, which spreads over a range of values. The nugget, sill, and range have their own physical signifi‐ cance and different models should result in the same set of parameter values. However, due to model uncertainty, four models have different set of optimized parameter values for the sand content. The range of optimized parameter values (Table 3) from different models

Similarly there are a wide range of optimized parameter values from different models for copper content. The optimized nugget values vary widely in different models (0.027

range of sill and range values clearly indicates the uncertainty associated with the parame‐ ters. Different parameter values for different models make the interpretation of the parame‐ ters and the semivariogram structures difficult and uncertain. The ratio of the nugget semivariance (VN) to the sill semivariance (VS) gives a measure of the strength or degree of spatial structure (Cambardella et al., 1994). Therefore, it is sometime difficult to explain the semivariogram structure of a particular property with different nugget and sill values for

In this situation the goodness of fit for each model is calculated. The residual sum of squared error (RSS) between the experimental data and the model data is presented in Table 3. The comparable RSS values indicate that the four models are comparable. From the RSS value, the maximum likelihood is calculated for each model. Akaike Information Criterion (AIC) for different models is calculated from the maximum likelihood of the models and is presented in Table 4. The AIC values for the spherical (-167.269) and exponential models (-167.187) fitted for sand semivariogram are very close indicating a close performance of the models. The performance of the linear plateau and Gaussian models are also acceptable. The

/kg2

) for Gaussian model) (Table 3). Wide

clearly indicate how uncertain our models and parameters are.

) for exponential model to 0.122 ln(mg2

ln(mg2

/kg2

90 Advances in Agrophysical Research

different models.


**Table 3.** Optimized parameter values and their variance (in paresis) for different semivariogram models for sand content and copper

The model averaging is conducted to reduce parameter uncertainty. Weights are assigned to the models based on their performance and importance. The model with lowest AIC value indicates best model. Based on the smallest AIC, ΔAIC is calculated from the difference be‐ tween AIC value of a particular model and the smallest AIC value out of four models. The ΔAIC is used to measure the likelihood of each model. The value of the likelihood of models helps in assigning weights for different models. The spherical and the exponential models for the sand semivariogram have the highest weight, which are almost equal. This indicates an equivalent performance of these two models. The Gaussian and linear plateau models have very small weights, indicating less importance of the models than that of the other two. Whereas for copper, the weights assigned to different models are close enough to explain the performance as equivalent. Based on the weights assigned to each model, the average value of the parameter and the variance associated with each parameter is estimated. Sand has an average nugget, sill and range value of 15.42 %, 37.92 % and 75.53 m respectively. The variance associated with each averaged parameters are presented in the paresis in Table 4. The higher weight of the spherical and exponential model indicates that two models have large contribution to the average value of parameters. Similarly, the variance of the aver‐ aged parameters will have large contribution from spherical and exponential models. Whereas for copper, the contribution to the average value by each models is comparable. The average value of nugget, sill and range for copper are 0.081 ln(mg2 /kg2 ), 0.546 ln(mg2 / kg2 ), and 0.219 km respectively. The variance of each averaged parameter is also calculated and presented in Table 4 (in paresis). The averaged variance reduced the uncertainty associ‐ ated with the parameters by taking the average.

makes the prediction of spatial processes difficult and uncertain. A model averaging procedure is

Model Averaging for Semivariogram Model Parameters

http://dx.doi.org/10.5772/52339

93

Two soil properties (sand content and copper content) are taken as examples for demon‐ strating the average parameter estimation procedure. The sand content is measured at regu‐ lar sampling interval along a transect, whereas the copper content is measured at variable sampling intervals in a two-dimensional field. The semivariogram is calculated for both properties and fitted with four most commonly used mathematical models (spherical, expo‐ nential, Gaussian and linear plateau). Weighted least square estimation is used for fitting these models to the experimental semivariogram. The goodness of fit for each model is cal‐ culated from their residual sum of squares. The parameter for each models are optimized during the fitting procedure. The likelihood of each model is calculated based on the Akaike Information Criterion (AIC) for each model. Different weights were assigned to each model based on their performance and importance from the AIC values. These weights are used for obtaining the weighted average of the optimized parameters. The weighted average of the estimated parameters reduced the uncertainty associated with the parameters and the bias in their selection. The average parameter values and reduced uncertainty provide more concise information about the spatial structure and consequently provide better guidance

The project was funded by a CSIRO Post Doctoral Fellowship and the University of Sas‐

2 Department of Soil Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada

[1] Akaike H (1973) Information theory as an extension of the maximum likelihood prin‐ ciple. In: Second International Symposium on Information Theory. Petrov, B.N., Csa‐

used to obtain the parameter value and to reduce model parameter uncertainty.

for sampling, experimental design, and interpolation and mapping.

**Acknowledgements**

katchewan.

**Author details**

**References**

Asim Biswas1,2\* and Bing Cheng Si2

\*Address all correspondence to: asim.biswas@csiro.au

1 Department of Natural Resource Sciences, McGill University, Canada

ki, F., Eds, Akademiai Kiado, Budapest, 267-281 p.


**Table 4.** Parameter values obtained through model averaging of commonly used models and their variance (in paresis).

When different models optimize parameters with a wide range of values, the weighted aver‐ age of the estimated parameter values provide more concise information and better under‐ standing about the parameters and thus the spatial structure. The variance of the optimised parameters clearly indicates the reduced uncertainty of the parameters. This weighted aver‐ age value of the parameters provides a better prediction about the underlying processes by reducing the uncertainty associated with the parameters and the bias in their selection. Bet‐ ter understanding of the parameters provides guidance for sampling and insights on experi‐ mental design.

## **5. Conclusion**

The performance of the commonly used semivariogram models, used for fitting the experimen‐ tal semivariogram, is comparable. Different models optimize a particular parameter differently, which indicates the uncertainly associated with the parameter. The uncertain parameter value makes the prediction of spatial processes difficult and uncertain. A model averaging procedure is used to obtain the parameter value and to reduce model parameter uncertainty.

Two soil properties (sand content and copper content) are taken as examples for demon‐ strating the average parameter estimation procedure. The sand content is measured at regu‐ lar sampling interval along a transect, whereas the copper content is measured at variable sampling intervals in a two-dimensional field. The semivariogram is calculated for both properties and fitted with four most commonly used mathematical models (spherical, expo‐ nential, Gaussian and linear plateau). Weighted least square estimation is used for fitting these models to the experimental semivariogram. The goodness of fit for each model is cal‐ culated from their residual sum of squares. The parameter for each models are optimized during the fitting procedure. The likelihood of each model is calculated based on the Akaike Information Criterion (AIC) for each model. Different weights were assigned to each model based on their performance and importance from the AIC values. These weights are used for obtaining the weighted average of the optimized parameters. The weighted average of the estimated parameters reduced the uncertainty associated with the parameters and the bias in their selection. The average parameter values and reduced uncertainty provide more concise information about the spatial structure and consequently provide better guidance for sampling, experimental design, and interpolation and mapping.

## **Acknowledgements**

value of the parameter and the variance associated with each parameter is estimated. Sand has an average nugget, sill and range value of 15.42 %, 37.92 % and 75.53 m respectively. The variance associated with each averaged parameters are presented in the paresis in Table 4. The higher weight of the spherical and exponential model indicates that two models have large contribution to the average value of parameters. Similarly, the variance of the aver‐ aged parameters will have large contribution from spherical and exponential models. Whereas for copper, the contribution to the average value by each models is comparable.

), and 0.219 km respectively. The variance of each averaged parameter is also calculated and presented in Table 4 (in paresis). The averaged variance reduced the uncertainty associ‐

**Parameter Model AIC value Weight Average parameter value**

Sand Spherical -167.269 0.4946 15.42 % 37.92 % 75.53 m

Exponential -167.187 0.4748 (335.79) (635.06) (31007.42)

km Exponential -113.703 0.3285 Gaussian -113.658 0.3212 (0.106) (0.036) (0.235)

**Table 4.** Parameter values obtained through model averaging of commonly used models and their variance (in paresis).

When different models optimize parameters with a wide range of values, the weighted aver‐ age of the estimated parameter values provide more concise information and better under‐ standing about the parameters and thus the spatial structure. The variance of the optimised parameters clearly indicates the reduced uncertainty of the parameters. This weighted aver‐ age value of the parameters provides a better prediction about the underlying processes by reducing the uncertainty associated with the parameters and the bias in their selection. Bet‐ ter understanding of the parameters provides guidance for sampling and insights on experi‐

The performance of the commonly used semivariogram models, used for fitting the experimen‐ tal semivariogram, is comparable. Different models optimize a particular parameter differently, which indicates the uncertainly associated with the parameter. The uncertain parameter value

kg2)

/kg2

**Nugget Sill Range**

0.546 ln(mg2/ kg2)

), 0.546 ln(mg2

0.219

/

The average value of nugget, sill and range for copper are 0.081 ln(mg2

Gaussian -151.279 0.0001 Linear Plateau -161.687 0.0303 Copper Spherical -111.739 0.1230 0.081 ln(mg2/

Linear Plateau -112.964 0.2270

ated with the parameters by taking the average.

kg2

92 Advances in Agrophysical Research

mental design.

**5. Conclusion**

The project was funded by a CSIRO Post Doctoral Fellowship and the University of Sas‐ katchewan.

## **Author details**

Asim Biswas1,2\* and Bing Cheng Si2


2 Department of Soil Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada

## **References**

[1] Akaike H (1973) Information theory as an extension of the maximum likelihood prin‐ ciple. In: Second International Symposium on Information Theory. Petrov, B.N., Csa‐ ki, F., Eds, Akademiai Kiado, Budapest, 267-281 p.

[2] Atteia O, Dubois JP, Webster R (1994) Geostatistical analysis of soil contamination in the Swiss Jura. Environ. poll. 86, 315-327.

[19] Hajrasuliha SW, Baniabassi N, Metthey J, Neilsen DR (1980) Spatial variability of soil

Model Averaging for Semivariogram Model Parameters

http://dx.doi.org/10.5772/52339

95

[20] Heuvelink GBM, Pebesma EJ (1999) Spatial aggregation and soil process modelling.

[21] Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging:

[22] Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics, Oxford

[23] Jian X, Olea RA, Yu Y (1996) Semivariogram modelling by weighted least squares.

[24] Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends ecol.

[25] Lambert DM, Lowenberg-Debeor J, Bongiovanni R (2004) A comparison of four spa‐ tial regression models for yield monitor data: a case study from Argentina. Pre‐

[26] Matheron G (1962) Traité de Géostatistique Appliqué, Tome 1. Memoires du Bureau

[27] McBratney AB, Bishop TFA, Teliatnikov IS (2000) Two soil profile reconstruction

[28] McBratney AB, Webster R (1986) Choosing functions for semi-variograms of soil

[29] Nielsen DR, Biggar JW, Erh KT (1973) Spatial variation of field measured soil water

[30] Nielsen DR, Wendroth O (2003) Spatial and temporal statistics: sampling field soil

[31] Pannatier Y (1996) VARIOWIN: Software for spatial data analysis in 2D. Springer-

[32] Papritz A, Dubois JP (1999) Mapping heavy metals in soil by (non-) linear kriging: an empirical validation. In: Gómez-Herńandez, J. et al. (Eds.) geoENV II-Geostatistics for environmental applications. Kluwer Academic Publ. Dordrecht, The Netherlands.

[33] Prakash MR, Singh VS (2000) Network for ground water monitoring-a case study.

[34] Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipies in C.

The art of scientific computing.Cambridge University press, NY.

properties and fitting them to sampling estimates. J. soil sci. 37, 617-639.

and their vegetation. Catena Verlag, Reiskirchen, Germany.

sampling for salinity studies in South-West Iran. Irrig. sci. 1, 197-208.

Geoderma 89, 47-65.

a tutorial. Statist. sci. 14, 382-417.

University Press, Toronto, Canada.

de RecherchesGéologiquesetMiniéres, Paris.

techniques. Geoderma 97, 209–221.

properties. Hilgardia 42, 214-259

Comp. Geosci. 22, 387-397.

evol. 19, 101-108.

cis.agr. 5, 579-600.

Verlag, NY.

429-440 p.

Environ. geol. 39, 628-632.


[19] Hajrasuliha SW, Baniabassi N, Metthey J, Neilsen DR (1980) Spatial variability of soil sampling for salinity studies in South-West Iran. Irrig. sci. 1, 197-208.

[2] Atteia O, Dubois JP, Webster R (1994) Geostatistical analysis of soil contamination in

[3] Burgess TM, Webster R (1980) Optimal interpolation and isarithmic mapping of soil properties. I. The semivariogram and punctual kriging. J. soil sci. 31, 315-331.

[4] Burnham KP, Anderson DR (2002) Model selection and multimodal inference: A

[5] Burrough PA (1983) Multiscale sources of spatial variation in soil. I. The application of fractal concepts of nested levels of soil variogram. J. soil. sci. 34: 577-597.

[6] Cambardella CA, Moorman TB, Novak JM, Parkin TB, Karlen DL, Turco RF, Konop‐ ka AE (1994) Field scale variability of soil properties in central Iowa soils. Soil sci.

[7] Carvalho JRP, Silveira PM, Vieira SR (2002) Geostatistics in determining the spatial variability of soil chemical characteristics under different preparations. Pesq. agro‐

[8] Corwin DL, Hopmans J, de Rooij GH (2006) From Field- to Landscape-Scale Vadose Zone Processes: Scale Issues, Modeling, and monitoring. Vadose zone j. 5, 129-139.

[9] Cressie N (1985) Fitting variogram models by weighted least squares. J. int. assoc.

[12] Deutsch CV, Journel AG (1998) GSLIB Geostatistical software library and user's

[13] Fagroud M, Van Meirvenne M (2002) Accounting for soil spatial correlation in the

[14] Foussereau X, Hornsby AG, Brown RB (1993) Accounting for variability within map units when linking a pesticide fate model to soil survey. Geoderma 60, 257-276.

[15] Gee GW, Bauder JW (1986) Particle size analyses. In: A. Klute (Eds.) Method of soil analyses. Part 1: Physical and Mineralogical Methods. ASA, Madison, Wisconsin,

[16] Goderya FS (1998) Field scale variations in soil properties for spatially variable con‐

[17] Goovaerts P (1997) Geostatistics for natural resources evaluation.Oxford Uni. Press.,

[18] Goovaerts P (1998) Geostatistical tools for characterizing the spatial variability of mi‐ crobiological and physio-chemical soil properties. Biol. fertil.soils 27, 315-334.

[10] Cressie NAC (1991) Statistics for spatial data. John Wiley & Sons, NY, 900 p.

[11] David M (1977) Geostatistical Ore reserve estimation. Elsevier, Amsterdam.

design of experimental trails. Soil sci. soc.am. j. 66, 1134-1142.

the Swiss Jura. Environ. poll. 86, 315-327.

soc.am. j. 58, 1501-1511.

94 Advances in Agrophysical Research

pec. bras. 37, 1151-1159.

math. geol. 17, 563-586.

USA.

NY.

guide. 2nd eds. Oxford Uni. Press, NY.

trol: a review. J. soil contam. 7, 243-264.

Practical Information-Theoretic approach, Springer.


[35] Si BC, Kachanoski RG, Reynolds WD (2007) Analysis of soil variability. In:E.G. Gre‐ gorich (Eds.) Soil sampling and methods of analysis. 1163-1191 p.

**Chapter 5**

**Scale Dependence and Time Stability of Nonstationary**

Soil water is one of the most important limiting factors for semi-arid agricultural production and a key element in environmental health. It controls a large number of surface and subsurface hydrological processes that are critical in understanding a broad variety of natural processes (geomorphological, climatic, ecological) acting over a range of spatio-temporal scales (Entin et al. , 2000). Knowledge on the behavior of soil water storage and its spatiotemporal distribution provides essential information on various hydrologic, climatic, and general circulation models (Beven, 2001; Western et al. , 2002), weather prediction, evapo‐ transpiration and runoff (Famigleitti and Wood, 1995), precipitation (Koster et al. , 2004) and

The distribution of soil water in the landscape is the response of a number of highly het‐ erogeneous factors and processes acting in different intensities over a variety of scales (Goovaerts, 1998; Entin et al. , 2000). The heterogeneity in factors and processes make the spatial distribution of soil water highly heterogeneous in space and time and create a challenge in hydrology (Quinn, 2004). Therefore, a large number of samples are needed in order to characterize the field averaged soil water with certain level of precision. Howev‐ er, if a field or watershed is repeatedly surveyed for soil water, some sites or points are consistently wetter or consistently drier than the field average. Vachaud et al. (1985) were the first to examine the similarity of the spatial pattern in soil water storage over time and termed this phenomenon time stability. The time stability is defined as a time invariant association between spatial location and classical statistical measures of soil water, most

> © 2013 Biswas and Si; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2013 Biswas and Si; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Soil Water Storage in a Hummocky Landscape Using**

**Global Wavelet Coherency**

Additional information is available at the end of the chapter

atmospheric variability (Delworth and Manabe, 1993).

Asim Biswas and Bing Cheng Si

http://dx.doi.org/10. 5772/52340

**1. Introduction**

