*3.2.1 Pearson's correlation*

The Pearson correlation coefficients (**Table 2**) showed that in general, the environmental covariates were significantly correlated with all soil properties analyzed (p < 0.05). The sand content was significantly correlated with covariates b5, b7,


#### **Table 2.**

*Pearson's correlation (p-value) between soil properties (sand, clay, CEC, and Na) and environmental covariates (b1, b2, b3, b4, b5, b7, NDVI, b3/2, b3/b7, b5/b7, and GSI).*

NDVI, b3/b7 ratio, and GSI (**Table 2**). These results do not agree with [44], who found no correlation between sand content and the Landsat 5 TM image data. Only the covariate b3/b7 was strongly correlated (r = 0.58 and 0.49, topsoil and subsoil, respectively) with the sand content [54], whereas the NDVI was moderately correlated (r = 0.39 and 0.36, topsoil and subsoil, respectively) and the other covariates weakly correlated, with r values below 0.27 (positive or negative) (**Table 3**).

Inverse relationships were observed between clay content and the environmental covariates, but magnitudes were the same as those observed for sand (**Table 2**), except for the covariate b4 (topsoil) which had no correlation with sand; similar results were reported by [42, 55]. The most relevant covariates were b3/b7 (r = 0.56 and 0.51, topsoil and subsoil, respectively) and NDVI (r = 0.36 and 0.37, topsoil and subsoil, respectively). These results vary among authors in the literature. Significant correlations were obtained between clay, the NDVI index, and b3/b2 and b5/b7 band ratios by [44], while there was no correlation between clay and the b3/b7 ratio band. On the other hand, Ahmed and Iqbal [56] found significant correlations between clay and bands 4 and 6 using Landsat 5 TM.

The CEC was significantly correlated with the covariates b4, b5, b7, NDVI, and b3/b7 and b5/b7 ratios (**Table 2**). Only the covariate b3/b7 was strongly correlated (r = 0.45 and 0.44, topsoil and subsoil, respectively) with the CEC [54], whereas the other covariates were weakly correlated, with r values below 0.27 (positive or negative) (**Table 3**). A strong correlation was observed by [55] between the CEC and covariates measured by ASTER spectral bands (1–8). The Na content was significantly correlated with most of the covariates, except with b7 (topsoil) and b3/ b2 and b5/b7 ratios (**Table 2**). However, none of the covariates had a strong or moderate correlation with Na content (**Table 3**), which may explain the very low performance of the random forest model in the prediction of this soil property.

The study performed by Demattê et al. [57] highlighted that the correlation with a particular spectral band is directly related with soil characteristics in specific regions, explaining the differences between study cases.


#### **Table 3.**

*Pearson's correlation coefficient (r) between soil properties (sand, clay, CEC, and Na) and environmental covariates (b1, b2, b3, b4, b5, b7, NDVI, b3/2, b3/b7, b5/b7, and GSI).*

important to irrigation and mechanization practices, they were selected as criteria

*Depth functions for soil key physical properties. PWP, permanent wilting point.*

*Multifunctionality and Impacts of Organic and Conventional Agriculture*

**Variables Soil properties**

The Pearson correlation coefficients (**Table 2**) showed that in general, the environmental covariates were significantly correlated with all soil properties analyzed (p < 0.05). The sand content was significantly correlated with covariates b5, b7,

Depth (cm) 0–20 20–60 0–20 20–60 0–20 20–60 0–20 20–60 b1 0.303 0.212 0.236 0.599 0.289 0.667 0.001 0.000 b2 0.198 0.321 0.431 0.177 0.848 0.723 0.001 0.000 b3 0.281 0.396 0.631 0.274 0.360 0.687 0.001 0.000 b4 0.037 0.040 0.009 0.072 0.002 0.016 0.019 0.000 b5 0.000 0.001 0.000 0.002 0.000 0.001 0.099 0.001 b7 0.000 0.002 0.000 0.003 0.000 0.000 0.152 0.001 NDVI 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 b3/b2 0.267 0.218 0.712 0.291 0.028 0.051 0.108 0.041 b3/b7 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 b5/b7 0.702 0.990 0.274 0.718 0.000 0.001 0.617 0.082 GSI 0.000 0.000 0.000 0.000 0.649 0.493 0.010 0.000

*Pearson's correlation (p-value) between soil properties (sand, clay, CEC, and Na) and environmental*

*covariates (b1, b2, b3, b4, b5, b7, NDVI, b3/2, b3/b7, b5/b7, and GSI).*

**Sand Clay CEC Na**

for agricultural zoning.

**Figure 3.**

**Table 2.**

**46**

**3.2 Predictive models**

*3.2.1 Pearson's correlation*

### *3.2.2 Predictive models*

In the training process, the RF model only maintains the covariates that had moderate or strong correlation with the soil properties (**Table 3**). The results obtained by the predictive models (RF and OK) using an independent dataset for validation (90 samples) are illustrated in **Figure 4**. In this study, the results obtained by RF models in the topsoil layer were higher than those obtained by

subsoil for sand (R<sup>2</sup> = 0.58 and 0.44 and RMSE = 91.95 and 90.32 g kg<sup>1</sup>

73.45 g kg<sup>1</sup>

19.67 g kg<sup>1</sup>

RMSE = 0.18 and 0.37 cmolc kg<sup>1</sup>

*DOI: http://dx.doi.org/10.5772/intechopen.88526*

Topsoil

Subsoil

**Table 4.**

**49**

subsoil, respectively) and clay (R2 = 0.53 and 0.45 and RMSE = 75.71 and

*Pedometric Tools Applied to Zoning Management of Areas in Brazilian Semiarid Region*

topsoil and subsoil, respectively) and Na content (R<sup>2</sup> = 0.15 and 0.26 and

study for the topsoil and subsoil layers (**Figure 4**). Lower Varex values were

**Statistics Dataset Predictive model**

Min 75 133 5.71 0.01 175 235 12.30 0.14 Max 802 640 61.41 3.59 626 594 54.33 0.57 Mean 334 449 33.29 0.37 338 449 34.27 0.34 SD 144 113 11.91 0.37 103 78 8.52 0.07 CV 43 25 36 100 30 17 25 206

Min 100 160 6.59 0.01 165 204 10.69 0.17 Max 672 673 62.15 7.99 576 623 51.28 1.26 Mean 308 484 33.99 0.71 308 483 35.24 0.65 SD 125 102 11.54 0.69 85 76 7.86 0.21 CV 41 21 34 97 28 16 22 32

*Min, minimum; Max, maximum; SD, standard deviation; CV, coefficient of variation*

*Descriptive statistics from soil samples and predictive models for topsoil and subsoil.*

**Sand Clay CEC Na Sand Clay CEC Na g kg<sup>1</sup> cmolc kg<sup>1</sup> g kg<sup>1</sup> cmolc kg<sup>1</sup>**

The Varex obtained for CEC using the validation samples was 47% (R2 = 0.47) for topsoil layer and 59% (R2 = 0.59) for subsoil layer. The goodness of fit estimated by RMSE was 8.62 and 7.47 cmolc kg<sup>1</sup> (topsoil and subsoil, respectively). Regarding the model's performance, the results can be considered satisfactory; besides that, few studies in literature have used RF to predict soil CEC, and no one has used only remote sensing data as main covariates. The present results showed worst performance from statistical indexes when compared with those obtained by Lagacherie et al. [61], which reached 79% for Varex for the layer between 15 and 30 cm and 3.4 cmolc kg<sup>1</sup> for RMSE, using as input data terrain attributes and hyperspectral data in the visible and near infrared (AISA-Dual) with 5 m of spatial

) and clay (13.11–13.59 g kg<sup>1</sup>

reported by [60] for sand (33–35%) and clay (31–35%).

than topsoil for CEC (R<sup>2</sup> = 0.44 and 0.59 and RMSE = 8.62 and 7.47 cmolc kg<sup>1</sup>

, topsoil and subsoil, respectively). The results were higher in subsoil

The results achieved by RF models were superior (topsoil) or similar (subsoil) to [58] that used terrain properties derived from a digital elevation model, for sand (variance explained = 30%) and clay (variance explained = 43%). In a study in Nigeria [59], percentages of variance explained (Varex) for RF models were of 48– 49% for sand and 53–56% for clay in the top soil layer (0–15 cm). These values are inferior or similar to those obtained in this study for sand, similar for clay in the topsoil and superior in the subsoil. Therefore, the RMSE results for sand (19.26–

, topsoil and subsoil, respectively) (**Figure 4**).

) were lower than those in the present

**Random forest Ordinary kriging**

, topsoil and

,

**Figure 4.**

*Results obtained by prediction models using independent validation samples. (a) Topsoil sand—RF; (b) subsoil sand—RF; (c) topsoil clay—RF; (d) subsoil clay—RF; (e) topsoil CEC—RF; (f) subsoil CEC—RF; (g) topsoil Na—OK; (h) subsoil sand—OK.*

subsoil for sand (R<sup>2</sup> = 0.58 and 0.44 and RMSE = 91.95 and 90.32 g kg<sup>1</sup> , topsoil and subsoil, respectively) and clay (R2 = 0.53 and 0.45 and RMSE = 75.71 and 73.45 g kg<sup>1</sup> , topsoil and subsoil, respectively). The results were higher in subsoil than topsoil for CEC (R<sup>2</sup> = 0.44 and 0.59 and RMSE = 8.62 and 7.47 cmolc kg<sup>1</sup> , topsoil and subsoil, respectively) and Na content (R<sup>2</sup> = 0.15 and 0.26 and RMSE = 0.18 and 0.37 cmolc kg<sup>1</sup> , topsoil and subsoil, respectively) (**Figure 4**).

The results achieved by RF models were superior (topsoil) or similar (subsoil) to [58] that used terrain properties derived from a digital elevation model, for sand (variance explained = 30%) and clay (variance explained = 43%). In a study in Nigeria [59], percentages of variance explained (Varex) for RF models were of 48– 49% for sand and 53–56% for clay in the top soil layer (0–15 cm). These values are inferior or similar to those obtained in this study for sand, similar for clay in the topsoil and superior in the subsoil. Therefore, the RMSE results for sand (19.26– 19.67 g kg<sup>1</sup> ) and clay (13.11–13.59 g kg<sup>1</sup> ) were lower than those in the present study for the topsoil and subsoil layers (**Figure 4**). Lower Varex values were reported by [60] for sand (33–35%) and clay (31–35%).

The Varex obtained for CEC using the validation samples was 47% (R2 = 0.47) for topsoil layer and 59% (R2 = 0.59) for subsoil layer. The goodness of fit estimated by RMSE was 8.62 and 7.47 cmolc kg<sup>1</sup> (topsoil and subsoil, respectively). Regarding the model's performance, the results can be considered satisfactory; besides that, few studies in literature have used RF to predict soil CEC, and no one has used only remote sensing data as main covariates. The present results showed worst performance from statistical indexes when compared with those obtained by Lagacherie et al. [61], which reached 79% for Varex for the layer between 15 and 30 cm and 3.4 cmolc kg<sup>1</sup> for RMSE, using as input data terrain attributes and hyperspectral data in the visible and near infrared (AISA-Dual) with 5 m of spatial


#### **Table 4.**

*Descriptive statistics from soil samples and predictive models for topsoil and subsoil.*

*3.2.2 Predictive models*

**Figure 4.**

**48**

*Na—OK; (h) subsoil sand—OK.*

In the training process, the RF model only maintains the covariates that had moderate or strong correlation with the soil properties (**Table 3**). The results obtained by the predictive models (RF and OK) using an independent dataset for validation (90 samples) are illustrated in **Figure 4**. In this study, the results obtained by RF models in the topsoil layer were higher than those obtained by

*Multifunctionality and Impacts of Organic and Conventional Agriculture*

*Results obtained by prediction models using independent validation samples. (a) Topsoil sand—RF; (b) subsoil sand—RF; (c) topsoil clay—RF; (d) subsoil clay—RF; (e) topsoil CEC—RF; (f) subsoil CEC—RF; (g) topsoil*

**Figure 5.**

*Spatial distribution of soil physical properties estimated by the RK models. (a) Topsoil sand; (b) subsoil sand; (c) topsoil clay; (d) subsoil clay.*

resolution. The difference from this study may be related to the coarser spatial resolution of images from Landsat 5 (30 m), in comparison with Lagacherie et al. [61] who used hyperspectral data (5 m). The influence of spatial resolution in the prediction of soil properties is reported by other studies [62, 63].

The RF model produced predicted values for sand, clay, and CEC within the range of the original values (**Table 4**), with a smaller standard deviation and coefficient of variation, as expected for this model [64]. The same was true for Na content in the OK model. However, the map produced presented a smaller range of values than the original dataset, which, according to [65], means that this model had low accuracy to describe the spatial variation of Na content, corroborating results found when predicting this property at both depths. The mean values for sand, clay, CEC, and Na content in both depths (**Table 4**) are closer to the values of the original dataset. The CV for both models was smaller than the CV from the

*Spatial distribution of soil chemical properties estimated by the RF and OK models. (a) Topsoil CEC;*

*Pedometric Tools Applied to Zoning Management of Areas in Brazilian Semiarid Region*

*DOI: http://dx.doi.org/10.5772/intechopen.88526*

These results could be explained by the moderate correlation of soil properties with the covariates and predominance of short-scale variations that could not be modeled from the set of profiles used. The results were considered satisfactory, except for Na content, and they can be ascribed to the physical interference of soil properties in the incident and reflected energy. However, the quantification of soil properties using an orbital sensor is not an easy task due to complexity of soil

original dataset, except for Na content in the topsoil.

*(b) subsoil CEC; (c) topsoil Na content; (d) subsoil Na content.*

**Figure 6.**

**51**

However, in this study the performance of metrics for CEC prediction was better when compared to the values (9.49 cmolc kg<sup>1</sup> ) achieved by [60]. The poor performance was explained by the authors as due to the small-scale variation of parental material and erosion/deposition rates, which were not captured by the spatial resolution of the covariates (100 m). The authors also highlighted importance of input dataset to improve the models' performance.

The semivariogram obtained for Na content (**Figure 4**) provides a description of the spatial dependence and indicates processes related with the spatial distribution [65]. For both depths, the best semivariogram was the exponential model. The semivariograms for topsoil and for subsoil (**Figure 4**) used to estimate the Na content present a R2 of 0.26 and RMSE of 0.37 cmolc kg<sup>1</sup> and R2 of 0.15 and RMSE of 0.18 cmolc kg<sup>1</sup> , respectively. The descriptive statistics of soil properties prediction for the predictive models are presented in **Table 4**.

*Pedometric Tools Applied to Zoning Management of Areas in Brazilian Semiarid Region DOI: http://dx.doi.org/10.5772/intechopen.88526*

**Figure 6.**

resolution. The difference from this study may be related to the coarser spatial resolution of images from Landsat 5 (30 m), in comparison with Lagacherie et al. [61] who used hyperspectral data (5 m). The influence of spatial resolution in the

*Spatial distribution of soil physical properties estimated by the RK models. (a) Topsoil sand; (b) subsoil sand;*

However, in this study the performance of metrics for CEC prediction was

performance was explained by the authors as due to the small-scale variation of parental material and erosion/deposition rates, which were not captured by the spatial resolution of the covariates (100 m). The authors also highlighted impor-

The semivariogram obtained for Na content (**Figure 4**) provides a description of the spatial dependence and indicates processes related with the spatial distribution [65]. For both depths, the best semivariogram was the exponential model. The semivariograms for topsoil and for subsoil (**Figure 4**) used to estimate the Na content present a R2 of 0.26 and RMSE of 0.37 cmolc kg<sup>1</sup> and R2 of 0.15 and RMSE

, respectively. The descriptive statistics of soil properties predic-

) achieved by [60]. The poor

prediction of soil properties is reported by other studies [62, 63].

*Multifunctionality and Impacts of Organic and Conventional Agriculture*

better when compared to the values (9.49 cmolc kg<sup>1</sup>

of 0.18 cmolc kg<sup>1</sup>

**50**

**Figure 5.**

*(c) topsoil clay; (d) subsoil clay.*

tance of input dataset to improve the models' performance.

tion for the predictive models are presented in **Table 4**.

*Spatial distribution of soil chemical properties estimated by the RF and OK models. (a) Topsoil CEC; (b) subsoil CEC; (c) topsoil Na content; (d) subsoil Na content.*

The RF model produced predicted values for sand, clay, and CEC within the range of the original values (**Table 4**), with a smaller standard deviation and coefficient of variation, as expected for this model [64]. The same was true for Na content in the OK model. However, the map produced presented a smaller range of values than the original dataset, which, according to [65], means that this model had low accuracy to describe the spatial variation of Na content, corroborating results found when predicting this property at both depths. The mean values for sand, clay, CEC, and Na content in both depths (**Table 4**) are closer to the values of the original dataset. The CV for both models was smaller than the CV from the original dataset, except for Na content in the topsoil.

These results could be explained by the moderate correlation of soil properties with the covariates and predominance of short-scale variations that could not be modeled from the set of profiles used. The results were considered satisfactory, except for Na content, and they can be ascribed to the physical interference of soil properties in the incident and reflected energy. However, the quantification of soil properties using an orbital sensor is not an easy task due to complexity of soil

dynamics and formation [66]. The spatial distribution of soil properties according to the predictive models is shown in **Figures 5** and **6**.

The conditions to define the agricultural zones have to be adjusted according to available data and heterogeneity of soil properties in the study area, in this case

Zone 1 presents the greater amount of Na in both layers, and it is associated with greater values of CEC as indicated by the mean values, than Zones 2 and 3. The same condition is verified for clay content, where Zone 1 has the greatest mean value in both layers (0–20 and 20–60 cm). As expected, sand content shows the

*Pedometric Tools Applied to Zoning Management of Areas in Brazilian Semiarid Region*

located in a semiarid region.

*DOI: http://dx.doi.org/10.5772/intechopen.88526*

**Figure 7.**

**53**

*Zoning of the study area according to selected soil key properties.*
