**2.3 Modeling procedures**

Quantitative soil properties, such as clay and sand content, can be estimated along the depth of soil profile through slice-wise algorithm [16] implemented in R software through algorithm for quantitative pedology (AQP) package [46]. The functions calculate an estimative of the main trend of values, represented by median and the variation interval by quartiles (25th and 75th of data distribution). The values shown in a column corresponds to percentage of soil profiles contributing with the function at each depth.

Initially, Pearson's linear correlation analysis was used to measure the linear association between variables, to determine among the soil properties defined by AQP, which correlated with environmental covariates (**Table 1**). This analysis was implemented in R [46], through the function cor.test, according to [42, 44, 47]. In Pearson's correlation the p-value defines whether or not two variables are statistically correlated, and for this study, it was defined that values lower than 0.02 indicate that the correlation is significant.

The modeling procedure to execute the random forest (RF) prediction was performed in R software, through randomForest (RF). RF is a nonparametric technique developed by Breiman [48] as an extension of classification and regression tree (CART) systems, to improve the performance of the predictors. To implement the RF models, three parameters are necessary: number of trees in the forest (ntree); minimum amount of data in each terminal node (nodesize); and number of covariates used in each tree (mtry) [49]. The ntree value was set to system default (500), although more stable results can be achieved with a larger number [50]. The


#### **Table 1.**

*Covariates used in spatial prediction of soil properties.*

At the study site, a portion of the original vegetation has been removed, and signs of land degradation such as bare soils are common. The landscape is mainly compounded by flat surfaces, with maximum slopes of 8%, and plateaus are common in the region. Geology comprises of limestone rocks from the caatinga formation, gneiss-granite rocks of the Caraiba-Paramirim complex, and recent colluvium/

*Study area and location of soil profiles, over Landsat image (band 3).*

*Multifunctionality and Impacts of Organic and Conventional Agriculture*

The soils types in the northeast region of Brazil show large variation, according to parent material and relief, from shallow and with high content of basic cations (Ca, Mg, K, and Na) to deep and leached profiles [34]. Dominant soil classes according to the World Reference Base for Soil Resources [35] are Vertisols, Cambisols, and Planosols, with smaller extension of Regosols, Acrisols, and

The soil properties used in this study were sand, clay, and cation exchange capacity (CEC), determined according to [36]. The selection of soil properties was based on two main conditions: (i) presenting variability according to depth and ii) having importance to agricultural management (e.g., clay or sodium (Na) content). From the original dataset, with 523 profiles, the ones with complete morpholog-

ical description and analytical data were selected, performed by *Companhia de Desenvolvimento do Vale do São Francisco* (CODEVASF) in 1989 [29]. So, the input dataset comprised of the topsoil (0–20 cm) and subsoil (20–60 cm) layers for 290

according to the requirements of national soil survey service, correspondent to the

soil profiles from a soil survey (legacy data). The sampling was performed

alluvium sediments [33].

**2.2 Soil input data and covariates**

detailed soil survey level [37, 38].

Luvisols.

**42**

**Figure 1.**

nodesize value was set to 5 for each terminal node, as usually selected in regression studies. The mtry value chosen in this study was according to Liaw and Wiener [49], which proposes an amount corresponding to one third of the total number of predictor variables for regression problems.

Although Na showed a significant correlation at the two depths, according to the Pearson correlation analysis, the preliminary results using the random forest (RF) model were very unsatisfactory. Thus, exceptionally for this property, the RF model has been replaced by ordinary kriging (OK). Semivariograms were used to analyze the spatial structure of the Na, and to generate predictive maps, in both depths. The OK was performed in R software, through krige function [46]. OK model is the most familiar type of kriging and provides an accurate estimate for an area around a measure sample [51].

The model's performance was evaluated based on independent validation set, which was not used in the training procedure. Thereby, the 290 soil samples were randomly divided into 2 independent datasets in the R software; one of these was used in the training process (200 soil samples) and another for the validation process (90 soil samples). The analysis of the model's performance was based on the correlation between the measured values (validation samples) and estimated values, calculated by the coefficient of determination (R2 ), the root mean square error (RMSE), and mean error (ME), presented as Eqs. (1) and (2):

$$RMSE = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} d\_i^2} \tag{1}$$

$$ME = \frac{1}{n} \sum\_{i=1}^{n} d\_i \tag{2}$$

conservatively high number, analyze the resulting clusters, and rerun the function

*Pedometric Tools Applied to Zoning Management of Areas in Brazilian Semiarid Region*

The AQP package allows to gather a set of functions to work and to analyze large soil profile collections. Depth functions of soil key properties used to distinguish the profiles are presented in **Figure 2** and **Figure 3**. The percentage values plotted along the profile shows the relative quantity of profiles used in the soil-depth function to

The study area presents shallow soils, sometimes with high base saturation; but the main limitation for agricultural use is the soil texture and high clay activity, which will influence the soil moisture to adequately manage the soils due to their

The pH and the calcium (Ca) content presented a linear trend of median values along depth (**Figure 2**), even though a smaller number of soil profiles (less than 25%) were used in the estimative for deeper than 150 cm depth. However, different patterns were shown for potassium (K), sodium (Na), electrical conductivity (EC), and cation exchange capacity (CEC). The soil key properties Ca, K, and CEC showed more variability around the median value, thus presenting a better predictive potential to distinguishing soils with different parent materials and high clay

It is pertinent to select which soil key properties could indicate differences in soil behavior along depth and are relevant for land management. From the analyses of **Figure 3**, it is possible to conclude that clay and sand contents have an opposite and large variability among the profiles of the collection; thus, they have great potential

to aid in the definition of soil management zones. Since both properties are

*Depth functions for soil key chemical properties. EC, electrical conductivity; CEC, cation exchange capacity.*

with a reduced number of classes [53].

*DOI: http://dx.doi.org/10.5772/intechopen.88526*

**3.1 Algorithm for quantitative pedology (AQP)**

calculate the statistics (median and quartiles) at each depth.

**3. Results and discussion**

high plasticity.

activity, respectively.

**Figure 2.**

**45**

where "d" is the difference between the observed and estimated values and "n" is the number of samples used in the validation process.

The RMSE is a measure of the overall error of the estimation and commonly is used to estimate the error or uncertainty in places where the error was not measured directly; thereby, the higher the values of RMSE, the greater the differences between the datasets [52]. The ME gives the bias and allows evaluation of overestimation (positive values) or underestimation (negative values); values close to zero are preferable.

#### **2.4 Definition of management zones**

Management zones were defined in this study according to potential for agriculture, considering variability of soil key properties along profile depth, importance of soil properties for the land management, and the performance of the models to predict the spatial variation of the properties. Based on the maps for the selected soil key properties, an unsupervised classification was performed by using a series of input raster bands (Na, CEC, clay, and sand) using the Iso Cluster and Maximum Likelihood Classification tools from ArcGIS Desktop 10.3.

The Iso Cluster tool uses a modified iterative optimization clustering procedure, also known as the migrating means technique. The algorithm separates all cells into the user-specified number of distinct unimodal groups in the multidimensional space of the input bands; the iso prefix of the isodata clustering algorithm is an abbreviation for the iterative self-organizing way of performing clustering. In the clustering process, during each iteration, all samples are assigned to existing cluster centers, and new means are recalculated for every class. The optimal number of classes to specify is usually unknown. Therefore, it is advised to enter a

conservatively high number, analyze the resulting clusters, and rerun the function with a reduced number of classes [53].
