**2. Materials and methods**

#### **2.1 Study area**

The study was conducted on the sub-Saharan mixed ecoregion, highly dependent on varied annual precipitations (AP) and yearly medium to high average temperatures (T<sup>0</sup> ), which both influence relative humidity (RH) changes. The area belongs to the medium Cameroon (central Africa), between latitudes 50 0'-80 5'N and longitudes 10<sup>0</sup> 00 -15<sup>0</sup> 50 E, a climatic transition between the agroecological zones of western highlands (AP = 1800-2500 mm; T0 = 19.5°C; RH = 75%), bi-modal rainforest (AP = 1700-2000 mm; T0 = 24°C; RH = 80%), and then Guinean high savanna (AP = 1500–1800 mm; T<sup>0</sup> = 30°C; RH = 60%) to Sudano-Sahelian savanna (AP = 400– 1200 mm; T0 = 28°C; RH = 50%) for the core area. Specifically, the vegetation density broadly reflects the climate gradient of dense moist broadleaf forest and highland forest to sparse extensive savannas featuring the co-dominance of woody shrubs, grassland in plains, and herbaceous steppe at the edge of Sahel (**Figure 1**) [22].

#### **2.2 Data and working environment description**

The study was conducted using European Spatial Agency, ESA, Sentinel2-A multispectral instrument (MSI) data, which represents a very valuable opportunity for the fine characterization and monitoring of vegetation types on large scales, but is poorly investigated for the tropical biome study [11, 23]. This sensor provides 13 varied spectral bands from 0.443 to 2.190 micrometers, a 10-day repeat cycle, and a spatial resolution up to 10 meters (Additional material 1).

The phenological dry season, globally from November to March, was selected because of its high temperatures and less rainfalls, assuming they are ideal conditions to observe the vegetation adaptations to extremes, for years 2015 to 2021. In the GEE cloud coding environment and using the JavaScript opensource simplified coding, the median reducer function was appended as the pixel-wise computation of all bi-annual collection images, based on a band per band processing [24]. Then, applying a date filter from November 15 to end of March 31, a boundary filter for the study area, and a cloud cover acceptance filter below 10%, one image of 13 bands was outputted per biannual periods 2015–2016 to 2020–2021, displayed a,nd converted from 16 bits to 8 bits before further processings. Offline tools, i.e., desktop software, were used to extract statistics and for some complementary processings using less memory, including final layouts. Namely, Erdas Imagine 2020, ArcGIS Pro 2020, and Microsoft Excel with extension Xlstat were specifically used.

#### **Figure 1.**

*Location of the study: (A) Cameroon in the context of African ecoregions based on Olson et al. (2001); (B) distribution of ecoregions in Cameroon; (C) preview of the subset ecoregions; (D) Sentinel2-a median image of the subset for mid-November 2020 to end march 2021.*

### **2.3 Inter-seasonal dynamics, spectral assumptions, and casting of indices**

In arid and semi-arid regions, common changes in density, spatial distribution, chlorophyll, pigmentation, water stress, anthocyanin, nitrogen, carotenoids, leaf structure, and browning or senescence differently impact the biomass [25]. Previous spectral indices-based applications investigated that, the visible (0.4–0.7 μm) wavelengths respond to photosynthetic and non-photosynthetic pigments, the NIR (0.7– 1.4 μm) wavelengths respond to the cellular structure and exhibit solar-induced fluorescence (SIF) and the SWIR (1.4–3 μm) wavelengths respond to senescent nonphotosynthetic vegetation. As such, the anisotropic behavior of vegetation at visible-SWIR wavelengths has been parameterized to describe vegetation structure [26].

We computed twelve spectral indices, whose three were selected as reference data according to their ability to better highlight the land cover targeted, i.e., second modified soil adjusted vegetation index, MSAVI2 [27], to assess the vegetation cover, normalized difference soil drought index NDSoDI [14], to highlight the dry bare soils, and new water index, NWI [28], to map the surface water. After testing dozens of other indices, two assumptions were emitted for an efficient last casting, such as *i) two or more indices targeting the same anomaly, with different or improved (increase/decrease)* *Dynamics, Anomalies and Boundaries of the Forest-Savanna Transition: A Novel Remote… DOI: http://dx.doi.org/10.5772/intechopen.105074*

*spatial patterns, express new information to be consider*; *ii) two or more indices targeting a different anomaly should show different spatial patterns to be considered enough informative*. Accordingly, nine vegetation indices were selected and computed, i.e., photosynthetic vigor index (PVR), global vegetation moisture index (GVMI), plant biochemical index (PBI), first red-edge inflection point (REIP1), modified chlorophyll reflectance index green (mCRIG), leaf water content index (LWCI), modified anthocyanin reflectance index (MARI), simple ratio pigment index (SRPI), and plant senescence reflectance index (PSRI). **Table 1** gives details of their formulations with references.

These indices were stacked and previewed with, MSAVI2, NDSoDI and NWI seeking the following: *i) agreement or disagreement with MSAVI2*; *ii) absolute disagreement with NDSoDI*; *iii) disagreement when possible NWI* (Additional material 2). On the transversal transect of 1544 pixels covering all types of vegetation and land cover, 500 pixels were sampled on different sections (100/300 pixels) for each index, to compare trends and relationships with MSAVI2 and NDSoDI. The two periods moving average was used to better visualize trends, while the simple linear regression was performed to extract the coefficient of correlation, *R***<sup>2</sup>**, and the root mean square error, RMSE (**Figure 2**). Based on patterns distribution and these statistics, a threshold was defined for each index, to separate vegetated and non-vegetated areas (Additional material 3). The binarized images were used as entries for the change assessment.


*\*Was inverted after first preview, to better highlight vegetation patterns instead of soil dynamics. \*\*Was adapted to SENTINEL2-A covered wavelengths by using band 6.*

**Table 1.** *Spectral indices used.*

#### **Figure 2.**

*Relationships between each analytical spectral index and both referential ones. The bright green curve represents the dependent variable, while the two others are explanatory variables (MSAVI2 = dark green; NDSoDI = light brown). N = 500 for all variables and regression parameters. The following is noticed from the trends of curves: PVR, PBI and GVMI have a neat positive correlation with MSAVI2, but a sharp negative one with NDSoDI; conversely, MARI and PSRI curves describe the exact opposite trends (positive with NDSoDI and negative with MSAVI2); mCRIG and LWCI curves evolve in another direction cutting the MSAVI2 and NDSoDI curves, while IREIP1 curve shows no real relationship them.*

#### **2.4 Change vector analysis and spatial dynamics assessment**

This process creates a difference image between two or more bands in a multitemporal image analysis, so to detect changes in the type or the conditions of surface features. Depending on the study, change vector analysis (CVA) can use calculation principles of the Mahalanobis or the Euclidean distance as in this study, according to the following expression [38]:

$$\mathbf{R} = \sqrt{\left(\boldsymbol{\beta}\sigma\_1 - \boldsymbol{\beta}\sigma\_2\right)^2 + \left(\boldsymbol{\beta}\rho\_1 - \boldsymbol{\beta}\rho\_2\right)^2} \tag{1}$$

Where *R* is magnitude of the vector change, *βσ***<sup>1</sup>** and *βσ***<sup>2</sup>** are fraction in date 1 and date 2 respectively, *βρ***<sup>1</sup>** and *βσ***<sup>2</sup>** are fraction cover in date 1 date 2.

Four classes of magnitude are represented for either degradation or re-growing [39] (Additional material 4). For each binarized image, a CVA process was performed, the magnitudes of increase and decrease patterns were used, while the stability magnitudes were ignored. A total of twelve difference images per index resulted, i.e., six per selected magnitude. Considering the need of complementarity among indices, only one CVA magnitude, i.e., increase or decrease, was selected per index for further processing. Criteria used to optimize the selection were the original goal of each index, as well as its spatial and statistical relationship with MSAVI2, NDSoDI and NWI (**Table 2**).

*Dynamics, Anomalies and Boundaries of the Forest-Savanna Transition: A Novel Remote… DOI: http://dx.doi.org/10.5772/intechopen.105074*


#### **Table 2.**

*CVA magnitude selected per spectral index.*

#### **2.5 Averaged moving standard deviation index method and anomalies mapping**

The moving standard deviation index, MSDI, is a filter applied to satellite images multispectral or derivative channels using the moving standard deviation calculation, generally to assess degradation [40]. One common application is the vegetation and soil of semi-arid systems, where the variability of the MSDI is used to indicate levels of habitat degradation [41–43]. MSTDI has been proven efficient to operate well in complex regions [40, 42, 44].

Here, the five derivative images per selected CVA were used as entries. A standard deviation was computed for each entry. Difference between consecutive standard deviations were calculated, giving four new images. Then, the averaged MSDI (aMSDI) was conceived as follows:

$$a \text{MSDI} = \frac{\sum\_{i=1}^{n} \sigma\_i b\_i}{n + \tau} \tag{2}$$

For all **1** *i n*, *σibi* represent the *i th* CVA-band contribution to the information distribution in the outputted image, whereas *τ* is the difference between the number of times each index was originally calculated (**7** times, starting from the multispectral median images) and the number of resulting standard deviations computed among selected CVA-bands **(5)**. Here, *τ* ¼ **2**. Computations were performed at **3** � **3 -**, **5** � **5**- and **7** � **7-**pixel moving window. The same steps were applied to both of decrease and increase CVA magnitudes of MSAVI2. The goal at this point was to assess spatial convergence and divergence trends with the nine analytical vegetation spectral indices' aMSDI, as a first-hand validation process of anomalies distribution. Two autoregression-based tools were used as metrics at this step (ESRI, 2014):

• The global Moran's I index was computed to assess the spatial autocorrelation of aMSDI outputs along the 1544 pixels transect. Its integration in spatial analysis is important to avoid incorrect statistical inference from inefficient or biased parameter estimates [45]. The formula is expressed such as follows:

$$I = \frac{n}{\mathcal{S}\_0} \frac{\sum\_{i=1}^n \sum\_{j=1}^n w\_{i,j} x\_i x\_j}{\sum\_{i=1}^n x\_i^2} [-\mathbf{1}, \mathbf{1}] \tag{3}$$

Where *zi* is the deviation of an attribute for pixel *i* from its mean *xi* � *X*, *wi*,*<sup>j</sup>* is the spatial weight between pixel *i* and *j*, *n* is equal to the total number of pixels, and *SO* is the aggregate of all the spatial weights, developed as:

$$\mathcal{S}\_O = \sum\_{i=1}^{n} \sum\_{j=1}^{n} w\_{ij} \tag{4}$$

Values closer to **0** indicate little or no spatial autocorrelation, regular values indicate a clustered trend, and negative values indicate a dispersed trend. In this paper and based on the proposed model, i.e., one CVA magnitude per index and average of the MSDI among years, the clustered trends were targeted to express values closeness. The null hypothesis for the patterns analysis was emitted for a less randomness and a significant clustering, comparing how good would be the model to spot and regroup spatially related pixels of anomalies, and discriminate them from "healthier" pixels. High standard deviation (all *z* � *scores >* **2***:***5**) and low probability (all *p* � *values* ¼ **0**) were ranges targeted, meaning that the observed spatial patterns are probably too unusual to be the result of random chance.

• Ordinary Least Squares (OLS) regression was computed as the non-spatial measure of the spatial convergence/divergence of aMSDI. This method predicts or models a dependent variable relationship with a set of explanatory variables. The regression coefficients are usually estimated by using least-square techniques such as:

$$y = a + \sum\_{j=1}^{n} \mathbf{x}\boldsymbol{\beta}\_{j} + \varepsilon \tag{5}$$

Where, *α* is the intercept, *β<sup>j</sup>* is the regression coefficient, and *ε* is the residual term, given by the difference between observed and expected value. Under these assumptions, the regression coefficient can be obtained by [46]:

$$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} \tag{6}$$

Here, each aMSDI of the nine indices was considered as dependent variable, whereas MSAVI2's aMSDI of both CVA, were simultaneous explanators. Three OLS parameters were selected for interpretation, i.e., adjusted R-squared, corrected Akaike's information criterion (AICc) which assesses the best-fit model between spatial and non-spatial Ordinary Least Squares (OLS) models, root mean squares error (RMSE), and three graphs, i.e., scatterplots of the relationship among variables, histograms of standard deviation probability and plots of residuals versus predicted.

At this point, the spatial correlation was assessed through the cross-correlation mapping process using only the finest **3** � **3** -pixel moving window, that consistently shows best statistic performances than the two others [43]. The parameters of the algorithm were set for the maximum gap (maximum pixel shifting) at **1**, and the

*Dynamics, Anomalies and Boundaries of the Forest-Savanna Transition: A Novel Remote… DOI: http://dx.doi.org/10.5772/intechopen.105074*

maximum masked fraction (maximum fraction of the pixel within the correlation window that are allowed to be masked) at **0***:***5**.

Further, the individual maps of anomalies were used to produce a single one. The first step consisted in stacking the aMSDI for each **3** � **3** -pixel moving window. Then, another correlation map was computed for all the nine inputs, only selecting the correlation coefficient layer. The highest values of the outputs were assumed show agreement and disagreement among aMSDI for high and low anomalies. At last, to keep the highest values as signs for anomaly spots and the lowest ones as potentially "healthier" areas, a simple linear combination, SLC, was performed among the nine aMSDI for each moving window size, by summing the layers without a weighing factor.

#### **2.6 Machine learning for vegetation discrimination**

Based on visual trends, binarized images were regrouped in three axes of three indices of vegetation each. A principal component analysis, PCA, was performed for each ax using the covariance reducer algorithm, that reduces some number of **1** � *D* arrays of the same length *N* to a covariance matrix of shape *N* � *N*. This reducer uses the one-pass covariance formula [24]. The first component of each output was chosen, and a threshold was set to separate vegetated and non-vegetated areas. Then the three outputs were stacked with NDSoDI and NWI to form a new five bands image. In Ref. to both this new image and to the last study period median image (2021–2022), three classes of vegetation were defined, completed by soil/Built-up and water classes. At least two spectral curves were produced per class to confirm the trends, except for water that appeared uniform. The land use land cover, LULC, classes sampling was performed using 115 points, and a process of ½ � **1** : **1** ratio between training samples on the PCA stacked image and testing samples on the MS image (**Table 3**).

Three machine learning algorithms were performed, such as the Classification and Regression Trees (CART), Random Forest (RF) and Support vector machine, (SVM) [47–49]. The metrics used to assess general performance of each classifier were overall accuracy, OA, and the kappa coefficient, KC [50, 51]. Whereas, the thorough assessment of their efficiency on individual LULC classes was done using the error matrix to extract the producer accuracy (PA) and the user accuracy (UA). Eqs. (7) and (8) formulate the main quality measures of the learning:

$$\text{OA} = \frac{\text{Total number of correct samples}}{\text{Total number of samples (96)}} \tag{7}$$

$$\text{KC} = \frac{\varepsilon\_1 - \varepsilon\_2}{1 - \varepsilon\_2} \tag{8}$$


**Table 3.** *Samples distribution.*

#### **Figure 3.**

*Overview of the study design.*

with,

$$\mathbf{e}\_1 = \frac{\sum\_{i=1}^{n} \mathbf{D}\_{ii}}{N}, \text{and}, \mathbf{e}\_2 = \frac{\sum\_{i=1}^{n} \mathbf{D}\_{i+} \mathbf{D}\_{+i}}{N^2}$$

Where, *Dii* is the number of observations in row *i* and column *i* of the confusion matrix, *n* is the number of rows in the error matrix, *N* is total number of counts in the confusion matrix, *xi*<sup>þ</sup> is the marginal total of row *i*, and *x*þ*<sup>i</sup>* is the marginal total of column *i*. Main steps of the processing are summarized in **Figure 3**.
