3. Meteorological indices

Drought can be defined as a period of unusually arid conditions (usually due to rainfall deficiency) that have lasted long enough to cause non-balance in a region's hydrological situation. Based on its intensity and persistence, drought can be classified into four categories [53]: (1) meteorological drought, which occurs when precipitation is less than usual, is characterized by changes in weather patterns; (2) agricultural (vegetation) drought refers to water deficits in plants; it occurs after meteorological drought and before hydrological drought; (3) hydrological drought ensues when the level of surface water and the groundwater table are less than the long-term average; and finally, (4) socioeconomic drought materializes when water resources required for industrial, agricultural, and household consumption are less than required and thus cause socioeconomic anomalies.

A drought index is an indicator or measure derived from a series of observations that reveals some of the cumulative effects of a prolonged and abnormal water deficit. It integrates pertinent meteorological and/or hydrological parameters (accumulated precipitation, temperature, and evapotranspiration) into a single numerical value or formula and gives a comprehensive picture of the situation [53]. Such an index is more readily usable and comprehensible than the raw data and, if presented as a numerical value, makes it easier for planners and policymakers to make decisions. Authorities and public and private committees evaluate the impact of drought using these indices and take measures to prevent its effects [54].

More than 100 drought indices have so far been proposed, and each one has been formulated for a specific condition [55]. The reclamation drought index (RDI), for example, was developed in the USA to activate drought emergency relief funds associated with public lands affected by drought; the crop moisture index (CMI) was designed to show the effects of water conditions on growing crops in the short term and is not a good instrument for displaying long-term conditions. Here we will only describe the standardized precipitation index, which those indices used in case studies.

#### 3.1 Standardized precipitation index (SPI)

Most of the forecasting works reviewed here are based on SPI [56]. It is perhaps the most popular index for forecasting meteorological drought and has been recommended by the World Meteorological Organization [57]. It can be defined as the number of standard deviations that the observed cumulative rainfall at a given time scale (1,3,6 month) would deviate from the long-term mean for that same time scale over the entire length of the record (z-score).

More specifically, SPI is calculated by building a frequency distribution from historical precipitation data (at least 30 years) at a specific location for the precipitation accumulated during a specified period, for example, 1 month (SPI1), 3 months, (SPI3), 24 months (SPI24), and so on. A theoretical probability density function (usually the gamma distribution) is fitted to the empirical distribution for the selected time scale.

SPI1 to SPI6 are considered indices for short-term or seasonal variation (soil moisture), whereas SPI12 is considered a long-term drought index (groundwater and reservoir storage).

The "drought" part of the SPI range is arbitrarily split into "near normal" (0.99 > SPI > 0.99), "moderately dry" (1.0 > SPI > 1.49), "severely dry"

and most researchers use DWT. For more information about CWT, the reader

Time series wavelet-ANN conjunction model. (A) Three-level wavelet decomposition tree (DWT).

reflect the time series according to the mother wavelet.

(B) Example of the decomposition of a precipitation signal.

mother wavelet function was defined at scale a and location b as

ψa,bðÞ¼ t

time series. As an example, we can mention Haar, which is also known as

8 >><

>>:

A full description of DWT can be found in [50, 52].

DWT operates two sets of functions (scaling and wavelets) viewed as high-pass (HPF) and low-pass (LPF) filters. The signal is convolved with the pair of HPF and LPF followed by subband downsampling producing two components. The first component, which is obtained by passing the signal through the low-pass filter, is called an approximation component (or series), and the other component (fast events) is called a detailed component (Figure 7). This process is iterated n times with successive approximation series being decomposed in turn, so that the original time series is broken down into the minimum number of components needed to

The filterbank implementation of wavelets can be interpreted as computing the

1 ffiffiffi <sup>a</sup> <sup>p</sup> <sup>ψ</sup>

ψ0,0ð Þt is a mother wavelet prototype and a, b are scaling and shifting parame-

Several wavelet families have proven useful for forecasting various hydrological

1 if 0 < t < 0:5 �1 if 0:5 < t < 1 0 otherwise

t � b a � �

(9)

(10)

wavelet coefficients of a discrete set of child wavelets for a given mother. This

should refer to [51].

Drought - Detection and Solutions

Figure 7.

ters, respectively.

10

daubechies1 or db1 [50]. It is defined as

(�1.5 > SPI > �1.99), and "extremely dry" (SPI < �2.0) conditions [56]. A drought event starts when SPI becomes negative and ends when it becomes positive again.

• Nash-Sutcliffe efficiency (NSE) or MSE skill [65].

DOI: http://dx.doi.org/10.5772/intechopen.85471

potential error [66].

• Mean absolute error (MAE).

predicted values, respectively.

dimensionless statistics.

4.1 Case studies

13

NSE <sup>¼</sup> <sup>1</sup> � <sup>∑</sup><sup>n</sup>

Satellite Data and Supervised Learning to Prevent Impact of Drought on Crop…

WI <sup>¼</sup> <sup>1</sup> � <sup>∑</sup><sup>n</sup>

Among one absolute error index statistic most used are

• Mean Error MSE ð Þ estimates the average estimate error.

MSE <sup>¼</sup> <sup>1</sup>

MAE <sup>¼</sup> <sup>1</sup>

<sup>i</sup>¼<sup>1</sup> pi � oi � �<sup>2</sup>

<sup>i</sup>¼1ð Þ oi � <sup>o</sup>

<sup>i</sup>¼<sup>1</sup> pi � oi � �<sup>2</sup>

" #

� � �<sup>2</sup>

pi � oi

pi � oi � � �

� þ pi � oi � � �

<sup>2</sup> (13)

� �<sup>2</sup> (15)

� (16)

(14)

∑<sup>n</sup>

• Willmott's index (WI) represents the ratio of the mean square error and the

pi � oi � � �

> <sup>n</sup> <sup>∑</sup> n i¼1

> > <sup>N</sup> <sup>∑</sup> n i¼1

Here we included R and R2, two standard regression criteria, in the group of

Some inconsistencies in the observations and the duration of satellite records introduce difficulties and uncertainties when applying forecast methods. At least 30 years of data record are required to SPI forecast; therefore, some of the examples we present here are based exclusively on ground gauge data. This situation is very close to reverting since satellite observations are reaching the minimum number of years

Shirmohammadi et al. [67] evaluated the performance of two ANN architectures (feedforward neural network and Elman or recurrent neural network), different kinds of ANFIS (four different membership functions: Gaussian, bell-shaped, triangular, and Piduetoits shape), WT-ANFIS, and WT-ANN. The wavelets families

required and the data are calibrated with ground observations (Table 1).

used here were db4, bior1.1, bior1.5, rboi1.1, rboi1.5, coif2, and coif4.

Finally, we present just one example of the graphical technique, mainly to show how a training and evaluation process is executed with a ML algorithm (Figure 8).

In all the formulas presented above, oi, pi represent the observed and estimated values, n is the number of records, and o, p indicate the means of the observed and

SPI is easy to calculate (using precipitation only) and can characterize drought or abnormal wetness on different time scales. Its standardization ensures independence from geographical position, and it is thus more comparable across regions with different climates. The index can be computed using several packages of the R project [58], for example, the SPEI package [59] or the SPI package [60]. Limitations of SPI include the following: (1) it does not account for evapotranspiration; (2) it is sensitive to the quantity and reliability of the data used to fit the distribution; and (3) it does not consider the intensity of precipitation and its potential impacts on runoff, streamflow, and water availability within the system. A more detailed explanation of how SPI is calculated can be found at [43].

#### 3.2 Other indices

Other indices including only precipitation data are EDI [61], SIAP [62], deciles index (DI), percent of normal (PN), standard precipitation index (SPI), China-Z index (CZI), modified CZI (MCZI), and z-score [55].

### 4. Forecasting meteorological drought

Forecasting meteorological drought using historical data is not a trivial task. The time series that characterize the evolution of meteorological events (drought, precipitation) in the temporal domain have localized high- and low-frequency components with dynamic nonlinearity and non-stationary features. Several statistical indicators have been proposed to evaluate the success of prediction. Most of these metrics are not independent; for example, MSE can be decomposed in many ways to link it with the bias and the correlation coefficient [63]. A standard practice of model corroboration is to compute a common set of performance metrics, typically more than three. Most important is that at least three critical components, that is, one dimensionless statistic, one absolute error index statistic, and one graphical technique, should be represented in the corroboration [64].

Regarding the dimensionless statistic, we must mention:

• Pearson's correlation coefficient (R) is used to evaluate how well the estimates correspond to the observed values. Due to the standardization of many indices, the robustness of R can be limited [64].

$$R = \frac{\sum\_{i=1}^{n} \left(p\_i - \overline{p}\right) (o\_i - \overline{o})}{\sqrt{\sum\_{i=1}^{n} \left(p\_i - \overline{p}\right)^2} \sqrt{\sum\_{i=1}^{n} (o\_i - \overline{o})^2}} \tag{11}$$

• Coefficient of determination (R2) measures the degree of association among the observed (oiÞ and predicted values (pi Þ.

$$\text{R2} = \frac{\sum\_{i=1}^{n} \left(o\_i - p\_i\right)^2}{\sum\_{i=1}^{n} \left(o\_i - \overline{o}\right)^2} \tag{12}$$

Satellite Data and Supervised Learning to Prevent Impact of Drought on Crop… DOI: http://dx.doi.org/10.5772/intechopen.85471

• Nash-Sutcliffe efficiency (NSE) or MSE skill [65].

(�1.5 > SPI > �1.99), and "extremely dry" (SPI < �2.0) conditions [56]. A drought event starts when SPI becomes negative and ends when it becomes positive again. SPI is easy to calculate (using precipitation only) and can characterize drought or abnormal wetness on different time scales. Its standardization ensures independence from geographical position, and it is thus more comparable across regions with different climates. The index can be computed using several packages of the R project [58], for example, the SPEI package [59] or the SPI package [60]. Limitations of SPI include the following: (1) it does not account for evapotranspiration; (2) it is sensitive to the quantity and reliability of the data used to fit the distribution; and (3) it does not consider the intensity of precipitation and its potential impacts on runoff, streamflow, and water availability within the system. A more

Other indices including only precipitation data are EDI [61], SIAP [62], deciles index (DI), percent of normal (PN), standard precipitation index (SPI), China-Z

Forecasting meteorological drought using historical data is not a trivial task. The time series that characterize the evolution of meteorological events (drought, precipitation) in the temporal domain have localized high- and low-frequency components with dynamic nonlinearity and non-stationary features. Several statistical indicators have been proposed to evaluate the success of prediction. Most of these metrics are not independent; for example, MSE can be decomposed in many ways to link it with the bias and the correlation coefficient [63]. A standard practice of model corroboration is to compute a common set of performance metrics, typically more than three. Most important is that at least three critical components, that is, one dimensionless statistic, one absolute error index statistic, and one graphical

• Pearson's correlation coefficient (R) is used to evaluate how well the estimates correspond to the observed values. Due to the standardization of many indices,

<sup>i</sup>¼<sup>1</sup> pi � <sup>p</sup> � �ð Þ oi � <sup>o</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

• Coefficient of determination (R2) measures the degree of association among

R2 <sup>¼</sup> <sup>∑</sup><sup>n</sup>

∑<sup>n</sup>

Þ.

<sup>i</sup>¼<sup>1</sup> oi � pi � �<sup>2</sup>

<sup>i</sup>¼<sup>1</sup>ð Þ oi � <sup>o</sup>

<sup>i</sup>¼<sup>1</sup> pi � <sup>p</sup> � �<sup>2</sup> <sup>q</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑<sup>n</sup>

<sup>i</sup>¼<sup>1</sup>ð Þ oi � <sup>o</sup>

2 <sup>q</sup> (11)

<sup>2</sup> (12)

detailed explanation of how SPI is calculated can be found at [43].

index (CZI), modified CZI (MCZI), and z-score [55].

technique, should be represented in the corroboration [64]. Regarding the dimensionless statistic, we must mention:

<sup>R</sup> <sup>¼</sup> <sup>∑</sup><sup>n</sup>

∑<sup>n</sup>

the robustness of R can be limited [64].

the observed (oiÞ and predicted values (pi

12

4. Forecasting meteorological drought

3.2 Other indices

Drought - Detection and Solutions

$$NSE = 1 - \frac{\sum\_{i=1}^{n} \left(p\_i - o\_i\right)^2}{\sum\_{i=1}^{n} \left(o\_i - \overline{o}\right)^2} \tag{13}$$

• Willmott's index (WI) represents the ratio of the mean square error and the potential error [66].

$$WI = 1 - \left[\frac{\sum\_{i=1}^{n} (p\_i - o\_i)^2}{\left(|p\_i - o\_i| + |p\_i - o\_i|\right)^2}\right] \tag{14}$$

Among one absolute error index statistic most used are

• Mean Error MSE ð Þ estimates the average estimate error.

$$\text{MSE} = \frac{1}{n} \sum\_{i=1}^{n} \left( p\_i - o\_i \right)^2 \tag{15}$$

• Mean absolute error (MAE).

$$\text{MAE} = \frac{1}{N} \sum\_{i=1}^{n} |p\_i - o\_i| \tag{16}$$

In all the formulas presented above, oi, pi represent the observed and estimated values, n is the number of records, and o, p indicate the means of the observed and predicted values, respectively.

Here we included R and R2, two standard regression criteria, in the group of dimensionless statistics.

Finally, we present just one example of the graphical technique, mainly to show how a training and evaluation process is executed with a ML algorithm (Figure 8).

#### 4.1 Case studies

Some inconsistencies in the observations and the duration of satellite records introduce difficulties and uncertainties when applying forecast methods. At least 30 years of data record are required to SPI forecast; therefore, some of the examples we present here are based exclusively on ground gauge data. This situation is very close to reverting since satellite observations are reaching the minimum number of years required and the data are calibrated with ground observations (Table 1).

Shirmohammadi et al. [67] evaluated the performance of two ANN architectures (feedforward neural network and Elman or recurrent neural network), different kinds of ANFIS (four different membership functions: Gaussian, bell-shaped, triangular, and Piduetoits shape), WT-ANFIS, and WT-ANN. The wavelets families used here were db4, bior1.1, bior1.5, rboi1.1, rboi1.5, coif2, and coif4.

#### Figure 8.

Generic example of time series forecasting using two different ML methods. The green dotted line indicates a "bad" forecast method. The red dashed line indicates an appropriate method for the data, that is, the curve is closer to the observed time series. Both methods were trained using 80% of the data and tested on the remaining 20%.

For SPI3 (1 month lead time) forecast, the best results in terms of RMSE (0.407)

Coefficient of determination (R2) of 10 ML methods to predict 3, 12, and 24 months SPI. Extracted from [71].

Regarding SPI6 forecasts, the WA-ANN and WA-SVR models provided the best SPI6 forecasts. Neither method was meaningfully better than the other. The predictions for SPI6 were significantly better than SPI3 predictions according to three performance measures. As the forecast lead time increased, the forecast accuracy of all the models declined. This drop was most evident in the ARIMA, ANN, and SVR

These results were similar to [70]. Authors used precipitation records (1970– 2005) from 20 stations in the same basin of Ethiopia (three different sub-basins) to generate SPI3 and SPI12 series. ANN, SVR, and WA-ANN were evaluated for 1 and 6 months lead time prediction. The comparison was made using RMSE, MAE, and R2. Forecasting of SPI 12, for all the models, had better performance results than predicting SPI 3, regardless of the lead time (best R2 = 0.953, WA-ANN). The

Belayneh et al. [71] modeled ANN and SVR as in [68] to forecast SPI 3, SPI12, and SPI24, but they included bootstrap (BANN and BSVR), boosting (BS-ANN, BS-SVR), wavelet coupled bootstrap ensemble (WBANN and WBSVR), and wavelet

In general, the performances of SVR and ANN were comparable, although ANN performance was slightly higher. The inclusion of wavelets improved both techniques (wavelet decomposition denoises the time series). All models were more effective at forecasting SPI12 and SPI24 than SPI3 (Table 1). All boosting ensemble

The WBS-ANN and WBS-SVR models provided better prediction results than all

performance of all the models declined when the lead time increased.

coupled boosting (WBS-ANN and WBS-SVR) in the analysis.

models were developed in MATLAB ("fitensemble" function).

the other types of models evaluated.

and MAE (0.391) were obtained by the WA-ANN model at the Ziquala station, whereas in terms of R2 (0.881), the Ginchi station had the best WA-ANN model. When the lead-time was raised to 3 months, WA-ANN remained the best model. One station (Bantu Liben) had the model with the lowest RMSE and MAE values (0.510 and 0.4941), whereas a second station (Sebeta) had the best results in terms

Model SPI3 SPI12 SPI24 SVR 0.54 0.84 0.89 BSVR 0.47 0.86 0.91 BS-SVR 0.62 0.93 0.92 ANN 0.64 0.89 0.93 BANN 0.55 0.87 0.92 BS-ANN 0.67 0.95 0.98 WBANN 0.64 0.87 0.93 WBS-ANN 0.69 0.90 0.95 WBSVR 0.57 0.85 0.90 WBS-SVR 0.67 0.95 0.94 Abbreviations: SVR, support vector regression; BSVR, bootstrap SVR ensemble; BS-SVR, boosting-SVR; ANN, artificial neural networks; BANN, bootstrap ANN ensemble; BS-ANN, boosting-ANN; WBANN, wavelet coupled bootstrap ANN ensemble; WBS-ANN, wavelet boosting-ANN; WBSVR, wavelet coupled bootstrap SVR ensemble;

Satellite Data and Supervised Learning to Prevent Impact of Drought on Crop…

DOI: http://dx.doi.org/10.5772/intechopen.85471

of R2 (0.7304).

WBS-SVR, wavelet boosting-SVR.

models.

15

Table 1.

Training data came from 1952 to 1992 rain records from East Azerbaijan province (Iran). More than 1000 model structures were tested to predict SPI6 for 1, 2, and 3 months' lead-time over the test period covering from 1992 to 2011. R2, NSE, and RMSE evaluated the performance of the models.

ANFIS models provided more accurate predictions than ANN models, and the inclusion of WT could improve meteorological drought modeling: WT-ANFIS (best RMSE = 0.097), WT-ANN (best RMSE = 0.227), ANFIS (best RMSE = 0.089), and ANN (best RMSE = 1.81).

Belayneh et al. [68] used precipitation records (1970 to 2005) to generate SPI3 and SPI6 time series from 12 stations in the Awash River Basin of Ethiopia (that is, 12 x 2 independent time series). The forecast was performed with ANN (RMSNN trained with the Levenberg-Marquardt back propagation), SVR, and the coupled models: WA-ANN and WA-SVR. About 80% of the data was used for training, 10% for validation, and 10% for testing, and ARIMA forecasting was used as a benchmark [69]. Regarding wavelet decomposition, each time series was decomposed between one and nine levels, and the appropriate level was selected by comparing results among all decomposition levels. The results of all the methods were compared by RMSE, MAE, and R2. Overall, the WA-ANN and WA-SVR models were effective in forecasting SPI3 although most WA-ANN models had more accurate estimates (1- or 3-month lead). The WA-ANN model seemed to be more effective in anticipating extreme SPI values (severe drought or heavy precipitation), whereas WA-SVR closely reflected the observed SPI trends but underestimated the extreme events.


Satellite Data and Supervised Learning to Prevent Impact of Drought on Crop… DOI: http://dx.doi.org/10.5772/intechopen.85471

Abbreviations: SVR, support vector regression; BSVR, bootstrap SVR ensemble; BS-SVR, boosting-SVR; ANN, artificial neural networks; BANN, bootstrap ANN ensemble; BS-ANN, boosting-ANN; WBANN, wavelet coupled bootstrap ANN ensemble; WBS-ANN, wavelet boosting-ANN; WBSVR, wavelet coupled bootstrap SVR ensemble; WBS-SVR, wavelet boosting-SVR.

#### Table 1.

Training data came from 1952 to 1992 rain records from East Azerbaijan province (Iran). More than 1000 model structures were tested to predict SPI6 for 1, 2, and 3 months' lead-time over the test period covering from 1992 to 2011. R2, NSE,

Generic example of time series forecasting using two different ML methods. The green dotted line indicates a "bad" forecast method. The red dashed line indicates an appropriate method for the data, that is, the curve is closer to the observed time series. Both methods were trained using 80% of the data and tested on the remaining

ANFIS models provided more accurate predictions than ANN models, and the inclusion of WT could improve meteorological drought modeling: WT-ANFIS (best RMSE = 0.097), WT-ANN (best RMSE = 0.227), ANFIS (best RMSE = 0.089), and

Belayneh et al. [68] used precipitation records (1970 to 2005) to generate SPI3 and SPI6 time series from 12 stations in the Awash River Basin of Ethiopia (that is, 12 x 2 independent time series). The forecast was performed with ANN (RMSNN trained with the Levenberg-Marquardt back propagation), SVR, and the coupled models: WA-ANN and WA-SVR. About 80% of the data was used for training, 10% for validation, and 10% for testing, and ARIMA forecasting was used as a benchmark [69]. Regarding wavelet decomposition, each time series was decomposed between one and nine levels, and the appropriate level was selected by comparing results among all decomposition levels. The results of all the methods were compared by RMSE, MAE, and R2. Overall, the WA-ANN and WA-SVR models were effective in forecasting SPI3 although most WA-ANN models had more accurate estimates (1- or 3-month lead). The WA-ANN model seemed to be more effective in anticipating extreme SPI values (severe drought or heavy precipitation), whereas WA-SVR closely reflected the observed SPI trends but underestimated the extreme

and RMSE evaluated the performance of the models.

ANN (best RMSE = 1.81).

Drought - Detection and Solutions

events.

14

Figure 8.

20%.

Coefficient of determination (R2) of 10 ML methods to predict 3, 12, and 24 months SPI. Extracted from [71].

For SPI3 (1 month lead time) forecast, the best results in terms of RMSE (0.407) and MAE (0.391) were obtained by the WA-ANN model at the Ziquala station, whereas in terms of R2 (0.881), the Ginchi station had the best WA-ANN model.

When the lead-time was raised to 3 months, WA-ANN remained the best model. One station (Bantu Liben) had the model with the lowest RMSE and MAE values (0.510 and 0.4941), whereas a second station (Sebeta) had the best results in terms of R2 (0.7304).

Regarding SPI6 forecasts, the WA-ANN and WA-SVR models provided the best SPI6 forecasts. Neither method was meaningfully better than the other. The predictions for SPI6 were significantly better than SPI3 predictions according to three performance measures. As the forecast lead time increased, the forecast accuracy of all the models declined. This drop was most evident in the ARIMA, ANN, and SVR models.

These results were similar to [70]. Authors used precipitation records (1970– 2005) from 20 stations in the same basin of Ethiopia (three different sub-basins) to generate SPI3 and SPI12 series. ANN, SVR, and WA-ANN were evaluated for 1 and 6 months lead time prediction. The comparison was made using RMSE, MAE, and R2. Forecasting of SPI 12, for all the models, had better performance results than predicting SPI 3, regardless of the lead time (best R2 = 0.953, WA-ANN). The performance of all the models declined when the lead time increased.

Belayneh et al. [71] modeled ANN and SVR as in [68] to forecast SPI 3, SPI12, and SPI24, but they included bootstrap (BANN and BSVR), boosting (BS-ANN, BS-SVR), wavelet coupled bootstrap ensemble (WBANN and WBSVR), and wavelet coupled boosting (WBS-ANN and WBS-SVR) in the analysis.

In general, the performances of SVR and ANN were comparable, although ANN performance was slightly higher. The inclusion of wavelets improved both techniques (wavelet decomposition denoises the time series). All models were more effective at forecasting SPI12 and SPI24 than SPI3 (Table 1). All boosting ensemble models were developed in MATLAB ("fitensemble" function).

The WBS-ANN and WBS-SVR models provided better prediction results than all the other types of models evaluated.

Ali, Deo, et al. [43] evaluated the performance of three models (ANFIS, M5Tree, and MPMR) to forecast SPI3, SPI6, and SPI12 calculated from a 35-year rainfall data set (1981–2015) from three (3) stations in Pakistan. SPI data were partitioned into 70% (training) and 30% (testing) periods. M5Tree is a kind of decision tree with linear regression functions on the leaves [72], whereas MPMR stands for minimax probability machine regression [73] and was also applied to benchmark the ensemble-ANFIS model. Regarding SPI3 forecast, ANFIS (R = 0.889 to 0.946) outperformed MPMR (R = 0.843 to 0.935) and M5Tree (R = 0.831 to 0.916). Similarly, SPI6 ANFIS (R = 0.968 to 0.974) outperformed M5Tree (R = 0.950 to 0.967) and MPMR (R = 0.952 to 0.970). For SPI12, ANFIS (R = 0.987 to 0.993) overcome M5Tree (R = 0.950 to 0.967) and MPMR (R = 0.984 to 0.986). The other statistics (e.g., RMSE, WI) corroborated the superior performance of ANFIS. Just as important, the ensemble-ANFIS model achieved the highest accuracy at the three stations when predicting moderate, severe, and extreme droughts.

used to train the algorithms with extensive parameter optimization, whereas the test set covered from 2005 to 2009. For monthly rainfall, SVR produced the most accurate forecast (lowest RMSE) and the best rainfall mapping (R2 = 0.93), whereas for the daily rainfall series, the MARS method produced the best R2 value (0.99).

There is no satellite that can reliably quantify rainfall under all circumstances. However, ground observations, although reliable and with long-term records, do not provide a consistent spatial representation of precipitation, particularly on certain world regions. Therefore, satellite data become necessary, as they provide more homogeneous data quality compared to ground observations [81, 82]. To our

Precipitation data sets may be classified into one of four categories: gauge data sets (e.g., CRU TS [83], APHRODITE [84]), satellite-exclusive (e.g., CHOMPS [85]), merged satellite-gauge products (e.g., GPCP [86], TRMM3B42), and reanalysis (e.g., NCEP1/NCEP2 [87], ERA-Interim [88]). Reanalysis implies integrating irregular observations with models encompassing physical and dynamic processes in order to generate an estimate of the state of the system across a

Many studies show that satellite precipitation algorithms show different biases, detection probabilities, and missing rainfall ratios in summer and winter. Sources of error include the satellite sensor itself, the retrieval error [90], and spatial and

Algorithms that estimate rainfall from satellite observations are based on either thermal infrared (TIR) bands (inferring cloud-top temperature), passive microwave sensors (PMW), or active microwave sensors (AMW). The TIR-based approach takes into account cold cloud duration or CCD, that is, the time that a cloud has a temperature below the threshold at a given pixel [93]. The PMW-based approach takes advantage of the fact that microwaves can penetrate clouds to explore their internal properties through the interaction of raindrops [94]. AMW is

There is a plethora of validation studies of satellite-based rainfall estimates (SREs). Normally, these SREs are compared against ground rainfall estimates

Sun et al. [97] reviewed 30 currently available global precipitation (gauge-based, satellite-related, or reanalysis) data sets. The degree of variability of the precipitation estimates varies by region. Large differences in annual and seasonal estimates were found in tropical oceans, complex mountain areas, Northern Africa, and some high-latitude regions. Systematic errors are the main sources of errors over large parts of Africa, northern South America, and Greenland. Random errors are the dominant kinds of error in large regions of global land, especially at high latitudes. Regarding satellite assessments, PERSIANN-CCS has larger systematic errors than CMORPH, TRMM 3B42, and PERSIANN-CDR. The spatial distribution of system-

Table 2 presents a comparison of several representative satellite rainfall products. More information regarding these and other products can be found in [97, 98]. Abbreviations: (IR) infrared satellite imagery, (MW) microwave estimates, (GG) ground gauges, (AMSU) Advanced Microwave Sounding Unit, (AMSU-B) Advanced Microwave Sounding Unit-B, (SSM/I) Special Sensor Microwave/Imager; (AMSR-E) Advanced Microwave Scanning Radiometer for the Earth Observing

All the metrics were calculated based on single-step ahead forecasting.

Satellite Data and Supervised Learning to Prevent Impact of Drought on Crop…

knowledge, merged satellite-gauge products are becoming indispensable.

5. Satellite precipitation products (SPPs)

DOI: http://dx.doi.org/10.5772/intechopen.85471

uniform grid and with temporal continuity [89].

what usually known as precipitation radar [95].

atic errors is similar for all reanalysis products [97].

temporal sampling [91, 92].

[91, 96].

17

Khosravi et al. [74] used rainfall data from the Tropical Rainfall Measuring Mission (TRMM) during 2000–2014 in the eastern district of Isfahan to generate 12-month SPI. The first 85% of the data was used to train a single-hidden layer feedforward ANN, an SVR with RBF kernel, an LS-SVR with RBF kernel, and an ANFIS method. Optimum values of SVR and LS-SVR were obtained by a grid search within the range of [10<sup>3</sup> , 10+3] and [2<sup>3</sup> , 2+3] for C and γ (SVR) or (10, 100 and 1000) for g and (1, 0.5 and 1) for γ (LS-SVR).

For SPI12, SVR achieved the highest accuracy (RMSE = 0.21), followed by LS-SVR (RMSE = 0.38), ANN (RMSE = 1.24), and ANFIS (RMSE = 1.36). The best ANN model consisted of three layers (input, hidden, and output) with 30, 8, and 1 neuron, respectively.

Chen et al. [75] evaluated RF and ARIMA to forecast SPI3 (short-term drought) with a 1-month lead time and SPI12 (long-term drought). Both models were developed based on data from 1966 to 1995 (four stations in China), and predictions (1 month or 6 months ahead) were made from 1996 to 2004. Overall, RF performed consistently better than ARIMA. Results also suggested that RF is more robust in predicting dry events. Finally, ARIMA lost the capacity to predict SPI12, whereas the accuracy of RF was less affected by the longer lead time.

Agana and Homaifar [76] developed a hybrid model using a denoised empirical mode decomposition [77] and DBN. The proposed method was applied to predict a standardized streamflow index (SSI) across the Colorado River basin (ten stations). The new model was compared with MLP and SVR in predicting SSI12 (1, 6, and 12 months lead time). DBN, SVR, and their hybrid versions displayed rather similar prediction errors. However, DBN and EMD-DBN outperformed all other models for two-step predictions at almost all stations. As in wavelets, the empirical mode decomposition significantly improves the quality of prediction.

Finally, we want to mention two examples where ML was directly applied to rainfall prediction.

El Shafie et al. [77] evaluated a radial basis function neural network (RBFNN) to forecast rainfall in Alexandria City, Egypt. The model was trained using rainfall data from 1960 to 2001 (four stations) and tested with data from 2002 to 2009 to predict yearly and monthly (January and December) precipitation. Regarding yearly model efficiency, R2 = 0.94 for RBFNN, whereas the control (a multiple linear regression MR model) only reached R2 = 0.21. Regarding monthly precipitation, RBFNN was very successful (R2 = 0.899 for January and R2 = 0.997 for December) as compared to the control (R2 = 0.997 and 0.34, respectively).

Sumi et al. [78] compared ANNs, multivariate adaptive regression splines or MARS [79], k-nearest neighbor [80], and SVR with RBF kernel to predict daily and monthly rainfall in Fukuoka, Japan. A preprocessed training set (1975–2004) was

Satellite Data and Supervised Learning to Prevent Impact of Drought on Crop… DOI: http://dx.doi.org/10.5772/intechopen.85471

used to train the algorithms with extensive parameter optimization, whereas the test set covered from 2005 to 2009. For monthly rainfall, SVR produced the most accurate forecast (lowest RMSE) and the best rainfall mapping (R2 = 0.93), whereas for the daily rainfall series, the MARS method produced the best R2 value (0.99). All the metrics were calculated based on single-step ahead forecasting.
