**Abstract**

The impact of increasing climate variability on crop yield is now evident. Predicting the potential effects of climate change on crops prompts the use of statistical models to measure how the crop responds to climate variables. This chapter examines the usage of regression analysis in predicting crop yield under a changing climate. Data quality control is explained and application of descriptive statistics, correlation analysis and contingency tables discussed. Methodological aspects of crop yield modeling and prediction using climate variables are described. Estimation of yield via a multilinear regression approach is outlined and an overview of statistical model verification introduced. The study recommends the usage of regression models in estimating crop yield in consideration of many other externalities that can contribute to yield change.

**Keywords:** crop, yield, prediction, climate change, regression model

## **1. Introduction**

In this chapter, we describe an experimental approach that can be employed in predicting crop yield in a changing climate. An introductory applied approach to linear statistical modeling and correlation analysis is examined.

Climate change is now evident with well documented socio-economic impacts that will affect food production [1, 2]. The decline in food production corresponding to reduction in crop yields can be investigated using statistical models [3, 4]. While climate related factors can affect yield of crop, there are other externalities that can impact on yield production that include the quality of soil, usage of commercial fertilizers or organic manures and residual effects of chemical substances in soils [5, 6].

**Figure 1** below shows the projected changes in crop yield due to the impacts of increasing climate variability.

Increasing climate variability and associated uncertainties, its impacts on food production and general livelihoods prompts the usage of prediction models to estimate future food production for early warning and planning. Projection of crop yield in a changing climate has been identified with uncertainties that are continuously being reduced by improvement in climate parameter response functions, including temperature [7]. Developing countries have also been identified with weaker monitoring and reporting of crop health which can lead to absence of early warning systems and slow responses to droughts and potential food shortages [8]. The prediction models employed are broadly classified into statistical or dynamic (mechanistic). However, the modeling has been in some instances enhanced by artificial neural network technology that has been applied to generate regional

**Figure 1.** *Projected changes in crop yield associated with increasing climate variability (source: [7]).*

time series of crop yield utilizing highly resolved output of the global and regional models. The Global Circulation Models (GCMs) give coarse output of important climate parameters preferably applicable on larger spatial scales while the Regional Circulation Models (GCMS) give fine scale resolutions of GCMs on shorter spatial scales. Where the variables of interests cannot be described by standard linear models, nested-error regression model having both fixed and random effects are usually applied to include Monte Carlo simulation methodology in order to enhance representative precision. Nested-error modeling is better at regional spatial scales and performs poorly at large scale spatial coverage [9].

Application of computing technology has seen historical climate data sets of at least 30 years of a given crop used with artificial neural network technology to investigate, simulate and predict historical time series of crop yields in climate zones over regions. Resultant neural networks are trained with data sets of selected climate zones and tested against an independent zone in order to enhance the power of crop yield predictability [10]. A combination of neural networks and fuzzy set theory has also been applied to construct Fuzzy Neural Network (FNN) and Granular Neural Network (GNN) that have been used for predicting crop yields over different spatial locations with inputs from spot vegetation cover data [11].

Assessment of vegetative cover using standardized vegetative indices has also been adopted to give estimates of crop yield over regions either separately or in combination of the above approaches. In this method, easily measurable proxies are applied that include Normalized-Difference Vegetation Indices (NDVI), Green vegetation indices (GVI), Soil Adjusted Vegetation Indices (SAVI), Back-propagation Neural Network (BPNN) that are positively correlated with crop yield [12]. Statistical models that combine the vegetative and thermal indices from satellite data have performed better in predicting crop yield compared to those that are based on vegetative cover indices alone [13]. While mechanistic models have been applied alongside statistical models, the later have been able to reproduce key features of crop responses to warming and precipitation changes using a process-based model approach [3, 14]. The Crop Environment Resource Synthesis (CERES) model has been applied as a decision making tool in crop yield estimation [15] while PRECIS (Predicting REgional Climate for Impact Studies) has been used to assess the impacts of climate change on crop yield [14].

Therefore, the need to continuously enhance understanding on yield estimation and prediction in a changing climate is continuous and cannot be over emphasized. It is in this back drop that this study makes attempts to adopt a simplified linear statistical approach applicable in crop yield estimation. Basic statistical description is defined and methodology discussed. Hybrid models that are both statistical and mechanistic, integrated by neural network technology based on

**3**

*Prediction of Crop Yields under a Changing Climate DOI: http://dx.doi.org/10.5772/intechopen.94261*

**2. Statistical determination of crop yield**

higher crop yield predictive power.

**2.1 Descriptive statistical analysis**

**2.2 Crop yield modeling and prediction**

ment mechanistic simulations of crop yield [17].

separately in comparison with historical crop data.

*2.2.1 Multilinear regression yield estimation*

tested using the t-test [4].

multiple variables of climate and crop physiological importance are found to have

This entails describing data in statistical summaries meaningfully without making conclusions beyond the data. Measures of central tendency and measures of spread are widely adopted in describing data. The former involves the determination of the mean, median, mode, skewness and kurtosis while the later includes variance and standard deviation. These parameters can be used to describe climate and crop yield data as a preliminary approach alongside data quality control.

Data quality control involves approaches that are used to detect defections and inconsistencies in data sets. Various methods are used to determine the quality of climate data including linear regression approaches. In one such method, a single mass curve technique is applied where cumulative values of climate variable are plotted against a linear scale. The tendency of the resulting curve to shift towards linearity is identified with better quality data. This method is also called the data homogeneity test. Data that fails this test or data with more than 10 percent of missing values is judged to be of poor quality and not fit for inferential statistics.

Statistical models have been applied in predicting crop yield and their ability to accurately predict yield responses to changes in mean temperature and precipitation has been determined by process-based crop models. Prototype models include Crop Environment Resource Synthesis (CERES) that can be applied to a crop to simulate corresponding yield and can be used for projecting future yield responses, with their usefulness higher at broader spatial scales [16]. *Mechanistic* models are also used alongside statistical models to predict crop yields [17]. Crop Yield Simulation and Land Assessment Models (CYSLAM) are applied to model the interaction of environmental variables, physiological responses, inputs, yields and land manage-

Yields constrained by radiation and temperature within 10 day periods (dekads) are initially estimated in order to account for effective rainfall, evapotranspiration, percolation, and soil moisture. The procedure is followed by a simulation of crop/soil water balance through the cycle of crop growth accounting for periods of moisture stress and consequently, estimation of crop yield [17]. The moisture-dependent yield is adjusted for nutrient supply, toxicities and drainage conditions of the soil [17]. However, validation of modules for moisture limited yield, nutrient yield and radiation and temperature limited yield is carried out

Single mass curve technique is used for data quality control where Cumulative values of data are plotted against a temporal scale. The nature and variability of climate elements is determined including the mean, skewness, standard deviation, students' t-test and correlation analysis. Trend is determined by dividing the data into two sets of equal length, and the difference in the means of the two sets is

*Agrometeorology*

**Figure 1.**

time series of crop yield utilizing highly resolved output of the global and regional models. The Global Circulation Models (GCMs) give coarse output of important climate parameters preferably applicable on larger spatial scales while the Regional Circulation Models (GCMS) give fine scale resolutions of GCMs on shorter spatial scales. Where the variables of interests cannot be described by standard linear models, nested-error regression model having both fixed and random effects are usually applied to include Monte Carlo simulation methodology in order to enhance representative precision. Nested-error modeling is better at regional spatial scales

*Projected changes in crop yield associated with increasing climate variability (source: [7]).*

Application of computing technology has seen historical climate data sets of at least 30 years of a given crop used with artificial neural network technology to investigate, simulate and predict historical time series of crop yields in climate zones over regions. Resultant neural networks are trained with data sets of selected climate zones and tested against an independent zone in order to enhance the power of crop yield predictability [10]. A combination of neural networks and fuzzy set theory has also been applied to construct Fuzzy Neural Network (FNN) and Granular Neural Network (GNN) that have been used for predicting crop yields over different spatial locations with inputs from spot vegetation cover data [11].

Assessment of vegetative cover using standardized vegetative indices has also been adopted to give estimates of crop yield over regions either separately or in combination of the above approaches. In this method, easily measurable proxies are applied that include Normalized-Difference Vegetation Indices (NDVI), Green vegetation indices (GVI), Soil Adjusted Vegetation Indices (SAVI), Back-propagation Neural Network (BPNN) that are positively correlated with crop yield [12]. Statistical models that combine the vegetative and thermal indices from satellite data have performed better in predicting crop yield compared to those that are based on vegetative cover indices alone [13]. While mechanistic models have been applied alongside statistical models, the later have been able to reproduce key features of crop responses to warming and precipitation changes using a process-based model approach [3, 14]. The Crop Environment Resource Synthesis (CERES) model has been applied as a decision making tool in crop yield estimation [15] while PRECIS (Predicting REgional Climate for Impact Studies) has been used to assess the

Therefore, the need to continuously enhance understanding on yield estimation and prediction in a changing climate is continuous and cannot be over emphasized. It is in this back drop that this study makes attempts to adopt a simplified linear statistical approach applicable in crop yield estimation. Basic statistical description is defined and methodology discussed. Hybrid models that are both statistical and mechanistic, integrated by neural network technology based on

and performs poorly at large scale spatial coverage [9].

impacts of climate change on crop yield [14].

**2**

multiple variables of climate and crop physiological importance are found to have higher crop yield predictive power.
