**4. Accounting for model error**

Uncertainty in probability forecasts can be divided into three distinct categories. The first category of prediction uncertainty is linked to the non-linearity of climate dynamics that causes a sensitivity to initial conditions. This is the so-called butterfly effect, which imposes

of individual models or simple ensemble means.

**3. Model based forecast products** 

and the period being forecast for.

**4. Accounting for model error** 

Ensemble strategies in theory allow for better estimation of the probability of extreme, or less likely events. The nonlinear nature of the coupled ocean-atmosphere system means that these probabilities may not be well estimated from a single 'best guess' deterministic forecast. Using simple decision models which will be discussed in more detail below, Palmer [18] demonstrated that the economic value of ensemble forecasts is greater than that

Because basic physics does not change under global warming, dynamical models are less compromised by climate change than statistical models. GCMs explicitly take into account climate processes that are important for seasonal climate prediction such as equatorial oceanic waves and atmospheric convection driven by ocean temperatures and are not constrained by what has occurred in the past. GCMs implicitly include the effects of a changing climate

Seasonal climate forecasts are inherently probabilistic due to imperfect model initialisation, instabilities in the modelled system, and model error. One approach to transform a GCM ensemble forecast into a probabilistic forecast is to define one or more event thresholds, and then take the fraction of ensemble members above this threshold as the probability forecast. This approach effectively takes the model ensemble distribution as a best guess of the probabilities of future states of the system. These can be referred to as 'ensemble relative frequency' or 'perfect model' probabilities, as they assume that the model ensemble is a perfect sample from possible futures consistent with the model initial conditions. This procedure does provide an adjustment for model biases, for example if the model tends to be biased towards warmer temperatures, because the ensemble distribution for a particular

One event for which probabilities may be desired would be the occurrence of above median monthly rainfall over a region of interest. Figure 4 shows the POAMA hindcast ensemble for the year 1997 and its conversion to a probabilistic forecast of the event of monthly rainfall being above the long-term median in the Murray Darling Basin, a region of high agricultural importance [19]. This probability forecast was generated for retrospective seasonal forecasts generated with the POAMA 1.5 model for the period of 1980 to 2006. The individual ensemble members show that for each month a range of outcomes is possible including both above and below media rainfall. These retrospective forecasts are produced from the first season of model output, meaning there is no time elapsed between the model initialisation

Uncertainty in probability forecasts can be divided into three distinct categories. The first category of prediction uncertainty is linked to the non-linearity of climate dynamics that causes a sensitivity to initial conditions. This is the so-called butterfly effect, which imposes

whatever its character or cause and can predict outcomes not seen previously.

realization is measured against the model's own climatological state.

**Figure 4.** Retrospective forecasts of mean seasonal rainfall for the Murray Darling Basis produced using the POAMA 1.5 CGCM for 1997. Upper) Time series of the model ensemble rainfall anomaly (mm/day), Middle) Probability forecast derived from the number of ensemble members lying above the model median, Lower) Observed seasonal rainfall anomaly (mm/day), where E denotes the occurrence of the above-median event.

hard limits on our ability to make deterministic predictions of nonlinear systems. The simple fact that we do not have infinite precision means that instabilities on scales smaller than the smallest resolved model scale inevitably grow and affect the larger scale until no predictive skill remains [20]. The 'saturation time' after which the system is effectively unpredictable is longer for the ocean than the atmosphere.

The second major category of prediction uncertainty is the sparseness and imprecision of earth system observations. As discussed above, the analysed state of the atmosphere and ocean is necessarily different from its actual state, and as such model projections are projecting an imperfect estimate of the initial state forward in time. As such even with a perfect physical model, predictions would be imperfect. This source of error interacts with the first, because instabilities growing from initial conditions that are not present in nature may produce possible future states that are inconsistent with actual potential future states. Ensemble forecasting allows this initial condition uncertainty to be estimated and quantified by sampling the space of plausible initial conditions and projecting this sample forward in

time. These two kinds of uncertainty can be described as 'flow dependent'[21] because their rates of growth and magnitudes are sensitive to the stability of the point in phase space characterising the flow.

Managing Climate Risk with Seasonal Forecasts 567

The joint distribution of the forecasts in one bin �� and observed events � is �(��, �) = ����

The calibration-refinement factorisation of the joint distribution for a particular forecast bin,

�(��, �) = �(�|��)�(��), is composed of two factors: the true positive ratio �(�|��) and the marginal

�(�|��) <sup>=</sup> ��

The true positive ratio �(�|��) is the conditional probability of the event given this particular forecast, while �(��) is the probability that the forecast system produces this category of forecasts, which indicates if the system is biased in one way or another. �(�|��) can be considered an estimate of the expected probability of the event of above median rainfall

We apply this simple forecast validation scheme to the POAMA MDB rainfall forecasts discussed above. In order to compute meaningful statistics on these probability outlooks, three bins for the probability of rainfall exceeding the climatological median were used. A small number of forecast verification pairs in any particular bin reduces the statistical significance of results markedly. Larger probability bins can be used to mitigate this, but at the expense of forecast resolution and sharpness. The three bins translate into categorical forecasts of a low, medium and high probability of an above median rainfall event. The binned forecasts were verified against Australian rainfall data from the Australian Bureau of

Table 2 shows these counts for the MDB rainfall forecasts described above for all months in

**Table 3.** Calibration table for GCM forecasts of above median seasonal rainfall, computed using data in

If the calibration distribution in each bin is assumed to be a Bernoulli distribution, probability intervals for the parameter can be generated for the forecasts by a permutation counting method. An alternative method for larger datasets for which permutation counting is prohibitive is to use percentiles of a normal posterior distribution. Table 3 gives the true positive ratio with a 90% probability interval for the data in Table 2. It can immediately be seen that the probability distribution implied by the model ensemble is not consistent with

p(E|F) 90% Probability

Interval of p(E|F)

(�� � ��)

 .

where the total number of forecast-verification pairs is � = ��� � ���.

based on the information from the forecast and its verification.

Meteorology National Climate Centre's gridded atmospheric data set [23].

Low (0-33%) 0.21 0.39 0.31 - 0.47 Medium (33-66%) 0.50 0.49 0.42 - 0.56 High (66-100%) 0.80 0.65 0.56 - 0.73

(Model probability)

Forecast Mean Ensemble Frequency

frequency �(��) = (�� � ��)��**,** where

the hindcast period.

table 2 with 90% probability interval.

The final category of uncertainty is model error, the fact that our mathematical idealisations of the climate system are not perfect. This includes errors due to imperfect physical parameterisations, errors due to unresolved processes at the sub-grid scale and differences between the mean state of the model and the true system. This class of error is widely studied and motivates research into better models with improved representation of physics, and model calibration techniques that can account for or correct the errors.

Single model ensemble forecasts only capture the components of prediction uncertainty associated with uncertain initial conditions and model-captured instability, and these are only fully captured in the ideal case of an infinite ensemble that uniformly samples initial condition uncertainty. An ensemble of a single model provides no information about the model error component of prediction uncertainty (Stephenson, 2005), and models that are structurally similar will invariably share biases.

### **4.1. Assessing forecast error**

Forecast validation is the process of measuring the correctness of a set of issued forecasts. It can be thought of as being distinct from model validation which is about determining whether a model correctly resolves physical processes [20].

Here we give an example of forecast validation based on the definition of discrete events, for example the event of rainfall over a three month period exceeding a given threshold, and of categorical forecasts, for example low, medium and high probability of the event. For three forecast categories, the contingency table summarising the forecast- verification set has the form shown in Table 1, with forecast categories ݂ଵǡ ݂ଶǡ ݂ଷ counts of observed events ଵǡ ଶǡ ଷ and counts of non-events ݊ଵǡ ݊ଶǡ ݊ଷ over each forecast. The 'distributions oriented' theory of forecast verification interprets the contingency table statistics in terms of the joint, marginal and conditional probability distributions of events and forecasts [22]. In this theory, the contingency table contains all the information required to generate a standard set of verification scores.


**Table 1.** Contingency table for a binary event with three forecast categories.


**Table 2.** Contingency table for MDB seasonal monthly rainfall hindcasts from POAMA 1.5, all start months.

The joint distribution of the forecasts in one bin �� and observed events � is �(��, �) = ���� where the total number of forecast-verification pairs is � = ��� � ���.

566 Risk Management – Current Issues and Challenges

structurally similar will invariably share biases.

whether a model correctly resolves physical processes [20].

**4.1. Assessing forecast error** 

characterising the flow.

time. These two kinds of uncertainty can be described as 'flow dependent'[21] because their rates of growth and magnitudes are sensitive to the stability of the point in phase space

The final category of uncertainty is model error, the fact that our mathematical idealisations of the climate system are not perfect. This includes errors due to imperfect physical parameterisations, errors due to unresolved processes at the sub-grid scale and differences between the mean state of the model and the true system. This class of error is widely studied and motivates research into better models with improved representation of physics,

Single model ensemble forecasts only capture the components of prediction uncertainty associated with uncertain initial conditions and model-captured instability, and these are only fully captured in the ideal case of an infinite ensemble that uniformly samples initial condition uncertainty. An ensemble of a single model provides no information about the model error component of prediction uncertainty (Stephenson, 2005), and models that are

Forecast validation is the process of measuring the correctness of a set of issued forecasts. It can be thought of as being distinct from model validation which is about determining

Here we give an example of forecast validation based on the definition of discrete events, for example the event of rainfall over a three month period exceeding a given threshold, and of categorical forecasts, for example low, medium and high probability of the event. For three forecast categories, the contingency table summarising the forecast- verification set has the form shown in Table 1, with forecast categories ݂ଵǡ ݂ଶǡ ݂ଷ counts of observed events ଵǡ ଶǡ ଷ and counts of non-events ݊ଵǡ ݊ଶǡ ݊ଷ over each forecast. The 'distributions oriented' theory of forecast verification interprets the contingency table statistics in terms of the joint, marginal and conditional probability distributions of events and forecasts [22]. In this theory, the contingency table contains all the information required to generate a standard set of verification scores.

Forecast Observed Events Observed Non Events

**Table 1.** Contingency table for a binary event with three forecast categories.

Forecast Events Observed Non Events Total Low 37 59 96 Medium 67 70 137 High 59 32 91

݂ଵ ଵ ݊ଵ ݂ଶ ଶ ݊ଶ ݂ଷ ଷ ݊ଷ

**Table 2.** Contingency table for MDB seasonal monthly rainfall hindcasts from POAMA 1.5, all start months.

and model calibration techniques that can account for or correct the errors.

The calibration-refinement factorisation of the joint distribution for a particular forecast bin,

$$p(F\_l, E) = p(E \| F\_l) p(F\_l)\_r$$

is composed of two factors: the true positive ratio �(�|��) and the marginal frequency �(��) = (�� � ��)��**,** where

$$p(E|F\_l) = \frac{o\_l}{(o\_l + n\_l)}\ .$$

The true positive ratio �(�|��) is the conditional probability of the event given this particular forecast, while �(��) is the probability that the forecast system produces this category of forecasts, which indicates if the system is biased in one way or another. �(�|��) can be considered an estimate of the expected probability of the event of above median rainfall based on the information from the forecast and its verification.

We apply this simple forecast validation scheme to the POAMA MDB rainfall forecasts discussed above. In order to compute meaningful statistics on these probability outlooks, three bins for the probability of rainfall exceeding the climatological median were used. A small number of forecast verification pairs in any particular bin reduces the statistical significance of results markedly. Larger probability bins can be used to mitigate this, but at the expense of forecast resolution and sharpness. The three bins translate into categorical forecasts of a low, medium and high probability of an above median rainfall event. The binned forecasts were verified against Australian rainfall data from the Australian Bureau of Meteorology National Climate Centre's gridded atmospheric data set [23].


Table 2 shows these counts for the MDB rainfall forecasts described above for all months in the hindcast period.

**Table 3.** Calibration table for GCM forecasts of above median seasonal rainfall, computed using data in table 2 with 90% probability interval.

If the calibration distribution in each bin is assumed to be a Bernoulli distribution, probability intervals for the parameter can be generated for the forecasts by a permutation counting method. An alternative method for larger datasets for which permutation counting is prohibitive is to use percentiles of a normal posterior distribution. Table 3 gives the true positive ratio with a 90% probability interval for the data in Table 2. It can immediately be seen that the probability distribution implied by the model ensemble is not consistent with the probability distribution implied by the verification of the forecasts. For example the mean probability for 'low probability' forecasts is 21%, but the event occurs 39% of the time for this forecast category.

Managing Climate Risk with Seasonal Forecasts 569

**Figure 5.** Reliability diagram for Murray Darling Basin monthly rainfall forecasts.

**5. Understanding prediction utility: Simple decision models** 

regarding potential costs and payoffs than the simple models studied here.

made by Anders Angstrom as documented by [24].

**5.1. Adjusting model output: Introducing calibration** 

economic value of the forecasts. (Table 5.)

In order to begin to understand potential uses of seasonal forecasts it is instructive to study simple cost-loss decision models. Such simple models provide a framework to begin to quantify the potential value of forecasts. Before proceeding, we note that real-world decisions are typically made with far more parameters and subject to greater uncertainty

We first consider a simple binary event, binary decision model in which there are two possible outcomes – the occurrence or non-occurrence of an event - and the user makes a decision to protect, or not protect, against the event. Protection has a cost; failure to protect incurs a loss. The classic example is the decision to carry an umbrella to protect against the possibility of rain. A seasonal timescale example is the decision to apply fertiliser to a crop based on the likelihood of future rainfall over a season. An early study of these issues was

A failure to protect with cost C results in a loss L. In this framework it only makes sense to take action given the probability of the event P if P > CL. If it is not, then the expected loss is less than the cost of taking protective action. The combination of the joint distribution of forecasts and observations and the decision-makers cost function determines the potential

Decisions about the use of GCMs for seasonal climate forecasting are usually based upon measures of model performance over a hind-cast (retrospective forecast) period. A natural

The earth system is very high dimensional and the procedure used here reduces the dimensionality of the problem. Such dimension reduction may result in a loss of information about the performance of the system – we are faced with a trade-off between information contained in the model-based forecasts, and seeking to extract information from the model-reforecast dataset. In this case simple binning is used, more sophisticated methods such as principal component analysis could also be employed. We will return to this point later in the chapter when calibration is discussed.


**Table 4.** True positive ratio for POAMA MDB June-July-August rainfall.

### **4.2. Assessment of probability forecasts**

In assessment of probability forecasts the two main aspects of performance are resolution and reliability. Reliability is defined as the degree to which the observed frequency of an event coincides with its forecast probability. Reliability does not guarantee useful skill, but forecasts that are not reliable cannot be taken at face value and must be adjusted, either implicitly as occurs when a verification plot demonstrating overconfidence is published next to a forecast or explicitly by downgrading probabilities that are not justified by model performance. The term 'well calibrated' is used to describe probability forecasts that are reliable. Resolution is defined as the frequency with which different observed outcomes follow different forecast categories, in other words the degree to which the forecast system can 'resolve' different outcomes.

Figure 5 shows the reliability diagram for the POAMA 1.5 Murray Darling Basin Average monthly mean rainfall, for all months. The green bar marks the forecast 90% probability interval (ci), the purple bar marks the 90% probability interval for perfect forecasts with the same sample size. Reliability diagrams are plots of the true positive ratio (also known as the calibration function, observed relative frequency, likelihood and hit rate) against the mean probability of the forecasts in each bin. Reliability diagrams are used to assess the degree to which the model forecast probabilities agree with the observed frequencies, shown in figure 5 with the probability intervals described above. The figure shows that even when small sample size is taken into account, the forecasts are overconfident. Resolution is represented by the spread of points on the reliability diagram in the vertical – it can be seen the model has some ability to resolve between the two outcomes.

**Figure 5.** Reliability diagram for Murray Darling Basin monthly rainfall forecasts.

this point later in the chapter when calibration is discussed.

Low (0-33%) 3 6 0.33 0.14 - .59 Medium (33-66%) 1 3 0.25 0.4 - 0.63 High (66-100%) 10 4 0.71 0.50 – 0.87

**Table 4.** True positive ratio for POAMA MDB June-July-August rainfall.

**4.2. Assessment of probability forecasts** 

can 'resolve' different outcomes.

has some ability to resolve between the two outcomes.

for this forecast category.

the probability distribution implied by the verification of the forecasts. For example the mean probability for 'low probability' forecasts is 21%, but the event occurs 39% of the time

The earth system is very high dimensional and the procedure used here reduces the dimensionality of the problem. Such dimension reduction may result in a loss of information about the performance of the system – we are faced with a trade-off between information contained in the model-based forecasts, and seeking to extract information from the model-reforecast dataset. In this case simple binning is used, more sophisticated methods such as principal component analysis could also be employed. We will return to

Forecast Events Nonevents p(E|F) 90% Probability Interval

In assessment of probability forecasts the two main aspects of performance are resolution and reliability. Reliability is defined as the degree to which the observed frequency of an event coincides with its forecast probability. Reliability does not guarantee useful skill, but forecasts that are not reliable cannot be taken at face value and must be adjusted, either implicitly as occurs when a verification plot demonstrating overconfidence is published next to a forecast or explicitly by downgrading probabilities that are not justified by model performance. The term 'well calibrated' is used to describe probability forecasts that are reliable. Resolution is defined as the frequency with which different observed outcomes follow different forecast categories, in other words the degree to which the forecast system

Figure 5 shows the reliability diagram for the POAMA 1.5 Murray Darling Basin Average monthly mean rainfall, for all months. The green bar marks the forecast 90% probability interval (ci), the purple bar marks the 90% probability interval for perfect forecasts with the same sample size. Reliability diagrams are plots of the true positive ratio (also known as the calibration function, observed relative frequency, likelihood and hit rate) against the mean probability of the forecasts in each bin. Reliability diagrams are used to assess the degree to which the model forecast probabilities agree with the observed frequencies, shown in figure 5 with the probability intervals described above. The figure shows that even when small sample size is taken into account, the forecasts are overconfident. Resolution is represented by the spread of points on the reliability diagram in the vertical – it can be seen the model
