**3. Historical data and scenario modelling**

It is practice in operational risk management to use different data sources for modelling future losses. Banks have been collecting their own data, but realistically, most banks only have between five and ten years of reliable loss data. To address this shortcoming, loss data from external sources and scenario data can be used by banks in addition to their own internal loss data and controls [12]. Certain external loss databases exist, including publicly available data, insurance data and consortium data. The process of incorporating data from external sources requires due consideration because of biases in the external data. One method of combining operational losses collected from various banks of different sizes and loss reporting thresholds, is discussed in [13]. In the remainder of our discussion we will only refer to historical data, which may be a combination of internal and external loss data.

Three types of scenario assessments are also suggested to improve the estimation of the severity distribution, namely the individual scenario approach, the interval approach, and the percentile approach. In the remainder of the chapter we discuss the percentile approach as we believe it is the most practical of the existing approaches available in the literature [4]. That being said, it should be noted that probability assessments by experts are notoriously difficult and unreliable as discussed in [14]. We mentioned previously that it is often an extreme quantile of the aggregate loss distribution that is of interest. In the case of operational risk, the regulator requires that the one-in-a-thousand-year quantile of this distribution be estimated, in other words the aggregate loss level that will be exceeded once in a thousand years. Considering that banks' only have limited historical data available, i.e. maximum of ten years of internal data, the estimation of such a quantile, using historical data only, is a near impossible task. So modellers have suggested the use of scenarios and experts' assessments thereof.

We advocate the use of the so-called 1-in-*c* year scenario approach as discussed in [4]. In the 1-in-*c* years scenario approach, the experts are asked to answer the question: 'What loss level *qc* is expected to be exceeded once every *c* years?'. Popular choices for *c* vary between 5 and 100 and often 3 values for *c* are used. As an example, the bank alluded to at the start of this chapter, used *c* ¼ 7, 20 and 100 and motivated the first choice as the number of years of reliable historical data available to them. In this case the largest loss in the historical data may serve as a guide for

methods, the interested reader is referred to [7]. The SLA has become very popular in the financial industry due to its simplicity and can be stated as follows: If *T* is the true underlying severity distribution function of the individual losses and *λ* the true annual frequency then the 100 1ð Þ � γ % VaR of the compound loss distribution

ð Þþ 1 � *γ=λ λμ*, where *μ* is the finite mean of the true underlying severity

ð Þ <sup>1</sup> � *<sup>γ</sup>* <sup>≈</sup>*T*�<sup>1</sup>

states that the 100 1ð Þ � γ % VaR of the aggregate loss distribution may be approximated by the 100 1ð Þ � *γ=λ* % VaR of the severity distribution, if the latter is part of the sub-exponential class of distributions. This follows from a theorem

*P max X* ð Þ f g 1, … ,*XN* >*x* as *x* ! ∞ (see e.g. [9]). The result is quite remarkable in that a quantile of the aggregate loss distribution may be approximated by a more extreme quantile (if *λ*>1) of the underlying severity distribution. EVT is all about modelling extremal events and is especially concerned about modelling the tail of a distribution (see e.g. [10]), i.e. that part of the distribution we are most interested in. Bearing this in mind we might consider modelling the body and tail of the

Let *q* be a quantile of the severity distribution *T*. We use *q* as a threshold that splice *T* in such a way that the interval below *q* is the expected part and the interval above *q* the unexpected part of the severity distribution. Define two distribution

*Te*ð Þ¼ *x T x*ð Þ*=T q*ð Þ for *x*≤*q* and

i.e. *Te*ð Þ *x* is the conditional distribution function of a random loss *X* � *T* given

that *X* ≤ *q* and *Tu*ð Þ *x* is the conditional distribution function given that *X* >*q*.

*Tu*ð Þ¼ *x* ½ � *T x*ð Þ� *T q*ð Þ *=*½ � 1 � *T q*ð Þ for *x*>*q*, (2)

ð Þ 1 � *γ=λ* or, as modified by [8] for large *λ*, by

ð Þ 1 � *γ=λ* , (1)

*<sup>n</sup>*¼<sup>1</sup>*Xn* <sup>&</sup>gt;*<sup>x</sup>* � �

≈

may be approximated by *T*�<sup>1</sup>

distribution. The first order approximation by [6]

severity distribution separately as follows.

Note that we then have the identity

*CoP*�<sup>1</sup>

*Variation obtained in the VaR estimates for different values of EVI and frequency.*

*Linear and Non-Linear Financial Econometrics - Theory and Practice*

from extreme value theory (EVT) which states that *P A* <sup>¼</sup> <sup>P</sup>*<sup>N</sup>*

*T*�<sup>1</sup>

**Figure 1.**

functions

**16**

choosing *q*<sup>7</sup> since this loss level has been reached once in 7 years. If the experts judge that the future will be better than the past, they may want to provide a lower assessment for *q*<sup>7</sup> than the largest loss experienced so far. If they foresee deterioration, they may judge that a higher assessment is more appropriate. The other choices of *c* are selected in order to obtain a scenario spread within the range that one can expect reasonable improvement in accuracy from the experts' inputs. Of course, the choice of *c* ¼ 100 may be questionable because judgements on a 1-in-100 years loss level are likely to fall outside many of the experts' experience. In the banking environment, they may also take additional guidance from external data of similar banks which in effect amplifies the number of years for which historical data are available. It is argued that this is an essential input into scenario analysis [12]. Of course requiring that the other banks are similar to the bank in question may be a difficult issue and the scaling of external data in an effort to make it comparable to the bank's own internal data raises further problems (see e.g. [15]). We will not dwell on this issue here and henceforth assume that we do have the 1-in-*c* years scenario assessments for a range of c-values, but have to keep in mind that subjective elements may have affected the reliability of the assessments.

If the annual loss frequency is *Poi*ð Þ*λ* distributed and the true underlying severity distribution is *T*, and if the experts are of oracle quality in the sense of actually knowing *λ* and *T*, then the assessments provided should be

$$q\_c = T^{-1} \left( \mathbf{1} - \frac{\mathbf{1}}{c\lambda} \right). \tag{4}$$

fact that *qc* <sup>¼</sup> *<sup>T</sup>*�<sup>1</sup> <sup>1</sup> � <sup>1</sup>

make their assessments.

**4. Estimating VaR**

**4.1 Naïve approach**

*4.1.2 Remarks*

**19**

*4.1.1 Naïve VaR estimation algorithm*

� � <sup>¼</sup> <sup>1</sup> � <sup>1</sup>*=<sup>c</sup>* or 1 � *Tu qc*

*Tu qc*

*cλ* � � <sup>¼</sup> *<sup>T</sup>*�<sup>1</sup>

*DOI: http://dx.doi.org/10.5772/intechopen.93722*

*<sup>u</sup>* <sup>1</sup> � *<sup>b</sup> c*

events, a reasonably accurate assessment of *q*<sup>1</sup> should be possible. Then

mulation of the basic question of the 1-in-*c* years approach. For example, if we take *b* ¼ 1 then *q*<sup>1</sup> would be the experts' answer to the question 'What loss level is expected to be exceeded once annually?'. Unless we are dealing with only rare loss

*Construction of Forward-Looking Distributions Using Limited Historical Data and Scenario…*

meaning of *Tu* this tells us that *qc* would be the answer to the question: 'Amongst those losses that are larger than *q*1, what level is expected to be exceeded only once in *c* years?'. Conditioning on the losses larger than *q*<sup>1</sup> has the effect that the annual frequency of all losses drops out of consideration when an answer is sought. In the remainder of the chapter we will assume that this question is posed to the experts to

Suppose we have available *a* years of historical loss data *x*1, *x*2, … , *xK* and scenario assessments ~*q*7, ~*q*<sup>20</sup> and ~*q*<sup>100</sup> provided by the experts. In the previous sections two modelling options have been suggested for modelling the true severity distribution *T* and a third will follow below. The estimation of the 99.9% VaR of the aggregate loss distribution is of interest and we will consider three approaches to estimate it, namely the naïve approach, the GPD approach and Venter's approach. The naïve approach will make use of historical data only, the GPD approach (which is based on the mixed model formulation) and Venter's approach will make use of both historical data and scenario assessments. Below we demonstrate that, as far as estimating VaR is concerned, that Venter's approach is preferred to the GPD and naïve approaches.

Assume that we have available only historical data and that we collected the loss severities of a total of *K* loss events spread over *a* years and denote these observed or historical losses by *<sup>x</sup>*1, … , *xK*. Then the annual frequency is estimated by ^*<sup>λ</sup>* <sup>¼</sup> *<sup>K</sup>=a*. Let *F x*ð Þ ; θ denote a suitable family of distributions to model the true loss severity distribution *T*. The fitted distribution is denoted by *F x*; ^*θ* � �, with ^*θ* denoting the (maximum likelihood) estimate of the parameter(s) *θ:* In order to estimate VaR a small adjustment

of the Monte Carlo approximation approach, discussed earlier, is necessary.

i. Generate *N* from the Poisson distribution with parameter ^*λ*;

iii. Repeat i and ii *I* times independently to obtain *Ai*, *i* ¼ 1, 2, … ,*I*. Then the 99.9% VaR is estimated by *A*ð Þ ½ �þ <sup>0</sup>*:*<sup>999</sup> <sup>∗</sup> *<sup>I</sup>* <sup>1</sup> where *A*ð Þ*<sup>i</sup>* denotes the *i*-th order

The estimation of VaR using the above-mentioned naïve approach has been discussed in several books and papers (see e.g. [11]). [16] stated that heavy-tailed

*<sup>n</sup>*¼<sup>1</sup>*Xn*;

ii. Generate *<sup>X</sup>*1, … , *XN* � *iid F x*; ^*<sup>θ</sup>* � � calculate *<sup>A</sup>* <sup>¼</sup> <sup>P</sup>*<sup>N</sup>*

statistic and ½ � *k* the largest integer contained in *k*.

� ) has interesting suggestions about the for-

� � <sup>¼</sup> <sup>1</sup>*=c*. Keeping in mind the conditional probability

To see this, let *Nc* denote the number of loss events experienced in *c* years and let *Mc* denote the number of these that are actually greater than *qc:* Then *Nc* � *Poi c*ð Þ*λ* and the conditional distribution of *Mc* given *Nc* is binomial with parameters *Nc* and 1 � *pc* ¼ *P* X≥*qc* <sup>¼</sup> <sup>1</sup> � *T qc* with *<sup>X</sup>* � *<sup>T</sup>* and *pc* <sup>¼</sup> *T qc* <sup>¼</sup> <sup>1</sup> � <sup>1</sup> *cλ* . Therefore *EMc* ¼ *EEMc* ð Þ j*Nc* ½ �¼ *E Nc* 1 � *pc* <sup>¼</sup> *<sup>c</sup><sup>λ</sup>* <sup>1</sup> � *T qc* <sup>Þ</sup> . Requiring that *EMc* <sup>¼</sup> 1, yields (4).

As illustration of the complexity of the experts' task, take *λ* ¼ 50 then *q*<sup>7</sup> ¼ *T*�<sup>1</sup> ð Þ <sup>0</sup>*:*<sup>99714</sup> , *<sup>q</sup>*<sup>20</sup> <sup>¼</sup> *<sup>T</sup>*�<sup>1</sup> ð Þ <sup>0</sup>*:*<sup>999</sup> and *<sup>q</sup>*<sup>100</sup> <sup>¼</sup> *<sup>T</sup>*�<sup>1</sup> ð Þ 0*:*9998 which implies that the quantiles that have to be estimated are very extreme.

Returning to the SLA i.e. *CoP*�<sup>1</sup> ð Þ <sup>1</sup> � *<sup>γ</sup>* <sup>≈</sup>*T*�<sup>1</sup> ð Þ 1 � *γ=λ* , and by taking *γ* ¼ 0*:*001, which implies *c* ¼ 1000, we could ask the oracle the question 'What loss level *q*1000is expected to be exceeded once every 1000 years?'. The oracle will then produce an answer that can be used directly as an approximation for the 99.9% VaR of the aggregate loss distribution. Of course, the experts we are dealing with are not of oracle quality.

In the light of the above arguments one has to take in consideration: (a) the SLA gives only an approximation to the VaR we are trying to estimate, and (b) experts are very unlikely to have the experience or the information at their disposal to assess a 1-in-1000 year event reliably. One can realistically only expect them to assess events occurring more frequently such as once in 30 years.

Returning to the oracle's answer in (4), the expert has to consider both the true severity distribution and the annual frequency when an assessment is provided. In order to simplify the task of the expert, consider the mixed model in (3) discussed in the previous section. This model will assist us in formulating an easier question for the expert to answer. Note that the oracle's answer to the question in the previous setting can be stated as *T qc* <sup>¼</sup> <sup>1</sup> � <sup>1</sup> *<sup>c</sup><sup>λ</sup>* (from (4)) and therefore depends on the annual frequency. However using the definition of *Tu* and taking *q* ¼ *qb*, *b*<*c*; it follows that *Tu qc* <sup>¼</sup> <sup>1</sup> � *<sup>b</sup> <sup>c</sup>* which does not depend on the annual frequency. This

*Construction of Forward-Looking Distributions Using Limited Historical Data and Scenario… DOI: http://dx.doi.org/10.5772/intechopen.93722*

fact that *qc* <sup>¼</sup> *<sup>T</sup>*�<sup>1</sup> <sup>1</sup> � <sup>1</sup> *cλ* � � <sup>¼</sup> *<sup>T</sup>*�<sup>1</sup> *<sup>u</sup>* <sup>1</sup> � *<sup>b</sup> c* � ) has interesting suggestions about the formulation of the basic question of the 1-in-*c* years approach. For example, if we take *b* ¼ 1 then *q*<sup>1</sup> would be the experts' answer to the question 'What loss level is expected to be exceeded once annually?'. Unless we are dealing with only rare loss events, a reasonably accurate assessment of *q*<sup>1</sup> should be possible. Then *Tu qc* � � <sup>¼</sup> <sup>1</sup> � <sup>1</sup>*=<sup>c</sup>* or 1 � *Tu qc* � � <sup>¼</sup> <sup>1</sup>*=c*. Keeping in mind the conditional probability meaning of *Tu* this tells us that *qc* would be the answer to the question: 'Amongst those losses that are larger than *q*1, what level is expected to be exceeded only once in *c* years?'. Conditioning on the losses larger than *q*<sup>1</sup> has the effect that the annual frequency of all losses drops out of consideration when an answer is sought. In the remainder of the chapter we will assume that this question is posed to the experts to make their assessments.
