**3.4 Analysis of Taxi claims data**

Again, the Taxi claims data and R codes used to analyze the data can be obtained from the authors on request. **Table 5** provides the descriptive summary of the Taxi claims data.

Both the calculated skewness and kurtosis are positive, thus based on the information based on **Tables 3** and **4**, we can conclude that the dataset is skewed to the right and is heavy-tailed.


#### **Table 5.**

*Descriptive statistics of the Taxi claims data.*

**Figure 7.** *Boxplot of Taxi claims data.*

**Figure 8.** *Histogram and CDF of taxi claims data.*

From **Figure 7a**, it is observed that the Taxi claims data has many extreme observations on the right tail because **Figure 7b** that excludes extreme values shows that the lower quantile of the boxplot. Next, **Figure 8** provides the corresponding PDF via a histogram (see **Figure 8a**) and the CDF (see **Figure 8b**) of the Taxi claims data.

Next, we fit a gamma, Weibull, log-normal, Burr, and Pareto distributions to the Taxi claims data. The parameter estimates and goodness-of-fit test results are provided in **Figure 9**.

**Figure 9.**

*QQ plots of the Taxi claims data fitted for different loss distribution.*


#### **Table 6.**

*MLE parameters and goodness-of-fit values.*

From **Figure 9**, it is observed that the log-normal is a better fit than the other corresponding distributions (**Table 6**).

It is observed that the log-normal distribution has the lowest KS and AD values out of all the tested distributions, and we can assume that the median of our data and tail is best explained by the log-normal distribution. Overall, we can see that the heavytailed distributions, log-normal, and Pareto distribution fit our data very well, and the thin-tailed distributions, gamma and Weibull distributions fit is poor. We can conclude that our data is best modeled by the log-normal distribution.

**Figure 10** is the Q-Q plot for the log-normal distribution, although the log-normal distribution was the best distribution to model the Taxi claims data according to the AD test and the KS test, it was not good for modeling extreme losses or extreme values (see the tails of **Figure 10**).

### **3.5 VaR sensitivity**

The maximum amount that can be lost during a specific holding period with a given level of confidence is known as VaR (i.e. value-at-risk). Four methods will be used to determine VaR:

**Figure 10.** *Log-normal QQ-plots.*

The empirical approach—it entails sorting the data from lowest to highest and taking quantiles. Here, the focus is on the 90th, 95th, 97.5th, and 99th percentiles.

The parametric approach—it entails the use of the fitted model. Here the lognormal was the best distribution for the Taxi claims data. Thus, to calculate the VaR, we used a "VaRes" package on R and put in the shape and scale parameter and computed the 90th, 95th, 97.5th, and 99th percentiles.

The stochastic approach—it entails simulating from a log-normal distribution with shape and scale parameter found from Taxi claims data and computing the 90th, 95th, 97.5th, and 99th percentiles.

The generalized extreme value (GEV) distribution approach—it entails simulating from the GEV distribution and computing the 90th, 95th, 97.5th, and 99th percentiles.

**Table 7** and **Figure 11** gives a summary of the different approaches of calculating VaR. Note that the column "VaR0.9" in **Table 7** under empirical approach can be interpreted as follows: "we can be 90% confident that the maximum amount claimed will be R30 616.67." In the case of VaR0.9, it is evident that the empirical VaR are close to the extreme value approach, and as the percentiles increase from VaR0.975 to VaR0.99, the GEV distribution was overestimating the risk. Thus, for the Taxi claims data, one would be more inclined to use the empirical approach because it does not assume any underlying distribution as the log-normal distribution seem to be poor in capturing the extreme tail component of the data. The rest of the amounts can be interpreted in the same way for the corresponding percentage levels.


**Table 7.** *VaR for the Taxi claims data.*

**Figure 11.** *Graphical representation of VaR.*

In this section, we aimed to learn how to quantify operational risk using parametric loss distributions (i.e. exponential, log-normal, gamma, Weibull, Pareto, and Burr distributions). As an example, for the chapter application, we fitted all the distributions to the Taxi claims data. Overall, we can conclude that out of all the distributions fitted, the log-normal distribution seemed to be the best-fitting distribution. We also observed some of the drawbacks of using the parametric distributions in that it tends to fail in capturing the tail as well as the peak of the data very well and makes one think maybe the true underlying distribution could be different from the fitted one. This means that if a parametric distribution method is applied to quantify operational risk, it is better to fit different distributions to the tail and the body to get better estimates. This is evident in our data where after a certain threshold, it is observed that the log-normal distribution underestimates the probability in the tail area; however, the GEV distribution fits better.
