**5. Threshold selection**

Having proved that the underlying distribution is heavy-tailed, we now select a threshold, above which the distribution of the impact is approximated by the GPD. We will use the techniques provided in Section 3.3 to estimate the appropriate threshold. First, we plot the mean excesses for each value of 200 different thresholds across the whole dataset, against their corresponding thresholds, with a significance level of 0*:*05. The goal is to find the lowest threshold such that the graph is linear with increasing thresholds, within uncertainty.

The mean residual life plot in **Figure 7** looks like a straight line right from the beginning except at the very right end. The plot suggests that the threshold should be between zero and around 250, 000.

The threshold selection can be carried out more carefully through a parameter stability plot. Recall that if the threshold is too low, the assumption underlying the GPD will be violated. On the other hand, if the threshold is too high, we will have too few observations to effectively fit the distribution. Given that natural disasters can be regarded, as extreme and rare events, we will limit the range of the threshold to within the upper and lower quantiles of the impact dataset: 0 ≤*v*≤ 250, 000, which is surprisingly in alignment to the values indicated by the mean excess plot. The resulting range of the number of exceedance is then 13 and 42. **Figure 8** shows

*On Modelling Extreme Damages from Natural Disasters in Kenya DOI: http://dx.doi.org/10.5772/intechopen.94578*

**Figure 7.** *Mean residual Life Plot.*

greater than the median (**Table 1**), indicating that the underlying distribution has a long tail. We will further investigate this by plotting the empirical mean excess function against different threshold values. We test the assumption using an

An upward trend in the mean excess plot indicates heavy-tailed behaviour, as explained in Section 4.2. From **Figure 5**, we can observe a general upwards trend in

To get more conclusive results, we also plot an exponential Q-Q and observe the pattern of the points in relation to the straight line. Heavy-tailed behaviour will be indicated by a convex departure from the straight line as explained in Section 4.2. Shorter tailed-distribution will have a concave departure and if the data are a sample from an exponential distribution, the points should be approximately linear. We can observe the convex behaviour of the exponential Q-Q plot in **Figure 6**.

Having proved that the underlying distribution is heavy-tailed, we now select a threshold, above which the distribution of the impact is approximated by the GPD. We will use the techniques provided in Section 3.3 to estimate the appropriate threshold. First, we plot the mean excesses for each value of 200 different thresholds across the whole dataset, against their corresponding thresholds, with a significance level of 0*:*05. The goal is to find the lowest threshold such that the graph is

The mean residual life plot in **Figure 7** looks like a straight line right from the beginning except at the very right end. The plot suggests that the threshold should

The threshold selection can be carried out more carefully through a parameter stability plot. Recall that if the threshold is too low, the assumption underlying the GPD will be violated. On the other hand, if the threshold is too high, we will have too few observations to effectively fit the distribution. Given that natural disasters can be regarded, as extreme and rare events, we will limit the range of the threshold to within the upper and lower quantiles of the impact dataset: 0 ≤*v*≤ 250, 000, which is surprisingly in alignment to the values indicated by the mean excess plot. The resulting range of the number of exceedance is then 13 and 42. **Figure 8** shows

the graph, except for the area between one million and two million.

Thus, we can conclude that the underlying distribution is heavy-tailed.

linear with increasing thresholds, within uncertainty.

be between zero and around 250, 000.

empirical mean excess plot.

*Natural Hazards - Impacts, Adjustments and Resilience*

**Figure 6.**

*Exponential Q-Q plot.*

**5. Threshold selection**

**286**

**Figure 8.** *Parameter Stability Plot.*

**Figure 9.** *Gertensgarbe Plot.*


### **Table 2.**

*Mann-Kendall Test Results.*

the plot the MLE estimates for the parameters of the GPD against their 80 corresponding thresholds, within the range 0≤*v*≤250, 000, together with a 95% confidence intervals.

We can tell that the estimates of the shape parameter appear to be constant for the whole dataset (when the threshold,*v*, is zero). The re-parametrized scale parameter estimates on the other hand, seem to be stable beyond 50,000 but not so beyond 225,000.

The information provided by these plots can however, be rather approximative. The Gertensgarbe plot provides a more powerful procedure for threshold estimation [24].

The cross point of the Gertensgarbe graph (**Figure 9**) is at the observation numbered *k* ¼ 19, which corresponds to a threshold of 150,000. The null hypothesis

that there is no change in the series of differences is rejected with a p-value less than 0.001 (see **Table 2**). Therefore, all the three techniques detect the threshold to be between 0 ≤*v*≤250, 000. The parameter stability plot estimates the threshold to be 50,000 and the Gertensgarbe plot 150,000. We will investigate the goodness-of-fit

**Figure 10** shows that GPD fits the data best when we set the threshold at 50,000 and 150,000. The differences between the fit in the two cases also appears to be very small. Since we want to have as many exceedances as possible, we choose the

Given a threshold of 50,000, we are interested in the distribution of the excesses. The number of exceedances is 22, and **Figure 11** shows the plot of the

carry out a chi-squared test using the MLE estimates to test its goodness of fit:

package in R. The p-value is greater than 0.01 hence, we fail to reject the null

**Name Value** Ch–squared statistic 3.01 Degree of Freedom 2.00 Chi-squared p-value 0.22

*Goodness of fit of Negative Binomial to the distribution of the number of Exceedances.*

The number of exceedances is assumed to be Negative-binomial-distributed. We

**Table 3** shows the output of the chi-squared test carried out using "fitdistrplus"

In section 3, we saw that the distribution of the exceedances can be approximated by a GPD. We will test whether this theorem is justified in our dataset. We use the "bootstrap goodness-of-fit test for the GPD" [25] provided in R package "gPdtest. This test investigates the goodness-of-fit of the GPD, for cases where the distribution is heavy-tailed (shape parameter *ζ* ≥0) and non-heavy tailed (*ζ* <0).

kernel density estimates of the number and value of the exceedances.

hypothesis that the data follows a Negative binomial distribution.

of the GPD to each of the three cases.

*DOI: http://dx.doi.org/10.5772/intechopen.94578*

*On Modelling Extreme Damages from Natural Disasters in Kenya*

**6. Distribution of the exceedances**

threshold to be 50,000.

**Figure 11.**

**Table 3.**

**289**

*Density of the Exceedances.*

**Figure 10.** *Q-Q Plot of GPD under different thresholds.*

*On Modelling Extreme Damages from Natural Disasters in Kenya DOI: http://dx.doi.org/10.5772/intechopen.94578*

that there is no change in the series of differences is rejected with a p-value less than 0.001 (see **Table 2**). Therefore, all the three techniques detect the threshold to be between 0 ≤*v*≤250, 000. The parameter stability plot estimates the threshold to be 50,000 and the Gertensgarbe plot 150,000. We will investigate the goodness-of-fit of the GPD to each of the three cases.

**Figure 10** shows that GPD fits the data best when we set the threshold at 50,000 and 150,000. The differences between the fit in the two cases also appears to be very small. Since we want to have as many exceedances as possible, we choose the threshold to be 50,000.
