**3. Model performance evaluation and discussion**

In this section, we will assess the performance of our prediction models while determining an appropriate statistical distribution to be combined with the models, in such a way as to ensure the most accurate estimation of the probability of accidents


#### **Table 8.**

*Count model regression results of λ*10*ZINB*.


#### **Table 9.**

*Zero-inflation model regression results of λ*10*ZINB*.

occurring at a given SAL2 in a given year. The Bayesian information criterion (BIC) [29], Akaike's information criterion (AIC) [30], the Pearson chi-square statistic (PCS) test [31], and the degree of freedom (DF) are used to evaluate the goodness of fit (GOF) of the model. They can be respectively expressed as follows:

$$\text{BIC} = n + n \times \ln\left(2\pi\right) + n \times \ln\left(\text{RSS}/n\right) + (l+1)\ln\left(n\right),\tag{22}$$

$$\text{AIC} = n + n \times \ln\left(2\pi\right) + n \times \ln\left(\text{RSS}/n\right) + \text{2}(l+1),\tag{23}$$

$$\text{PCS} = \sum\_{i=1}^{n} \frac{\left(O\_i - \lambda\_i\right)^2}{\lambda\_i},\tag{24}$$

$$\text{DF} = n - (l + 1),\tag{25}$$

where RSS is the sum of the squares of residuals between the annual accident frequencies observed and the annual accident frequencies estimated, *n* is the sample *Accident Prediction Modeling Approaches for European Railway Level Crossing Safety DOI: http://dx.doi.org/10.5772/intechopen.109865*

size, *l* is the number of independent exponential parameters, *λ<sup>i</sup>* is the annual accident frequency expected, and *Oi* is the annual accident frequency observed.

The BIC and AIC are used to test the relative quality of models for a given dataset. Smaller BIC and AIC values indicate a better model fitting. The PCS test is used to determine if there is a significant difference between the values expected and the values observed. The PCS is roughly equal to DF if the model fits the data perfectly without any dispersion. Namely, the closer the PCS is to the DF, the better the model fits the data [14].

The log-likelihood statistic test (LL) is adopted to assess the GOF of the accident frequency prediction model combined with a statistical distribution. The larger the LL, the more preferred the model [14]. The mathematical expression of the LL is given as follows:

$$\text{LL} = \sum\_{i=1}^{n} \ln \left( \hat{P}\_i \right), \tag{26}$$

where *n* is the sample size and *P*^*<sup>i</sup>* is the estimated probability of accident frequency observed. *P*^*<sup>i</sup>* is computed respectively according to the accident frequency prediction model combined with the Poisson or the NB distribution.
