**3.4 Step 4: database processing**

In this process, we will compile the database obtained through the questionnaires applied to each user evaluated. Subsequently, this database will be analyzed to know the socioeconomic and road users' knowledge. In the following graph (see **Figure 1**), the analysis of the variables of road education performed with the results obtained by the surveys in each one of the evaluated users is shown. This shows that the users that resulted with the lowest road knowledge in general are freight drivers (FD) and vehicle drivers (VD), unlike cyclists (C) who obtained the highest level of knowledge. At the same time, we can observe that the motorcyclists (M) obtained a low rating in regulation and recommendations (R&R); in contrast, the pedestrian (P) proved to have low knowledge in courtesy and urbanity (C&U).

The rest of the variables of road education by its initials are classified in the following form: traffic signals (TS), current situation in road safety and human factor (CRS&HF), infrastructure (Infra), and applied situations (AS).

## **3.5 Step 5: the probabilistic model**

In the literature, the use of *Logit* models has been reported to estimate the probability of accidents [7, 28]. In this sense, the present research project estimated the presence of road accidents using *Logit* models. These models are estimated using the commercial software NLOGIT version 5, which was used for the same objective by Tay [29]; who mentions that binary regression models are adequate techniques to predict a binary dependent variable as a function of predictor variables.

Due to its ease in its estimation, the *logit* transformation is one of the most used in studies, this conducive search of a model of choice is more comfortable analytically, and the result was the binary *logit* model. This is under the assumption that *ε*n is logistically distributed [29]; and the probability of choosing alternative *i* is given by Eq. (2).

$$P\_n\left(i\right) = \frac{1}{1 + e^{-\mu\left(V\_{\mu} - V\_{\mu}\right)}}\tag{2}$$

**Figure 1.** *Road education grade of each user.*

For this model, the dependent variable *P*(*i*), is a probability (between 0 and 1) that cannot be observed; only the choices of each individual are observed and these are variables (0 and 1).
