*3.3.4. Interpretation of the results*

There is a significant improvement in the prediction of PM2.5 in the morning (r ≈ 0.5). The performance can be explained by the fact that the PBL is relatively low in the morning. Thus, the pollution dilution is reduced and consequently the level of PM2.5 becomes strongly correlated with the pollution produced by the vehicles. The higher is the traffic activity, the higher is the concentration of fine particulate matter (see the high weight of the %orange parameter).

For the two other models, the accuracy is around the same value as a global model (r ≈ 0.3). Their predictive performance seems reduced, because the depth of the PBL increases with the augmentation of the solar radiation (maximal around noon). The poor power of prediction of these two models would be caused by the reduction of the influence of the traffic on the level of PM2.5, since the weight of the %orange parameter drops at midday and afternoon.

Nevertheless, the average performance of an approach based on three models per day provides an accuracy slightly better than the single model (see Eq. (8)). It suggests that the best prediction of PM2.5 from the traffic monitoring is obtained by analyzing the typical daily fluctuation of PM2.5 concentration and applying a specific model according to the occurrence of the pollution peaks, especially in the morning.

Since  $\overline{r}$ ,  $\overline{r}$ , \dots,  $\overline{r}$ , \dots, \dots $, $ \overline{r} $, \dots, \dots$ ,  $\overline{r}$ , \dots $, $ \overline{r} $, \dots$ ,  $\overline{r}$ , \dots $, $ \overline{r} $, \dots$ , 
$$\overline{r} = \frac{0.49 + 0.29 + 0.28}{3} = 0.35\tag{8}$$

However, the calculation of the WD is a bit more complex. It is not possible to compute the mean direction per hour, because it can provide a completely wrong result. For instance, if the wind angle is four times around the east (90°) and the two other times is around the west (270°), the mean WD will be the south-southeast (150°), even if the wind never originated in that direction. To tackle this issue, the calculation of the most representative WD for each hour

Regression Models to Predict Air Pollution from Affordable Data Collections

http://dx.doi.org/10.5772/intechopen.71848

31

Another data preparation is required before running the machine-learning algorithms. The polar coordinates of the WD (0–360°) are transformed into Cartesian coordinates, by consider-

**Figure 4.** Representation of the calculation of the WD for a specific hour. The graphic indicates the WD angles, in degrees (x-axis), and their respective ratio of occurrences (y-axis). The black curve represents the normal distribution that fits the

• Sampling of the WD to transform continuous values into discrete values.

**Figure 4** represents an example regarding the approach the WD is obtained.

is carried out through the process as follows:

• Take the mean of the Gaussian as the hourly WD.

• Fit a normal distribution to the data.

data. Here, the value of the hourly WD is mu ≈ 191°.

*4.1.2. Data transformation*

This performance could be further improved by analyzing a reduced image of the traffic map that closely matches the footprint of PM2.5 concentrations measured by the monitoring station. In this study, the used picture represents an area of 22.4 km2 and the footprint area for Belisario station (monitoring station height 10 m) would be around 3 km<sup>2</sup> , only [16]. However, we chose a bigger traffic map area to have a more representative traffic situation of the city.
