**5. Adding trace gas concentrations**

*4.3.3. Afternoon model*

**PM2.5 =**

**−2002.1108**

 **r = 0.56 RMSE = 6.61**

peak at around 20 μg/m3

*4.3.4. Interpretation of the results*

*r*

diction per se.

three daily models.

The linear regression model obtained after running the algorithm is as follows:

This model is simpler (only four features) and does not consider exactly the same attributes than the two previous models (Pressure is used, and %red is preferred to %orange.) In Section 3.3.3, differences were already noted in the afternoon model with respect to the morning and midday. The explanation seemed to be related to the difficulty to get a reliable predictive model of PM2.5 when the particulates are strongly diluted in the atmosphere. In such a situation, the fair performance of the model (r = 0.56; RMSE = 6.61) would be more caused by the reduced fluctuation of the PM2.5 values (**Figure 3** shows a maximum

Eq. (11) presents the average prediction accuracy by modeling the air pollution through the

Although the morning model is slightly more accurate than the two other ones, the mean value of the regression coefficient is not better than the regression coefficient of the single

Thus, when meteorological factors are taken into account, it does not seem to be advantageous to consider three regression models per day. It can be explained by the fact that the weather conditions have a very strong effect on the levels of PM2.5 (e.g., rain and wind tend to clean the atmosphere). Thus, including these factors as model features reduces the mere influence of the traffic on the value of PM2.5. And since the impact of this human activity is more

¯ <sup>=</sup> 0.58 <sup>+</sup> 0.56 <sup>+</sup> 0.56 \_\_\_\_\_\_\_\_\_\_\_

in the morning) than the reliability of the pre-

<sup>3</sup> = 0.57 (11)

, against 30 μg/m3

model, especially if this model is obtained by a model tree algorithm.

 **−0.02 \* minutes + 28.0895 \* %red + 0.4498 \* RH + −2.7491 \* pressure +**

38 Machine Learning - Advanced Techniques and Emerging Applications

The prediction accuracy of the model is evaluated as

This part intends to verify the prediction accuracy of the methods as described in the previous sections. To do so, the precision of the prediction based on low-cost data collection is compared with a pollution monitoring that makes use of costlier technologies (i.e., EPA-approved chemical sensors). Then, a hybrid model is proposed from a selection of the most relevant features to minimize the prediction error.
