**38.6386**

The prediction accuracy of the model is evaluated as

**r = 0.58**

relative humidity and Xminutes. The regression models that depend on the RH threshold (nine features) are slightly simpler than the models that depend on the Xminutes threshold (11 features). To note that when the tree algorithm is applied, the SR is included in the model, even though its weight is quite low. As expected, the model tree (four rules and an average of 10 features per rule) is more complex than the linear regression model (seven features). Nevertheless, the model tree is still easy to interpret and provides a prediction performance slightly better than the linear regression (+0.05 for the correlation coefficient

This analysis shows that including meteorological factors as model inputs improves the prediction accuracy of PM2.5 concentrations (r = 0.58). The performance is slightly improved by

Thus, the results suggest that the use of a quite affordable meteorological station enables us to significantly improve the prediction of the concentration of fine particulate matter (The correlation coefficient is twice higher than with the traffic monitoring only.) All the meteorological factors are relevant for the prediction, except the precipitation accumulation. Rain seems to be

Next, it is studied if a multiple model approach, based on three models a day, could improve

The same division of the dataset into three periods as in Section 3.3 is carried out. Since the day is analyzed into three independent parts, the dataset can be reduced to 12 features: minutes, %orange, %red, SR, T, P, rain, RH, WS, Xwind, Ywind, and PM2.5 (= feature to predict). The three datasets are composed of 110, 116, and 145 instances for the morning, midday, and

The linear regression model obtained after running the algorithm is as follows:

applying a model tree, which is composed of four linear regressions (r = 0.63).

excluded from the model, because it is a very rare event.

36 Machine Learning - Advanced Techniques and Emerging Applications

of the tree).

*4.2.3. Interpretation of the results*

the prediction accuracy.

afternoon models, respectively.

**0.0513 \* minutes +**

 **41.7958 \* %orange + −0.23 \* RH + −2.8397 \* temperature + 2.5325 \* Xwind +**

**4.3. Multiple models**

*4.3.1. Morning model*

**PM2.5 =**

**RMSE = 9.56**

The model presents six features, only. It means that many attributes are filtered, especially in terms of meteorological factors (SR, pressure, rain, and WS are removed). It can be explained by the fact that the prediction of the level of PM2.5 in the morning would be mainly correlated with the density of the traffic (see Section 3.3). However, the morning model does not seem to be significantly different than the single multiple regression neither in terms of features (five identical attributes) nor in terms of performance (r = 0.58 in both cases).
