*5.2.2. Multiple models*

**5.2. Prediction from full data sources**

40 Machine Learning - Advanced Techniques and Emerging Applications

tion, temperature, wind Speed, Xwind, Ywind, CO, NO<sup>2</sup>

 **1.4412 \* Yminutes + 0.2212 \* RH + −0.0035 \* SR + 0.9367 \* temperature + 1.2377 \* WS + 0.7501 \* Xwind + 0.3971 \* Ywind +**

**0.2691 \* NO<sup>2</sup>**

**0.1878 \* O<sup>3</sup>**

**−30.8553**

 **r = 0.81 RMSE = 5.31**

**1.0463 \* SO<sup>2</sup>**

**8.3473 \* CO +**

The prediction accuracy of the model is evaluated as

the available types of data.

*5.2.1. Single model*

**PM2.5 =**

This section explores the possibility to get a better prediction of air pollution if we build a hybrid model that uses a combination of the whole data sources mentioned previously. The objective is to define the best predictive model to estimate the concentration of PM2.5 from all

The full dataset is used for this analysis. There is a total number of 17 features, which are Xminutes, Yminutes, %red, %orange, relative humidity, precipitation, pressure, solar radia-

**+**

**+**

The results show that the regressive model considers three classes of parameters (time, meteorology, and criteria pollutants) out of four to predict the value of PM2.5. Traffic information is filtered, certainly because of its redundancy with time. After attribute selection (M5 method), the final model is composed of 11 features out of 16. As hypothesized, a model based on a hybrid data source allows for a significant improvement of the prediction

**+**

The linear regression model obtained after running the algorithm is as follows:

, O3 , SO2

, PM2.5 (= feature to predict).
