**5.2. Prediction from full data sources**

This section explores the possibility to get a better prediction of air pollution if we build a hybrid model that uses a combination of the whole data sources mentioned previously. The objective is to define the best predictive model to estimate the concentration of PM2.5 from all the available types of data.

accuracy. The values of the correlation coefficient and the RMSE are better for the hybrid

Regression Models to Predict Air Pollution from Affordable Data Collections

http://dx.doi.org/10.5772/intechopen.71848

41

**+**

**+**

**+**

The linear regression model obtained after running the algorithm is as follows:

The linear regression model obtained after running the algorithm is as follows:

 **0.0379 \* minutes + 0.3438 \* RH + −1.7248 \* pressure + −0.6846 \* temperature + 4.5902 \* CO +**

**0.4294 \* NO<sup>2</sup>**

**2.0133 \* SO<sup>2</sup>**

**0.6343 \* O<sup>3</sup>**

The prediction accuracy of the model is evaluated as

 **−0.0362 \* minutes + −1.1911 \* pressure + −0.0122 \* SR + 2.3857 \* temperature + 1.4346 \* Ywind + 0.2274 \* RH + 14.8788 \* CO +**

**1209.4494**

 **r = 0.85 RMSE = 6.04**

*5.2.2.2. Midday model*

**PM2.5 =**

than the chemical model.

*5.2.2. Multiple models*

*5.2.2.1. Morning model*

**PM2.5 =**
