*5.2.2.1. Morning model*

The linear regression model obtained after running the algorithm is as follows:


#### **1209.4494**

The prediction accuracy of the model is evaluated as

 **r = 0.85 RMSE = 6.04**

#### *5.2.2.2. Midday model*

The linear regression model obtained after running the algorithm is as follows:

**PM2.5 =**



The prediction accuracy of the model is evaluated as

 **r = 0.87 RMSE = 5.33**

#### *5.2.2.3. Afternoon model*

The linear regression model obtained after running the algorithm is as follows:


The prediction accuracy of the model is evaluated as

 **r = 0.66 RMSE = 6.29**

#### *5.2.2.4. Interpretation of the results*

The results of the Eq. (12) shows that the average prediction accuracy (evaluated by the regression coefficient metrics) by modeling the air pollution through three models is

$$
\overline{r} = \frac{0.85 + 0.87 + 0.66}{3} = 0.79\tag{12}
$$

$$
\overline{r} = \frac{0.85 + 0.87 + 0.66}{3} = 0.79\tag{12}
$$

Thus, it seems that using several models with all the available features for the prediction of fine particulate matter is only justified to predict the level of PM2.5 from 6 am to 2 pm (r ≈ 0.86). After this period, the model gets more complex and less reliable. This result confirms the previous analyses that tend to demonstrate that the model accuracy to estimate PM2.5 concentrations from traffic, meteorology, and air pollutants is stronger when the gases and particulates

Regression Models to Predict Air Pollution from Affordable Data Collections

http://dx.doi.org/10.5772/intechopen.71848

43

Since the full feature model (Section 5.2) is quite complex, the present stage consists of removing insignificant and/or redundant features in order to optimize the modeling. The goal is to find a simple model that is still able to provide a reliable estimation of PM2.5 concentrations. The simplest best model is defined as a model that maintains a high accuracy (r ≥ 0.8) with a maximum number of features equal to eight. The method used to get this model is the ranker search method. This technique sorts the attributes according to their evaluation and allows for

**+**

**+**

**+**

**Table 1** represents the ranked attributes, in which the features are sorted in the descending

The linear regression model obtained after running the algorithm is as follows:

are less diluted in the atmosphere.

**6.1. The simplest best model**

**PM2.5 =**

**−23.9476**

**r = 0.8**

**RMSE = 5.34**

**6. Simplification and recommendations**

a specification of the number of attributes to retain.

**0.2032 \* RH +**

**0.6507 \* temperature +**

**−0.0021 \* SR +**

**0.4549 \* Xwind +**

**8.8163 \* CO +**

The prediction accuracy of the model is evaluated as

order of their individual performance to predict the output value.

**0.225 \* NO<sup>2</sup>**

**0.2159 \* O<sup>3</sup>**

**1.0707 \* SO<sup>2</sup>**

Thus, it seems that using several models with all the available features for the prediction of fine particulate matter is only justified to predict the level of PM2.5 from 6 am to 2 pm (r ≈ 0.86). After this period, the model gets more complex and less reliable. This result confirms the previous analyses that tend to demonstrate that the model accuracy to estimate PM2.5 concentrations from traffic, meteorology, and air pollutants is stronger when the gases and particulates are less diluted in the atmosphere.
