*3.2.5. Interpretation of the results*

The performance accuracy of the models evaluated by a metric in terms of correlation coefficient and RMSE between traffic and PM2.5 is slightly above 0.3 and around 8.5, respectively. The models that consider traffic monitoring provide a higher accuracy than a model based on time only. This result means that traffic is more reliable than time to predict air quality. This difference could be reduced if the weekends (air pollution levels usually low) are excluded, since the traffic is quite stereotypic during the workdays. Also, the accuracy of a model based on traffic monitoring is not significantly improved by adding the time of day, because this information is mostly redundant with the traffic data.

Overall, it seems that Google Maps Traffic can provide a fair information to predict the level of PM2.5. From this data source, the number of orange pixels (medium amount of traffic) would be the most relevant feature. It could be explained by the fact that the medium traffic has the largest amplitude of variation all day long, and thus, this is the category that best represents the traffic density in the city. Nevertheless, the accuracy of the model could be improved if we consider an air pollution modeling based on several daily models, defined by the variation of air pollution levels all day long (two peaks a day), instead of a single one.

layer (PBL). PBL growth during the day is dependent on the solar heating of the surface and thus induced vertical mixing. The depth of maximum PBL can vary from 1 day to another due to the difference in solar radiation intensity, solar angle, and especially cloud cover [14]. PBL is shallow in the morning (up to a few hundred meters) and deepens during the day reaching up to few kilometers [15]. This has a consequence on the level of air contaminants, which are less diluted in the morning than in the afternoon. All of these variations would reduce the performance of a single regression model a day to predict PM2.5 from the vehicle emissions in the city. Thus, the present section describes a prediction of fine particulate matters from three daily models determined by the two peaks of pollution, such as a morning model [6–10 h], a midday model [10–14 h], and an afternoon model [14–19 h]. It is not necessary to consider a

Regression Models to Predict Air Pollution from Affordable Data Collections

http://dx.doi.org/10.5772/intechopen.71848

27

The morning model is defined between 6 am (360th minute) and 10 am (600th minute). **Figure 3** shows that there is a constant increase in the PM2.5 concentration during this period. The two main factors that should explain this increase are the traffic intensification and the low morning PBL. If this assumption is correct, then the predictive accuracy of a regression model that considers traffic

**Figure 3.** Typical profile of the PM2.5 concentrations during the day in the Belisario district of Quito (2007–2016 data). Although, a slight reduction in the level of pollution was observed throughout the years, the air contamination peaks are

always located at the same time of day (around 10 am and 7 pm).

night model, because the level of air pollution drops during this period.

*3.3.1. Morning model*

**Figure 2.** Representation of the value of PM2.5 against the ratio of medium traffic (each dot is an observation) and the respective simple linear regression between these two features (line). The higher is the medium amount of traffic (%orange), the larger is the concentration of fine particulate matter (PM2.5).

#### **3.3. Multiple models**

In the city of Quito as in most of the cities worldwide, there are two peaks of PM2.5 pollution during the day. The first peak is in the morning (around 10 am) and the second is in the evening (around 7 pm). **Figure 3** is a graphical representation of the two daily peaks of fine particulate contamination averaged over the last 10 years (2007–2016) for the district of Belisario (These peaks occur approximately at the same time in any district of Quito.) During the morning hours, the rush hour actually lasts longer than the visible PM2.5 concentration peak, but a sudden decline can be observed due to the deepening of the planetary boundary layer (PBL). PBL growth during the day is dependent on the solar heating of the surface and thus induced vertical mixing. The depth of maximum PBL can vary from 1 day to another due to the difference in solar radiation intensity, solar angle, and especially cloud cover [14]. PBL is shallow in the morning (up to a few hundred meters) and deepens during the day reaching up to few kilometers [15]. This has a consequence on the level of air contaminants, which are less diluted in the morning than in the afternoon. All of these variations would reduce the performance of a single regression model a day to predict PM2.5 from the vehicle emissions in the city. Thus, the present section describes a prediction of fine particulate matters from three daily models determined by the two peaks of pollution, such as a morning model [6–10 h], a midday model [10–14 h], and an afternoon model [14–19 h]. It is not necessary to consider a night model, because the level of air pollution drops during this period.
