**3.1. Dataset**

derivative. Once the algorithm cycled through all the features of the model, the t counter is incremented and the convergence condition is tested to decide whether the program must loop through or not. When the minimum is reached (RSS ≤ *ε*), the respective values of the

In addition, the final regression models of this study are obtained after an attribute selection using the M5 method, which steps through the attributes removing the one with the smallest standardized coefficient until no improvement is observed in the estimate of the error given

\_\_\_\_ *RSS*

where N is the number of observations (or instances), and D is the number of features (or

All the models presented in the manuscript are obtained after a normalization of the value of the variables, in order to avoid a dominance of the variables with the highest intrinsic values. The used method to evaluate the model accuracy is a 10-fold cross-validation. The regression modeling is performed with Pandas and scikit-learn machine-learning library for Python.

Air pollution data (PM2.5) were collected in central Quito over a period of 2 months in June and July of 2017 by the city Secretariat of the Environment. Belisario (alt. 2835 m.a.s.l, coord.78°29′24″ W, 0°10′48″ S) measurement station was setup following the criteria of the Environmental Protection Agency of the United States (USEPA). For PM2.5 concentration data Thermo Scientific FH62C14-DHS continuous ambient particulate monitor 5014i was used based on beta rays' attenuation method (EPA No. EQPM-0609-183). For all the data 1 hour

In this work, we present several regression models to provide a reliable estimation of the current level of PM2.5 from data collection methods of different levels of affordability. In Section 3, we describe a prediction of PM2.5 concentrations based on real-time traffic monitoring, only. This type of data does not cost anything to the user as it is based on publicly available worldwide traffic data. Section 4 describes a prediction that adds meteorological

*<sup>N</sup>* <sup>−</sup> *<sup>D</sup>*) + 2*D* (5)

regression coefficients are used as the model parameters to form the predictions.

Algorithm 1. Gradient descent algorithm for multiple regression.

18 Machine Learning - Advanced Techniques and Emerging Applications

*<sup>N</sup> hj* (*x* →*i* )(*yi* − *y*̂ *i* (*w* →(*t*) ))

attributes). The selected model is the model that gets the lowest AIC.

− ηpartial[j]

by the Akaike information criterion (AIC) [11].

*AIC* = *N ln*(

averages were calculated, resulting in 1118 instances.

**2.2. Cumulative modeling method**

1: init *w*

<sup>→</sup>(1) = 0, t = 1

→(*t*) )‖ > *ε*.

2: while ‖∇ *RSS*(*w*

5: w

6: t ← t + 1

3: for j = 0, …, D

4: partial[j] = −2∑*<sup>i</sup>*=1

→ j (t+1) ← w → j (t)
