**3.** *PM*<sup>10</sup> **time series**

In the present case study, the analysis of daily concentrations of *PM*<sup>10</sup> (*µg*/*m*3), measured at one of the monitoring stations of Brindisi district during the period 2010-2013, has been conducted through geostatistical techniques.

These data have been collected by the Environmental Protection Agency of Apulian region (ARPA Puglia) which controls the air quality of urban, suburban, and industrialized areas of the region.

Note that *PM*<sup>10</sup> monitoring stations are classified in the following three categories:


The analyzed station, named "Torchiarolo" is located in the municipality of Torchiarolo (Brindisi district), as shown in Fig. 1. It is classified as industrial station, since it is strictly close to an industrial site, i.e. the thermoelectric power station "Enel-Federico II" in Cerano (Brindisi district).

**Figure 1.** "Torchiarolo" monitoring station belonging to the Environmental Protection Agency of Apulian region (ARPA Puglia)

## **3.1. Exploratory Data Analysis**

4 Current Air Quality Issues

where

unsampled point [16, 25, 26].

two different approaches:

**3.** *PM*<sup>10</sup> **time series**

the region.

(Brindisi district).

In this context, given the observed time series *xti*

nonperiodic variogram model has been fitted;

conducted through geostatistical techniques.




expectation of an indicator random field *I*(*t*; *x*) [27], that is

probability *Prob* {*Xt* ≤ *x*|H*n*}, with H*<sup>n</sup>* = {*xti*

The kriging approach, based on the knowledge of variogram, leads naturally to nonparametric estimation [17]. Indicator kriging is a nonparametric approach to estimate the posterior cumulative distribution function (c.d.f.) of the variable under study at an

*Prob* {*Xt* ≤ *x*|H*n*} = *E* [*I*(*t*; *x*|H*n*)]

In the case study presented hereafter, ordinary kriging and indicator kriging are applied for interpolation and prediction purposes of an environmental variable. Note that a *GSLib* routine for kriging, named "KT3DP" [12], has been used in order to define appropriate temporal search neighborhoods in presence of periodicity, since environmental time series, such as the ones for air pollution data, usually are characterized by a periodic behavior. Hence, the use of periodic and nonperiodic variogram models have been proposed through

• the periodic component has been factored out using the moving average method [5] and

In the present case study, the analysis of daily concentrations of *PM*<sup>10</sup> (*µg*/*m*3), measured at one of the monitoring stations of Brindisi district during the period 2010-2013, has been

These data have been collected by the Environmental Protection Agency of Apulian region (ARPA Puglia) which controls the air quality of urban, suburban, and industrialized areas of

The analyzed station, named "Torchiarolo" is located in the municipality of Torchiarolo (Brindisi district), as shown in Fig. 1. It is classified as industrial station, since it is strictly close to an industrial site, i.e. the thermoelectric power station "Enel-Federico II" in Cerano

• the periodicity has been retained and described by a periodic variogram model.

Note that *PM*<sup>10</sup> monitoring stations are classified in the following three categories:

 1, *if Xt* <sup>≤</sup> *<sup>x</sup>* 0, *if Xt > x*.

*I*(*t*; *x*) =

, *ti*, *i* = 1, 2, ..., *n*, the conditional

, *ti*, *i* = 1, 2, ..., *n*}, is interpreted as conditional

In order to assess the statistical properties of *PM*<sup>10</sup> measured at the "Torchiarolo" station in the period 2010-2013, an exploratory data analysis has been performed. Some results are shown in Tab. 1.


**Table 1.** Descriptive statistics of *PM*<sup>10</sup> (*µg*/*m*3), measured in the period 2010-2013 at the "Torchiarolo" monitoring station

According to the National Law concerning the human health protection, *PM*<sup>10</sup> daily average concentrations cannot be greater than 50 *µg*/*m*<sup>3</sup> for more than 35 times per year. During the period under study, the *PM*<sup>10</sup> daily values exceeded the threshold 243 times; in addition, the station has measured more than 35 exceedances per year.

The box plot in Fig. 2 shows that the observed time series is characterized by a seasonal component. During summertime, particle pollution shows lower levels compared to those recorded during wintertime; in particular, in summertime, *PM*<sup>10</sup> doesn't exceed the limit value fixed by the National Law.

On the other hand, in wintertime changes in the lower layer of the troposphere determine *PM*<sup>10</sup> stagnation. Hence, high levels of this pollutant are recorded.

In the following sections, the study of the temporal evolution of *PM*<sup>10</sup> at the analyzed station has been conducted by performing


**Figure 2.** Box plot of *PM*<sup>10</sup> daily concentrations, grouped by month, and limit value fixed by the National Decree (Decree-law 60/2002)

#### **4. Structural Analysis**

As previously pointed out, the variogram can describe a wider class of stochastic processes, that is the class of intrinsic stochastic processes and is usually preferred to the use of the covariance function.

In structural analysis, before modeling the temporal correlation described by the variogram, its estimation from data is required. The following classical estimator [9] is often used:

$$\hat{\gamma}(r\_t) = \frac{1}{2\left|M(r\_l)\right|} \sum\_{M(r\_l)} \left[X\_{t+h\_l} - X\_t\right]^2 \tag{4}$$

where *rt* is the temporal lag, *M*(*rt*) = {*t* + *ht* ∈ *H* and *t* ∈ *H*, such that *rt* − *ht < δt*}, *δ<sup>t</sup>* is the tolerance, *H* is the set of data at different time points (not necessarily equally-spaced) and |*M*(*rt*)| is the cardinality of this set.

In the present case study the variogram has been used as an exploratory tool to assess stationarity and periodicity. In particular, sample temporal variogram for *PM*<sup>10</sup> daily observations, shown in Figure 3-a), reproduces the seasonal behavior of the variable under study, which presents an annual periodicity at 365 days. In equation (5) the analytic expression of the periodic variogram model, fitted to the sample variogram for the observed values, is proposed:

$$\gamma(h\_{\rm t}) = 265 \operatorname{Exp}(|h\_{\rm t}|; 10) + 130 \operatorname{Cost}(|h\_{\rm t}|; 365); \tag{5}$$

where *Exp*(·) and *Cos*(·) are the exponential and the cosine variogram models [29], respectively.

On the other hand, since the variable under study, is characterized by periodicity, this seasonal component could be factored out. Moving average and monthly averages techniques have been applied in order to obtain *PM*<sup>10</sup> residuals. Note that the FORTRAN program "REMOVE" [11] has been used to apply moving average techniques.

**Figure 3.** Sample temporal variograms and fitted models. (a) Variogram for *PM*<sup>10</sup> daily concentrations (b) Variogram for *PM*<sup>10</sup> residuals

The sample variogram of the residuals has been computed and modelled and the following nonperiodic variogram model has been chosen:

$$\tilde{\gamma}(h\_t) = \mathbf{348} \exp(|h\_t|; \mathbf{30}) + \mathbf{30} \exp(|h\_t|; \mathbf{365});\tag{6}$$

where *Exp*(·) is the exponential variogram model [8].

6 Current Air Quality Issues

(Decree-law 60/2002)

covariance function.

values, is proposed:

respectively.

**4. Structural Analysis**

PM

and |*M*(*rt*)| is the cardinality of this set.

10

months

**Figure 2.** Box plot of *PM*<sup>10</sup> daily concentrations, grouped by month, and limit value fixed by the National Decree

As previously pointed out, the variogram can describe a wider class of stochastic processes, that is the class of intrinsic stochastic processes and is usually preferred to the use of the

In structural analysis, before modeling the temporal correlation described by the variogram, its estimation from data is required. The following classical estimator [9] is often used:

> *M*(*rt*)

where *rt* is the temporal lag, *M*(*rt*) = {*t* + *ht* ∈ *H* and *t* ∈ *H*, such that *rt* − *ht < δt*}, *δ<sup>t</sup>* is the tolerance, *H* is the set of data at different time points (not necessarily equally-spaced)

In the present case study the variogram has been used as an exploratory tool to assess stationarity and periodicity. In particular, sample temporal variogram for *PM*<sup>10</sup> daily observations, shown in Figure 3-a), reproduces the seasonal behavior of the variable under study, which presents an annual periodicity at 365 days. In equation (5) the analytic expression of the periodic variogram model, fitted to the sample variogram for the observed

where *Exp*(·) and *Cos*(·) are the exponential and the cosine variogram models [29],

On the other hand, since the variable under study, is characterized by periodicity, this seasonal component could be factored out. Moving average and monthly averages techniques have been applied in order to obtain *PM*<sup>10</sup> residuals. Note that the FORTRAN program

*Xt*<sup>+</sup>*ht* − *Xt*

*γ*(*ht*) = 265 *Exp*(|*ht*|; 10) + 130 *Cos*(|*ht*|; 365); (5)

<sup>2</sup> (4)

<sup>2</sup> <sup>|</sup>*M*(*rt*)<sup>|</sup> <sup>∑</sup>

*<sup>γ</sup>*ˆ(*rt*) = <sup>1</sup>

"REMOVE" [11] has been used to apply moving average techniques.

limit value (50 / ) fixed by the national law

*g m*3

daily values (

 / )

3 *g m*

> The sample temporal variogram for *PM*<sup>10</sup> daily residuals and the corresponding nonperiodic fitted model (6) are illustrated in Fig. 3-b).

> In both cases (original data and residuals), the behavior of the variogram functions near the origin is assumed to be linear with no nugget effect.

> The goodness of variogram models (5) and (6) has been evaluated through cross-validation, which allows the estimation for *PM*<sup>10</sup> daily concentrations and *PM*<sup>10</sup> residuals, respectively, at all data points. Figure 4 shows the scatter plots of *PM*<sup>10</sup> observed values (a) and *PM*<sup>10</sup> residuals (b) towards the corresponding estimated values. The high values of the linear correlation coefficients (0.783 and 0.780, respectively) confirm the goodness of the above fitted models.

> It is important to point out that the variogram model (5) has been validated using a modified version of the *GSLib* program "KT3D" [13], named "KT3DP". This program has been developed in order to properly define the neighborhood, i.e. the subset of available data used in the kriging system.

> By taking into account the main features of the analyzed pollutant and its temporal behavior (periodicity at 365 days), the kriging routine has been modified and the value at time *t* is estimated by considering data observed


• at the same day of two years before and/or later, (*t* − 2*d*) and (*t* + 2*d*), with *d* = 365 and some days before and/or later, (*t* − 2*d* ± *k*) and (*t* + 2*d* ± *k*), with *k* = 1, 2, 3,

up to a maximum number of eight values.

The variogram model (6), which describes the temporal correlation for *PM*<sup>10</sup> residuals, has been validated using the *GSLib* program "KT3D".

**Figure 4.** Scatter plots between observed and estimated values. (a) Diagram of *PM*<sup>10</sup> daily concentrations towards the estimated ones (b) Diagram of *PM*<sup>10</sup> residuals towards the estimated ones

## **5. Estimation of missing values**

In this section the reconstruction of *PM*10, by using the kriging technique, has been discussed [20, 21, 30, 31].

The reconstruction of temporal data is required if a time series is incomplete. This problem could be due to a malfunction of the monitoring station or the presence of invalid data.

With this aim, six consecutive *PM*<sup>10</sup> values from the 12th to the 17th of June 2011, have been considered as missing, both for the observed time series with a 365-day periodic behavior and the deseasonalized values.

Kriging daily estimations for these missing values have been obtained using, alternatively


Since the time series of the observed values is characterized by a periodic behavior, *GSLib* routine "KT3DP", properly modified in order to define an appropriate neighborhood, has been used with the aim to estimate *PM*<sup>10</sup> daily measurements.

On the other hand, for the deseasonalized time series, *PM*<sup>10</sup> residuals have been estimated by the original version of "KT3D". Finally the periodic component, previously estimated by the moving average and monthly averages techniques, has been added to the estimated residuals, in order to obtain estimates of *PM*<sup>10</sup> daily concentrations.

Time series of estimated missing values, obtained with the periodic variogram model (5) and the nonperiodic variogram model (6), are shown in Fig. 5, together with the time series of *PM*<sup>10</sup> values, observed on June 2011.

**Figure 5.** Time plot of *PM*<sup>10</sup> estimated missing values and *PM*<sup>10</sup> daily concentrations (*µg*/*m*3), from the 12th to the 17th of June 2011

In order to test the validity of the estimation procedure, the linear correlation coefficients have been computed. In particular, the linear correlation coefficient between the *PM*<sup>10</sup> observed values and the corresponding estimates, obtained with the periodic variogram model, is equal to 0.805. On the other hand, the linear correlation coefficient between the residuals and the corresponding estimates, obtained with the nonperiodic variogram model, is equal to 0.831. These results confirm the goodness of the kriging technique as estimator of missing values.

In Table 2 some results of estimation procedure are shown. Note that the mean value of the kriging standard error is lower if the periodic variogram model is used, compared with the kriging results based on the nonperiodic variogram model.

Therefore, the flexibility of kriging to reconstruct the time series has been demonstrated even when the periodic component is not factored out and the temporal correlation is described by a periodic variogram model.
