**2. A non-homogeneous Markov chain model**

The mathematical model considered here may be described as follows. Let *N* > 0 be a natural number representing the number of years in which measurements were taken. Let *Ti*, *i* = 1, 2, . . . , *N*, be natural numbers representing the amount of observations in each year. Hence, we have that for a given year *i*, either *Ti* = 366 or *Ti* = 365, depending on whether or not we have a leap year, *i* = 1, 2, . . . , *N*.

Let *<sup>Z</sup>*(*i*) *<sup>t</sup>* be the ozone concentration on the *t*th day of the *i*th year, *t* = 1, 2, . . . , *Ti*, *i* = 1, 2, . . . , *N*. Following [23], we will set *Ti* = *T* = 366, *i* = 1, 2, . . . , *N*, with the convention that for non leap year, we assign *<sup>Z</sup>*(*i*) *<sup>T</sup>* = 0.

*Remark.* Since, we are taking all years of the same length we will drop the index *i* from the notation.

Denote by *L* > 0 the environmental threshold we are interested in knowing if the ozone concentration has surpassed or not. Define **Y** = {*Yt* : *t* ≥ 0} by,

$$Y\_t = \begin{cases} 0, \text{ if } Z\_t < L \\ 1, \text{ if } Z\_t \ge L. \end{cases} \tag{1}$$

Hence, *Yt* indicates whether or not in the *t*th day the threshold *L* was exceeded.

2 Air Pollution

Markov chain.

Let *<sup>Z</sup>*(*i*)

notation.

In [24], the order of the Markov chain was estimated using auto-correlation function. Its transition matrix was estimated using the maximum likelihood method (see, for instance, [27, 28], among others). In [25], the order of the chain was also considered an unknown quantity that needed to be estimated. The Bayesian approach (see, for example, [29]) was used to estimate the order as well as the transition probabilities of the chain. In particular, the maximum *à posteriori* method was used. In [26], the estimation of the order of the chain is performed using the Bayesian approach using the so-called trans-dimensional Markov chain Monte Carlo algorithm ([30, 31]). The transition matrix of the chain was obtained through the maximum *à posteriori* method. However, the common denominator of those works is that the Markov chain model used was a time homogeneous one. Since ozone data are not, in general, time homogeneous, the data had to be split into time homogeneous segments and

Here, the interest also resides in estimating, for instance, the probability that the ozone measurement will be above a given threshold some days into the future, given where it stands today and in the past few days. Although in the present work we also use Markov chain models and the Bayesian approach, the novelty here is that the time-homogeneous assumption is dropped. Here, we consider a non-homogeneous Markov chain model. We assume that the order of the chain as well as its transition probabilities are unknown and need to be estimated. The chosen method of estimation is also the maximum *à posteriori*. This work is presented as follows. In Section 2 the non-homogeneous Markov chain model is given. Section 3 presents the Bayesian formulation of the model. An application to ozone measurements from Mexico City is given in Section 4. In Section 5 some comments about the methodology and results are made. In an Appendix, before the list of references, we present the code of the programme used to estimate the order and the transition probabilities of the

The mathematical model considered here may be described as follows. Let *N* > 0 be a natural number representing the number of years in which measurements were taken. Let *Ti*, *i* = 1, 2, . . . , *N*, be natural numbers representing the amount of observations in each year. Hence, we have that for a given year *i*, either *Ti* = 366 or *Ti* = 365, depending on whether or

*<sup>t</sup>* be the ozone concentration on the *t*th day of the *i*th year, *t* = 1, 2, . . . , *Ti*, *i* = 1, 2, . . . , *N*. Following [23], we will set *Ti* = *T* = 366, *i* = 1, 2, . . . , *N*, with the convention

*Remark.* Since, we are taking all years of the same length we will drop the index *i* from the

Denote by *L* > 0 the environmental threshold we are interested in knowing if the ozone

0, if *Zt* < *L*

1, if *Zt* <sup>≥</sup> *<sup>L</sup>*. (1)

*<sup>T</sup>* = 0.

*Yt* =

concentration has surpassed or not. Define **Y** = {*Yt* : *t* ≥ 0} by,

the analysis was made for each segment separately.

**2. A non-homogeneous Markov chain model**

not we have a leap year, *i* = 1, 2, . . . , *N*.

that for non leap year, we assign *<sup>Z</sup>*(*i*)

As in [25], we assume that **Y** is ruled by a Markov chain of order *K* ≥ 0. In contrast with that work, in the present case the Markov chain is a non-homogeneous one. Hence, denote by *<sup>X</sup>*(*K*) = {*X*(*K*) *<sup>t</sup>* : *<sup>t</sup>* <sup>=</sup> 1, 2, . . . *<sup>T</sup>*}, the corresponding non-homogeneous Markov chain of order *K*. We assume that *K* has as state space a set S = {0, 1, . . . , *M*}, for some fixed integer *M* ≥ 0, such that, *M* ≤ *T* with probability one.

Note that, *<sup>X</sup>*(*K*) has as state space the set *<sup>S</sup>*(*K*) <sup>1</sup> <sup>=</sup> {(*x*1, *<sup>x</sup>*2,..., *xK*) ∈ {0, 1}*K*}, with *<sup>S</sup>*(0) <sup>1</sup> <sup>=</sup> *<sup>S</sup>*(1) <sup>1</sup> . Also, note that (see [25]), if the set of observed value is (*y*1, *y*2,..., *yT*), then the transition probabilities of *X*(*K*) are such that

$$P(X\_{t+1}^{(K)} = w \mid X\_t^{(K)} = \mathfrak{x}\_t = (y\_{t+1}, y\_{t+2}, \dots, y\_{t+K}))\_{\prime \prime}$$

is different of zero if, and only if, *<sup>w</sup>* = (*yt*+2, *yt*+3,..., *yt*+*K*+1) <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>1</sup> , with 0 <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> *<sup>T</sup>* <sup>−</sup> *<sup>K</sup>*. Therefore, *<sup>w</sup>* occurs, if and only if, the observation following *yt*+1, *yt*+2,..., *yt*+*K*, is *yt*+*K*+1. This enables us to work with a more treatable state space for *X*(*K*), and therefore, to have a better form for the transition matrix.

Hence, as in [25, 32], we consider the transformed state space *<sup>S</sup>*(*K*) <sup>2</sup> <sup>=</sup> {0, 1, . . . , 2*<sup>K</sup>* <sup>−</sup> <sup>1</sup>}, which is obtained from *<sup>S</sup>*(*K*) <sup>1</sup> by using the transformation *<sup>f</sup>* : *<sup>S</sup>*(*K*) <sup>1</sup> <sup>→</sup> *<sup>S</sup>*(*K*) <sup>2</sup> , given by, *<sup>f</sup>*(*w*1, *<sup>w</sup>*2,..., *wK*) = <sup>∑</sup>*K*−<sup>1</sup> *<sup>l</sup>*=<sup>0</sup> *wl*<sup>+</sup><sup>1</sup> <sup>2</sup>*<sup>l</sup>* . Let (*x*1, *<sup>x</sup>*2,..., *xK*) ↔ *<sup>m</sup>* indicate that the state (*x*1, *<sup>x</sup>*2,..., *xK*) <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>1</sup> corresponds to the state *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> . Hence, the transition probabilities of *X*(*K*) may be written as (see, for instance, [25]),

$$P\_{\overline{\mathfrak{m}}\mathfrak{j}}^{(K)}(t) = P(\mathbf{Y}\_{t+K+1} = j \mid X\_t^{(K)} = (y\_{t+1}, y\_{t+2}, \dots, y\_{t+K}) \leftrightarrow \overline{\mathfrak{m}}),\tag{2}$$

where *<sup>m</sup>* ∈ *<sup>S</sup>*(*K*) <sup>2</sup> , *<sup>j</sup>* ∈ {0, 1}, and 0 <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> *<sup>T</sup>* <sup>−</sup> *<sup>K</sup>*.

Now, indicate by *<sup>Q</sup>*(*K*) *<sup>m</sup>* (*t*), *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> , *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> , the probability *<sup>P</sup>*(*X*(*K*) *<sup>t</sup>* = *m*). Hence, when *<sup>t</sup>* = 1, we have that *<sup>Q</sup>*(*K*) *<sup>m</sup>* (1), *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> , is the initial distribution of *<sup>X</sup>*(*K*). When *<sup>K</sup>* <sup>=</sup> 0, we have that *<sup>P</sup>*(0) *mj* (*t*) = *<sup>Q</sup>*(0) *<sup>j</sup>* (*t*), *<sup>t</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*, *<sup>j</sup>* <sup>=</sup> 0, 1, *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0) <sup>2</sup> <sup>=</sup> {0, 1}.

*Remarks.* 1. When *<sup>K</sup>* = 1, we have that *<sup>P</sup>*(0) *mj* (*t*), *<sup>j</sup>* <sup>=</sup> 0, 1, *<sup>t</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>* <sup>−</sup> *<sup>K</sup>*, are the usual one-step transition probabilities. When *K* = 0, the transition probabilities are just the probabilities *<sup>Q</sup>*(0) *<sup>m</sup>* (*t*), associated to each state *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0) <sup>2</sup> <sup>=</sup> {0, 1}, with *<sup>t</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*.

2. Unless otherwise stated, from now on, we are going to use the state space *<sup>S</sup>*(*K*) <sup>2</sup> and the corresponding transition probabilities.

3. **Y** is going to represent our observed data.

In addition to estimating the order *K* of the Markov chain, we will also estimate its transition probabilities *<sup>P</sup>*(*K*) *mj* (*t*) as well as the probabilities *<sup>Q</sup>*(*K*) *<sup>m</sup>* (*t*), *<sup>j</sup>* ∈ {0, 1}, *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> , for each *t*. We indicate by *<sup>P</sup>*(*K*)(*t*) = � *<sup>P</sup>*(*K*) *mj* (*t*) � *j*∈{0,1},*m*∈*S*(*K*) 2 , the transition matrix at time *t*. Note that, if *<sup>K</sup>* = 0, then *<sup>P</sup>*(0) *mj* (*t*) = *<sup>Q</sup>*(0) *<sup>j</sup>* (*t*), *<sup>j</sup>* ∈ {0, 1}, *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0) <sup>2</sup> , *t* = 1, 2, . . . , *T*.

## **3. A Bayesian estimation of the parameters of the model**

There are many ways of estimating the order and the transition matrix of a non-homogeneous Markov chain. One way of estimating the order is via the auto-correlation function associated to the chain throughout the years. Another way is to use the Bayesian approach. When it comes to estimating the transition probabilities we have, for instance, the maximum likelihood method ( [33]) and the empirical estimator ([34]) which are essentially the same. In the present work, we will use the Bayesian approach (see, for instance, [29, 35]) to estimate the order and the transition probabilities. In particular, we are going to adopt the maximum *à posteriori* approach. Inference will be performed using the information provided by the so-called posterior distribution of the parameters. The posterior distribution of a vector of parameters θ given the observed data **D**, indicated by *P*(θ | **D**), is such that *P*(θ | **D**) ∝ *L*(**D** | θ) *P*(θ), where *L*(**D** | θ) is the likelihood function of the model, and *P*(θ) is the prior distribution of the vector θ.

In the present case, we have that the vector of parameter is θ = (*K*, *Q*(*K*)(1), *P*(*K*)(*t*), *t* = 1, 2, . . . , *T* − *K*). If *K* = 0, the range of *t* is {1, 2, . . . , *T*}. The vector θ belongs to the following sample space

$$\Theta = \bigcup\_{K=0}^{M} \left( \{K\} \times \Delta\_2^{2^K} \times \Delta\_2^{(T-K)2^K} \right).$$

where <sup>∆</sup><sup>2</sup> = {(*x*1, *<sup>x</sup>*2) ∈ **<sup>R</sup>**<sup>2</sup> : *xi* ≥ 0, *<sup>i</sup>* = 1, 2, *<sup>x</sup>*<sup>1</sup> + *<sup>x</sup>*<sup>2</sup> = <sup>1</sup>} is the one dimensional simplex. (Note that if we have *K* = 0, then the parametric space reduces to Θ = ∆*<sup>T</sup>* <sup>2</sup> .) In the present case we have **D** = **Y**

Let (*x*1, *<sup>x</sup>*2,..., *xK*) be such that *Yt* <sup>=</sup> *<sup>x</sup>*1. Indicate by *<sup>n</sup>*(*K*) *mi* (*t*) the number of years in which the vector (*x*1, *<sup>x</sup>*2,..., *xK*) corresponding to a state *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> is followed by the observation *i*, *<sup>i</sup>* = 0, 1. Also define *<sup>n</sup>*(0) *<sup>m</sup>* (*t*), *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0) <sup>2</sup> <sup>=</sup> {0, 1}, as the number of years in which we have the observation *<sup>m</sup>* at time *<sup>t</sup>*, *<sup>t</sup>* = 1, 2, . . . , *<sup>T</sup>*. Additionally, let *<sup>n</sup>*(*K*) *<sup>m</sup>* indicate the number of years in which the state corresponding to the initial *<sup>K</sup>* days is equivalent to the value *<sup>m</sup>* ∈ *<sup>S</sup>*(*K*) <sup>2</sup> , *<sup>K</sup>* ≥ 0. In the case of *<sup>K</sup>* = 0, we have *<sup>n</sup>*(0) *<sup>m</sup>* <sup>=</sup> *<sup>n</sup>*(1) *<sup>m</sup>* , and *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0) <sup>2</sup> <sup>=</sup> *<sup>S</sup>*(1) <sup>2</sup> <sup>=</sup> {0, 1}.

Therefore, since a Markovian model is assumed, the likelihood function is given by (see, for instance, [23, 33])

$$L(\mathbf{Y}|\boldsymbol{\theta}) \propto \left(\prod\_{\overline{\mathbf{m}} \in \mathbf{S}\_2^{(K)}} \left[\mathcal{Q}\_{\overline{\mathbf{m}}}^{(K)}(\mathbf{1})\right]^{n\_{\overline{\mathbf{m}}}^{(K)}}\right) \left(\prod\_{t=1}^{T-K} \left[P\_{\overline{\mathbf{m}}0}^{(K)}(t)\right]^{n\_{\overline{\mathbf{m}}}^{(K)}(t)} \left[1 - P\_{\overline{\mathbf{m}}0}^{(K)}(t)\right]^{n\_{\overline{\mathbf{m}}}^{(K)}(t)}\right). \tag{3}$$

Note that when *K* = 0, the expression (3) simplifies to

$$L(\mathbf{Y}|\boldsymbol{\theta}) \propto \prod\_{t=1}^{T} \prod\_{m=0}^{1} \left[ \mathbb{Q}\_{\overline{m}}^{(0)}(t) \right]^{n\_{\overline{m}}^{(0)}(t)} \lambda$$

where *<sup>Q</sup>*(0) *<sup>m</sup>* (*t*) = *<sup>P</sup>*(*X*(0) *<sup>t</sup>* <sup>=</sup> *<sup>m</sup>*) = *<sup>P</sup>*(*Yt* <sup>=</sup> *<sup>m</sup>*), *<sup>t</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*, *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0) <sup>2</sup> <sup>=</sup> {0, 1}.

4 Air Pollution

indicate by *P*(*K*)(*t*) =

*<sup>K</sup>* = 0, then *<sup>P</sup>*(0)

sample space

case we have **D** = **Y**

*<sup>i</sup>* = 0, 1. Also define *<sup>n</sup>*(0)

instance, [23, 33])

*L*(**Y** | θ) ∝

*<sup>K</sup>* ≥ 0. In the case of *<sup>K</sup>* = 0, we have *<sup>n</sup>*(0)

 ∏ *m*∈*S*(*K*) 2

� *<sup>Q</sup>*(*K*) *<sup>m</sup>* (1)

Note that when *K* = 0, the expression (3) simplifies to

� *<sup>P</sup>*(*K*) *mj* (*t*) �

*mj* (*t*) = *<sup>Q</sup>*(0)

the prior distribution of the vector θ.

*j*∈{0,1},*m*∈*S*(*K*) 2

There are many ways of estimating the order and the transition matrix of a non-homogeneous Markov chain. One way of estimating the order is via the auto-correlation function associated to the chain throughout the years. Another way is to use the Bayesian approach. When it comes to estimating the transition probabilities we have, for instance, the maximum likelihood method ( [33]) and the empirical estimator ([34]) which are essentially the same. In the present work, we will use the Bayesian approach (see, for instance, [29, 35]) to estimate the order and the transition probabilities. In particular, we are going to adopt the maximum *à posteriori* approach. Inference will be performed using the information provided by the so-called posterior distribution of the parameters. The posterior distribution of a vector of parameters θ given the observed data **D**, indicated by *P*(θ | **D**), is such that *P*(θ | **D**) ∝ *L*(**D** | θ) *P*(θ), where *L*(**D** | θ) is the likelihood function of the model, and *P*(θ) is

In the present case, we have that the vector of parameter is θ = (*K*, *Q*(*K*)(1), *P*(*K*)(*t*), *t* = 1, 2, . . . , *T* − *K*). If *K* = 0, the range of *t* is {1, 2, . . . , *T*}. The vector θ belongs to the following

{*K*} × ∆2*<sup>K</sup>*

where <sup>∆</sup><sup>2</sup> = {(*x*1, *<sup>x</sup>*2) ∈ **<sup>R</sup>**<sup>2</sup> : *xi* ≥ 0, *<sup>i</sup>* = 1, 2, *<sup>x</sup>*<sup>1</sup> + *<sup>x</sup>*<sup>2</sup> = <sup>1</sup>} is the one dimensional simplex.

in which the state corresponding to the initial *<sup>K</sup>* days is equivalent to the value *<sup>m</sup>* ∈ *<sup>S</sup>*(*K*)

Therefore, since a Markovian model is assumed, the likelihood function is given by (see, for

� *<sup>P</sup>*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*)

�*T*−*K* ∏ *t*=1

*<sup>m</sup>* <sup>=</sup> *<sup>n</sup>*(1)

�*n*(*K*) *m*  <sup>2</sup> <sup>×</sup> <sup>∆</sup>(*T*−*K*) <sup>2</sup>*<sup>K</sup>* 2

*<sup>m</sup>* , and *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0)

�*n*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*) �

�

<sup>2</sup> <sup>=</sup> {0, 1}, as the number of years in which we have the

<sup>2</sup> <sup>=</sup> *<sup>S</sup>*(1)

<sup>1</sup> − *<sup>P</sup>*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*)

<sup>2</sup> .) In the present

<sup>2</sup> ,

. (3)

*mi* (*t*) the number of years in which

<sup>2</sup> is followed by the observation *i*,

*<sup>m</sup>* indicate the number of years

�*n*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*) �

<sup>2</sup> <sup>=</sup> {0, 1}.

*<sup>j</sup>* (*t*), *<sup>j</sup>* ∈ {0, 1}, *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0)

**3. A Bayesian estimation of the parameters of the model**

Θ = � *M*

Let (*x*1, *<sup>x</sup>*2,..., *xK*) be such that *Yt* <sup>=</sup> *<sup>x</sup>*1. Indicate by *<sup>n</sup>*(*K*)

the vector (*x*1, *<sup>x</sup>*2,..., *xK*) corresponding to a state *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*)

*<sup>m</sup>* (*t*), *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0)

observation *<sup>m</sup>* at time *<sup>t</sup>*, *<sup>t</sup>* = 1, 2, . . . , *<sup>T</sup>*. Additionally, let *<sup>n</sup>*(*K*)

*K*=0

�

(Note that if we have *K* = 0, then the parametric space reduces to Θ = ∆*<sup>T</sup>*

, the transition matrix at time *t*. Note that, if

<sup>2</sup> , *t* = 1, 2, . . . , *T*.

The prior distribution of the vector of parameters is given as follows. We assume a prior independence of *P*(*<sup>K</sup>*)(*t*) as functions of *t*. Also, since the forms of *P*(*<sup>K</sup>*)(*t*) and *Q*(*<sup>K</sup>*)(1) depend on the value of *K*, we have that for θ = (*K*, *Q*(*<sup>K</sup>*)(1), *P*(*<sup>K</sup>*)(*t*), *t* = 1, 2, . . . , *T* − *K*),

$$P(\boldsymbol{\theta}) = P(\boldsymbol{Q}^{(K)}(1) \mid \boldsymbol{K}) \left[ \prod\_{t=1}^{T-K} P\left(P^{(K)}(t) \mid \boldsymbol{K}\right) \right] P(\boldsymbol{K})\_t$$

where *P*(*Q*(*<sup>K</sup>*)(1)| *K*) and *P P*(*<sup>K</sup>*)(*t*)| *K* are the prior distributions of the initial distribution *Q*(*<sup>K</sup>*)(1) and of the transition matrix *P*(*<sup>K</sup>*)(*t*) given the order of the chain, respectively, and *P*(*K*) the prior distribution of the order *K*.

*Remark.* When we have *<sup>K</sup>* = 0, the vector of parameters is <sup>θ</sup>′ = (*Q*(0) *<sup>m</sup>* (*t*); *t* = 1, 2, . . . , *T*), whose prior distribution is

$$P(\boldsymbol{\theta}') = \left[ \prod\_{t=1}^{T} P(\boldsymbol{\mathcal{Q}}^{(0)}(t)) \right] \boldsymbol{\wedge}$$

where *<sup>P</sup>*(*Q*(0)(*t*)) is the prior distribution of the probability vector *<sup>Q</sup>*(0)(*t*)=(*Q*(0) *<sup>m</sup>* (*t*), *m* = 0, 1), *t* = 1, 2, . . . , *T*.

Given the nature of transition matrices, we are going to assume that rows are independent. We also assume that, given the order *K* of the chain, each row of the transition matrix *P*(*<sup>K</sup>*)(*t*) will have as prior distribution a Dirichlet distribution with appropriate hyperparameters. Therefore, given that *<sup>K</sup>* = *<sup>k</sup>*, row (*P*(*k*) *<sup>m</sup>*<sup>0</sup> (*t*), *<sup>P</sup>*(*k*) *<sup>m</sup>*<sup>1</sup> (*t*)) has as prior distribution a Dirichlet(*α*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*), *<sup>α</sup>*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*)), *t* = 1, 2, . . . , *T*; i.e.,

$$P\left(P^{(K)}(t)\mid K\right) = \prod\_{\overline{m}\in S\_2^{(K)}} \left(\frac{\Gamma(a\_{\overline{m}0}^{(K)}(t) + a\_{\overline{m}1}^{(K)}(t))}{\Gamma(a\_{\overline{m}0}^{(K)}(t))\,\Gamma(a\_{\overline{m}1}^{(K)}(t))} \left\{ \left[P\_{\overline{m}0}^{(K)}(t)\right]^{a\_{\overline{m}1}^{(K)}(t)-1} \left[P\_{\overline{m}1}^{(K)}(t)\right]^{a\_{\overline{m}1}^{(K)}(t)-1} \right\} \right)$$

for *t* in the appropriate range. In the case of initial distribution *Q*(*<sup>K</sup>*)(1), we also have a Dirichlet prior distribution, but now with hyperparameters (*α*(*K*) *<sup>m</sup>* ; *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> ).Therefore,

$$P\left(Q^{(K)}(1)\mid K\right) = \frac{\Gamma\left(\sum\_{\overline{m}\in S\_2^{(K)}} \alpha\_{\overline{m}}^{(K)}\right)}{\prod\_{\overline{m}\in S\_2^{(K)}} \Gamma\left(\alpha\_{\overline{m}}^{(K)}\right)} \prod\_{\overline{m}\in S\_2^{(K)}} \left[Q\_{\overline{m}}^{(K)}(1)\right]^{\alpha\_{\overline{m}}^{(K)}-1}.$$

If *K* = 0, then *Q*(0)(*t*) has as prior distribution a Dirichlet distribution with hyperparameters (*α*(0) *<sup>m</sup>* (*t*); *<sup>t</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*; *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0) <sup>2</sup> <sup>=</sup> {0, 1}). Therefore, for any given *<sup>t</sup>*,

$$P\left(\mathcal{Q}^{(0)}(t) \mid K=0\right) = \frac{\Gamma\left(\sum\_{\overline{m}=0}^{1} \alpha\_{\overline{m}}^{(0)}(t)\right)}{\prod\_{\overline{m}=0}^{1} \Gamma\left(\alpha\_{\overline{m}}^{(0)}(t)\right)} \prod\_{\overline{m}=0}^{1} \left[\mathcal{Q}\_{\overline{m}}^{(0)}(t)\right]^{\alpha\_{\overline{m}}^{(0)}(t)-1}.$$

We assume that *K* has as prior distribution a truncated Poisson distribution defined on the set S with rate *λ* > 0; i.e.,

$$P(K) = \frac{\lambda^K}{K!} \ I\_{\mathcal{S}}(K)\_{\mathcal{N}}$$

where *IA*(*x*) = 1, if *<sup>x</sup>* ∈ *<sup>A</sup>* and is zero otherwise.

Therefore, we have from [25, 32, 36], that the conditional posterior distribution of *P*(*K*)(*t*) given *K*, is

$$P\left(P^{(K)}(t)\mid \mathbf{K}, \mathbf{Y}\right) \propto \prod\_{t} \left\{ \prod\_{\overline{m} \in S\_{2}^{(K)}} \left( \left[ P\_{\overline{m}0}^{(K)}(t) \right]^{n\_{\overline{m}0}^{(K)}(t) + a\_{\overline{m}0}^{(K)}(t) - 1} \left[ P\_{\overline{m}1}^{(K)}(t) \right]^{n\_{\overline{m}1}^{(K)}(t) + a\_{\overline{m}1}^{(K)}(t) - 1} \right) \right\}.$$

Hence, *P* � *P*(*K*)(*t*)| *K*, **Y** � is proportional to the product of Dirichlet distributions with hyperparameters (*n*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*) + *<sup>α</sup>*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*) + *<sup>α</sup>*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*)). The mode of each Dirichlet distribution is known and is given by (see [37]),

$$P\_{\overline{m}\overline{i}}^{(K)}(t) = \frac{n\_{\overline{m}\overline{i}}^{(K)}(t) + a\_{\overline{m}\overline{i}}^{(K)}(t) - 1}{\sum\_{j=0}^{1} \left[ n\_{\overline{m}\overline{j}}^{(K)}(t) + a\_{\overline{m}\overline{j}}^{(K)}(t) - 1 \right]}, \quad i = 0, 1; \; \overline{m} \in S\_2^{(K)}; \; K \in \mathcal{S}; \; t = 1, 2, \dots, T - K. \tag{4}$$

Additionally, the posterior distribution of the initial distribution *Q*(*K*)(1) given *K* is

$$P(\mathbf{Q}^{(K)}(1) \mid \mathbf{K}, \mathbf{Y}) \propto \prod\_{\overline{m} \in S\_2^{(K)}} \left[ \mathbf{Q}\_{\overline{m}}^{(K)}(1) \right]^{n\_{\overline{m}}^{(K)} + n\_{\overline{m}}^{(K)} - 1}.$$

Therefore, *P*(*Q*(*K*)(1)| *K*, **Y**) is proportional to a Dirichlet distribution with hyperparameters (*α*(*K*) *<sup>m</sup>* <sup>+</sup> *<sup>n</sup>*(*K*) *<sup>m</sup>* ; *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> ). Hence, as in the case of the posterior distribution of *<sup>P</sup>*(*K*)(*t*), the mode of *P*(*Q*(*K*)(1)| *K*, **Y**) is,

$$\mathbb{Q}\_{\overline{\mathcal{W}}}^{(K)}(1) = \frac{n\_{\overline{\mathcal{W}}}^{(K)} + a\_{\overline{\mathcal{W}}}^{(K)} - 1}{\sum\_{\overline{\mathcal{W}}' \in \mathcal{S}\_2^{(K)}} \left[ n\_{\overline{\mathcal{W}}}^{(K)} + a\_{\overline{\mathcal{W}}}^{(K)} - 1 \right]}, \quad \overline{\mathcal{m}} \in \mathbb{S}\_2^{(K)}; \ K \in \mathcal{S}. \tag{5}$$

When *K* = 0, we have

6 Air Pollution

(*α*(0)

given *K*, is

Hence, *P*

*<sup>P</sup>*(*K*)

(*α*(*K*) *<sup>m</sup>* <sup>+</sup> *<sup>n</sup>*(*K*)

*P* � *P*(*K*)

*<sup>m</sup>* (*t*); *<sup>t</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>*; *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(0)

*P* � *Q*(0)

set S with rate *λ* > 0; i.e.,

(*t*)| *K*, **Y**

�

*mi* (*t*) = *<sup>n</sup>*(*K*)

∑1 *j*=0 � *<sup>n</sup>*(*K*)

*<sup>m</sup>* ; *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*)

*<sup>Q</sup>*(*K*)

mode of *P*(*Q*(*K*)(1)| *K*, **Y**) is,

with hyperparameters (*n*(*K*)

� ∝ ∏ *t*

*P*(*K*)(*t*)| *K*, **Y**

(*t*)| *K* = 0

where *IA*(*x*) = 1, if *<sup>x</sup>* ∈ *<sup>A</sup>* and is zero otherwise.

 ∏ *m*∈*S*(*K*) 2

�

distribution is known and is given by (see [37]),

*mi* (*t*) + *<sup>α</sup>*(*K*)

*mj* (*t*) + *<sup>α</sup>*(*K*)

*P*(*Q*(*K*)

*<sup>m</sup>* (1) = *<sup>n</sup>*(*K*)

∑*m*′ ∈*S*(*K*) 2 � *<sup>n</sup>*(*K*) *<sup>m</sup>*′ <sup>+</sup> *<sup>α</sup>*(*K*)

�� *<sup>P</sup>*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*)

*<sup>m</sup>*<sup>0</sup> (*t*) + *<sup>α</sup>*(*K*)

*mi* (*t*) <sup>−</sup> <sup>1</sup>

*mj* (*t*) <sup>−</sup> <sup>1</sup>

�

<sup>=</sup> <sup>Γ</sup>(∑<sup>1</sup>

∏<sup>1</sup>

*<sup>P</sup>*(*K*) = *<sup>λ</sup><sup>K</sup>*

If *K* = 0, then *Q*(0)(*t*) has as prior distribution a Dirichlet distribution with hyperparameters

*<sup>m</sup>*=<sup>0</sup> *<sup>α</sup>*(0) *<sup>m</sup>* (*t*))

We assume that *K* has as prior distribution a truncated Poisson distribution defined on the

Therefore, we have from [25, 32, 36], that the conditional posterior distribution of *P*(*K*)(*t*)

�*n*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*)+*α*(*K*)

*<sup>m</sup>*<sup>1</sup> (*t*) + *<sup>α</sup>*(*K*)

� , *<sup>i</sup>* <sup>=</sup> 0, 1; *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*)

*<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*)

Additionally, the posterior distribution of the initial distribution *Q*(*K*)(1) given *K* is

*m*∈*S*(*K*) 2

Therefore, *P*(*Q*(*K*)(1)| *K*, **Y**) is proportional to a Dirichlet distribution with hyperparameters

*<sup>m</sup>* <sup>−</sup> <sup>1</sup>

*<sup>m</sup>*′ <sup>−</sup> <sup>1</sup>

� *<sup>Q</sup>*(*K*) *<sup>m</sup>* (1)

(1)<sup>|</sup> *<sup>K</sup>*, **<sup>Y</sup>**) <sup>∝</sup> ∏

*<sup>m</sup>* <sup>+</sup> *<sup>α</sup>*(*K*)

*<sup>m</sup>*=<sup>0</sup> <sup>Γ</sup>(*α*(0)

<sup>2</sup> <sup>=</sup> {0, 1}). Therefore, for any given *<sup>t</sup>*,

*<sup>m</sup>* (*t*))

*<sup>K</sup>*! *<sup>I</sup>*<sup>S</sup> (*K*),

1 ∏*m*=0

*<sup>m</sup>*<sup>0</sup> (*t*)−<sup>1</sup> �

*<sup>P</sup>*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*)

is proportional to the product of Dirichlet distributions

�*α*(*K*) *<sup>m</sup>* <sup>+</sup>*n*(*K*) *<sup>m</sup>* <sup>−</sup><sup>1</sup> .

<sup>2</sup> ). Hence, as in the case of the posterior distribution of *<sup>P</sup>*(*K*)(*t*), the

� , *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*)

�*n*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*)+*α*(*K*)

*<sup>m</sup>*<sup>1</sup> (*t*)). The mode of each Dirichlet

<sup>2</sup> ; *<sup>K</sup>* ∈ S; *<sup>t</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>* <sup>−</sup> *<sup>K</sup>*. (4)

<sup>2</sup> ; *<sup>K</sup>* ∈ S. (5)

*<sup>m</sup>*<sup>1</sup> (*t*)−<sup>1</sup>

� .

� *<sup>Q</sup>*(0) *<sup>m</sup>* (*t*)

�*α*(0) *<sup>m</sup>* (*t*)−<sup>1</sup> .

$$P(\mathcal{Q}^{(0)}(t) \mid \mathcal{K} = 0, \mathbf{Y}) \propto \left( \prod\_{\overline{m} = 0}^{1} \left[ \mathcal{Q}\_{\overline{m}}^{(0)}(t) \right]^{n\_{\overline{m}}^{(0)}(t) + a\_{\overline{m}}^{(0)}(t) - 1} \right), \quad t = 1, 2, \dots, T.$$

Therefore, for each *t* = 1, 2, . . . , *T*,

$$Q\_{\overline{\mathcal{W}}}^{(0)}(t) = \frac{n\_{\overline{\mathcal{W}}}^{(0)}(t) + a\_{\overline{\mathcal{W}}}^{(0)}(t) - 1}{\sum\_{\overline{\mathcal{W}}'=0}^{1} \left[ n\_{\overline{\mathcal{W}}}^{(0)}(t) + a\_{\overline{\mathcal{W}}}^{(0)}(t) - 1 \right]}, \quad \overline{m} = 0, 1. \tag{6}$$

Furthermore, we also have, from [25], that

$$L(\mathbf{Y}|K) \propto \frac{\Gamma\left(\sum\_{\overline{m} \in S\_2^{(K)}} a\_{\overline{m}}^{(K)}\right)}{\Gamma\left(\sum\_{\overline{m} \in S\_2^{(K)}} \left[a\_{\overline{m}}^{(K)} + n\_{\overline{m}}^{(K)}\right]\right)} \left(\prod\_{\overline{m} \in S\_2^{(K)}} \frac{\Gamma\left(n\_{\overline{m}}^{(K)} + a\_{\overline{m}}^{(K)}\right)}{\Gamma\left(a\_{\overline{m}}^{(K)}\right)}\right)$$

$$\prod\_{t=1}^{T-K} \left\{ \prod\_{\overline{m} \in S\_2^{(K)}} \left(\frac{\Gamma[a\_{\overline{m}0}^{(K)}(t) + a\_{\overline{m}1}^{(K)}(t)]}{\Gamma(\sum\_{j=0}^{1} [n\_{\overline{m}j}^{(K)}(t) + a\_{\overline{m}j}^{(K)}(t)])} \prod\_{j=0}^{1} \frac{\Gamma(n\_{\overline{m}j}^{(K)}(t) + a\_{\overline{m}j}^{(K)}(t))}{\Gamma(a\_{\overline{m}j}^{(K)}(t))}\right) \right\}.$$

with the appropriate adaptation for the case of *K* = 0. Hence, the posterior distribution of the order *K* is

$$P(K|\mathbf{Y}) = \frac{1}{c} L(\mathbf{Y}|K) \frac{\lambda^K}{K!} \tag{7}$$

where *<sup>c</sup>* <sup>=</sup> <sup>∑</sup>*k*∈S *<sup>L</sup>*(**<sup>Y</sup>** <sup>|</sup> *<sup>K</sup>* <sup>=</sup> *<sup>k</sup>*) � *λk*/*k*! � is the normalising constant.

Therefore, in order to obtain the probability of interest, we just have to use (7) to estimate the value of *K* that maximises that posterior probability, and then use (4), (5), and/or (6) (depending on the case), in order to calculate the corresponding transition matrix and initial distribution, respectively.

The hyperparameters appearing in the prior distribution will be considered known and will be specified later.

#### **4. Application to ozone data from the monitoring network of Mexico City**

In this section we apply the model to the Mexico City's ozone measurements. The data used consist of twenty two years of the daily maximum ozone measurements (from 01 January 1990 to 31 December 2011) provided by the monitoring network of the Metropolitan Area of Mexico City. The Metropolitan Area is divided into five regions, namely, Northeast (NE), Northwest (NW), Centre (CE), Southeast (SE), and Southwest (SW). The monitoring stations are placed throughout the city. Measurements in each monitoring station are obtained minute by minute and the averaged hourly result is reported at each station. The daily maximum measurement for a given region is the maximum over all the maximum averaged values recorded hourly during a 24-hour period by each station placed in the region. Since emergency alerts in Mexico City are declared regionally, we will analyse each region separately.

The Mexican ozone standard considers the threshold 0.11ppm (see [38]). Hence, we will take that value as one of our thresholds. Additionally, for comparison purpose, we will also take the threshold values 0.15ppm and 0.17ppm. One of the reasons for choosing these latter values is that we would like to know what would happen if the threshold for declaring emergency alerts in Mexico City was lowered to 0.17ppm. The reason for choosing the threshold 0.15ppm is because it is an intermediate value between the Mexican standard and 0.17ppm.

During the observational period considered here, we have that the mean of the daily observed measurements were 0.12, 0.098, 0.13, 0.12, and 0.14, in regions NE. NW, CE, SE, and SW, respectively, with corresponding standard deviations of 0.06, 0.04, 0.06, 0.05, and 0.06, for those same regions. The threshold 0.11ppm was either reached or exceeded in 4280, 3139, 4921, 4921, and 5711 days in regions NE, NW, CE, SE, and SW, respectively. In those same regions, the threshold 0.15ppm was reached or exceeded in 2460, 963, 2819, 2299, and 3594 days, and the numbers in the case of the threshold 0.17ppm are, 1769, 479, 1896, 1419, and 2660, respectively.

Even though it is a general belief that ozone measurements depend on the measurements of only a few days in the past, we are taking *M* = 16 when we consider the threshold values 0.15ppm and 0.17ppm. We have decided to do that because in previous works the order for homogeneous segments could have higher order. In the case of *L* = 0.11ppm, in some cases, larger values of *M* were needed. Hence, we also take *M* = 16, in the case of region NW, and we take *M* = 18 in the case of regions CE, NE, SE, and SW. In order to account also for the possibility of low order, we take *λ* = 1 in the prior distribution of *K*.

The hyperparameters of the Dirichlet prior distributions are assigned as in [25]. Therefore, the values of *<sup>α</sup>*(*K*) *mi* (*t*), *<sup>α</sup>*(0) *<sup>m</sup>* (*t*), and *<sup>α</sup>*(*K*) *<sup>m</sup>* will belong to the set {3, 4, 5, 6, 7, 8}. Hence, assign *α*(*K*) *mi* (*t*) = 8 for the coordinate corresponding to the max{*n*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*)}. Depending on the difference max{*n*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*)} − min{*n*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*)}, an integer value in {3, 4, 5, 6, 7} is assigned to the hyperparameter corresponding to min{*n*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*)}. If we have *<sup>n</sup>*(*K*) *mi* (*t*) = 0, then the value 3 is automatically assigned to the corresponding *<sup>α</sup>*(*K*) *mi* (*t*). Similar procedure is applied in the cases of *<sup>α</sup>*(0) *<sup>m</sup>* (*t*) and *<sup>α</sup>*(*K*) *<sup>m</sup>* .

Table 1 gives the values of *P*(*K* | **Y**). Even though, S includes the values 0, 1, 2, and 3, since the posterior probabilities at those points are of order 10−<sup>8</sup> and below, we have omitted those values of *K*. We use the symbol "-" to indicate that the specific value of *K* either was not considered in the corresponding region or the probability associated to it was small compared to the values shown.

Looking at Table 1 we may see that, if we consider the threshold *L* = 0.11ppm, then the selected order of the chain is *K* equal to 16 in the case of region NE, equal to 12 for region NW, and equal to 17 for regions CE and SE. When we consider region SW, the value of *K* is

A Non-Homogeneous Markov Chain Model to Study Ozone Exceedances in Mexico City 9


8 Air Pollution

382 Current Air Quality Issues

region separately.

0.17ppm.

2660, respectively.

the values of *<sup>α</sup>*(*K*)

difference max{*n*(*K*)

is applied in the cases of *<sup>α</sup>*(0)

compared to the values shown.

*α*(*K*)

*mi* (*t*), *<sup>α</sup>*(0)

*<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*)

obtained minute by minute and the averaged hourly result is reported at each station. The daily maximum measurement for a given region is the maximum over all the maximum averaged values recorded hourly during a 24-hour period by each station placed in the region. Since emergency alerts in Mexico City are declared regionally, we will analyse each

The Mexican ozone standard considers the threshold 0.11ppm (see [38]). Hence, we will take that value as one of our thresholds. Additionally, for comparison purpose, we will also take the threshold values 0.15ppm and 0.17ppm. One of the reasons for choosing these latter values is that we would like to know what would happen if the threshold for declaring emergency alerts in Mexico City was lowered to 0.17ppm. The reason for choosing the threshold 0.15ppm is because it is an intermediate value between the Mexican standard and

During the observational period considered here, we have that the mean of the daily observed measurements were 0.12, 0.098, 0.13, 0.12, and 0.14, in regions NE. NW, CE, SE, and SW, respectively, with corresponding standard deviations of 0.06, 0.04, 0.06, 0.05, and 0.06, for those same regions. The threshold 0.11ppm was either reached or exceeded in 4280, 3139, 4921, 4921, and 5711 days in regions NE, NW, CE, SE, and SW, respectively. In those same regions, the threshold 0.15ppm was reached or exceeded in 2460, 963, 2819, 2299, and 3594 days, and the numbers in the case of the threshold 0.17ppm are, 1769, 479, 1896, 1419, and

Even though it is a general belief that ozone measurements depend on the measurements of only a few days in the past, we are taking *M* = 16 when we consider the threshold values 0.15ppm and 0.17ppm. We have decided to do that because in previous works the order for homogeneous segments could have higher order. In the case of *L* = 0.11ppm, in some cases, larger values of *M* were needed. Hence, we also take *M* = 16, in the case of region NW, and we take *M* = 18 in the case of regions CE, NE, SE, and SW. In order to account also for the

The hyperparameters of the Dirichlet prior distributions are assigned as in [25]. Therefore,

*<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*)

Table 1 gives the values of *P*(*K* | **Y**). Even though, S includes the values 0, 1, 2, and 3, since the posterior probabilities at those points are of order 10−<sup>8</sup> and below, we have omitted those values of *K*. We use the symbol "-" to indicate that the specific value of *K* either was not considered in the corresponding region or the probability associated to it was small

Looking at Table 1 we may see that, if we consider the threshold *L* = 0.11ppm, then the selected order of the chain is *K* equal to 16 in the case of region NE, equal to 12 for region NW, and equal to 17 for regions CE and SE. When we consider region SW, the value of *K* is

*<sup>m</sup>* will belong to the set {3, 4, 5, 6, 7, 8}. Hence, assign

*<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*)

*<sup>m</sup>*<sup>0</sup> (*t*), *<sup>n</sup>*(*K*)

*<sup>m</sup>*<sup>1</sup> (*t*)}, an integer value in {3, 4, 5, 6, 7} is

*<sup>m</sup>*<sup>1</sup> (*t*)}. Depending on the

*mi* (*t*). Similar procedure

*mi* (*t*) =

*<sup>m</sup>*<sup>1</sup> (*t*)}. If we have *<sup>n</sup>*(*K*)

possibility of low order, we take *λ* = 1 in the prior distribution of *K*.

*<sup>m</sup>*<sup>1</sup> (*t*)} − min{*n*(*K*)

0, then the value 3 is automatically assigned to the corresponding *<sup>α</sup>*(*K*)

*<sup>m</sup>* .

*<sup>m</sup>* (*t*) and *<sup>α</sup>*(*K*)

*<sup>m</sup>* (*t*), and *<sup>α</sup>*(*K*)

*mi* (*t*) = 8 for the coordinate corresponding to the max{*n*(*K*)

assigned to the hyperparameter corresponding to min{*n*(*K*)

**Table 1.** Posterior distribution of the order of the chain for all regions and threshold considered. The symbol "-" is used to indicate that the specific value of *K* either was not considered in the corresponding region or the probability associated to it was small compared to the values shown.

either larger than or equal to 18 with probability one. If we take into account the threshold *L* = 0.15ppm, then, also by looking at Table 1, we have that the chosen orders are 10, 6, 11, 9, and 14, in the cases of regions NE, NW, CE, SE, and SW, respectively. When we consider the threshold *L* = 0.17ppm, then the selected orders are 7, 8, and 9, for regions NE, CE, and SW, respectively. In the cases of regions NW and SE, the estimated order is 5. Therefore, using this information and (4), the corresponding transition and initial probabilities may then be calculated.

As an example, consider the case of region CE and the threshold 0.17ppm. In that case, we have that the order of the chain is *K* = 8. Therefore, *S*(*K*) <sup>2</sup> = {0, 1, . . . , 255}. In Table 2, we have the approximated estimated values of the initial distribution *Q*(*K*) *<sup>m</sup>* (1), and of the transition probabilities *P*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*), *t* = 1, 2. (We have truncated the values and the total sum is approximately one.) We use the notation *m* − *m*, to indicate that for all values of *m* in {*m* , *m* + 1, . . . , *m*}, the estimated probabilities are equal to the values shown.

Looking at Table 2, it is possible to see that the highest initial probability is that associated to the state 0, i.e., the first eight days of the year form a string of zeros, meaning that the concentration levels are below 0.17ppm. Additionally, once you have the information that the ozone concentration levels on the first eight days are below 0.17ppm, then the highest transition probability is also associated to the transition to zeros, i.e., the two days following the eight initial days with concentration below 0.17ppm are more likely to present lower concentration levels as well.

In order to illustrate the type of information that may be obtained using the methodology considered here, take the case of the year 2012 and region CE. Suppose we want to calculate the probability that during the first nine days of January we have that the ozone concentration is below 0.17ppm from the first eight days, and it is above it on the ninth. Therefore, we want to know the probability that (0, 0, 0, 0, 0, 0, 0, 0) is followed by one. Hence, we want the probability of having the following sequence of zeros and ones: 0, 0, 0, 0, 0, 0, 0, 0, 1. Therefore,

$$P((0,0,0,0,0,0,0,0,1)) = P\_{\mathbb{D}1}^{(8)}(1) \times Q\_{\mathbb{D}}^{(8)}(1) = 0.238 \times 0.0327 \approx 0.008.$$

10 Air Pollution


**Table 2.** Transition probabilities *P*(8) *<sup>m</sup>*<sup>0</sup> (1) and *<sup>P</sup>*(8) *<sup>m</sup>*<sup>0</sup> (2) as well as the initial probabilities *<sup>Q</sup>*(8) *<sup>m</sup>* (1), for all values of *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) 2 in the case of region CE and threshold 0.17ppm. The notation *m* − *m* is used to indicate that for all values of *m* in {*m* , *m* + 1, . . . , *m*}, the estimated probabilities are equal to the values shown.

In order to obtain the values of the probabilities of interest, recall that *P*(*K*) *<sup>m</sup>*<sup>0</sup> (*t*) = <sup>1</sup> <sup>−</sup> *<sup>P</sup>*(*K*) *<sup>m</sup>*<sup>1</sup> (*t*), *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> , *<sup>t</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>* <sup>−</sup> *<sup>K</sup>*. Therefore, looking at Table 2, we have that *<sup>P</sup>*(8) <sup>01</sup> (1) is one minus the value on the column corresponding to *P*(*K*) *<sup>m</sup>*<sup>0</sup> (1) with *m* = 0. Similar comment is valid in the case of *Q*(8) *<sup>m</sup>* (1), *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*) <sup>2</sup> .

Suppose now that we want to know the probability of having (0, 0, 0, 0, 0, 0, 0, 0) followed by (1, 1). Hence, we want to know what the probability that (0, 0, 0, 0, 0, 0, 0, 0) is followed by one and that (0, 0, 0, 0, 0, 0, 0, 1) is followed by one. Therefore, we need to calculate

$$P((0,0,0,0,0,0,0,0,1,1)) = P\_{\overline{255}1}^{(8)}(2) \times P\_{\overline{01}}^{(8)}(1) \times Q\_{\overline{0}}^{(8)}(1) = 0.5 \times 0.238 \times 0.0327 \approx 0.0004.$$

Proceeding in this way we may calculate the probability of having any string of states at any time of the year.

If we compare to the actual measurements in the year 2012, then we have that in the first ten days, the sequence **Y**, in the case of region CE, has the configuration 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. In fact, the estimated probability of that sequence of zeros and ones is 0.5 × 0.762 × 0.0327 ≈ 0.0125 which is three times higher than the probability of having (0, 0, 0, 0, 0, 0, 0, 0) followed by (1, 1). If we consider also the year 2013, the results are similar. Hence, the methodology used here can produce estimated values that may describe well the behaviour of the data.

## **5. Conclusion**

10 Air Pollution

**Table 2.** Transition probabilities *P*(8)

{*m*

*<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*)

the case of *Q*(8)

time of the year.

*m Q*(8)

*<sup>m</sup>*<sup>0</sup> (1) and *<sup>P</sup>*(8)

, *m* + 1, . . . , *m*}, the estimated probabilities are equal to the values shown.

the value on the column corresponding to *P*(*K*)

<sup>2</sup> .

*<sup>m</sup>* (1), *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*)

*P*((0, 0, 0, 0, 0, 0, 0, 0, 1, 1)) = *P*(8)

In order to obtain the values of the probabilities of interest, recall that *P*(*K*)

255 1(2) <sup>×</sup> *<sup>P</sup>*(8)

<sup>2</sup> , *<sup>t</sup>* <sup>=</sup> 1, 2, . . . , *<sup>T</sup>* <sup>−</sup> *<sup>K</sup>*. Therefore, looking at Table 2, we have that *<sup>P</sup>*(8)

*<sup>m</sup>* (1) *m P*(8)

0 0.0327 0 0.762 0.889 1 − 15 0.0036 1 − 11 0.5 0.5 16 0.0073 12 0.5 0.286 17 − 23 0.0036 13 − 62 0.5 0.5 24 0.0073 63 0.286 0.8 25 − 27 0.0036 64 − 127 0.5 0.5 28 0.0073 128 0.5 0.444 29 − 31 0.0036 129 − 158 0.5 0.5 32 0.0073 159 0.5 0.286 33 − 62 0.0036 160 − 247 0.5 0.5 63 0.0073 248 0.286 0.5 63 − 125 0.0036 249 0.5 0.5 126 0.0073 250 0.286 0.5 127 − 191 0.0036 251 0.5 0.6 192 0.0073 252 − 253 0.5 0.286 193 − 241 0.0036 254 − 255 0.5 0.5 242 0.0073 ––– 243 0.0036 ––– 244 0.0073 ––– 245 − 247 0.0036 ––– 248 0.0073 ––– 249 0.0036 ––– 250 0.0073 ––– 251 − 255 0.0036 –––

*<sup>m</sup>*<sup>0</sup> (2) as well as the initial probabilities *<sup>Q</sup>*(8)

in the case of region CE and threshold 0.17ppm. The notation *m* − *m* is used to indicate that for all values of *m* in

Suppose now that we want to know the probability of having (0, 0, 0, 0, 0, 0, 0, 0) followed by (1, 1). Hence, we want to know what the probability that (0, 0, 0, 0, 0, 0, 0, 0) is followed by one and that (0, 0, 0, 0, 0, 0, 0, 1) is followed by one. Therefore, we need to calculate

<sup>01</sup> (1) <sup>×</sup> *<sup>Q</sup>*(8)

Proceeding in this way we may calculate the probability of having any string of states at any

If we compare to the actual measurements in the year 2012, then we have that in the first ten days, the sequence **Y**, in the case of region CE, has the configuration 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. In fact, the estimated probability of that sequence of zeros and ones is 0.5 × 0.762 × 0.0327 ≈ 0.0125 which is three times higher than the probability of having (0, 0, 0, 0, 0, 0, 0, 0) followed by (1, 1). If we consider also the year 2013, the results are similar. Hence, the methodology used here can produce estimated values that may describe well the behaviour of the data.

*<sup>m</sup>*<sup>0</sup> (1) *<sup>P</sup>*(8)

*<sup>m</sup>*<sup>0</sup> (2)

*<sup>m</sup>* (1), for all values of *<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>*(*K*)

*<sup>m</sup>*<sup>0</sup> (*t*) = <sup>1</sup> <sup>−</sup> *<sup>P</sup>*(*K*)

*<sup>m</sup>*<sup>0</sup> (1) with *m* = 0. Similar comment is valid in

<sup>0</sup> (1) = 0.5 × 0.238 × 0.0327 ≈ 0.0004.

<sup>01</sup> (1) is one minus

2

*<sup>m</sup>*<sup>1</sup> (*t*),

In this work we have considered a non-homogeneous Markov chain model to study the ozone's behaviour in Mexico City. The interest resides in estimating the probability that the ozone level will be above (below) a certain threshold given that it is either above or below it in the present and in the recent past. Due to the nature of the questions asked here, a natural way of trying to answer them is to use Markov chain models. However, due to the non-homogeneity of the data, a non-homogeneous version of the chain is used.

**Figure 1.** Proportion of years in which, for a given day, the threshold 0.11ppm was exceeded by the ozone concentration.

Using the Bayesian approach a maximum *à posteriori* estimation of the order of the matrix as well as its transition matrix and initial distribution was made. The results have shown that higher order should be considered for the chain. One explanation for that could be the way the empirical probability of having an exceedance in a given day behaves. That can be seen in Figure 1 where, as an illustration, the proportion of exceedances of the threshold 0.11ppm is presented for each region. The values correspond to the proportion of years in which in a fixed day the threshold was exceeded. As we vary the days, we have the behaviour throughout the 366 days. In that figure we have in the horizontal axis the days and in the vertical axis we have the values of *prop*(*t*)=(1/*N*) ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *<sup>Y</sup>*(*i*) *<sup>t</sup>* , which represents the proportion of years with an exceedance in the *<sup>t</sup>*th day, *<sup>t</sup>* = 1, 2, . . . , *<sup>T</sup>*. The notation *<sup>Y</sup>*(*i*) *<sup>t</sup>* is used to indicate the variable *Yt* defined in (1) on the *i*th year.

The plots in Figure 1 reflect well the fact that in region SW, in most of the days of the year, there are exceedances of the threshold 0.11. We may also see the influence of the seasons of the year. The hill between days 100 and 200 appearing in every plot, corresponds to measurements taken between April and June. Higher values occur during the days corresponding to approximately mid April to mid May. Those months are in the middle of Spring. During this season it does rain much in Mexico City. Additionally, there is a lot of sunlight. Hence, the ozone concentration is bound to be high, and as a consequence, the proportion of years in which exceedances occur at that period is large. The values decrease when the raining season starts (around the beginning of June).

If we consider the threshold values 0.15ppm and 0.17ppm, the behaviour of the proportion of years where in a given day exceedances occurred is similar to the case of 0.11ppm. The difference is that the values of the proportions are smaller. It is possible to see that the proportion of exceedances may vary according to the seasons of the year, and that, within a given season, changes are, in general, not drastic. Therefore, it is possible that measurements from more than a few days may have an influence on the behaviour of future measurements, and with that, make the estimation method considered here to produce high values for the order of the chain.
