3. Parameter-driven ZIB models

### 3.1. Model formulation

f <sup>t</sup> yt

130 Time Series Analysis and Applications

jF<sup>t</sup>�<sup>1</sup>; πt; ω<sup>t</sup> � � <sup>¼</sup>

of regression coefficients γ and β, and φ = [φ1,…,φp]

corresponding to the past responses [yt� 1,…, yt� <sup>p</sup>]

The partial data likelihood of the observed series is

The log-likelihood for the observation-driven ZIB model is

t¼1

log PLð Þ¼ <sup>θ</sup> <sup>X</sup><sup>n</sup>

likelihood estimator (MPLE).

where θ = [β, φ, γ]

<sup>ω</sup><sup>t</sup> <sup>þ</sup> ð Þ <sup>1</sup> � <sup>ω</sup><sup>t</sup> ð Þ <sup>1</sup> � <sup>π</sup><sup>t</sup> nt

nt yt

Similarly, ω<sup>t</sup> and π<sup>t</sup> represent the zero-inflation parameter and the intensity parameter, respectively. Both parameters are modeled via logit link functions. Specifically, we assume that

logitð Þ¼ <sup>ω</sup><sup>t</sup> <sup>x</sup><sup>Τ</sup>

where x1,<sup>t</sup> and x2,<sup>t</sup> are sets of time-dependent explanatory variables for the corresponding vectors

parameter ω<sup>t</sup> as a constant that does not vary over time. In the observation-driven ZIB model, serial correlation is accommodated by introducing lagged values of the response to the linear predictor.

t¼1

require the derivation of the joint distribution of the response and the covariates, and is largely simplified relative to the full likelihood. This approach facilitates conditional inference for a fairly large class of transitional processes where the response depends on its past values.

log <sup>ω</sup>tI <sup>y</sup>ð Þ <sup>t</sup>¼<sup>0</sup> <sup>þ</sup> ð Þ <sup>1</sup> � <sup>ω</sup><sup>t</sup>

The vector θ^ obtained by maximizing the partial likelihood is called the maximum partial

Similar to Section 2.1, we can apply the EM algorithm or the Newton–Raphson method to obtain the MPLE. This estimation process can be conveniently conducted in practice using standard software tools available for fitting classical ZIB models. In SAS, we can use the finite mixture models (FMM) procedure to fit the observation-driven ZIB model, while we can use function gamlss in the package generalized additive models for location scale and shape (GAMLSS) for model fitting in R. Hypothesis testing for θ is carried out through the partial likelihood method. The common tests are based on Wald statistics, score statistics, and partial likelihood ratio statistics. All of these tests are

conducted based on the framework for classical maximum likelihood inference.

f <sup>t</sup> yt jF<sup>t</sup>�<sup>1</sup>

<sup>Τ</sup> is the vector of unknown parameters. The partial likelihood does not

nt yt � � πyt

� �

<sup>2</sup>,t<sup>β</sup> <sup>þ</sup><sup>X</sup> p

j¼1 φj yt�j

Τ

πyt

!

ð Þ 1 � ω<sup>t</sup>

logitð Þ¼ <sup>π</sup><sup>t</sup> <sup>x</sup><sup>Τ</sup>

PLð Þ¼ <sup>θ</sup> <sup>Y</sup><sup>n</sup>

8 >>><

>>>:

, if yt ¼ 0,

<sup>1</sup>,tγ, (6)

<sup>Τ</sup> is a vector of autoregressive coefficients

. For simplicity, we treat the zero-inflation

� �, (8)

<sup>t</sup> ð Þ <sup>1</sup> � <sup>π</sup><sup>t</sup> nt�yt

: (9)

, (7)

(5)

<sup>t</sup> ð Þ <sup>1</sup> � <sup>π</sup><sup>t</sup> nt�yt , if yt <sup>&</sup>gt; <sup>0</sup>:

An alternative approach to describe binomial time series with excess zeros is based on parameterdriven ZIB models. This class of models can be viewed as an analogue of the parameter-driven ZIP models presented by Yang et al. [5].

To account for temporal dynamics in the series, we introduce a latent stationary autoregressive process {zt} of order p (AR(p)):

$$z\_t = \sum\_{i=1}^p \phi\_i z\_{t-i} + \varepsilon\_t. \tag{10}$$

Here, ε<sup>t</sup> is a Gaussian white noise process with a mean of 0 and a variance of σ<sup>2</sup> . Additionally, φ<sup>i</sup> explains how the past state zt � <sup>i</sup> relates to the current state zt.

Let yt be the observed count at time t. Given the current state zt, the positive count response yt is assumed to follow a ZIB distribution with a probability mass function defined as

$$f\_t(\boldsymbol{y}\_t|\boldsymbol{\pi}\_t; \boldsymbol{\pi}\_t, \omega\_t) = \begin{cases} \boldsymbol{\omega}\_t + (1 - \omega\_t)(1 - \pi\_t)^{\boldsymbol{u}\_t}, & \text{if } \boldsymbol{y}\_t = \mathbf{0}, \\\\ (1 - \omega\_t) \binom{\boldsymbol{n}\_t}{\boldsymbol{y}\_t} \pi\_t^{\boldsymbol{y}\_t} (1 - \pi\_t)^{\boldsymbol{n}\_t - \boldsymbol{y}\_t}, & \text{if } \boldsymbol{y}\_t > \mathbf{0}. \end{cases} \tag{11}$$

Similar to the previous model parameterizations, ω<sup>t</sup> and π<sup>t</sup> represent the zero-inflation parameter and the intensity parameter, respectively. Both parameters are modeled via logit link functions and could be time-varying. To relate the intensity parameter π<sup>t</sup> to the latent component zt, we use the model

$$\text{logit}(\pi\_t) = \mathbf{x}\_t^\mathrm{T}\boldsymbol{\beta} + \mathbf{z}\_{t\boldsymbol{\nu}} \tag{12}$$

where x<sup>t</sup> is a set of explanatory variables observed at time t, and β is the corresponding vector of regression coefficients. In the present setting, x<sup>t</sup> is assumed fixed. For simplicity, we treat the zero-inflation parameter ω<sup>t</sup> as a constant that does not vary over time.

For the parameter-driven ZIB model, the conditional mean and variance of the response variable yt are given by

$$\mathbf{E}\left(\mathbf{Y}\_{t}|\mathbf{z}\_{t}\right) = (\mathbf{1}\cdot\boldsymbol{\omega}\_{t})\mathbf{n}\_{t}\boldsymbol{\pi}\_{t\prime} \tag{13}$$

$$\operatorname{Var}(Y\_t|z\_t) = (1 \cdot \omega\_t) n\_t \pi\_t [1 \cdot \pi\_t(1 \cdot \omega\_t n\_t)].\tag{14}$$

Obviously, the presence of zero-inflation (ω<sup>t</sup> > 0) not only explains the excess zeros in the series, but also introduces overdispersion. Additionally, the correlated random effects zt contribute to the extra variance.

We can write the parameter-driven ZIB model in the following hierarchical form:

$$\mathbf{s}\_{t}|\mathbf{s}\_{t-1} \sim \mathcal{N}\_p(\mathbf{0}\mathbf{s}\_{t-1}, \boldsymbol{\Sigma}),\tag{15}$$

$$
\mu\_t \sim \text{Bernoulli}\left(\omega\right), \tag{16}
$$

$$y\_t | \mathbf{s}\_t \; \boldsymbol{u}\_t \sim \text{Binomial}\left(\boldsymbol{u}\_t, (1 - \boldsymbol{u}\_t)\boldsymbol{\pi}\_t\right) \tag{17}$$

where s<sup>t</sup> = [zt, …, zt � <sup>p</sup> + 1] <sup>Τ</sup> is a p-dimensional state vector with zt being its first element, ut is an unobservable membership indicator that determines whether the response comes from a degenerate distribution or an ordinary binomial distribution, Φ is an unknown transition matrix, and Σ is the covariance matrix of the state noise process st. The process s<sup>t</sup> is initiated with a normal vector s<sup>0</sup> that has mean μ<sup>0</sup> and covariance matrix Σ0. Diffuse priors are often assigned to s<sup>0</sup> in practice. Given the two unobserved latent processes s<sup>t</sup> and ut, we can conceptualize a sequential update of the response variable yt.

In Eq. (15), Φ and Σ are p � p matrices defined as follows:

$$\begin{aligned} \boldsymbol{\Phi} = \begin{bmatrix} \phi\_1 & \phi\_2 & \cdots & \phi\_{p-1} & \phi\_p \\ 1 & 0 & \cdots & 0 & 0 \\ 0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & 0 \end{bmatrix}, \quad \boldsymbol{\Sigma} = \begin{bmatrix} \sigma^2 & 0 & \cdots & 0 & 0 \\ 0 & 0 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 0 & 0 \\ 0 & 0 & \cdots & 0 & 0 \end{bmatrix}. \end{aligned} \tag{18}$$

The transition matrix Φ governs the generation of the state vector s<sup>t</sup> from the past state s<sup>t</sup> � <sup>1</sup> for time points t = 1, …, n. Note that the covariance matrix Σ in Eq. (18) is not positive definite. This is both legitimate and common in the state-space modeling approach.

#### 3.2. Parameter estimation via MCEM algorithm

#### 3.2.1. Model fitting

To fit the parameter-driven ZIB model, in principle, one would first obtain the marginal likelihood of the observed data y1,…, yn by integrating out unobserved components. However, because of the presence of correlated random effects and the non-Gaussian nature of the response, these integrals are not analytically tractable. Therefore, approximations or numerical solutions for the maximum likelihood estimates (MLEs) are necessary. Instead of obtaining the MLEs based on the marginal likelihood, we propose an EM algorithm [23], which relies on the complete-data likelihood to estimate the parameters.

Let y1 : <sup>t</sup> = [y1, y2, …, yt] <sup>Τ</sup> denote the vector of observed data from time point 1 through t. In a similar fashion, let s0 : <sup>t</sup> = [s0, s1, …, st] <sup>Τ</sup> and u1 : <sup>t</sup> = [u1, u2, …, ut] <sup>Τ</sup> denote the vectors of two latent processes, respectively, over the same time frame. Let θ = [ω, β<sup>Τ</sup> , φ<sup>Τ</sup> , σ] <sup>Τ</sup> denote the vector of unknown parameters.

To develop an EM algorithm for parameter estimation of the mixture model, Eqs. (15)– (17), we begin by formulating the complete-data likelihood; i.e., the joint density of s0 : <sup>n</sup>, u1 : <sup>n</sup>, and y1 : <sup>n</sup>. The two latent processes s0 : <sup>n</sup> and u1 : <sup>n</sup> are considered missing. Based on the state-space representation, the complete-data likelihood may be orthogonally decomposed as follows:

$$\begin{split} L\_c(\boldsymbol{\Theta}) &= f\left(\mathbf{s}\_{0:n}, u\_{1:n}, y\_{1:n}\right) \\ &= f(\mathbf{s}\_{0:n}, u\_{1:n}) f\left(y\_{1:n} | \mathbf{s}\_{0:n}, u\_{1:n}\right) \\ &= f(\mathbf{s}\_{0:n}) f(u\_{1:n}) f\left(y\_{1:n} | \mathbf{s}\_{0:n}, u\_{1:n}\right) \\ &= f(\mathbf{s}\_0) \prod\_{t=1}^n f(\mathbf{s}\_t | \mathbf{s}\_{t-1}) \prod\_{t=1}^n f(u\_t) \prod\_{t=1}^n f(y\_t | \mathbf{s}\_t, u\_t) .\end{split} \tag{19}$$

Here, the initial state vector s<sup>0</sup> is assumed to be normally distributed with mean vector μ<sup>0</sup> and covariance matrix Σ0. In implementing the algorithm, we set μ<sup>0</sup> = 0 and Σ<sup>0</sup> = Ip, as the effect of the starting values of μ<sup>0</sup> and Σ<sup>0</sup> on the estimated parameters θ is negligible.

Up to an additive constant, the complete-data log-likelihood is given by

We can write the parameter-driven ZIB model in the following hierarchical form:

and ut, we can conceptualize a sequential update of the response variable yt.

is an unobservable membership indicator that determines whether the response comes from a degenerate distribution or an ordinary binomial distribution, Φ is an unknown transition matrix, and Σ is the covariance matrix of the state noise process st. The process s<sup>t</sup> is initiated with a normal vector s<sup>0</sup> that has mean μ<sup>0</sup> and covariance matrix Σ0. Diffuse priors are often assigned to s<sup>0</sup> in practice. Given the two unobserved latent processes s<sup>t</sup>

The transition matrix Φ governs the generation of the state vector s<sup>t</sup> from the past state s<sup>t</sup> � <sup>1</sup> for time points t = 1, …, n. Note that the covariance matrix Σ in Eq. (18) is not positive definite. This is both legitimate and common in the state-space modeling approach.

To fit the parameter-driven ZIB model, in principle, one would first obtain the marginal likelihood of the observed data y1,…, yn by integrating out unobserved components. However, because of the presence of correlated random effects and the non-Gaussian nature of the response, these integrals are not analytically tractable. Therefore, approximations or numerical solutions for the maximum likelihood estimates (MLEs) are necessary. Instead of obtaining the MLEs based on the marginal likelihood, we propose an EM algorithm [23], which relies on the

<sup>Τ</sup> and u1 : <sup>t</sup> = [u1, u2, …, ut]

, Σ ¼

<sup>Τ</sup> denote the vector of observed data from time point 1 through t. In a

, φ<sup>Τ</sup> , σ]

yt

In Eq. (15), Φ and Σ are p � p matrices defined as follows:

<sup>φ</sup><sup>1</sup> <sup>φ</sup><sup>2</sup> <sup>⋯</sup> <sup>φ</sup><sup>p</sup>�<sup>1</sup> <sup>φ</sup><sup>p</sup> 1 0 ⋯ 0 0 0 1 ⋯ 0 0 ⋮ ⋮ ⋱⋮ ⋮ 0 0 ⋯ 1 0

Φ ¼

3.2. Parameter estimation via MCEM algorithm

complete-data likelihood to estimate the parameters.

processes, respectively, over the same time frame. Let θ = [ω, β<sup>Τ</sup>

3.2.1. Model fitting

Let y1 : <sup>t</sup> = [y1, y2, …, yt]

unknown parameters.

similar fashion, let s0 : <sup>t</sup> = [s0, s1, …, st]

where s<sup>t</sup> = [zt, …, zt � <sup>p</sup> + 1]

132 Time Series Analysis and Applications

st∣s<sup>t</sup>�<sup>1</sup> � N <sup>p</sup>ð Þ Φs<sup>t</sup>�<sup>1</sup>; Σ , (15)

∣st, ut � Binomial nt ð Þ ;ð Þ 1 � ut π<sup>t</sup> , (17)

<sup>Τ</sup> is a p-dimensional state vector with zt being its first element, ut

ut � Bernoullið Þ ω , (16)

<sup>Τ</sup> denote the vectors of two latent

<sup>Τ</sup> denote the vector of

: (18)

$$\begin{split} l\_{\epsilon}(\boldsymbol{\theta}) &= -\frac{n}{2} \log \sigma^{2} - \frac{1}{2\sigma^{2}} \sum\_{t=1}^{n} \left( \mathbf{z}\_{t} - \boldsymbol{\phi}^{T} \mathbf{s}\_{t-1} \right)^{2} \\ &+ \sum\_{t=1}^{n} \left\{ \boldsymbol{\mu}\_{t} \log \boldsymbol{\omega} + (1 - \boldsymbol{\mu}\_{t}) \log \left( 1 - \boldsymbol{\omega} \right) \right\} \\ &+ \sum\_{t=1}^{n} (1 - \boldsymbol{\mu}\_{t}) \left\{ \boldsymbol{y}\_{t} \mathbf{x}\_{t}^{T} \boldsymbol{\beta} - \boldsymbol{m}\_{t} \log \left( 1 + \exp \left( \mathbf{x}\_{t}^{T} \boldsymbol{\beta} + \boldsymbol{z}\_{t} \right) \right) \right\}. \end{split} \tag{20}$$

The complete-data log-likelihood can be described as the sum of three functionally independent parameter forms, such that lc(θ) = l(φ, σ| st) + l(ω| ut) + l(β| st, ut), resulting in ease of the maximization in the M-step for each set of parameters.

With the implementation of the EM algorithm, we need to compute the conditional expectation of lc(θ) given the observed data y1 : <sup>n</sup>. Deriving an analytical form for the conditional expectation is not feasible due to the nonlinear forms in the latent variables and the response, as well as the non-Gaussian distributions of the response and the latent indicators. There are many numerical methods available to approximate the conditional expectation, such as the Markov chain Monte Carlo (MCMC) algorithm [24], the MCEM algorithm [7, 25], the penalized quasi-likelihood (PQL) method [2], and integrated nested Laplace approximations (INLA) [26]. Following Yang et al. [5], we develop an MCEM algorithm to approximate the conditional expectation.

To simplify the notation, we let Að Þ<sup>j</sup> <sup>t</sup> , <sup>b</sup>ð Þ<sup>j</sup> <sup>t</sup> , c ð Þj <sup>t</sup> , d ð Þj <sup>t</sup> , e ð Þj <sup>t</sup> , and f ð Þj <sup>t</sup> denote the conditional expectations of <sup>s</sup><sup>t</sup>�<sup>1</sup>s<sup>Τ</sup> <sup>t</sup>�<sup>1</sup>, zts<sup>t</sup> � 1, <sup>z</sup><sup>2</sup> <sup>t</sup> , ut, 1ð Þ � ut log 1 <sup>þ</sup> exp <sup>x</sup><sup>Τ</sup> <sup>t</sup> β þ zt � � � � , and 1ð Þ � ut exp <sup>x</sup><sup>Τ</sup> <sup>t</sup> β þ zt � �= <sup>1</sup> <sup>þ</sup> exp <sup>x</sup><sup>Τ</sup> <sup>t</sup> β þ zt � � � � evaluated at θ(j) , respectively. In the Monte Carlo E-step of the algorithm, we first compute the conditional expectation of lc(θ):

$$\begin{split} Q\left(\boldsymbol{\theta}|\boldsymbol{\theta}^{(j)}\right) &= E\left\{l\_{t}(\boldsymbol{\theta})|\boldsymbol{y}\_{1:n}, \boldsymbol{\theta}^{(j)}\right\} \\ &= -\frac{n}{2}\log\sigma^{2} - \frac{1}{2\sigma^{2}}\sum\_{t=1}^{n}\left(c\_{t}^{(j)} - 2\phi^{T}\mathbf{b}\_{t}^{(j)} + \phi^{T}\mathbf{A}\_{t}^{(j)}\boldsymbol{\phi}\right) \\ &+ \sum\_{t=1}^{n}\left\{d\_{t}^{(j)}\log\boldsymbol{\omega} + \left(1 - d\_{t}^{(j)}\right)\log\left(1 - \boldsymbol{\omega}\right)\right\} \\ &+ \sum\_{t=1}^{n}\left\{\left(1 - d\_{t}^{(j)}\right)y\_{t}\mathbf{x}\_{t}^{T}\boldsymbol{\beta} - n\_{t}e\_{t}^{(j)}\right\}, \end{split} \tag{21}$$

where particle filtering and smoothing techniques are used to approximate the conditional expectations. The details of the particle methods for the parameter-driven ZIB model are presented in Section 3.3.

The following partial derivatives are applied to maximize Q(θ| θ(j) ) in the M-step:

$$\frac{\partial \mathcal{Q}}{\partial \omega} = \frac{1}{\omega} \sum\_{t=1}^{n} d\_t^{(j)} - \frac{1}{1 - \omega} \sum\_{t=1}^{n} \left( 1 - d\_t^{(j)} \right), \tag{22}$$

$$\frac{\partial \mathcal{Q}}{\partial \phi} = \frac{1}{\sigma^2} \sum\_{t=1}^{n} \left( \mathbf{b}\_t^{(j)} - \mathbf{A}\_t^{(j)} \phi \right), \tag{23}$$

$$\frac{\partial \mathbf{Q}}{\partial \sigma} = -\frac{n}{\sigma} + \frac{1}{\sigma^3} \sum\_{t=1}^{n} \left( c\_t^{(j)} - 2\phi^T \mathbf{b}\_t^{(j)} + \phi^T \mathbf{A}\_t^{(j)} \phi \right), \tag{24}$$

$$\begin{split} \frac{\partial \mathcal{Q}}{\partial \boldsymbol{\beta}} &= \frac{\partial \mathcal{E} \left\{ l\_{\boldsymbol{c}}(\boldsymbol{\Theta}) | y\_{1:n}, \boldsymbol{\Theta}^{(j)} \right\}}{\partial \boldsymbol{\beta}} \\ &= \mathcal{E} \left( \frac{\partial l\_{\boldsymbol{c}}(\boldsymbol{\Theta})}{\partial \boldsymbol{\beta}} | y\_{1:n}, \boldsymbol{\Theta}^{(j)} \right) \\ &= \mathcal{E} \left( \sum\_{t=1}^{n} \left\{ (1 - u\_{t}) y\_{t} - n\_{t} (1 - u\_{t}) \frac{\exp \left( \mathbf{x}\_{t}^{T} \boldsymbol{\beta} + z\_{t} \right)}{1 + \exp \left( \mathbf{x}\_{t}^{T} \boldsymbol{\beta} + z\_{t} \right)} \right\} \mathbf{x}\_{t} | y\_{1:n}, \boldsymbol{\Theta}^{(j)} \right) \\ &= \sum\_{t=1}^{n} \left\{ \left( 1 - d\_{t}^{(j)} \right) y\_{t} - n\_{t} \boldsymbol{\epsilon}\_{t}^{(j)} \right\} \mathbf{x}\_{t} . \end{split} \tag{25}$$

At the j th iteration, we obtain the following closed-form solutions for ω(<sup>j</sup> + 1), φ(<sup>j</sup> + 1), and σ(<sup>j</sup> + 1):

$$
\omega^{(j+1)} = \frac{1}{n} \sum\_{t=1}^{n} d\_t^{(j)},\tag{26}
$$

$$\phi^{(j+1)} = \left(\sum\_{t=1}^{n} \mathbf{A}\_t^{(j)}\right)^{-1} \sum\_{t=1}^{n} \mathbf{b}\_t^{(j)}.\tag{27}$$

$$\sigma^{(j+1)} = \sqrt{\frac{1}{n} \left\{ \sum\_{t=1}^{n} a\_t^{(j)} - \left(\sum\_{t=1}^{n} \mathbf{b}\_t^{(j)}\right)^T \left(\sum\_{t=1}^{n} \mathbf{A}\_t^{(j)}\right)^{-1} \sum\_{t=1}^{n} \mathbf{b}\_t^{(j)}} \right\}} \tag{28}$$

In addition, we can easily compute β(<sup>j</sup> + 1) through iterative algorithms such as Broyden-Fletcher-Goldfarb-Shanno (BFGS). Once we acquire the particle smoothers from the smoothing step, we can obtain the MCEM estimates by plugging in the sample means of the functions of particle smoothers for the conditional expectations.

To offset the slow convergence and to reduce the computational cost of the EM algorithm, starting with good initial parameters is essential. For the proposed parameter-driven ZIB model, we suggest using the estimates of the parameters from a classical ZIB model or from the observation-driven ZIB model discussed in Section 2.2.

### 3.2.2. Standard errors

<sup>Q</sup> <sup>θ</sup>jθð Þ<sup>j</sup> � �

presented in Section 3.3.

134 Time Series Analysis and Applications

∂Q ∂β ¼

At the j

<sup>¼</sup> E lcð Þj <sup>θ</sup> <sup>y</sup><sup>1</sup>:<sup>n</sup>; <sup>θ</sup>ð Þ<sup>j</sup> n o

<sup>2</sup> log <sup>σ</sup><sup>2</sup> � <sup>1</sup>

1 � d ð Þj t � �

2σ<sup>2</sup>

<sup>t</sup> log ω þ 1 � d

yt xΤ <sup>t</sup> β � nte

n o

where particle filtering and smoothing techniques are used to approximate the conditional expectations. The details of the particle methods for the parameter-driven ZIB model are

Xn t¼1 c ð Þj

<sup>t</sup> � <sup>2</sup>φ<sup>Τ</sup>bð Þ<sup>j</sup>

ð Þj t � �

> Xn t¼1

bð Þ<sup>j</sup> <sup>t</sup> � <sup>A</sup>ð Þ<sup>j</sup> <sup>t</sup> φ � �

<sup>t</sup> � <sup>2</sup>φ<sup>Τ</sup>bð Þ<sup>j</sup>

c ð Þj

( )

!

th iteration, we obtain the following closed-form solutions for ω(<sup>j</sup> + 1), φ(<sup>j</sup> + 1), and σ(<sup>j</sup> + 1):

ð Þj t

,

1 � d ð Þj t � �

<sup>t</sup> <sup>þ</sup> <sup>φ</sup><sup>Τ</sup>Að Þ<sup>j</sup>

<sup>t</sup> β þ zt � �

<sup>t</sup> β þ zt � �

� �

exp x<sup>Τ</sup>

<sup>1</sup> <sup>þ</sup> exp <sup>x</sup><sup>Τ</sup>

Xn t¼1

bð Þ<sup>j</sup>

<sup>t</sup> φ

n o

<sup>t</sup> <sup>þ</sup> <sup>φ</sup><sup>Τ</sup>Að Þ<sup>j</sup>

� �

log 1ð Þ � ω

<sup>t</sup> φ

) in the M-step:

, (22)

, (24)

(25)

, (23)

<sup>x</sup>tjy<sup>1</sup>:<sup>n</sup>; <sup>θ</sup>ð Þ<sup>j</sup>

<sup>t</sup> , (26)

<sup>t</sup> , (27)

(21)

¼ � <sup>n</sup>

þX<sup>n</sup> t¼1 d ð Þj

þX<sup>n</sup> t¼1

The following partial derivatives are applied to maximize Q(θ| θ(j)

∂Q <sup>∂</sup><sup>φ</sup> <sup>¼</sup> <sup>1</sup> σ2 Xn t¼1

∂Q <sup>∂</sup><sup>ω</sup> <sup>¼</sup> <sup>1</sup> ω Xn t¼1 d ð Þj <sup>t</sup> � <sup>1</sup> 1 � ω

∂Q <sup>∂</sup><sup>σ</sup> ¼ � <sup>n</sup> σ þ 1 σ3 Xn t¼1

<sup>¼</sup> <sup>E</sup> <sup>∂</sup>lcð Þ <sup>θ</sup>

<sup>¼</sup> <sup>E</sup> <sup>X</sup><sup>n</sup> t¼1

<sup>¼</sup> <sup>X</sup><sup>n</sup> t¼1

<sup>∂</sup>E lcð Þj <sup>θ</sup> <sup>y</sup><sup>1</sup>:<sup>n</sup>; <sup>θ</sup>ð Þ<sup>j</sup> n o ∂β

<sup>∂</sup><sup>β</sup> <sup>j</sup>y<sup>1</sup>:<sup>n</sup>; <sup>θ</sup>ð Þ<sup>j</sup> � �

> 1 � d ð Þj t � �

ð Þ 1 � ut yt � ntð Þ 1 � ut

ð Þj t

<sup>ω</sup>ð Þ <sup>j</sup>þ<sup>1</sup> <sup>¼</sup> <sup>1</sup> n Xn t¼1 d ð Þj

t¼1

Að Þ<sup>j</sup> t !�<sup>1</sup>

<sup>φ</sup>ð Þ <sup>j</sup>þ<sup>1</sup> <sup>¼</sup> <sup>X</sup><sup>n</sup>

xt:

yt � ntf

n o

Standard errors of the parameter estimators can be obtained either by using the inverse of the observed information to approximate the variance/covariance matrix, or by employing a collection of replicated bootstrapped parameter estimates. Given the computational cost of the MCEM algorithm, we pursue the first approach by applying Louis's formula [27] to compute the observed information matrix Io(θ). Based on the missing information principle, we have

$$\mathbf{I}\_o(\boldsymbol{\theta}) = \mathbf{I}\_c(\boldsymbol{\theta}) - \mathbf{I}\_m(\boldsymbol{\theta}),\tag{29}$$

where Ic(θ) and Im(θ) are defined as follows:

$$\mathbf{I}\_{\varepsilon}(\boldsymbol{\theta}) = E \left( -\frac{\partial^2 l\_{\varepsilon}}{\partial \boldsymbol{\Theta} \partial \boldsymbol{\Theta}^T} \, | \boldsymbol{y}\_{1:n} \right) \, \tag{30}$$

$$\mathbf{I}\_{m}(\boldsymbol{\theta}) = E\left(\frac{\partial l\_{\boldsymbol{\varepsilon}}}{\partial \boldsymbol{\Theta}} \frac{\partial l\_{\boldsymbol{\varepsilon}}}{\partial \boldsymbol{\Theta}^{T}} | \boldsymbol{y}\_{1:n}\right) - E\left(\frac{\partial l\_{\boldsymbol{\varepsilon}}}{\partial \boldsymbol{\Theta}} | \boldsymbol{y}\_{1:n}\right) E\left(\frac{\partial l\_{\boldsymbol{\varepsilon}}}{\partial \boldsymbol{\Theta}^{T}} | \boldsymbol{y}\_{1:n}\right). \tag{31}$$

The first-order derivatives of lc(θ) are given by

$$\frac{\partial l\_t}{\partial \omega} = \frac{1}{\omega} \sum\_{t=1}^{n} u\_t - \frac{1}{1 - \omega} \sum\_{t=1}^{n} (1 - u\_t)\_{\prime} \tag{32}$$

$$\frac{\partial l\_c}{\partial \beta} = \sum\_{t=1}^{n} (1 - u\_t) \left\{ y\_t - n\_t \frac{\exp\left(\mathbf{x}\_t^T \beta + z\_t\right)}{1 + \exp\left(\mathbf{x}\_t^T \beta + z\_t\right)} \right\} \mathbf{x}\_t \tag{33}$$

$$\frac{\partial l\_{\varepsilon}}{\partial \phi} = \frac{1}{\sigma^2} \sum\_{t=1}^{n} \left( z\_t - \phi^T \mathbf{s}\_{t-1} \right) \mathbf{s}\_{t-1} \tag{34}$$

$$\frac{\partial l\_c}{\partial \sigma} = -\frac{n}{\sigma} + \frac{1}{\sigma^3} \sum\_{t=1}^n \left( \mathbf{z}\_t - \boldsymbol{\phi}^T \mathbf{s}\_{t-1} \right)^2. \tag{35}$$

The second-order derivatives of lc(θ) are given by

$$\frac{\partial^2 l\_c}{\partial \omega \partial \omega} = -\frac{1}{\omega^2} \sum\_{t=1}^n \mu\_t - \frac{1}{(1-\omega)^2} \sum\_{t=1}^n (1-\mu\_t)\_{\prime} \tag{36}$$

$$\frac{\partial^2 l\_\varepsilon}{\partial \beta \partial \beta^T} = -\sum\_{t=1}^n (1 - u\_t) n\_t \frac{\exp\left(\mathbf{x}\_t^T \boldsymbol{\beta} + z\_t\right)}{\left[1 + \exp\left(\mathbf{x}\_t^T \boldsymbol{\beta} + z\_t\right)\right]^2} \mathbf{x}\_t \mathbf{x}\_t^T. \tag{37}$$

$$\frac{\partial^2 l\_c}{\partial \phi \partial \phi^T} = -\frac{1}{\sigma^2} \sum\_{t=1}^n \mathbf{s}\_{t-1} \mathbf{s}\_{t-1'}^T \tag{38}$$

$$\frac{\partial^2 l\_\varepsilon}{\partial \sigma \partial \sigma} = \frac{n}{\sigma^2} - \frac{3}{\sigma^4} \sum\_{t=1}^n \left( z\_t - \boldsymbol{\phi}^T \mathbf{s}\_{t-1} \right)^2,\tag{39}$$

$$\frac{\partial^2 l\_c}{\partial \phi \partial \sigma} = -\frac{2}{\sigma^3} \sum\_{t=1}^n \left( \mathbf{z}\_t - \boldsymbol{\phi}^T \mathbf{s}\_{t-1} \right) \mathbf{s}\_{t-1}. \tag{40}$$

Again, particle filtering and smoothing techniques are used to approximate the conditional expectations in Ic(θ) and Im(θ).

In principle, the variance/covariance matrix can be approximated by taking the inverse of the observed information matrix. However, the computation of the inverse is often problematic. As indicated by Kim and Stoffer [25], the observed information matrix is not guaranteed to be numerically positive definite. To address this problem, we slightly modify Louis's formula by introducing a slack variable ξ, such that

$$\mathbf{I}\_o(\boldsymbol{\theta}) = \mathbf{I}\_c(\boldsymbol{\theta}) - (1 - \boldsymbol{\xi})\mathbf{I}\_m(\boldsymbol{\theta}),\tag{41}$$

where ξ is a non-negative variable ranging from 0 to 1. In practice, we can iteratively increase this value until the observed information matrix can be inverted.

#### 3.3. Particle methods

Particle filtering [21] and particle smoothing [22] belong to the class of sequential Monte Carlo (SMC) methods [28]. These particle methods can be viewed as the non-linear and non-Gaussian extensions of the popular Kalman filtering and smoothing algorithms for traditional state-space models. Rather than yielding a single estimate for the filter or the smoother, as computed through conventional Kalman filtering and smoothing, particle methods provide a set of particles with associated weights to approximate the conditional densities governing the filters and smoothers. Implemented via sequential importance sampling (SIS), in the E-step of the EM algorithm, particle methods provide approximate solutions to the intractable integrals corresponding to the conditional expectations of functions of the latent components given the observed data. However, sample degeneracy is a typical problem for SIS methods. In particular, degeneracy occurs when particles have small weights or even negative weights, rendering their contributions to the conditional density negligible. Resampling (e.g., bootstrapping) offers a recourse for eliminating particles with negligible effects. Kim [29] provides an elegant treatment of particle filtering and smoothing for state-space models.

### Particle filtering

∂lc <sup>∂</sup><sup>σ</sup> ¼ � <sup>n</sup> σ þ 1 σ3 Xn t¼1

> ω2 Xn t¼1

∂<sup>2</sup>lc <sup>∂</sup>φ∂φ<sup>Τ</sup> ¼ � <sup>1</sup>

ð Þ 1 � ut nt

<sup>σ</sup><sup>2</sup> � <sup>3</sup> σ4 Xn t¼1

> σ3 Xn t¼1

t¼1

∂<sup>2</sup>lc <sup>∂</sup>σ∂<sup>σ</sup> <sup>¼</sup> <sup>n</sup>

∂<sup>2</sup>lc <sup>∂</sup>φ∂<sup>σ</sup> ¼ � <sup>2</sup>

this value until the observed information matrix can be inverted.

ut � <sup>1</sup> ð Þ 1 � ω 2 Xn t¼1

> σ2 Xn t¼1

Again, particle filtering and smoothing techniques are used to approximate the conditional

In principle, the variance/covariance matrix can be approximated by taking the inverse of the observed information matrix. However, the computation of the inverse is often problematic. As indicated by Kim and Stoffer [25], the observed information matrix is not guaranteed to be numerically positive definite. To address this problem, we slightly modify Louis's formula by

where ξ is a non-negative variable ranging from 0 to 1. In practice, we can iteratively increase

Particle filtering [21] and particle smoothing [22] belong to the class of sequential Monte Carlo (SMC) methods [28]. These particle methods can be viewed as the non-linear and non-Gaussian extensions of the popular Kalman filtering and smoothing algorithms for traditional state-space models. Rather than yielding a single estimate for the filter or the smoother, as computed through conventional Kalman filtering and smoothing, particle methods provide a set of particles with associated weights to approximate the conditional densities governing the filters and smoothers. Implemented via sequential importance sampling (SIS), in the E-step of the EM algorithm, particle methods provide approximate solutions to the intractable integrals corresponding to the conditional expectations of functions of the latent components given the

The second-order derivatives of lc(θ) are given by

136 Time Series Analysis and Applications

∂<sup>2</sup>lc <sup>∂</sup>β∂βΤ ¼ �X<sup>n</sup>

expectations in Ic(θ) and Im(θ).

3.3. Particle methods

introducing a slack variable ξ, such that

∂<sup>2</sup>lc <sup>∂</sup>ω∂<sup>ω</sup> ¼ � <sup>1</sup>

zt � φΤs<sup>t</sup>�<sup>1</sup> � �<sup>2</sup>

exp x<sup>Τ</sup>

<sup>1</sup> <sup>þ</sup> exp <sup>x</sup><sup>Τ</sup>

<sup>s</sup><sup>t</sup>�<sup>1</sup>s<sup>Τ</sup>

zt � <sup>φ</sup><sup>Τ</sup>s<sup>t</sup>�<sup>1</sup> � �<sup>2</sup>

zt � <sup>φ</sup><sup>Τ</sup>s<sup>t</sup>�<sup>1</sup>

<sup>t</sup> β þ zt � �

<sup>t</sup> β þ zt � � � � <sup>2</sup> <sup>x</sup>tx<sup>Τ</sup>

: (35)

ð Þ 1 � ut , (36)

<sup>t</sup>�<sup>1</sup>, (38)

� �s<sup>t</sup>�<sup>1</sup>: (40)

Ioð Þ¼ θ Icð Þ� θ ð Þ 1 � ξ Imð Þ θ , (41)

, (39)

<sup>t</sup> , (37)

For the parameter-driven ZIB model, we implement particle filtering by first generating s ð Þi <sup>0</sup>∣<sup>0</sup> � N <sup>p</sup> μ0; Σ<sup>0</sup> � �. Then for t = 1, …, n:


$$q\_{t|t-1}^{(i)} \propto \binom{\eta\_t}{y\_t} \left( \left(1 - \mu\_{t|t-1}^{(i)}\right) \pi\_{t|t-1}^{(i)}\right)^{y\_t} \left(1 - \left(1 - \mu\_{t|t-1}^{(i)}\right) \pi\_{t|t-1}^{(i)}\right)^{\eta\_t - y\_t} \tag{42}$$

where logit πð Þ<sup>i</sup> <sup>t</sup>∣<sup>t</sup>-1 � � <sup>¼</sup> <sup>x</sup><sup>Τ</sup> <sup>t</sup> β þ z ð Þi <sup>t</sup>∣<sup>t</sup>-1 and z ð Þi <sup>t</sup>∣t�<sup>1</sup> is the first element of <sup>s</sup> ð Þi <sup>t</sup>∣t�<sup>1</sup>.

(F.3) Generate s ð Þi <sup>t</sup>∣<sup>t</sup> ; <sup>u</sup>ð Þ<sup>i</sup> t∣t � � by resampling <sup>s</sup> ð Þi <sup>t</sup>∣t�<sup>1</sup>; <sup>u</sup>ð Þ<sup>i</sup> t∣t�1 � � with replacement based on the preceding filtering weights.

As a byproduct of the particle filtering, the observed-data log-likelihood can be approximated by

$$\sum\_{t=1}^{n} \log \left( \frac{1}{N} \sum\_{i=1}^{N} q\_{t|t-1}^{(i)} \right), \tag{43}$$

where N is the number of particles in the filtering step.

### Particle smoothing

Next, we employ the particle smoothing algorithm proposed by Godsill et al. [22] to obtain the conditional expectations of the functions of the latent variables given the complete set of observed data. In this step, we first choose s ð Þr <sup>n</sup>∣<sup>n</sup>; <sup>u</sup>ð Þ<sup>r</sup> n∣n � � <sup>¼</sup> <sup>s</sup> ð Þi <sup>n</sup>∣<sup>n</sup>; <sup>u</sup>ð Þ<sup>i</sup> n∣n � � with probability <sup>q</sup> ð Þi n∣n�1 . Then for t = n � 1, …, 1:

(S.1) Calculate the smoothing weights

$$q\_{t|\boldsymbol{\mu}}^{(i)} \propto q\_{t|t-1}^{(i)} \exp\left\{ -\frac{1}{2\sigma^2} \left( z\_{t+1|\boldsymbol{\mu}}^{(i)} - \phi^T \mathbf{s}\_{t|t}^{(i)} \right)^2 \right\} a^{\boldsymbol{u}\_{i+1|t}^{(i)}} (1 - \omega)^{1 - \boldsymbol{u}\_{i+1|t}^{(i)}}.\tag{44}$$

(S.2) Choose s ð Þr <sup>t</sup>∣<sup>n</sup>; <sup>u</sup>ð Þ<sup>r</sup> t∣n � � <sup>¼</sup> <sup>s</sup> ð Þi <sup>t</sup>∣<sup>t</sup> ; <sup>u</sup>ð Þ<sup>i</sup> t∣t � � with probability <sup>q</sup> ð Þi t∣n.

We obtain independent realizations by repeating the preceding process for r = 1, …, R.
