1. Introduction

Multiple time series are of considerable interest in an array of domains, such as finance, economics, engineering and so on. The data are collected in time order and consist of several related variables of interest, for instance, the data of stock price indexes and the status data of important instruments such as shuttles. It is of much practical significance to model this kind of data well. Moreover, a lot of commonly seen multiple time series are correlated, which makes it reasonable to regard them as a single vector and to fit them using multivariate models. Multivariate models perform well in exploring the interdependencies among multiple time series and capturing the dynamic structure.

Plenty of contributions have been made in the field of parametric models for multivariate time series. For instance, Sims proposed vector autoregressive (VAR) models in 1980 [1], Engle and Kroner considered multivariate generalized autoregressive conditional heteroscedastic (GARCH) models in 1995 [2], and Tsay developed the multivariate threshold models in 1998 [3]. Compared

© 2018 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

to parametric models, nonparametric models require less assumption about the model structure and are more flexible. Combined with the fact that nonlinearity widely exists in time series, it is ideal to model the multiple time series using nonparametric models. However, not much of achievements have been made about this. This is partly due to the complexity of nonparametric smoothing as well as the curse of dimensionality. With these objectives in mind, Jiang proposed the multivariate functional-coefficient model in 2014 [4], which provides a useful tool for modeling vector time series data.

In this chapter, we first review some vector time series models, next extend them to include an error-correction term by incorporating cointegration among integrated variables, then develop a single index model for choosing the smoothing variable and a variable selection method for the multivariate functional-coefficient models, and finally study multivariate time-varying coefficient models and related hypothesis testing problems.

The remainder of this chapter is organized as follows. In Section 2 we review vector autoregressive (VAR) models. In Section 3, we consider multivariate functional-coefficient regression models and their extensions, where a model selection rule is also proposed. In Section 4 we introduce multivariate time-varying coefficient models and propose a generalized likelihood ratio test. In Section 5 we make a conclusion and discuss some interesting research topics to be completed.

## 2. Review of VAR models

The vector autoregressive model is a generalization of the univariate autoregressive model for forecasting a vector of time series. This model was pioneered by Sims in Ref. [1] and it has acquired a prominent role in analyzing macroeconomic time series. Prior to 1980, large-scale statistical dynamic simultaneous equations model (DSEMs) was widely used in empirical macroeconomics, which often contained dozens or even hundreds of equations. As the economic environment has grown more complicated, the traditional simultaneous models have grown. Sims believed that since these models do not dichotomize variables into "endogenous" and "exogenous," the exclusion restrictions used to identify the simultaneous equations models make little sense. Thus, he advocated the vector autoregressive model (VAR) to model the interrelationships among a set of macroeconomic variables. In the structure of VAR models, each variable is a linear function of past lags of itself and past lags of the other variables. Sims demonstrated that VARs provide a flexible and tractable framework for analyzing economic time series. While hardly relying on economic theorems, VAR models have proven efficient in capturing the dynamics of multivariate systems as well as forecasting [1]. Specifically, a vector autoregressive model of order p [VAR(p)] has the following general form:

$$y\_t = \mathcal{c} + A\_1 y\_{t-1} + \dots A\_p y\_{t-p} + \mathcal{e}\_t \tag{1}$$

where yt = (y1t, … , yKt) 0 is a set of K time series variables, c is a K � 1 vector of constant, Ai's are K � K coefficient matrices, and et are error terms. Usually, et are assumed to be zero-mean independent white noise with time-invariant and positive-definite covariance matrix Σ. For example, a VAR (1) model with two time series components can be written as:

$$
\begin{pmatrix} y\_{1t} \\ y\_{2t} \end{pmatrix} = \begin{pmatrix} c\_1 \\ c\_2 \end{pmatrix} + \begin{pmatrix} A\_{11} & A\_{12} \\ A\_{21} & A\_{22} \end{pmatrix} \begin{pmatrix} y\_{1,t-1} \\ y\_{2,t-1} \end{pmatrix} + \begin{pmatrix} e\_{1,t} \\ e\_{2,t} \end{pmatrix},
$$

or the equation set

to parametric models, nonparametric models require less assumption about the model structure and are more flexible. Combined with the fact that nonlinearity widely exists in time series, it is ideal to model the multiple time series using nonparametric models. However, not much of achievements have been made about this. This is partly due to the complexity of nonparametric smoothing as well as the curse of dimensionality. With these objectives in mind, Jiang proposed the multivariate functional-coefficient model in 2014 [4], which provides a

In this chapter, we first review some vector time series models, next extend them to include an error-correction term by incorporating cointegration among integrated variables, then develop a single index model for choosing the smoothing variable and a variable selection method for the multivariate functional-coefficient models, and finally study multivariate time-varying

The remainder of this chapter is organized as follows. In Section 2 we review vector autoregressive (VAR) models. In Section 3, we consider multivariate functional-coefficient regression models and their extensions, where a model selection rule is also proposed. In Section 4 we introduce multivariate time-varying coefficient models and propose a generalized likelihood ratio test. In Section 5 we make a conclusion and discuss some interesting research topics

The vector autoregressive model is a generalization of the univariate autoregressive model for forecasting a vector of time series. This model was pioneered by Sims in Ref. [1] and it has acquired a prominent role in analyzing macroeconomic time series. Prior to 1980, large-scale statistical dynamic simultaneous equations model (DSEMs) was widely used in empirical macroeconomics, which often contained dozens or even hundreds of equations. As the economic environment has grown more complicated, the traditional simultaneous models have grown. Sims believed that since these models do not dichotomize variables into "endogenous" and "exogenous," the exclusion restrictions used to identify the simultaneous equations models make little sense. Thus, he advocated the vector autoregressive model (VAR) to model the interrelationships among a set of macroeconomic variables. In the structure of VAR models, each variable is a linear function of past lags of itself and past lags of the other variables. Sims demonstrated that VARs provide a flexible and tractable framework for analyzing economic time series. While hardly relying on economic theorems, VAR models have proven efficient in capturing the dynamics of multivariate systems as well as forecasting [1]. Specifically, a vector autoregressive model of order p [VAR(p)] has the following general form:

K � K coefficient matrices, and et are error terms. Usually, et are assumed to be zero-mean

yt <sup>¼</sup> <sup>c</sup> <sup>þ</sup> <sup>A</sup>1yt�<sup>1</sup> <sup>þ</sup> …Apyt�<sup>p</sup> <sup>þ</sup> et (1)

is a set of K time series variables, c is a K � 1 vector of constant, Ai's are

useful tool for modeling vector time series data.

94 Time Series Analysis and Applications

to be completed.

2. Review of VAR models

where yt = (y1t, … , yKt)

0

coefficient models and related hypothesis testing problems.

$$\begin{aligned} y\_{1t} &= c\_1 + A\_{11} y\_{1, t-1} + A\_{12} y\_{2, t-1} + e\_{1, t}, \\ y\_{2t} &= c\_2 + A\_{21} y\_{1, t-1} + A\_{22} y\_{2, t-1} + e\_{2, t} \end{aligned}$$

Using lag-operator L, Eq. (1) can be written as the following form:

$$y\_t = c + \left(A\_1 L + A\_2 L^2 + \dots + A\_p L^p \right) y\_t + e\_t \tag{2}$$

Let A(z) = I � A1z � A2z <sup>2</sup>� …Apz p , where z is a complex number. Then the VAR process is stable if

$$\det(A(z)) \neq 0 \\ \text{for } |z| \le 1. \tag{3}$$

In other words, the determinant of the matrix polynomial has no roots in and on the complex unit circle. If the stability conditions are satisfied and the process can be extended to the infinite past, then the VAR process is stationary.

For model (1), since the right-hand side consists of only predetermined variables and the error terms are assumed to be independent white noise with time-invariant covariance, each equation can be estimated by ordinary least squares (OLS). Zellner proved that the OLS estimator coincides with the generalized LS (GLS) estimator [5].

The celebrated model (1) is easy to fit, and its autoregressive structure allows one to study the feedback effects and the Granger causality. However, model (1) employs only the lagged values of yt for forecast and ignores other potentially important variables' effect. In addition, as time evolved, the coefficients remain constant, which may contrast the real situations where the dynamic structure of the relationship among different time series involves with time.

### 3. Multivariate functional-coefficient regression models and extensions

We briefly reviewed VAR models in the previous section. This parametric method has been significantly developed and widely applied to econometric dynamics as well as other domains. An alternative to modeling vector time series is the nonparametric method, which requires much fewer assumptions on the model structure and may shed light on the later parametric fitting. To illustrate the basic idea of this approach, let us begin with the multivariate threshold autoregressive model [3].

#### 3.1. Multivariate threshold autoregressive model

The multivariate threshold autoregressive model is a generalization of the univariate threshold autoregressive model [6]. The idea is to partition one-dimensional variable into s regimes and impose an AR model with exogenous variables in each regime. Consider a k� dimensional time series yt = (y1t, …, ykt) 0 and a v-dimensional exogenous variable xt = (x1t, …, xvt) 0 , for t = 1,…, n. Let � ∞ = r<sup>0</sup> < r<sup>1</sup> < ⋯ < rs = ∞. The multivariate threshold model with threshold variable zt and delay d has the following form:

$$\mathbf{y}\_t = \mathbf{c}\_{\dot{\jmath}} + \sum\_{i=1}^n \phi\_i^{(\dot{\jmath})} y\_{t-i} + \sum\_{i=1}^q \beta\_i^{(\dot{\jmath})} \mathbf{x}\_{t-i} + \varepsilon\_i^{(\dot{\jmath})} \mathbf{if} r\_{\dot{\jmath}-1} < z\_{t-d} \le r\_{\dot{\jmath}} (\dot{\jmath} = 1, \ldots, s), \tag{4}$$

where p and q are nonnegative integers and ε<sup>t</sup> ð Þ<sup>j</sup> <sup>¼</sup> <sup>Σ</sup><sup>j</sup> 1 2at, with Σ<sup>j</sup> 1 <sup>2</sup> being a positive-definite matrix and af g<sup>t</sup> a sequence of serially uncorrelated random vectors with mean zero and covariance matrix Ιk. The threshold variable zt is assumed to be stationary and has a continuous distribution.

Model (4) is piecewise linear in the threshold space of zt � <sup>d</sup>, but it is nonlinear when s > 1 [3]. This model has proven to be useful in practice. Nevertheless, the assumption embedded in this model weakens the practicability, that is, the coefficients are assumed to be constants in the threshold space of zt � <sup>d</sup> in model (4). This assumption is questionable since the economic conditions tend to change slowly over time and the coefficient functions may vary smoothly. Motivated by this, Jiang proposed the multivariate functional-coefficient model, in which the coefficients are functions of threshold variable zt � <sup>d</sup> instead of constants [4].

#### 3.2. Multivariate functional-coefficient models

The multivariate functional-coefficient model has the following form:

$$\mathbf{y}\_t = \mathbf{c}(\mathbf{z}\_{t-d}) + \sum\_{i=1}^p \phi\_i(\mathbf{z}\_{t-d}) \mathbf{y}\_{t-i} + \sum\_{i=1}^q \beta\_i(\mathbf{z}\_{t-d}) \mathbf{x}\_{t-i} + \varepsilon\_{t\prime} \tag{5}$$

where cð�Þ is a k � 1 functional vector, φi(�) are k � k functional matrices, and βi(�) are k � v functional matrices. The innovation satisfies ε<sup>t</sup> ¼ σ<sup>t</sup> <sup>∗</sup>at, where σ<sup>t</sup> <sup>∗</sup> is a positive-definite matrix and af g<sup>t</sup> as in Eq. (4). Assume that σ<sup>t</sup> <sup>∗</sup> is measurable with respect to the σ-field generated by the historical information F<sup>t</sup> � <sup>1</sup> = {(wj,zj � <sup>d</sup>) : j ≤ t}, where wj = (xj � 1, … , xj � <sup>q</sup>, yj � 1, …, yj � <sup>p</sup>). For model (5), we are interested in estimating the regression part. Once it is estimated, one may consider making simultaneous inference about parameters and using the residuals to study the structure of the volatility matrix. This model is a generalization of vector autoregressive models [1], threshold models [3] and functional-coefficient models [7–10]. Even for onedimensional settings with k = 1, model (5) includes important predictive regression models in econometrics, such as the linear predictive models with nonstationary predictors [11–13] and functional-coefficient models for nonstationary time series data [14]. Model (5) can also be used to investigate the Granger Causality [15–17] and the feedback effect in engineering and finance [18, 19].

For model (5), a weighted local least squares estimation method was provided in [4]. Let Xt = vec(1, yt � 1, … , yt � <sup>p</sup>, xt � 1, …, xt � <sup>p</sup>) and Φ(z)=(c(z), φ1(z), ⋯ , φp(z), β1(z), … , βq(z)). Then model (5) becomes

3.1. Multivariate threshold autoregressive model

0

time series yt = (y1t, …, ykt)

96 Time Series Analysis and Applications

ous distribution.

delay d has the following form:

yt <sup>¼</sup> cj <sup>þ</sup>X<sup>n</sup>

i¼1 φi

where p and q are nonnegative integers and ε<sup>t</sup>

3.2. Multivariate functional-coefficient models

yt ¼ c zð Þþ <sup>t</sup>�<sup>d</sup>

functional matrices. The innovation satisfies ε<sup>t</sup> ¼ σ<sup>t</sup>

and af g<sup>t</sup> as in Eq. (4). Assume that σ<sup>t</sup>

finance [18, 19].

ð Þ<sup>j</sup> yt�<sup>i</sup> <sup>þ</sup><sup>X</sup>

q

i¼1 βi

coefficients are functions of threshold variable zt � <sup>d</sup> instead of constants [4].

The multivariate functional-coefficient model has the following form:

X p

i¼1 φi

The multivariate threshold autoregressive model is a generalization of the univariate threshold autoregressive model [6]. The idea is to partition one-dimensional variable into s regimes and impose an AR model with exogenous variables in each regime. Consider a k� dimensional

n. Let � ∞ = r<sup>0</sup> < r<sup>1</sup> < ⋯ < rs = ∞. The multivariate threshold model with threshold variable zt and

ð Þ<sup>j</sup> xt�<sup>i</sup> <sup>þ</sup> <sup>ε</sup><sup>i</sup>

matrix and af g<sup>t</sup> a sequence of serially uncorrelated random vectors with mean zero and covariance matrix Ιk. The threshold variable zt is assumed to be stationary and has a continu-

Model (4) is piecewise linear in the threshold space of zt � <sup>d</sup>, but it is nonlinear when s > 1 [3]. This model has proven to be useful in practice. Nevertheless, the assumption embedded in this model weakens the practicability, that is, the coefficients are assumed to be constants in the threshold space of zt � <sup>d</sup> in model (4). This assumption is questionable since the economic conditions tend to change slowly over time and the coefficient functions may vary smoothly. Motivated by this, Jiang proposed the multivariate functional-coefficient model, in which the

ð Þ zt�<sup>d</sup> yt�<sup>i</sup> <sup>þ</sup><sup>X</sup>

where cð�Þ is a k � 1 functional vector, φi(�) are k � k functional matrices, and βi(�) are k � v

the historical information F<sup>t</sup> � <sup>1</sup> = {(wj,zj � <sup>d</sup>) : j ≤ t}, where wj = (xj � 1, … , xj � <sup>q</sup>, yj � 1, …, yj � <sup>p</sup>). For model (5), we are interested in estimating the regression part. Once it is estimated, one may consider making simultaneous inference about parameters and using the residuals to study the structure of the volatility matrix. This model is a generalization of vector autoregressive models [1], threshold models [3] and functional-coefficient models [7–10]. Even for onedimensional settings with k = 1, model (5) includes important predictive regression models in econometrics, such as the linear predictive models with nonstationary predictors [11–13] and functional-coefficient models for nonstationary time series data [14]. Model (5) can also be used to investigate the Granger Causality [15–17] and the feedback effect in engineering and

q

i¼1 βi

<sup>∗</sup>at, where σ<sup>t</sup>

<sup>∗</sup> is measurable with respect to the σ-field generated by

and a v-dimensional exogenous variable xt = (x1t, …, xvt)

ð Þ<sup>j</sup> <sup>¼</sup> <sup>Σ</sup><sup>j</sup> 1

2at, with Σ<sup>j</sup>

0

<sup>2</sup> being a positive-definite

ð Þ zt�<sup>d</sup> xt�<sup>i</sup> þ εt, (5)

<sup>∗</sup> is a positive-definite matrix

ð Þ<sup>j</sup> ifrj�<sup>1</sup> <sup>&</sup>lt; zt�<sup>d</sup> <sup>≤</sup> rjð Þ <sup>j</sup> <sup>¼</sup> <sup>1</sup>;…;<sup>s</sup> , (4)

1

, for t = 1,…,

$$\mathbf{y}\_t = \Phi(\mathbf{z}\_{t-d})\mathbf{X}\_t + \varepsilon\_{t\prime} \tag{6}$$

where Φ(�) is a k � m matrix-valued function and Xt is an m � 1 vector with m =1+ pk + qv. For any zt � <sup>d</sup> in the neighborhood of z, by the Taylor expansion, we have

$$
\Phi(z\_{t-d}) \simeq \Phi(z) + \Phi'(z)(z\_{t-d} - z) \equiv A + B(z\_{t-d} - z).
$$

Let <sup>S</sup> and <sup>V</sup> be 2 � 2 matrices whose (i, <sup>j</sup>)th elements are <sup>μ</sup><sup>i</sup> <sup>+</sup> <sup>j</sup> � <sup>2</sup> <sup>=</sup> <sup>Ð</sup> ui <sup>+</sup> <sup>j</sup> � <sup>2</sup> K(u)du and <sup>ν</sup><sup>i</sup> <sup>+</sup> <sup>j</sup> � <sup>2</sup> <sup>=</sup> <sup>Ð</sup> ui <sup>+</sup> <sup>j</sup> � <sup>2</sup> K2 (u)du, respectively, and let s = (μ2, μ3) 0 . Given any invertible working variance matrix σ<sup>t</sup> <sup>2</sup> of σ<sup>t</sup> ∗2 , the estimator <sup>A</sup><sup>~</sup> ; <sup>B</sup><sup>~</sup> � � is achieved by minimizing

$$\sum\_{t=s'+1}^{n} \|\sigma\_t^{-1} \left[y\_t - AX\_t - BX\_t(z\_{t-d} - z)\right] \|^2 K\_{h\_t}(z\_{t-d} - z)\_t$$

where ∥ � ∥ denotes the Euclidean norm, s 0 = max(p, d, q), and Khn (x) = hn �1 K(x/hn) for kernel function K(�) with bandwidth hn controlling the amount of smoothing. Let Khn (i) (zt � <sup>d</sup> � z) = hn �i (zt � <sup>d</sup>� z)Khn (z�<sup>d</sup> � <sup>z</sup>) and <sup>S</sup>~ni <sup>¼</sup> <sup>P</sup><sup>n</sup> t¼s<sup>0</sup> þ1 XtXt <sup>T</sup> � � ⊗ σ<sup>t</sup> �<sup>2</sup>Khn ð Þ<sup>i</sup> ð Þ zt�<sup>d</sup> � <sup>z</sup> for <sup>i</sup> = 0 , 1 , 2. Then the weighted estimators <sup>A</sup><sup>~</sup> ; <sup>B</sup>~<sup>Þ</sup> � admit the closed form:

$$
\begin{pmatrix} \text{vec}(\tilde{A}) \\\\ \text{vec}(h\_n\tilde{B}) \end{pmatrix} = \begin{pmatrix} \tilde{\mathbf{S}}\_{n0} & \tilde{\mathbf{S}}\_{n1} \\\\ \tilde{\mathbf{S}}\_{n1} & \tilde{\mathbf{S}}\_{n2} \end{pmatrix}^{-1} \left( \begin{array}{c} \sum\_{t=s'+1}^{n} \{\mathbf{X}\_t \otimes \sigma\_t^{-2}\} y\_t \mathbf{K}\_{h\_n} (\mathbf{z}\_{t-d} - \mathbf{z}) \\\\ \sum\_{t=s'+1}^{n} \{\mathbf{X}\_t \otimes \sigma\_t^{-2}\} y\_t \mathbf{K}\_{h\_n} ^{(1)} (\mathbf{z}\_{t-d} - \mathbf{z}) \end{array} \right).
$$

Under certain conditions, the weighted estimators are asymptotically normal (see [4]).

Recall that, in model (5), σ<sup>t</sup> <sup>∗</sup> is a positive-definite matrix measurable with respect to the sigmaalgebra generated by historical information. If there is a parametric structure of σ<sup>t</sup> ∗ , for example, the generalized autoregressive conditional heteroscedastic (GARCH) errors [4], then it helps to improve the efficiency of the weighted estimation. Example 3 in [4] exemplifies this point. Our intuition is that, if a parametric structure of σ<sup>t</sup> <sup>∗</sup> is correctly specified, then the weighted estimation mimics the oracle estimation in the sense that σ<sup>t</sup> <sup>∗</sup> is known. This intuition can be verified theoretically since σ<sup>t</sup> <sup>∗</sup> can be estimated at rate of ffiffiffi n p which is faster than what we can do for the regression function in model (5).

#### 3.3. Extension of multivariate functional-coefficient models

Due to the fact that many economic factors are not stationary, classic regression analysis requiring the stationarity condition suffers from a great limitation. Cointegration analysis has become a formidable toolkit in analyzing non-stationary economic time series. The concept of cointegration goes back to Granger [20] and initiated a literal research boom. Engle & Granger proposed the well-known Engel-Granger test to examine whether there is a cointegrating relationship among a set of first-order integrated variables [21].

Motivated by Granger and Engel & Granger, Jiang proposed an error-correction version of model (5) by incorporating the cointegrating relationship of first-order integrated variables [4]. This allows us to cope with the nonstationarity of vector time series and to improve the accuracy of forecasting.

Let st denote a k � 1 vector of first-order integrated variables and let yt = st� st � 1. Assume that there is a co-integrating relationship for st; that is, there exists a unique k � r(0 < r < k) deterministic matrix θ of rank r and a stationary process ut such that θ<sup>T</sup> st = ut. Then an error-correction form of model (5) is

$$y\_t = \mathbf{c}(\mathbf{z}\_{t-d}) + \gamma(\mathbf{z}\_{t-d})u\_{t-1} + \sum\_{i=1}^p \phi\_i(\mathbf{z}\_{t-d})y\_{t-i} + \sum\_{i=1}^q \beta\_i(\mathbf{z}\_{t-d})\mathbf{x}\_{t-i} + \varepsilon\_t \tag{7}$$

where γ(zt � <sup>d</sup>) is a k � r coefficient matrix. This model simplifies to the Granger representation theorem if the coefficient functions are constant and there are no exogenous variables [4].

Due to the widespread presence of cointegrating variables in finance and economics, model (7) should improve the practicability of model (5). However, model (7) requires specification of variable zt. This can be relaxed by using the idea of single index models. Recall that model (5) can be represented in succinct form (6). The similar operations can be applied to model (7). Now set zt = γ<sup>T</sup> Xt and let data decide the value of γ. Then model (7) can be extended as

$$\mathbf{y}\_t = \Phi(\mathbf{y}^T \mathbf{X}\_{t-d}) \mathbf{X}\_t + \varepsilon\_{t\prime} \tag{8}$$

where γ is a directional vector such that its first nonzero entry is positive. Model (8) is more flexible than model (7), it is key to estimate γ. We introduce the profile lease squares method to estimate model (8). The estimation procedure consists of several steps:

Step 1. Given an initial value of γ, one obtains the weighted estimator Φb ð Þ �; γ of coefficient function in the same way as for model (6).

Step 2. Find the value <sup>γ</sup><sup>b</sup> to minimize

$$\sum\_{t=s'+1}^{n} \|y\_t - \hat{\Phi}\left(\boldsymbol{\upchi}^T \mathbf{X}\_{t-d}; \boldsymbol{\upchi}\right) \mathbf{X}\_t\|^2. \tag{9}$$

Step 3. Update the value of <sup>γ</sup> by <sup>γ</sup>b, and repeat Step 1 and Step 2 many times until convergence. The coefficient function <sup>Φ</sup>(�) is estimated by <sup>Φ</sup><sup>b</sup> ð Þ �; <sup>γ</sup><sup>b</sup> .

It can be shown that <sup>Φ</sup><sup>b</sup> ð Þ �; <sup>γ</sup><sup>b</sup> shares the same asymptotic normality as the Oracle weighted estimator in the sense that it knows the true value of <sup>γ</sup>, since <sup>γ</sup><sup>b</sup> is ffiffiffi n p -consistent.

### 3.4. Variable selection of multivariate functional-coefficient models

become a formidable toolkit in analyzing non-stationary economic time series. The concept of cointegration goes back to Granger [20] and initiated a literal research boom. Engle & Granger proposed the well-known Engel-Granger test to examine whether there is a cointegrating

Motivated by Granger and Engel & Granger, Jiang proposed an error-correction version of model (5) by incorporating the cointegrating relationship of first-order integrated variables [4]. This allows us to cope with the nonstationarity of vector time series and to improve the

Let st denote a k � 1 vector of first-order integrated variables and let yt = st� st � 1. Assume that there is a co-integrating relationship for st; that is, there exists a unique k � r(0 < r < k) determin-

ð Þ zt�<sup>d</sup> yt�<sup>i</sup> <sup>þ</sup><sup>X</sup>

q

i¼1 βi

� �Xt <sup>þ</sup> <sup>ε</sup>t, (8)

p

i¼1 φi

yt <sup>¼</sup> Φ γTXt�<sup>d</sup>

estimate model (8). The estimation procedure consists of several steps:

Xn t¼s<sup>0</sup> þ1

estimator in the sense that it knows the true value of <sup>γ</sup>, since <sup>γ</sup><sup>b</sup> is ffiffiffi

function in the same way as for model (6).

The coefficient function <sup>Φ</sup>(�) is estimated by <sup>Φ</sup><sup>b</sup> ð Þ �; <sup>γ</sup><sup>b</sup> .

Step 2. Find the value <sup>γ</sup><sup>b</sup> to minimize

where γ(zt � <sup>d</sup>) is a k � r coefficient matrix. This model simplifies to the Granger representation theorem if the coefficient functions are constant and there are no exogenous variables [4].

Due to the widespread presence of cointegrating variables in finance and economics, model (7) should improve the practicability of model (5). However, model (7) requires specification of variable zt. This can be relaxed by using the idea of single index models. Recall that model (5) can be represented in succinct form (6). The similar operations can be applied to model (7).

where γ is a directional vector such that its first nonzero entry is positive. Model (8) is more flexible than model (7), it is key to estimate γ. We introduce the profile lease squares method to

Step 1. Given an initial value of γ, one obtains the weighted estimator Φb ð Þ �; γ of coefficient

<sup>∥</sup>yt � Φ γ <sup>b</sup> TXt�<sup>d</sup>; <sup>γ</sup> � �Xt∥<sup>2</sup>

Step 3. Update the value of <sup>γ</sup> by <sup>γ</sup>b, and repeat Step 1 and Step 2 many times until convergence.

It can be shown that <sup>Φ</sup><sup>b</sup> ð Þ �; <sup>γ</sup><sup>b</sup> shares the same asymptotic normality as the Oracle weighted

Xt and let data decide the value of γ. Then model (7) can be extended as

st = ut. Then an error-correction

ð Þ zt�<sup>d</sup> xt�<sup>i</sup> þ εt, (7)

: (9)

n p -consistent.

relationship among a set of first-order integrated variables [21].

istic matrix θ of rank r and a stationary process ut such that θ<sup>T</sup>

yt <sup>¼</sup> c zð Þþ <sup>t</sup>�<sup>d</sup> <sup>γ</sup>ð Þ zt�<sup>d</sup> ut�<sup>1</sup> <sup>þ</sup><sup>X</sup>

accuracy of forecasting.

98 Time Series Analysis and Applications

form of model (5) is

Now set zt = γ<sup>T</sup>

In this section, we consider variable selection of model (6). Increasing the lags p and q will necessarily reduce the sum of squared errors. However, doing so will increase the burden of coefficient estimation and may also lead to overfitting. Hence, for the multivariate functionalcoefficient model, order selection is of much importance.

Two widely used model selection criteria are Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). However, these stepwise methods yield heavy burden on computation and furthermore bring difficulty in establishing asymptotics for the estimation of selected models. The problems become more severe for high-dimensional data. Various regularization methods have been proposed to deal with these problems. Among them, a popular approach, called LASSO, proposed by Tibshirani, performs variable selection and parameter estimation simultaneously. See Ref. [22]. For univariate varying-coefficient regression models with i.i.d. data, Wang and Xia [23] developed a shrinkage estimation method by combining the idea of group LASSO [24] and kernel smoothing. In the following we develop a shrinkage estimation method for multivariate functional-coefficient model (6):

$$y\_t = \Phi(z\_{t-d})X\_t + \varepsilon\_{t\nu}$$

where the functional-coefficient matrix Φ(z)=(c(z), φ1(z), … ,φp(z), β1(z), … , βq(z)). Since each column of Φ(�) corresponds to the effect of a component of Xt, for variable selection of Xt we should penalize each column of Φ(�) as a whole. This leads to minimizing

$$Q\_{\lambda}(\mathcal{O}) = \sum\_{i=s'+1}^{n} \sum\_{t=s'+1}^{n} \|y\_t - \mathcal{O}(z\_{i-d})X\_t\|^2 K \{h^{-1}(z\_{t-d} - z\_{i-d})\} + \sum\_{j=1}^{p+q+1} \lambda\_j \|\mathcal{O}\_j\|\_{\star} \tag{10}$$

where Φ<sup>j</sup> = (Φj(zs 0 + 1 � <sup>d</sup>), …, Φj(zn � <sup>d</sup>)) with Φj(�) being the jth column of Φ(�), λj's are tuning parameters, and for any matrix A we use ∥A∥ to denote the Hilbert-Schmidt norm of matrix. It is interesting to establish model selection consistency and the oracle property of the shrinkage estimation.

### 4. Multivariate time-varying coefficient models

Parallel to functional-coefficient model (5), it is natural to consider its alternative with timevarying coefficients [25]:

$$y\_t = \mathbf{c}(t/T) + \sum\_{i=1}^p \phi\_i(\mathbf{t}/T) y\_{t-i} + \sum\_{i=1}^q \beta\_i(\mathbf{t}/T) \mathbf{x}\_{t-i} + \varepsilon\_t, t = 1, \dots, T,\tag{11}$$

where yt is a k � 1 vector, xt is a v � 1 vector, cis a ð Þ� k � 1 vector, ϕi(�) are k� k smooth matrices and βi(�) are k� v smooth matrices. The innovation satisfies the same conditions as model (5). It is known that as time involves the economic conditions change slowly and smoothly. Model (11) reflects this smoothing change by allowing the coefficients being smoothing functions of time.

Let

$$\Phi(\mathbf{t}/T) = \left( \mathbf{c}(t/T), \phi\_1(\mathbf{t}/T), \dots, \phi\_p(\mathbf{t}/T), \beta\_1(\mathbf{t}/T), \dots, \beta\_q(\mathbf{t}/T) \right).$$

Using similar arguments to model (6), we can rewrite model (11) as

$$y\_t = \Phi(t/T)X\_t + \varepsilon\_{t\prime}t = 1, \dots, T,\tag{12}$$

where Φ(�) is a k � m matrix and Xt is the same as in model (6). By the Taylor expansion, for any t in the neighborhood of t0∈(0, T), we have

$$
\mathfrak{Q}(t/T) \simeq \mathfrak{Q}(t\_0/T) + \mathfrak{Q}'(t\_0/T)((t - t\_0)/T) \equiv P + \mathbb{Q}((t - t\_0)/T).
$$

Running the local linear smoother for model (12), we minimize

$$\sum\_{t=s+1}^{T} \left\| y\_t - P\mathbf{X}\_t - Q\mathbf{X}\_t((t-t\_0)/T) \right\|^2 \mathbf{K}\_h(t-t\_0) \tag{13}$$

over P and Q, where s = max(p, q) and Kh(x) = h�<sup>1</sup> K(x/hT). Then it is straightforward to obtain an explicit form of the minimizer, <sup>P</sup>b; <sup>Q</sup><sup>b</sup> � �, for the above optimization problem,

$$
\begin{pmatrix}
\text{vec}\begin{pmatrix}\hat{\mathbf{P}}\\ \hat{\mathbf{P}}\\ \text{vec}\begin{pmatrix}h\hat{\mathbf{Q}}\end{pmatrix}\end{pmatrix} = \begin{pmatrix}\mathcal{S}\_{T0} & \mathcal{S}\_{T1}\\ \mathcal{S}\_{T1} & \mathcal{S}\_{T2}\end{pmatrix}^{-1} \begin{pmatrix}\sum\_{t=s+1}^{T}(\mathcal{X}\_{t}\otimes I\_{k})\mathcal{Y}\_{t}\mathcal{K}\_{h}(t-t\_{0})\\\sum\_{t=s+1}^{T}(\mathcal{X}\_{t}\otimes I\_{k})\mathcal{Y}\_{t}\mathcal{K}\_{h}^{(1)}(t-t\_{0})\\\end{pmatrix}\end{pmatrix}\tag{14}
$$

where STi <sup>¼</sup> <sup>P</sup> T t¼sþ1 XtXt <sup>0</sup> ð Þ ⊗ IkKh ð Þ<sup>i</sup> ð Þ <sup>t</sup> � <sup>t</sup><sup>0</sup> and Kh (i) (t � t0)=(Th) �i (t � t0) i Kh(t � t0), for i = 0 , 1 , 2.

Define M = E[(XtXt T ) ⊗ Ik] and N = E[(XtXt T ) ⊗ (σ<sup>t</sup> ∗ ) 2 ]. Let μ<sup>i</sup> = Ð ui K(u)du, vi = Ð ui K2 (u)du,

$$\mathcal{U} = \begin{pmatrix} \mu\_0 & \mu\_1 \\ \mu\_1 & \mu\_2 \end{pmatrix}, \mathcal{V} = \begin{pmatrix} \upsilon\_0 & \upsilon\_1 \\ \upsilon\_1 & \upsilon\_2 \end{pmatrix}.$$

Using similar arguments to [4], we can show that this estimator is asymptotically normal with mean zero and variance Σ, where Σ = (U�<sup>1</sup> VU�<sup>1</sup> ) ⊗ (M�<sup>1</sup> NM�<sup>1</sup> ).

#### 4.1. Generalized likelihood ratio tests

The multivariate time-varying coefficient regression model is flexible and powerful to estimate the dynamic changes of coefficients. After fitting a given dataset, some important questions arise, for example, whether the coefficient functions are actually constant or of some particular forms? This leads to statistical hypothesis testing. To answer these questions, we develop generalized likelihood ratio statistics to test corresponding hypothesis testing problems about the coefficient functions [26].

For the multivariate time-varying coefficient model (12), assume Σ<sup>0</sup> �1/2ε<sup>t</sup> has mean zero and covariance matrix Ik with Σ<sup>0</sup> being a symmetric positive-definite constant matrix.

Consider the following hypothesis testing problem

Let

100 Time Series Analysis and Applications

Φð Þ¼ t=T c tð Þ =T ; φ1ð Þ t=T ;…; φpð Þ t=T ; β1ð Þ t=T ;…; βqð Þ t=T

where Φ(�) is a k � m matrix and Xt is the same as in model (6). By the Taylor expansion, for any

<sup>∥</sup>yt � PXt � QXtð Þ ð Þ <sup>t</sup> � <sup>t</sup><sup>0</sup> <sup>=</sup><sup>T</sup> <sup>∥</sup><sup>2</sup>

Using similar arguments to model (6), we can rewrite model (11) as

t in the neighborhood of t0∈(0, T), we have

Φð Þ t=T ≈ Φð Þþ t0=T Φ<sup>0</sup>

X T

t¼sþ1

1

) ⊗ Ik] and N = E[(XtXt

CA <sup>¼</sup> ST<sup>0</sup> ST<sup>1</sup> ST<sup>1</sup> ST<sup>2</sup> � ��<sup>1</sup>

ð Þ<sup>i</sup> ð Þ <sup>t</sup> � <sup>t</sup><sup>0</sup> and Kh

<sup>U</sup> <sup>¼</sup> <sup>μ</sup><sup>0</sup> <sup>μ</sup><sup>1</sup> μ<sup>1</sup> μ<sup>2</sup> � �

T ) ⊗ (σ<sup>t</sup> ∗ ) 2

over P and Q, where s = max(p, q) and Kh(x) = h�<sup>1</sup>

explicit form of the minimizer, <sup>P</sup>b; <sup>Q</sup><sup>b</sup> � �

0 B@

T t¼sþ1

where STi <sup>¼</sup> <sup>P</sup>

Define M = E[(XtXt

vec Pb � �

vec hQ<sup>b</sup> � �

XtXt <sup>0</sup> ð Þ ⊗ IkKh

mean zero and variance Σ, where Σ = (U�<sup>1</sup>

4.1. Generalized likelihood ratio tests

T

Running the local linear smoother for model (12), we minimize

� �

yt ¼ Φð Þ t=T Xt þ εt, t ¼ 1, …,T, (12)

ð Þ t0=T ð Þ� ð Þ t � t<sup>0</sup> =T P þ Q t ð Þ ð Þ � t<sup>0</sup> =T :

, for the above optimization problem,

ð Þ Xt ⊗ Ik yt

ð Þ Xt ⊗ Ik yt

X T

0

BBBBB@

(i)

Using similar arguments to [4], we can show that this estimator is asymptotically normal with

The multivariate time-varying coefficient regression model is flexible and powerful to estimate the dynamic changes of coefficients. After fitting a given dataset, some important questions arise, for example, whether the coefficient functions are actually constant or of some particular forms? This leads to statistical hypothesis testing. To answer these questions, we develop

VU�<sup>1</sup>

t¼sþ1

X T

t¼sþ1

(t � t0)=(Th)

]. Let μ<sup>i</sup> = Ð

NM�<sup>1</sup> ).

, V <sup>¼</sup> <sup>v</sup><sup>0</sup> <sup>v</sup><sup>1</sup> v<sup>1</sup> v<sup>2</sup> � �

) ⊗ (M�<sup>1</sup>

:

Khð Þ t � t<sup>0</sup> (13)

1

CCCCCA

(14)

,

ui K2 (u)du,

Kh(t � t0), for i = 0 , 1 , 2.

K(x/hT). Then it is straightforward to obtain an

Khð Þ t � t<sup>0</sup>

ð Þ<sup>1</sup> ð Þ <sup>t</sup> � <sup>t</sup><sup>0</sup>

K(u)du, vi = Ð

Kh

�i (t � t0) i

ui

$$H\_0: \Phi(t/T) \in \Theta\_0(t/T) \leftrightarrow H\_a: \Phi(t/T) \notin \Theta\_0(t/T),\tag{15}$$

where Θ0(t/T) is some known constant matrix Φ<sup>0</sup> or a set of functionalmatrices. Let Φb ð Þ t=T denote the nonparametric estimator of Φ, and let Φb <sup>0</sup>ð Þ t=T denote the true or estimated value of coefficients under the null hypothesis. Following Fan et al. [26] and Fan and Jiang [27], we define a generalized likelihood ratio statistic for testing problem (15):

$$
\lambda r = \frac{T}{2} \log \left( \frac{RSS\_0 - RSS\_d}{RSS\_d} \right),
\tag{16}
$$

where RSS<sup>0</sup> <sup>¼</sup> <sup>P</sup><sup>T</sup> <sup>t</sup>¼<sup>1</sup> yt � <sup>Φ</sup><sup>b</sup> <sup>0</sup>ð Þ <sup>t</sup>=<sup>T</sup> Xt � �<sup>T</sup> <sup>Σ</sup>�<sup>1</sup> yt � <sup>Φ</sup><sup>b</sup> <sup>0</sup>ð Þ <sup>t</sup>=<sup>T</sup> Xt � �<sup>T</sup> , and RSSa <sup>¼</sup> <sup>P</sup><sup>T</sup> <sup>t</sup>¼<sup>1</sup> yt � <sup>Φ</sup><sup>b</sup> ð Þ <sup>t</sup>=<sup>T</sup> � XtÞ <sup>T</sup>Σ�<sup>1</sup> yt � <sup>Φ</sup><sup>b</sup> ð Þ <sup>t</sup>=<sup>T</sup> Xt � � with <sup>Σ</sup> being a known constant covariance matrix from a working model. It is meaningful to study the asymptotic distributions of the test statistic under the null and alternatives.

In the following example, we consider the case when Θ0(.) is a known constant. For any u = t/ T∈(0, 1), if we rewrite matrix Φ(u) as a vector, Δ(u) � vec(Φ1(u), …, Φm(u)), and denote <sup>Δ</sup>0(u) � vec(Φ01<sup>∗</sup> (u), … , Φ0M<sup>∗</sup> (u)), then the power of the test is evaluated against alternatives:

$$H\_{\mathfrak{u}}: \Delta(\mathfrak{u}) = \Delta\_0(\mathfrak{u}) + \frac{1}{\sqrt{T\mathfrak{h}}} \mathcal{G}(\mathfrak{u}),\tag{17}$$

where G(u)=(g1(u), …, gm(u))<sup>T</sup> is a vector of functions.

Example 1. To investigate the performance of the proposed generalized likelihood ratio test, 600 replications for each of sample sizes T = 200, T = 400 and T = 800 from the multivariate timevarying coefficient model were generated:

$$y\_t = \Phi(t/T)X\_t + \varepsilon\_{t\prime}t = 1, \dots, T$$

where k = 2, v = p = q = 1, Δ= vec(0.5, 0.0074, 0.08, 0.65, 0.25, 0.75)<sup>T</sup> . We set the initial values x<sup>1</sup> = 0 and y<sup>1</sup> = (0.15, 0.2). Accordingly, Xt = vec(y1 , <sup>t</sup> � 1, y2 , <sup>t</sup> � 1, xt � 1) for t =2, … , T. Three distributions of the error term are considered: bivariate normal, bivariate log-normal, and bivariate t(5), each with variance matrix <sup>Σ</sup> <sup>¼</sup> 1 0:<sup>5</sup> <sup>0</sup>:5 1 � �. According to alternative (17), the power of the test is

evaluated for a sequence of alternatives index by θ:

Figure 1. The power curves for Example 1. Significance level is 5%.

$$H\_{\theta}: \Delta\_{\theta} = (0.5, 0.0075, 0.08, 0.65, 0.25, 0.75)^{T} + \frac{\theta}{\sqrt{\text{Th}}} G(t/T), \tag{18}$$

where G tð Þ¼ <sup>=</sup><sup>T</sup> sin ffiffiffi 2 <sup>p</sup> <sup>π</sup>=<sup>T</sup> � �; �0:09 cos ð Þ <sup>π</sup>t=<sup>T</sup> ; <sup>0</sup>:16 sin ffiffiffi 3 <sup>p</sup> <sup>π</sup>=<sup>T</sup> � �; <sup>0</sup>:8 sin ffiffiffi 2 <sup>p</sup> <sup>π</sup>=<sup>T</sup> � �; <sup>0</sup>:3 sin ð Þ <sup>t</sup>=<sup>T</sup> ; �� cos ffiffiffiffiffiffi <sup>1</sup>:<sup>5</sup> <sup>p</sup> <sup>π</sup>t=<sup>T</sup> � �<sup>Þ</sup> <sup>T</sup> and θ = 0 , 0.2 , 0.4 , 0.6 , 0.8 , 1. The power function is estimated by the relative rejection frequency of H0 in the above replicates.

The significance level is set to be 5%, and the critical values in simulations are calculated similarly by using the conditional bootstrap method in Ref. [26] for each given θ value. Detail of this method is as follows:

Step 1. Compute the estimators of the coefficient Φb ð Þ t=T under both the null and the alternative by setting the optimal bandwidth as the estimated value bhopt.

Step 2. Compute the test statistic λT(H0) and the residuals {et} from the alternative model.

Step 3. For each given Xt, draw a bootstrap residual et <sup>∗</sup> from the centered empirical distribution of et and compute yt <sup>∗</sup> <sup>¼</sup> <sup>Φ</sup><sup>b</sup> ð Þ <sup>t</sup>=<sup>T</sup> Xt <sup>þ</sup> et <sup>∗</sup>. This forms a conditional bootstrap sample Xt; yt <sup>∗</sup> � �<sup>T</sup> t¼1.

Step 4. Compute the test statistic λ<sup>T</sup> ∗ (H0) using the bootstrap sample constructed in Step 3.

Step 5. Repeat Step 3 and Step 4 to get a sample of the test statistic λ<sup>T</sup> ∗ (H0). The critical values at significance level α are calculated by the 100(1 � α)th percentile of the sample.

Figure 1 displays the power curves in difference scenarios. We can tell from Figure 1 that the patterns of power curves look like half of an inverted normal density. All the curves rise monotonically from a height equal to the significance level of 5% until eventually it reaches its maximum height of around 90%. It is evident from Figure 1 that the test is powerful for all three different distributions of error terms. Moreover, the test becomes more powerful as sample size increases. These indicate that the proposed test keeps the size and is powerful for distinguishing the difference between the null and the alternative.
