**2. Multivariate regression model**

In this section, we consider a multivariate regression model on *p* response variables and *k* explanatory variables denoted by *y* = (*y*1, …, *yp*)′ and *x* = (*x*1, …, *xk*)′, respectively. Suppose that we have the observation matrices given by (1.1). A multivariate regression model is given by

$$\mathbf{Y} = \mathbf{X}\boldsymbol{\Theta} + \mathbf{E},\tag{2.1}$$

where **Θ** is a *k* × *p* unknown parameter matrix. It is assumed that the rows of the error matrix **E** are independently distributed as a *p* variate normal distribution with mean zero and unknown covariance matrix **Σ,** i.e., Np(0, **Σ**).

Let *L*(Θ, Σ) be the density function or the likelihood function. Then, we have

$$-2\log L(\boldsymbol{\Theta}, \boldsymbol{\Sigma}) = n \log |\boldsymbol{\Sigma}| + \text{tr}\, \boldsymbol{\Sigma}^{-1} (\boldsymbol{\Upsilon} - \mathbf{X}\boldsymbol{\Theta})^\prime (\boldsymbol{\Upsilon} - \mathbf{X}\boldsymbol{\Theta}) + np \log (2\pi).$$

The maximum likelihood estimators (MLE) **Θ** ^ and **Σ** ^ of **Θ** and **Σ** are defined by the maximizers of *L*(Θ, Σ) or equivalently the minimizers of −2log *L*(Θ, Σ).

Theorem 2.1 *Suppose that* **Y** *follows the multivariate regression model in* (2.1)*. Then, the* MLE*s of* Θ *and* Σ *are given as*

$$\begin{aligned} \bar{\boldsymbol{\Theta}} &= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}, \\ \hat{\mathbf{E}} &= \frac{1}{n}(\mathbf{Y} - \mathbf{X}\hat{\boldsymbol{\Theta}})'(\mathbf{Y} - \mathbf{X}\hat{\boldsymbol{\Theta}}) = \frac{1}{n}\mathbf{Y}'(\mathbf{I}\_n - \mathbf{P}\_\mathbf{X})\mathbf{Y}, \end{aligned}$$

where **PX** = **X**(**X**′**X**) − 1**X**′. Further, it holds that

E( ) = = ( , , ). **Y** h h<sup>1</sup> K h

where *Ω* is a given subspace in the *n* dimensional Euclid space R*<sup>n</sup>*. A typical *Ω* is given by

Here, ℛ[**X**] is the space spanned by the column vectors of **X**. A general theory for statistical inference on the regression parameter Θ can be seen in texts on multivariate analysis, e.g., see [1–8]. In this chapter, we discuss with algebraic approach in multivariate linear model.

and *Ω* = ℛ[**X**]. The maximum likelihood estimator (MLE)s and likelihood ratio criterion (LRC) for Θ<sup>2</sup> =O are derived by using projection matrices. Here, Θ=(Θ<sup>1</sup> Θ2). The distribution of LRC is discussed by multivariate Cochran theorem. It is pointed out that projection matrices play an important role. In Section 3, we give a summary of projection matrices. In Section 4, we consider to test an additional information hypothesis of *y*2 in the presence of *y*1, where *y*1 = (*y*1. …, *yq*)′ and *y*2 = (*yq* + 1. …, *yp*)′. In Section 5, we consider testing problems in discriminant analysis. Section 6 deals with a generalized multivariate linear model which is also called the

In this section, we consider a multivariate regression model on *p* response variables and *k* explanatory variables denoted by *y* = (*y*1, …, *yp*)′ and *x* = (*x*1, …, *xk*)′, respectively. Suppose that we have the observation matrices given by (1.1). A multivariate regression model is given by

where **Θ** is a *k* × *p* unknown parameter matrix. It is assumed that the rows of the error matrix **E** are independently distributed as a *p* variate normal distribution with mean zero and

Let *L*(Θ, Σ) be the density function or the likelihood function. Then, we have

 q

= [ ] = { = ; ( , , ) , < < , = 1, , }. <sup>1</sup> W = -¥ ¥ K K ¢ *k i* R **X X**

qq

A multivariate linear model is defined by requiring that

140 Applied Linear Algebra in Action

h q q

In Section 2, we consider a multivariate regression model in which *xi*

growth curve model. Some related problems are discussed in Section 7.

**2. Multivariate regression model**

unknown covariance matrix **Σ,** i.e., Np(0, **Σ**).

*<sup>p</sup>* (1.2)

*i k* (1.4)

*s* are explanatory variables

'

**Y E** = , **X**Q+ (2.1)

Î =¼ Ω for all 1, , , *<sup>i</sup> η i p* (1.3)

$$-2\log L(\hat{\Theta}, \hat{\mathbf{E}}) = n \log |\hat{\mathbf{E}}| + np \left\{ \log(2\pi) + 1 \right\}.$$

Theorem 2.1 can be shown by a linear algebraic method, which is discussed in the next section. Note that **PX** is the projection matrix on the range space Ω=ℛ **X** . It is symmetric and idempo‐ tent, i.e.

$$\mathbf{P}'\_{\mathbf{x}} = \mathbf{P}\_{\mathbf{x}^\*} \qquad \mathbf{P}^2\_{\mathbf{x}} = \mathbf{P}\_{\mathbf{x}^\*}$$

Next, we consider to test the hypothesis

$$H: \mathbf{E}(\mathbf{V}) = \mathbf{X}\_1 \boldsymbol{\Theta}\_1 \quad \Leftrightarrow \quad \boldsymbol{\Theta}\_2 = \mathbf{O},\tag{2.2}$$

against *K*; **Θ**<sup>2</sup> ≠O, where **X** = (**X**<sup>1</sup> **X**2), **X**1; *n* × *j* and **Θ**=(**Θ<sup>1</sup> ′ <sup>Θ</sup><sup>2</sup> ′** ), **Θ**1; *j* × *p*. The hypothesis means that the last *k* − *j* dimensional variate *x*2 = (*xj* + 1, …, *xk*)′ has no additional information in the presence of the first *j* variate *x*1 = (*x*1, …, *xj* )′. In general, the likelihood ratio criterion (LRC) is defined by

$$\lambda = \frac{\max\_{H} L(\boldsymbol{\Theta}, \mathbf{Z})}{\max\_{K} L(\boldsymbol{\Theta}, \mathbf{Z})}. \tag{2.3}$$

Then we can express

$$\begin{split}-2\log\mathcal{A} &= \min\_{H} \left\{-2\log L(\boldsymbol{\Theta},\,\mathbf{E})\right\} - \min\_{\kappa} \left\{-2\log L(\boldsymbol{\Theta},\,\mathbf{E})\right\} \\ &= \min\_{H} \left\{n\log|\mathbf{E}| + \operatorname{tr}(\mathbf{Y} - \mathbf{X}\boldsymbol{\Theta})(\mathbf{Y} - \mathbf{X}\boldsymbol{\Theta})\right\} \\ &- \min\_{\kappa} \left\{n\log|\mathbf{E}| + \operatorname{tr}(\mathbf{Y} - \mathbf{X}\boldsymbol{\Theta})^{\prime}(\mathbf{Y} - \mathbf{X}\boldsymbol{\Theta})\right\}.\end{split}$$

Using Theorem 2.1, we can expressed as

$$\mathcal{A}^{2/n} \equiv \mathbf{A} = \frac{|m\hat{\mathbf{E}}\_{\alpha}|}{|m\hat{\mathbf{E}}\_{\alpha}|}.$$

Here, **Σ** ^ <sup>Ω</sup> and **Σ** ^ *<sup>ω</sup>* are the maximum likelihood estimators of Σ under the model (2.1) or *K* and *H*, respectively, which are given by

$$\begin{split} \eta \hat{\mathbf{E}}\_{\alpha} &= (\mathbf{Y} - \mathbf{X}\hat{\boldsymbol{\Theta}}\_{\alpha})(\mathbf{Y} - \mathbf{X}\hat{\boldsymbol{\Theta}}\_{\alpha}), \quad \hat{\mathbf{\Theta}}\_{\alpha} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y} \\ &= \mathbf{Y}'(\mathbf{I}\_{\alpha} - \mathbf{P}\_{\alpha})\mathbf{Y} \end{split} \tag{2.4}$$

and

$$\begin{split} n\hat{\mathbf{E}}\_{o} &= (\mathbf{Y} - \mathbf{X}\_{\text{l}}\hat{\boldsymbol{\Theta}}\_{\text{l}o}) \prime (\mathbf{Y} - \mathbf{X}\_{\text{l}}\hat{\boldsymbol{\Theta}}\_{\text{l}o}), \ \hat{\boldsymbol{\Theta}}\_{\text{l}o} = (\mathbf{X}\_{\text{l}}^{\prime}\mathbf{X}\_{\text{l}})^{-1}\mathbf{X}\_{\text{l}}^{\prime}\mathbf{Y} \\ &= \mathbf{Y}^{\prime} (\mathbf{I}\_{\boldsymbol{\imath}} - \mathbf{P}\_{o}) \mathbf{Y} \end{split} \tag{2.5}$$

Summarizing these results, we have the following theorem.

**Theorem 2.2***Let λ* = *Λ<sup>n</sup>*/2 *be the* LRC *for testing H in* (2.2)*. Then, Λ is expressed as*

$$\Lambda = \frac{|\mathbf{S}\_s|}{|\mathbf{S}\_s + \mathbf{S}\_h|},\tag{2.6}$$

*where*

$$\mathbf{S}\_{\circ} = n \hat{\mathbf{E}}\_{\circ \text{\textquotedblleft}}, \quad \mathbf{S}\_{\circ} = n \hat{\mathbf{E}}\_{\circ \text{\textquotedblright}} - n \hat{\mathbf{E}}\_{\circ \text{\textquotedblleft}} \tag{2.7}$$

*and***S***Ωand***S***ωare given by* (*2.4*) *and* (*2.5*)*, respectively.*

The matrices **S***e* and **S***h* in the testing problem are called the sums of squares and products (SSP) matrices due to the error and the hypothesis, respectively. We consider the distribution of *Λ*. If a *p* × *p* random matrix **W** is expressed as

$$\mathbf{W} = \sum\_{j=1}^{n} \mathbf{z}\_{j} \mathbf{z}\_{j}^{\prime}$$

where **z** *<sup>j</sup>* ∼ Np(**μ** *<sup>j</sup>* , **Σ**) and *z*1, …, *z<sup>n</sup>* are independent, **W** is said to have a noncentral Wishart distribution with *n* degrees of freedom, covariance matrix **Σ,** and noncentrality matrix **Δ**=**μ**1**μ**<sup>1</sup> *′* <sup>+</sup> <sup>⋯</sup> <sup>+</sup> **<sup>μ</sup>***n***μ***<sup>n</sup> ′* . We write that **W**∼W*p*(*n*, **<sup>Σ</sup>**;**Δ**). In the special case **Δ**=**O, <sup>W</sup>** is said to have a Wishart distribution, denoted by **W**∼W*p*(*n*, **Σ**).

**Theorem 2.3** (multivariate Cochran theorem) *Let* **Y**=(**y**1, …, **y***n*)′ *, where* **y***<sup>i</sup>* ∼ N*p*(**μ***<sup>i</sup>* , **Σ**)*, i* = 1, …, *n and y*1, …, *y<sup>n</sup> are independent. Let* **A***,* **A**1*, and* **A**<sup>2</sup> *be n* × *n symmetric matrices. Then:*

1. **Y′ <sup>A</sup>Y**<sup>∼</sup> <sup>W</sup>*p*(*k*, **<sup>Σ</sup>**; **<sup>Ω</sup>**)⇔**A**<sup>2</sup> <sup>=</sup>**A**, tr**A**=*k*, **<sup>Ω</sup>**<sup>=</sup> E(**Y**)′ **A**E(**Y**).

2. **Y′ <sup>A</sup>**1**<sup>Y</sup>** *and* **<sup>Y</sup>′ A**2**Y** *are independent* ⇔ **A**1**A**2 = **O***.*

{ } { }

**X X X X**

Q S Q S

*L L*

{ } { }

+- - ¢

S Q Q S Q Q **Y Y Y Y**

min log | | tr( ) ( ) .

2/ <sup>ˆ</sup> | | = . <sup>ˆ</sup> | | *<sup>n</sup> n*

<sup>W</sup> <sup>º</sup> <sup>S</sup> <sup>L</sup>

*n* w

S

*<sup>ω</sup>* are the maximum likelihood estimators of Σ under the model (2.1) or *K* and

**Y Y** (2.4)

1

**S S** (2.6)


**<sup>Y</sup>** (2.5)

2log = min 2log ( , ) 2log ( , ) min = min log | | tr( ) ( )


*H K*


l

<sup>1</sup> ˆ = ( ) ( ), ( ) ˆ ˆˆ

S **Y Y** Q Q Q **Y**

1 1 1 1 1 11 1 <sup>ˆ</sup> = ( ) ( ), = ( ) ˆ ˆˆ

S **Y Y** Q Q Q **Y**


=, = , ˆ ˆˆ **S S** *e h n nn* S W W S

w

The matrices **S***e* and **S***h* in the testing problem are called the sums of squares and products (SSP) matrices due to the error and the hypothesis, respectively. We consider the distribution

+ **S**

 ww


**X X XX X**


**X X XX X**

*n* - W W WW W

*H K*

=( ) *<sup>n</sup>*

**I P**

 w

w

**Theorem 2.2***Let λ* = *Λ<sup>n</sup>*/2 *be the* LRC *for testing H in* (2.2)*. Then, Λ is expressed as*

L

**I PY**

¢ -

=( ) *<sup>n</sup>*

Summarizing these results, we have the following theorem.

*and***S***Ωand***S***ωare given by* (*2.4*) *and* (*2.5*)*, respectively.*

of *Λ*. If a *p* × *p* random matrix **W** is expressed as

¢ -

l

Using Theorem 2.1, we can expressed as

Here, **Σ** ^

and

*where*

<sup>Ω</sup> and **Σ** ^

142 Applied Linear Algebra in Action

*H*, respectively, which are given by

*n* w *n n*

> For a proof of multivariate Cochran theorem, see, e.g. [3, 6–8]. Let **B** and **W** be independent random matrices following the Wishart distribution W*p*(*q*, Σ) and W*p*(*n*, **Σ**), respectively, with *n* ≥ *p*. Then, the distribution of

$$\Lambda = \frac{|\mathbf{W}|}{|\mathbf{B} + \mathbf{W}|}$$

is said to be the *p*-dimensional Lambda distribution with (*q*, *n*)-degrees of freedom and is denoted by *Λp*(*q*, *n*). For distributional results of *Λp*(*q*, *n*), see [1, 3].

By using multivariate Cochran's theorem, we have the following distributional results:

**Theorem 2.4***Let***S***eand***S***<sup>h</sup> be the random matrices in* (2.7)*. Let Λ be the Λ-statistic defined by* (2.6)*. Then,*

**1. S***eand***S***<sup>h</sup> are independently distributed as a Wishart distribution*W*p*(*n* −*k*, **Σ**)*and a noncentral Wishart distribution*W*p*(*k* − *j*, **Σ**;**Δ**),*respectively, where*

$$\mathbf{A} = (\mathbf{X}\boldsymbol{\Theta})'(\mathbf{P}\_{\mathbf{x}} - \mathbf{P}\_{\mathbf{x}\_{\parallel}})\mathbf{X}\boldsymbol{\Theta}.\tag{2.8}$$

**2.** *Under H, the statistic Λ is distributed as a lambda distribution Λp*(*k* − *j*, *n* − *k*)*.*

*Proof.* Note that **P***Ω* = **PX** = **X**(**X**′**X**)− 1**X**′, **P***<sup>ω</sup>* <sup>=</sup>**PX**<sup>1</sup> =**X**1(**X**<sup>1</sup> **′ <sup>X</sup>**1)−<sup>1</sup> **X,′** and **P***Ω***P***ω* = **P***ω***P***Ω*. By multivariate Cochran's theorem the first result (1) follows by checking that

$$\begin{aligned} \left( (\mathbf{I}\_{\mathfrak{u}} - \mathbf{P}\_{\Omega}) \right)^{2} &= (\mathbf{I}\_{\mathfrak{u}} - \mathbf{P}\_{\Omega}), \ (\mathbf{P}\_{\Omega} - \mathbf{P}\_{\boldsymbol{\alpha}})^{2} = (\mathbf{P}\_{\Omega} - \mathbf{P}\_{\boldsymbol{\alpha}}), \\ (\mathbf{I}\_{\mathfrak{u}} - \mathbf{P}\_{\Omega})(\mathbf{P}\_{\Omega} - \mathbf{P}\_{\boldsymbol{\alpha}}) &= \mathbf{O}. \end{aligned}$$

The second result (2) follows by showing that **Δ**<sup>0</sup> =**O,** where **Δ**<sup>0</sup> is the **Δ** under *H*. This is seen that

$$\mathbf{A}\_{o} = (\mathbf{X}\_{\uparrow}\boldsymbol{\Theta}\_{\downarrow})'(\mathbf{P}\_{\boldsymbol{\Omega}} - \mathbf{P}\_{o})(\mathbf{X}\_{\uparrow}\boldsymbol{\Theta}\_{\downarrow}) = \mathbf{O}\_{o}$$

since **P***Ω***X**1 = **P***ω***X**1 = **X**1.

The matrices **S***e* and **S***h* in (2.7) are defined in terms of *n* × *n* matrices **P***Ω* and **P***ω*. It is important to give expressions useful for their numerical computations. We have the following expres‐ sions:

$$\mathbf{S}\_{\iota} = \mathbf{Y}'\mathbf{Y} - \mathbf{Y}'\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}, \quad \mathbf{S}\_{\iota} = \mathbf{Y}'\mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y} - \mathbf{Y}'\mathbf{X}\_{\iota}(\mathbf{X}\_{\iota}'\mathbf{X}\_{\iota})^{-1}\mathbf{X}\_{\iota}'\mathbf{Y}.$$

Suppose that *x*<sup>1</sup> is 1 for all subjects, i.e., *x*<sup>1</sup> is an intercept term. Then, we can express these in terms of the SSP matrix of (*y*', *x*')′ defined by

$$\mathbf{S} = \sum\_{i=1}^{n} \begin{pmatrix} \mathbf{y}\_{i} - \overline{\mathbf{y}} \\ \mathbf{x}\_{i} - \overline{\mathbf{x}} \end{pmatrix} \begin{pmatrix} \mathbf{y}\_{i} - \overline{\mathbf{y}} \\ \mathbf{x}\_{i} - \overline{\mathbf{x}} \end{pmatrix} = \begin{pmatrix} \mathbf{S}\_{yy} & \mathbf{S}\_{yx} \\ \mathbf{S}\_{xy} & \mathbf{S}\_{yx} \end{pmatrix}, \tag{2.9}$$

where **y**¯ and **x**¯ are the sample mean vectors. Along the partition of **x**=(**x**<sup>1</sup> **′** , **x**<sup>2</sup> **′** )′, we partition **S** as

$$\mathbf{S} = \begin{pmatrix} \mathbf{S}\_{\gg} & \mathbf{S}\_{\gg} & \mathbf{S}\_{\gg} \\ \mathbf{S}\_{\gg} & \mathbf{S}\_{\ll} & \mathbf{S}\_{\ll} \\ \mathbf{S}\_{\succ} & \mathbf{S}\_{21} & \mathbf{S}\_{22} \end{pmatrix}. \tag{2.10}$$

Then,

$$\mathbf{S}\_{\boldsymbol{\omega}} = \mathbf{S}\_{\boldsymbol{\chi}\times\boldsymbol{\omega}}, \quad \mathbf{S}\_{\boldsymbol{\mu}} = \mathbf{S}\_{\boldsymbol{\chi}\times\boldsymbol{1}} \mathbf{S}\_{22\boldsymbol{\alpha}}^{-1} \mathbf{S}\_{2\boldsymbol{\gamma}\times\boldsymbol{1}}.\tag{2.11}$$

Here, we use the notation **S***yy*⋅*<sup>x</sup>* =**S***yy* −**S***yx***S***xx* −1 **S***x*y, **S***y*2⋅<sup>1</sup> =**S***y*<sup>2</sup> −**S***y*1**S**<sup>11</sup> −1 **S**1*<sup>y</sup>*, etc. These are derived in the next section by using projection matrices.

#### **3. Idempotent matrices and max-mini problems**

In the previous section, we have seen that idempotent matrices play an important role on statistical inference in multivariate regression model. In fact, letting E(**Y**) =**η**=(**η**1, …, **η,** *<sup>p</sup>*) consider a model satisfying

$$
\mathfrak{h}\_i \in \Omega = \mathcal{R}[\mathbf{R}], \quad \text{for all } i = 1, \ldots, p,\tag{3.1}
$$

Then the MLE of **Θ** is **Θ** ^ =(**X′ X**)−<sup>1</sup> **X′ Y,** and hence the MLE of *η* is denoted by

The second result (2) follows by showing that **Δ**<sup>0</sup> =**O,** where **Δ**<sup>0</sup> is the **Δ** under *H*. This is seen

0 11 = ( ) ( )( ) = , <sup>W</sup> w1 1 D **X P PX O** Q ¢ - Q

The matrices **S***e* and **S***h* in (2.7) are defined in terms of *n* × *n* matrices **P***Ω* and **P***ω*. It is important to give expressions useful for their numerical computations. We have the following expres‐

> *e h* = () , = () ( ) . 1 11 1 - -- **S YY Y Y S Y Y Y Y** ¢ ¢¢ ¢ ¢¢ ¢ ¢ ¢ ¢ - - **X XX X X XX X X XX X**

Suppose that *x*<sup>1</sup> is 1 for all subjects, i.e., *x*<sup>1</sup> is an intercept term. Then, we can express these in

1 2 1 11 12 2 21 22 = . *yy y y*

1

**S***x*y, **S***y*2⋅<sup>1</sup> =**S***y*<sup>2</sup> −**S***y*1**S**<sup>11</sup>


−1

æ ö ç ÷ ç ÷ ç ÷ è ø

**SSS SSSS SSS**

*e yy x h y* =, = . 2 1 22 1 2 1 *<sup>y</sup>*

−1

In the previous section, we have seen that idempotent matrices play an important role on statistical inference in multivariate regression model. In fact, letting E(**Y**) =**η**=(**η**1, …, **η,** *<sup>p</sup>*)

= =, *<sup>n</sup> i i yy yx i i i xy xx* æ öæ ö - - æ ö ç ÷ç ÷ ç ÷ è øè ø - - è ø <sup>å</sup> **S S <sup>S</sup> S S** *yyyy*

1 11

*xxxx* (2.9)

**′** , **x**<sup>2</sup> **′** )′,

we partition **S**

**S**1*<sup>y</sup>*, etc. These are derived

(2.10)

that

sions:

as

Then,

since **P***Ω***X**1 = **P***ω***X**1 = **X**1.

144 Applied Linear Algebra in Action

terms of the SSP matrix of (*y*', *x*')′ defined by

Here, we use the notation **S***yy*⋅*<sup>x</sup>* =**S***yy* −**S***yx***S***xx*

consider a model satisfying

in the next section by using projection matrices.

**3. Idempotent matrices and max-mini problems**

=1

where **y**¯ and **x**¯ are the sample mean vectors. Along the partition of **x**=(**x**<sup>1</sup>

*y y*

$$
\hat{\eta}\_{\alpha} = \mathbf{x} \hat{\mathbf{Q}} \mathbf{= P}\_{\alpha} \mathbf{Y}\_{+}
$$

Here, **P***Ω* = **X**(**X**′**X**) <sup>−</sup><sup>1</sup>**X**′. Further, the residual sums of squares and products (RSSP) matrix is expressed as

$$\mathbf{S}\_{\alpha} = (\mathbf{Y} - \hat{\boldsymbol{\eta}}\_{\alpha})'(\mathbf{Y} - \hat{\boldsymbol{\eta}}\_{\alpha}) = \mathbf{Y}'(\mathbf{I}\_{\boldsymbol{\eta}} - \mathbf{P}\_{\alpha})\mathbf{Y}.$$

Under the hypothesis (2.2), the spaces *η<sup>i</sup>* 's belong are the same and are given by *ω* = ℛ[**X**1]. Similarly, we have

$$\begin{aligned} \hat{\boldsymbol{\mathfrak{p}}}\_{\boldsymbol{\alpha}} &= \mathbf{X} \hat{\boldsymbol{\Theta}}\_{\boldsymbol{\alpha}} = \mathbf{P}\_{\boldsymbol{\alpha}} \mathbf{Y}, \\ \mathbf{S}\_{\boldsymbol{\alpha}} &= (\mathbf{Y} - \hat{\boldsymbol{\eta}}\_{\boldsymbol{\alpha}})' (\mathbf{Y} - \hat{\boldsymbol{\eta}}\_{\boldsymbol{\alpha}}) = \mathbf{Y}' (\mathbf{I}\_{\boldsymbol{\alpha}} - \mathbf{P}\_{\boldsymbol{\alpha}}) \mathbf{Y}, \end{aligned}$$

where **Θ** ^ *<sup>ω</sup>* =(**Θ** ^ 1*ω* ' **O**)′ and **Θ** ^ <sup>1</sup>*<sup>ω</sup>* =(**X**<sup>1</sup> **′ X**1 **′** )−1 **X**1 **′ Y**. The LR criterion is based on the following decom‐ position of SSP matrices;

$$\begin{split} \mathbf{S}\_{\boldsymbol{\alpha}} = \mathbf{Y}'(\mathbf{I}\_{\boldsymbol{n}} - \mathbf{P}\_{\boldsymbol{\alpha}})\mathbf{Y} &= \mathbf{Y}'(\mathbf{I}\_{\boldsymbol{n}} - \mathbf{P}\_{\boldsymbol{\alpha}})\mathbf{Y} + \mathbf{Y}'(\mathbf{P}\_{\boldsymbol{\alpha}} - \mathbf{P}\_{\boldsymbol{\alpha}})\mathbf{Y}' \\ &= \mathbf{S}\_{\boldsymbol{\alpha}} + \mathbf{S}\_{\boldsymbol{\alpha}}. \end{split}$$

The degrees of freedom in the *Λ* distribution *Λp*(*fh*, *fe*) are given by

$$f\_\circ = n - \dim[\Omega], \quad f\_\wedge = k - j = \dim[\Omega] - \dim[\alpha].$$

In general, an *n* × *n* matrix **P** is called idempotent if **P**<sup>2</sup> = **P**. A symmetric and idempotent matrix is called projection matrix. Let **R***<sup>n</sup>* be the *n* dimensional Euclid space, and *Ω* be a subspace in **R***<sup>n</sup>*. Then, any *n* × 1 vector **y** can be uniquely decomposed into direct sum, i.e.,

$$\mathbf{y} = \boldsymbol{\mu} + \boldsymbol{\nu}, \quad \boldsymbol{\mu} \in \Omega, \quad \mathbf{v} \in \Omega^{\perp}, \tag{3.2}$$

where *Ω*⊥ is the orthocomplement space. Using decomposition (3.2), consider a mapping

$$\mathbf{P}\_{\alpha}: \mathbf{y} \to \mathfrak{u}, \quad \text{i.e. } \mathbf{P}\_{\alpha}\mathbf{y} = \mathfrak{u}.$$

The mapping is linear, and hence it is expressed as a matrix. In this case, *u* is called the orthogonal projection of *y* into *Ω*, and **P***<sup>Ω</sup>* is also called the orthogonal projection matrix to *Ω*. Then, we have the following basic properties:

(P1) **P***Ω* is uniquely defined;

(P2) **I***n* − **P***Ω* is the projection matrix to *Ω*<sup>⊥</sup>;

(P3) **P***Ω* is a symmetric idempotent matrix;

(P4) ℛ[**P***Ω*] = *Ω*, and dim[*Ω*] = *tr***P***Ω*;

Let *ω* be a subset of *Ω*. Then, we have the following properties:

(P5) **P***Ω***P***ω* = **P***ω***P***Ω* = **P***ω*.

(P6) **P***<sup>Ω</sup>* <sup>−</sup>**P***<sup>ω</sup>* <sup>=</sup>**P***<sup>ω</sup>* ⊥∩Ω, where *ω*<sup>⊥</sup> is the orthocomplement space of *ω*.

(P7) Let **B** be a *q* × *n* matrix, and let N(**B**) = {*y*; **B***y* = **0**}. If *ω* = N[**B**] ∩ *Ω*, then *ω*⊥ ∩ *Ω* = *R*[**P***Ω***B** '].

For more details, see, e.g. [3, 7, 9, 10].

The MLEs and LRC in multivariate regression model are derived by using the following theorem.

#### **Theorem 3.1**

1. *Consider a function of f* (**Σ**)=log|Σ| + tr**Σ**−<sup>1</sup> **S** *of p* × *p positive definite matrix. Then, f* (**Σ**) *takes uniquely the minimum at* **Σ**=**S***, and the minimum value is given by*

$$\min\_{\mathbf{B}\succeq \mathbf{0}} \ulcorner (\mathbf{B}) = f(\mathbf{S}) + p.$$

2. *Let* **Y** *be an n* × *p known matrix and* **X** *an n* × *k known matrix of rank k. Consider a function of p* × *p positive definite matrix* **Σ** *and k* × *p matrix* **Θ**=(*θij* ) *given by*

$$\log(\boldsymbol{\Theta}, \boldsymbol{\Sigma}) = m \log|\boldsymbol{\Sigma}| + \operatorname{tr} \boldsymbol{\Sigma}^{-1} (\mathbf{Y} - \mathbf{X}\boldsymbol{\Theta})' (\mathbf{Y} - \mathbf{X}\boldsymbol{\Theta}),$$

*where m* > 0, − ∞ < *θij* < ∞*, for i* = 1, …, *k*; *j* = 1, …, *p. Then, g*(**Θ**, **Σ**) *takes the minimum at*

$$\boldsymbol{\Theta} = \hat{\boldsymbol{\Theta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}, \quad \mathbf{E} = \hat{\mathbf{E}} = \frac{1}{m}\mathbf{Y}'(\mathbf{I}\_n - \mathbf{P}\_\mathbf{x})\mathbf{Y},$$

and the minimum value is given by *m*log**|Σ|** ^ <sup>+</sup> *mp*.

*Proof.* Let *ℓ*1, …, *ℓp* be the characteristic roots of **Σ**−<sup>1</sup> **S**. Note that the characteristic roots of **Σ**−<sup>1</sup> **S** and **Σ**−1/2 **SΣ**−1/2 are the same. The latter matrix is positive definite, and hence we may assume *ℓ*1 ≥ ⋯ ≥ *ℓp* > 0. Then

$$\begin{aligned} f(\mathbf{E}) - f(\mathbf{S}) &= \log |\mathbf{E} \mathbf{S}^{-1}| + tr(\mathbf{E}^{-1} \mathbf{S}) - p \\ &= -\log |\mathbf{E}^{-1} \mathbf{S}| + tr(\mathbf{E}^{-1} \mathbf{S}) - p \\ &= \sum\_{i=1}^{p} (-\log \ell\_i + \ell\_i - 1) \ge 0. \end{aligned}$$

The last inequality follows from *x* − 1 ≥ log *x* (*x* > 0). The equality holds if and only if *ℓ*1 = ⋯ = *ℓp* = 1 ⇔ **Σ**=**S**.

Next, we prove 2. we have

The mapping is linear, and hence it is expressed as a matrix. In this case, *u* is called the orthogonal projection of *y* into *Ω*, and **P***<sup>Ω</sup>* is also called the orthogonal projection matrix to *Ω*.

(P7) Let **B** be a *q* × *n* matrix, and let N(**B**) = {*y*; **B***y* = **0**}. If *ω* = N[**B**] ∩ *Ω*, then *ω*⊥ ∩ *Ω* = *R*[**P***Ω***B** '].

The MLEs and LRC in multivariate regression model are derived by using the following

<sup>&</sup>gt; min *f fp* ( )= ( ) . <sup>+</sup> **<sup>O</sup> S**

2. *Let* **Y** *be an n* × *p known matrix and* **X** *an n* × *k known matrix of rank k. Consider a function of p* × *p*

<sup>1</sup> *g m* ( , ) = log | | tr ( ) ( ), - Q S S+ S - - **YX YX** Q ¢ Q

<sup>1</sup> <sup>1</sup> = =( ) , = = ( ) , <sup>ˆ</sup> <sup>ˆ</sup> *<sup>n</sup> <sup>m</sup>* - Q Q **XX XY** ¢¢ ¢ S S **YI P Y** - **<sup>X</sup>**

^ <sup>+</sup> *mp*.

**SΣ**−1/2 are the same. The latter matrix is positive definite, and hence we may assume

*where m* > 0, − ∞ < *θij* < ∞*, for i* = 1, …, *k*; *j* = 1, …, *p. Then, g*(**Θ**, **Σ**) *takes the minimum at*

) *given by*

S

**S** *of p* × *p positive definite matrix. Then, f* (**Σ**) *takes*

**S**. Note that the characteristic roots of **Σ**−<sup>1</sup>

**S**

Then, we have the following basic properties:

Let *ω* be a subset of *Ω*. Then, we have the following properties:

(P6) **P***<sup>Ω</sup>* <sup>−</sup>**P***<sup>ω</sup>* <sup>=</sup>**P***<sup>ω</sup>* ⊥∩Ω, where *ω*<sup>⊥</sup> is the orthocomplement space of *ω*.

*uniquely the minimum at* **Σ**=**S***, and the minimum value is given by*

S

(P2) **I***n* − **P***Ω* is the projection matrix to *Ω*<sup>⊥</sup>; (P3) **P***Ω* is a symmetric idempotent matrix;

(P4) ℛ[**P***Ω*] = *Ω*, and dim[*Ω*] = *tr***P***Ω*;

For more details, see, e.g. [3, 7, 9, 10].

1. *Consider a function of f* (**Σ**)=log|Σ| + tr**Σ**−<sup>1</sup>

*positive definite matrix* **Σ** *and k* × *p matrix* **Θ**=(*θij*

and the minimum value is given by *m*log**|Σ|**

*Proof.* Let *ℓ*1, …, *ℓp* be the characteristic roots of **Σ**−<sup>1</sup>

(P1) **P***Ω* is uniquely defined;

146 Applied Linear Algebra in Action

(P5) **P***Ω***P***ω* = **P***ω***P***Ω* = **P***ω*.

theorem.

and **Σ**−1/2

*ℓ*1 ≥ ⋯ ≥ *ℓp* > 0. Then

**Theorem 3.1**

$$\begin{split} & \text{tr} \mathbf{E}^{-1} (\mathbf{Y} - \mathbf{X} \boldsymbol{\Theta})' (\mathbf{Y} - \mathbf{X} \boldsymbol{\Theta}) \\ &= \text{tr} \mathbf{E}^{-1} (\mathbf{Y} - \mathbf{X} \hat{\boldsymbol{\Theta}})' (\mathbf{Y} - \mathbf{X} \hat{\boldsymbol{\Theta}}) + \text{tr} \mathbf{E}^{-1} \langle \mathbf{X} (\hat{\boldsymbol{\Theta}} - \boldsymbol{\Theta}) \rangle' \mathbf{X} (\hat{\boldsymbol{\Theta}} - \boldsymbol{\Theta}) \\ & \geq \text{tr} \mathbf{E}^{-1} \mathbf{Y}' (\mathbf{I}\_{\pi} - \mathbf{P}\_{\mathbf{X}}) \mathbf{Y} . \end{split}$$

The first equality follows from that **Y**−**XΘ**=**Y**−**XΘ** ^ <sup>+</sup> **<sup>X</sup>**(**<sup>Θ</sup>** ^ <sup>−</sup>**Θ**) and (**Y**−**X<sup>Θ</sup>** ^)′ **X**(**Θ** ^ <sup>−</sup>**Θ**)=**O**. In the last step, the equality holds when **Θ**=**Θ** ^ . The required result is obtained by noting that Θ ^ does not depend on Σ and combining this result with the first result 1.

**Theorem 3.2***Let***X***be an n* × *k matrix of rank k, and let Ω* = ℛ[**X**] *which is defined also by the set* {*y* : *y* = **X***θ* }*, where θ is a k* × 1 *unknown parameter vector. Let* **C** *be a c* × *k matrix of rank c, and define ω by the set* {*y* : *y* = *Xθ* , **C***θ* = **0**}*. Then,*

$$\mathbf{1}. \quad \mathbf{P}\_{\alpha} = \mathbf{X}(\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{X}^{\prime}.$$

**2. P***Ω* − **P***ω* = **X**(**X**′**X**) − 1**C**′{**C**(**X**′**X**) − 1**C**} − 1**C**(**X**′**X**) − 1**X**′.

*Proof.* 1 Let *ŷ* = **X**(**X**′**X**) <sup>−</sup><sup>1</sup>**X**′ and consider a decomposition *y* = *ŷ* + (*y* − *ŷ*). Then, *ŷ*′(*y* − *ŷ*) = **0**. Therefore, **P***Ωy* = *ŷ* and hence **P***Ω* = **X**(**X**′**X**) − 1**X**′.

2. Since **C***θ* = **C**(**X**′**X**) − 1**X**′ ⋅ **X***θ*, we can write *ω* = N[**B**] ∩ Ω, where **B** = **C**(**X**′**X**) − 1**X**′. Using (P7),

$$
\alpha \boldsymbol{\sigma}^{\perp} \cap \Omega = \mathcal{R}[\mathbf{P}\_{\alpha}\mathbf{B}^{\prime}] = \mathcal{R}[\mathbf{X}(\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{C}^{\prime}].
$$

The final result is obtained by using 1 and (P7).

Consider a special case **C** = (**O I***<sup>k</sup>* <sup>−</sup> *<sup>q</sup>*). Then *ω* = ℛ[**X**1], where **X** = (**X**<sup>1</sup> **X**2), **X**1 : *n* × *q*. We have the following results:

$$\begin{aligned} \alpha \rho^\perp \cap \Omega &= \mathcal{R}[(\mathbf{I}\_\kappa - \mathbf{P}\_{\mathbf{x}\_\l})\mathbf{X}\_2], \\ \mathbf{P}\_{\alpha^\perp \cap \Omega} &= (\mathbf{I}\_\kappa - \mathbf{P}\_{\mathbf{x}\_\l})\mathbf{X}\_2 \{\mathbf{X}\_2'(\mathbf{I}\_\kappa - \mathbf{P}\_{\mathbf{x}\_\l})\mathbf{X}\_2\}^{-1} \mathbf{X}\_2'(\mathbf{I}\_\kappa - \mathbf{P}\_{\mathbf{x}\_\l}). \end{aligned}$$

The expressions (2.11) for **S***e* and **S***h* in terms of **S** can be obtained from projection matrices based on

$$
\Omega = \mathcal{R}[\mathbf{X}] = \mathcal{R}[\mathbf{1}\_n] + \mathcal{R}[(\mathbf{I}\_n - \mathbf{P}\_{\mathbf{1}\_n})\mathbf{X}],
$$

$$
\boldsymbol{\alpha}^\perp \cap \Omega = \mathcal{R}[(\mathbf{I}\_n - \mathbf{P}\_{\mathbf{1}\_n} - \mathbf{P}\_{(\mathbf{1}\_n - \mathbf{P}\_{\mathbf{1}\_n})\mathbf{X}\_1})\mathbf{X}\_2].
$$
