**6. Tests in discriminant analysis**

We consider *q p*-variate normal populations with common covariance matrix Σ and the *i*th population having mean vector *θ<sup>i</sup>* . Suppose that a sample of size *ni* is available from the *i*th population, and let *yij* be the *j*th observation from the *i*th population. The observation matrix for all the observations is expressed as

$$\mathbf{Y} = (\mathbf{y}\_{11}, \dots, \mathbf{y}\_{1n\_1}, \mathbf{y}\_{21}, \dots, \mathbf{y}\_{q1}, \dots, \mathbf{y}\_{qn\_q})'.\tag{6.1}$$

It is assumed that *yij* are independent, and

$$\mathbf{y}\_{\boldsymbol{y}} \sim \mathcal{N}(\boldsymbol{\Theta}\_{\mathrm{i}}, \boldsymbol{\Sigma}), \; j = 1, \ldots, n\_{\boldsymbol{i}}; \; i = 1, \ldots, q,\tag{6.2}$$

The model is expressed as

$$\mathbf{Y} = \mathbf{A}\boldsymbol{\Theta} + \boldsymbol{\Xi},\tag{6.3}$$

where

$$\mathbf{A} = \begin{pmatrix} \mathbf{1}\_{n\_1} & \mathbf{0} & \cdots & \mathbf{0} \\ \mathbf{0} & \mathbf{1}\_{n\_2} & \cdots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} & \cdots & \mathbf{1}\_{n\_q} \end{pmatrix}, \quad \boldsymbol{\Theta} = \begin{pmatrix} \boldsymbol{\theta}' \\ \boldsymbol{\theta}' \\ \vdots \\ \boldsymbol{\theta}' \end{pmatrix}.$$

Here, the error matrix **E** has the same property as in (2.1).

First, we consider to test

$$H: \mathfrak{G}\_1 = \dots = \mathfrak{G}\_q(\! = \mathfrak{G}),\tag{6.4}$$

against alternatives *K* : *θ<sup>i</sup>* ≠ *θ<sup>j</sup>* for some *i*, *j*. The hypothesis can be expressed as

$$H: \mathbf{C} \clubsuit = \mathbf{O}, \quad \mathbf{C} = \left(\mathbf{I}\_{q-1}, -\mathbf{1}\_{q-1}\right). \tag{6.5}$$

The tests including LRC are based on three basic statistics, the within-group SSP matrix **W,** the between-group SSP matrix **B,** and the total SSP matrix **T** given by

$$\begin{aligned} \mathbf{M} &= \sum\_{i=1}^{q} (n\_i - 1) \mathbf{S}\_i, \quad \mathbf{B} = \sum\_{i=1}^{q} n\_i (\overline{\mathbf{y}}\_i - \overline{\mathbf{y}}) (\overline{\mathbf{y}}\_i - \overline{\mathbf{y}})', \\ \mathbf{T} &= \mathbf{B} + \mathbf{M} = \sum\_{i=1}^{q} \sum\_{j=1}^{n\_i} (\mathbf{y}\_{ij} - \overline{\mathbf{y}}) (\mathbf{y}\_{ij} - \overline{\mathbf{y}})', \end{aligned} \tag{6.6}$$

where **y**¯*<sup>i</sup>* and **S***i* are the mean vector and sample covariance matrix of the *i*th population, and **y**¯ is the total mean vector defined by (1 / *n*)∑*<sup>i</sup>*=1 *<sup>q</sup> ni* **y,**¯ *i* and *n* =∑*<sup>i</sup>*=1 *<sup>q</sup> ni* . In general, **W** and **B** are independently distributed as a Wishart distribution W*p*(*n* −*q*, **Σ**) and a noncentral Wishart distribution W*p*(*q* −1, **Σ**;**Δ**), respectively, where

$$\Delta = \sum\_{i=1}^{q} n\_i (\boldsymbol{\varTheta}\_i - \overline{\boldsymbol{\varTheta}}) (\boldsymbol{\varTheta}\_i - \overline{\boldsymbol{\varTheta}})^{r\_i}$$

where *θ*¯ =(1 / *n*)∑ *i*=1 *q*+1 *ni θi* . Then, the following theorem is well known.

**Theorem 6.1***Letλ* = *Λ<sup>n</sup>*/2*be the* LRC *for testing H in* (6.4)*. Then, Λ is expressed as*

$$
\Lambda = \frac{|\mathbf{W}|}{|\mathbf{W} + \mathbf{B}|} = \frac{|\mathbf{W}|}{|\mathbf{T}|}, \tag{6.7}
$$

where **W**, **B** , and **T** are given in (6.6). Further, under *H*, the statistic *Λ* is distributed as a lambda distribution *Λp*(*q* − 1, *n* − *q*).

Now we shall show Theorem 6.1 by an algebraic method. It is easy to see that

$$
\Omega = \mathcal{R}[\mathbf{A}], \quad \boldsymbol{\phi} = \mathrm{N}[\mathbf{C}(\mathbf{A}^\prime \mathbf{A})^{-1} \mathbf{A}^\prime] \cap \Omega = \mathcal{R}[\mathbf{1}\_n].
$$

The last equality is also checked from that under *H*

$$\mathbf{E}[\mathbf{Y}] = \mathbf{A} \mathbf{1}\_q \mathbf{\theta}^r = \mathbf{1}\_n \mathbf{\theta}^r.$$

We have

Here, the error matrix **E** has the same property as in (2.1).

≠ *θ<sup>j</sup>*

between-group SSP matrix **B,** and the total SSP matrix **T** given by

**W SB**

**TBW**

**y**¯ is the total mean vector defined by (1 / *n*)∑*<sup>i</sup>*=1

distribution W*p*(*q* −1, **Σ**;**Δ**), respectively, where

=1 =1

*i i <sup>n</sup> <sup>q</sup> <sup>i</sup>*

*q q*

å å

=1 =1

åå

*i j*

= = ( )( ) ,

=1

. Then, the following theorem is well known.


**W W**

where **W**, **B** , and **T** are given in (6.6). Further, under *H*, the statistic *Λ* is distributed as a lambda

+

Now we shall show Theorem 6.1 by an algebraic method. It is easy to see that

*i* <sup>D</sup> å*<sup>n</sup>*

**Theorem 6.1***Letλ* = *Λ<sup>n</sup>*/2*be the* LRC *for testing H in* (6.4)*. Then, Λ is expressed as*

L

*q*

+ -- ¢

<sup>1</sup> : = = (= ), ××× *H*

q *q* q

The tests including LRC are based on three basic statistics, the within-group SSP matrix **W,** the

= ( 1) , = ( )( ) ,

*i i ii i*

*ij ij*

*y yy y*

*<sup>q</sup> ni* **y,** ¯ *i*

independently distributed as a Wishart distribution W*p*(*n* −*q*, **Σ**) and a noncentral Wishart

= ( )( ) ,

q - q q q¢

*ii i*

and **S***i* are the mean vector and sample covariance matrix of the *i*th population, and

*n n* - - - ¢

for some *i*, *j*. The hypothesis can be expressed as

*y yy y*

and *n* =∑*<sup>i</sup>*=1

*<sup>q</sup> ni*

**WB T** (6.7)

*H* : =, = , . **C OC I 1** Q ( *q q* - - 1 1 - ) (6.5)

(6.4)

(6.6)

. In general, **W** and **B** are

q

First, we consider to test

154 Applied Linear Algebra in Action

against alternatives *K* : *θ<sup>i</sup>*

where **y**¯*<sup>i</sup>*

where *θ*¯ =(1 / *n*)∑

*i*=1 *q*+1 *ni θi*

distribution *Λp*(*q* − 1, *n* − *q*).

$$\begin{aligned} \mathsf{T} &= \mathsf{Y}'(\mathsf{I}\_{\pi} - \mathsf{P}\_{\mathsf{1}\_{\pi}})\mathsf{Y} \\ &= \mathsf{Y}'(\mathsf{I}\_{\pi} - \mathsf{P}\_{\mathsf{A}})\mathsf{Y} + \mathsf{Y}'(\mathsf{P}\_{\mathsf{A}} - \mathsf{P}\_{\mathsf{1}\_{\pi}})\mathsf{Y} \\ &= \mathsf{W} + \mathsf{B}. \end{aligned}$$

Further, it is easily checked that

**1.** (**I***<sup>n</sup>* <sup>−</sup>**PA**)<sup>2</sup> <sup>=</sup>**I***<sup>n</sup>* <sup>−</sup>**P**A, (**P**<sup>A</sup> <sup>−</sup>**P1***<sup>n</sup>* )<sup>2</sup> =**P**<sup>A</sup> −**P1***<sup>n</sup>* .

$$\mathbf{2.} \qquad (\mathbf{P\_A} - \mathbf{P\_{1\_u}})(\mathbf{P\_A} - \mathbf{P\_{1\_u}}) = \mathbf{O} \dots$$

$$\mathbf{3.}\qquad f\_{\boldsymbol{\epsilon}} = \dim\{\mathcal{R}\{\mathbf{A}\}^{\perp}\} = \text{tr}(\mathbf{I}\_{\boldsymbol{n}} - \mathbf{P}\_{\mathbf{A}}) = \boldsymbol{n} - q\_{\boldsymbol{\epsilon}}$$

$$f\_{\boldsymbol{h}} = \dim \left[ \mathcal{R} \mathtt{f} \mathtt{1}\_{n} \mathtt{I}^{\perp} \cap \mathcal{R} \mathtt{f} \mathtt{A} \mathtt{J} \right] = \text{tr} \left( \mathtt{P}\_{\operatorname{A}} - \mathtt{P}\_{\mathbf{1}\_{n}} \right) = q - 1 \ldots$$

Related to the test of *H*, we are interested in whether a subset of variables *y*1, …, *yp* is sufficient for discriminant analysis, or the set of remainder variables has no additional information or is redundant. Without loss of generality, we consider the sufficiency of a subvector *y*1 = (*y*1, …, *yk*)′ of *y*, or redundancy of the remainder vector *y*2 = (*yk* + 1, …, *yp*)′. Consider to test

$$H\_{2 \cdot 1} \colon \mathfrak{G}\_{1 \cdot 2 \cdot 1} = \dots = \mathfrak{G}\_{q/2 \cdot 1}(= \mathfrak{G}\_{2 \cdot 1}),\tag{6.8}$$

where

$$
\boldsymbol{\Theta}\_{i} = \begin{pmatrix} \boldsymbol{\Theta}\_{i;1} \\ \boldsymbol{\Theta}\_{i;2} \end{pmatrix}, \quad \boldsymbol{\Theta}\_{i;1}; \; k \times 1, \quad i = 1, \dots, q,
$$

and

$$
\boldsymbol{\Theta}\_{i;2\cdot 1} = \boldsymbol{\Theta}\_{i;2} - \mathbf{\bf{E}}\_{21} \mathbf{\bf{E}}\_{11}^{-1} \boldsymbol{\Theta}\_{i;1}, \quad i = 1, \ldots, q.
$$

The testing problem was considered by [11]. The hypothesis can be formulated in terms of Maharanobis distance and discriminant functions. For its details, see [12, 13]. To obtain a likelihood ratio for *H*2 ⋅ 1, we partition the observation matrix as

$$\mathbf{Y} = \begin{pmatrix} \mathbf{Y}\_1 & \mathbf{Y}\_2 \end{pmatrix}, \quad \mathbf{Y}\_1 : \boldsymbol{n} \times k\_{-1}$$

Then the conditional distribution of **Y**2 given **Y**1 is normal such that the rows of **Y**2 are inde‐ pendently distributed with covariance matrix **Σ**22⋅<sup>1</sup> =**Σ**<sup>22</sup> −**Σ**21**Σ**<sup>11</sup> <sup>−</sup>1**Σ**12 , and the conditional mean is given by

$$\mathbf{E}(\mathbf{Y}\_2 \mid \mathbf{Y}\_1) = \mathbf{A}\boldsymbol{\underline{\mathbf{O}}}\_{2 \cdot 1} + \mathbf{Y}\_1 \mathbf{E}\_{11}^{-1} \mathbf{E}\_{12},\tag{6.9}$$

where **Θ**21˙ =(**θ**1;2⋅1, …, **θ***q*;2⋅1)′ . The LRC for *H*2 ⋅ 1 can be obtained by use of the conditional distribution, and following the steps (D1)–(D4) in Section 5. In fact, the spaces spanned by each column of E(**Y**<sup>2</sup> |**Y**1) are the same, and let the spaces under *K*2 ⋅ 1 and *H*2 ⋅ 1 denote by *Ω* and *ω*, respectively. Then

$$
\Omega = \mathcal{R}[(\mathbf{A} \,\, \mathbf{Y}\_1)], \quad \boldsymbol{\alpha} = \mathcal{R}[(\mathbf{1}\_{\boldsymbol{\alpha}} \,\, \mathbf{Y}\_1)],
$$

dim(*Ω*) = *q* + *k*, and dim(*ω*) = *q* + 1. The likelihood ratio criterion *λ* can be expressed as

$$\mathcal{A}^{2/n} = \Lambda = \frac{|\mathbf{S}\_{\alpha}|}{|\mathbf{S}\_{\alpha}|} = \frac{|\mathbf{S}\_{\alpha}|}{|\mathbf{S}\_{\alpha} + (\mathbf{S}\_{\alpha} - \mathbf{S}\_{\alpha})|}.$$

where **S***<sup>Ω</sup>* =**Y**<sup>2</sup> **′** (**I***<sup>n</sup>* −**P***Ω*)**Y**2 and **S***<sup>ω</sup>* =**Y**<sup>2</sup> **′** (**I***<sup>n</sup>* −**P***ω*)**Y**2. We express the LRC in terms of **W** , **B** , and **T**. Let us partition **W** , **B** , and **T** as

$$\mathbf{W} = \begin{pmatrix} \mathbf{W}\_{11} & \mathbf{W}\_{12} \\ \mathbf{W}\_{21} & \mathbf{W}\_{22} \end{pmatrix}, \quad \mathbf{B} = \begin{pmatrix} \mathbf{B}\_{11} & \mathbf{B}\_{12} \\ \mathbf{B}\_{21} & \mathbf{B}\_{22} \end{pmatrix}, \quad \mathbf{T} = \begin{pmatrix} \mathbf{T}\_{11} & \mathbf{T}\_{12} \\ \mathbf{T}\_{21} & \mathbf{T}\_{22} \end{pmatrix}, \tag{6.10}$$

where **W**<sup>12</sup> : *<sup>q</sup>* ×(*<sup>p</sup>* <sup>−</sup>*q*) , **<sup>B</sup>**<sup>12</sup> : *<sup>q</sup>* ×(*<sup>p</sup>* <sup>−</sup>*q*) , and **T**<sup>12</sup> : *<sup>q</sup>* ×(*<sup>p</sup>* <sup>−</sup>*q*). Noting that **P***<sup>Ω</sup>* <sup>=</sup>**P**<sup>A</sup> <sup>+</sup> **<sup>P</sup>**(**I***n*−**PA**)**Y**<sup>1</sup> , we have

$$\begin{split} \mathbf{S}\_{\Omega} &= \mathbf{Y}\_{2}^{\prime} \left\{ \mathbf{I}\_{\boldsymbol{n}} - \mathbf{P}\_{\boldsymbol{\Lambda}} - (\mathbf{I}\_{\boldsymbol{n}} - \mathbf{P}\_{\boldsymbol{\Lambda}}) \mathbf{Y}\_{1} \{ \mathbf{Y}\_{1}^{\prime} (\mathbf{I}\_{\boldsymbol{n}} - \mathbf{P}\_{\boldsymbol{\Lambda}}) \mathbf{Y}\_{1} \}^{-1} \mathbf{Y}\_{1} (\mathbf{I}\_{\boldsymbol{n}} - \mathbf{P}\_{\boldsymbol{\Lambda}}) \right\} \mathbf{Y}\_{2} \\ &= \mathbf{W}\_{22} - \mathbf{W}\_{21} \mathbf{W}\_{11}^{-1} \mathbf{W}\_{12} = \mathbf{W}\_{22 \cdot 1} . \end{split}$$

Similarly, noting that **P***<sup>ω</sup>* <sup>=</sup>**P1***<sup>n</sup>* <sup>+</sup> **<sup>P</sup>**(**I***n*−**P1***<sup>n</sup>* )**Y**<sup>1</sup> , we have

The testing problem was considered by [11]. The hypothesis can be formulated in terms of Maharanobis distance and discriminant functions. For its details, see [12, 13]. To obtain a

**Y YY Y** = ,:. ( 12 1 ) *n k* ´

Then the conditional distribution of **Y**2 given **Y**1 is normal such that the rows of **Y**2 are inde‐

distribution, and following the steps (D1)–(D4) in Section 5. In fact, the spaces spanned by each column of E(**Y**<sup>2</sup> |**Y**1) are the same, and let the spaces under *K*2 ⋅ 1 and *H*2 ⋅ 1 denote by *Ω* and *ω*,

> W = [( )], = [( )], R R **A 1 Y Y** 1 1 w*n*

2/ || || == = . | | | ( )|

11 12 11 12 11 12 21 22 21 22 21 22 = ,= ,= , æ öæ öæ ö ç ÷ç ÷ç ÷ è øè øè ø **WW BB TT**

{ } <sup>1</sup> 2 1 1 1 1 2

= ( ){( )} ( )

¢ ¢ --- - -

**AA A A S Y IP IP IP IP YY Y Y Y**

×

*nn n n*

**S S S S SS**

 w

**WW BB TT** (6.10)


(**I***<sup>n</sup>* −**P***ω*)**Y**2. We express the LRC in terms of **W** , **B** , and **T**. Let

, we have

W W

+ -

dim(*Ω*) = *q* + *k*, and dim(*ω*) = *q* + 1. The likelihood ratio criterion *λ* can be expressed as

w

W W

*n*

**′**

L

**W BT**

1 22 21 11 12 22 1


= = .

**W WWW W**


where **W**<sup>12</sup> : *<sup>q</sup>* ×(*<sup>p</sup>* <sup>−</sup>*q*) , **<sup>B</sup>**<sup>12</sup> : *<sup>q</sup>* ×(*<sup>p</sup>* <sup>−</sup>*q*) , and **T**<sup>12</sup> : *<sup>q</sup>* ×(*<sup>p</sup>* <sup>−</sup>*q*). Noting that **P***<sup>Ω</sup>* <sup>=</sup>**P**<sup>A</sup> <sup>+</sup> **<sup>P</sup>**(**I***n*−**PA**)**Y**<sup>1</sup>

l

(**I***<sup>n</sup>* −**P***Ω*)**Y**2 and **S***<sup>ω</sup>* =**Y**<sup>2</sup>

W

<sup>1</sup> E( | ) = 2 1 2 1 1 11 12 , - **YY Y A**Q <sup>×</sup> + S S (6.9)

. The LRC for *H*2 ⋅ 1 can be obtained by use of the conditional

<sup>−</sup>1**Σ**12 , and the conditional mean

likelihood ratio for *H*2 ⋅ 1, we partition the observation matrix as

pendently distributed with covariance matrix **Σ**22⋅<sup>1</sup> =**Σ**<sup>22</sup> −**Σ**21**Σ**<sup>11</sup>

is given by

156 Applied Linear Algebra in Action

where **Θ**21˙ =(**θ**1;2⋅1, …, **θ***q*;2⋅1)′

respectively. Then

where **S***<sup>Ω</sup>* =**Y**<sup>2</sup>

**′**

us partition **W** , **B** , and **T** as

$$\begin{split} \mathbf{S}\_{\boldsymbol{\alpha}} &= \mathbf{Y}\_{2}^{\prime} \left\{ \mathbf{I}\_{n} - \mathbf{P}\_{\mathbf{1}\_{n}} - (\mathbf{I}\_{n} - \mathbf{P}\_{\mathbf{1}\_{n}}) \mathbf{Y}\_{1} \langle \mathbf{Y}\_{1}^{\prime} (\mathbf{I}\_{n} - \mathbf{P}\_{\mathbf{1}\_{n}}) \mathbf{Y}\_{1} \rangle^{-1} \mathbf{Y}\_{1}^{\prime} (\mathbf{I}\_{n} - \mathbf{P}\_{\mathbf{1}\_{n}}) \right\} \mathbf{Y}\_{2} \\ &= \mathbf{T}\_{22} - \mathbf{T}\_{21} \mathbf{T}\_{11}^{-1} \mathbf{T}\_{12} = \mathbf{T}\_{22,1} . \end{split}$$

**Theorem 6.2***Suppose that the observation matrix***Y***in* (6.1) *is a set of samples from* N*p*(**θ***<sup>i</sup>* , **Σ**), *i* = 1, …, *q. Then the likelihood ratio criterion λ for the hypothesis H*2 ⋅ 1 *in* (6.8) *is given by*

$$
\mathcal{X} = \left(\frac{|\mathbf{W}\_{22\cdot 1}|}{|\mathbf{T}\_{22\cdot 1}|}\right)^{n/2},
$$

*where***W***and***T***are given by* (6.6)*. Further, under H*2 ⋅ 1*,*

$$\frac{|\mathsf{W}\_{22\cdot 1}|}{|\mathsf{T}\_{22\cdot 1}|} \colon \ \Lambda\_{p-k}(q-1, n-q-k)\mathcal{I}$$

*Proof.* We consider the conditional distributions of **W**22⋅1 and **T**22⋅1 given **Y**1 by using Theorem 2.3, and see also that they do not depend on **Y**1. We have seen that

$$\mathbf{W}\_{22\cdot 1} = \mathbf{Y}\_2^\prime \mathbf{Q}\_1 \mathbf{Y}\_2, \quad \mathbf{Q}\_1 = \mathbf{I}\_n - \mathbf{P}\_\mathbf{A} - \mathbf{P}\_{(\mathbf{I}\_n - \mathbf{P}\_\mathbf{A})\mathbf{Y}\_1}$$

It is easy to see that **Q**<sup>1</sup> 2 =**Q**1 , rank(**Q**1) =tr**Q**<sup>1</sup> =*n* −*q* −*k* , **Q**1**A**=**O** , **Q**1**X**1 = **O**, and

$$\operatorname{E}(\mathbf{Y}\_2 \mid \mathbf{Y}\_1) \mathsf{"{O}}\_1 \mathsf{E}(\mathbf{Y}\_2 \mid \mathbf{Y}\_1) = \mathbf{O}\_{\cdot \cdot}$$

This implies that **W**22⋅<sup>1</sup> |**Y**1∼ W*p*−*<sup>k</sup>* (*n* −*q* −*k*, **Σ**22⋅1) and hence **W**22 ⋅<sup>1</sup> ∼ W*<sup>p</sup>* <sup>−</sup> *<sup>k</sup>*(*n* − *q* − *k*, *Σ*22 ⋅ 1). For **T**22⋅1 , we have

$$\mathbf{T}\_{22\cdot 1} = \mathbf{Y}\_2^\prime \mathbf{Q}\_2 \mathbf{Y}\_2, \quad \mathbf{Q}\_2 = \mathbf{I}\_n - \mathbf{P}\_{\mathbf{1}\_n} - \mathbf{P}\_{(\mathbf{1}\_n - \mathbf{P}\_{\mathbf{1}\_n})^\prime \mathbf{Y}\_1},$$

and hence

$$\mathbf{T}\_{22\cdot 1} - \mathbf{W}\_{22\cdot 1} = \mathbf{Y}\_{2\cdot} (\mathbf{Q}\_2 - \mathbf{Q}\_1) \mathbf{Y}\_{2\cdot}$$

Similarly, **Q**2 is idempotent. Using **P1***<sup>n</sup>* **P**<sup>A</sup> =**P**A**P1***<sup>n</sup>* =**P1***<sup>n</sup>* , we have **Q**1**Q**<sup>2</sup> =**Q**2**Q**<sup>1</sup> =**Q**1 , and hence

$$(\mathbf{Q}\_2 - \mathbf{Q}\_1)^2 = \mathbf{Q}\_2 - \mathbf{Q}\_1, \qquad \mathbf{Q}\_1 \cdot (\mathbf{Q}\_2 - \mathbf{Q}\_1) = \mathbf{Q}\_2$$

Further, under *H*2 ⋅ 1,

$$\operatorname{E}(\mathbf{X}\_2 \mid \mathbf{X}\_1)'(\mathbf{Q}\_2 - \mathbf{Q}\_1)\operatorname{E}(\mathbf{X}\_2 \mid \mathbf{X}\_1) = \mathbf{O}\_2$$
