2. Preliminaries

The study of multivariate methods is greatly facilitated by means of matrix algebra [9–11]. Next, we introduce some basic concepts that are essential for the explanation of statistical techniques, as well as for geometric interpretations. In addition, the relationships that can be expressed in terms of matrices are easily programmable on computers, so we can apply calculation routines to obtain other quantities of interest. It is a basic introduction about concepts and relationships.

#### 2.1. The vector of means and the covariance matrix

Let X ¼ X<sup>1</sup> … Xp � �<sup>t</sup> be a random column vector of dimension <sup>p</sup>. Each component, Xi, is a random variable (r.v.) with mean E Xi ½ �¼ μ<sup>i</sup> and variance V Xi ½ �¼ E Xi � μ<sup>i</sup> � �<sup>2</sup> h i <sup>¼</sup> <sup>σ</sup>ii. Given two r.v., Xi and Xj, we define the covariance between them as Cov Xi; Xj � � <sup>¼</sup> E Xi � <sup>μ</sup><sup>i</sup> �� � Xj � μ<sup>j</sup> � �� ¼ <sup>σ</sup>ij. The expected values, variances, and covariances can be grouped into vectors and matrices that we will call population mean vector, μ, and population covariance matrix, P:

$$\boldsymbol{\mu} = \boldsymbol{E}[\mathbf{X}] = \begin{pmatrix} \mu\_1 \\ \vdots \\ \mu\_p \end{pmatrix}, \sum = \text{Cov}[\mathbf{X}] = \text{E}\left[ (\mathbf{X} - \boldsymbol{\mu})(\mathbf{X} - \boldsymbol{\mu})^t \right] = \begin{bmatrix} \sigma\_{11} & \cdots & \sigma\_{1p} \\ \vdots & \ddots & \vdots \\ \sigma\_{p1} & \cdots & \sigma\_{pp} \end{bmatrix} \tag{1}$$

The population correlation matrix is given by <sup>r</sup> <sup>¼</sup> <sup>r</sup>ij h i, where <sup>r</sup>ij <sup>¼</sup> <sup>σ</sup>ij ffiffi <sup>σ</sup> <sup>p</sup> ii ffiffiffiffi <sup>σ</sup>jj <sup>p</sup> .

In the case of having n values of the r.v.s, we will consider estimators of the previous population quantities, which we will call sample estimators.

Definition 2.1: Let X ¼ x<sup>11</sup> ⋯ x1<sup>p</sup> ⋮ ⋱⋮ xp<sup>1</sup> ⋯ xpp 2 6 4 3 7 <sup>5</sup> be a simple random sample of a p-dimensional r.v. ordered in the data matrix, with the values of the r.v.s in each column. The p-dimensional sample mean column vector is <sup>X</sup> <sup>¼</sup> xi ½ �, where xi <sup>¼</sup> <sup>1</sup> p P p m¼1 xim. The sample covariance matrix is <sup>S</sup> <sup>¼</sup> sij � � <sup>¼</sup> <sup>n</sup> <sup>n</sup>�<sup>1</sup> Sn <sup>¼</sup> <sup>n</sup> n�1 X-X � � X-X � �<sup>t</sup> . The generalized sample variance is the determinant of S, j j S : The sample correlation matrix is <sup>R</sup> <sup>¼</sup> rij � �, where rij <sup>¼</sup> sij ffi s p ii ffiffiffi sjj <sup>p</sup> with i, j ¼ 1…p.

Proposition 2.1: Let X1, …, X<sup>p</sup> be a simple random sample of a p-dimensional r.v. X with mean vector μ and covariance matrix P. The unbiased estimators of μ and P are X and S:.

#### 2.2. Eigenvalues and eigenvectors

notes are to disclose the nature of the principal component analysis and show some of its

Principal component analysis refers to the explanation of the structure of variances and covariances through a few linear combinations of the original variables, without losing a significant part of the original information. In other words, it is about finding a new set of orthogonal axes in which the variance of the data is maximum. Its objectives are to reduce the dimensionality of the problem and, once the transformation has been carried out, to facilitate its interpretation. By having p variables collected on the units analyzed, all are required to reproduce the total variability of the system, and sometimes the majority of this variability can be found in a small number, k, of principal components. Its origin lies in the redundancy that there exists many times between different variables, so the redundancy is data, not information. The k principal components can replace the p initial variables, so that the original set of data, consisting of n

The objective pursued by the analysis of principal components is the representation of the numerical measurements of several variables in a space of few dimensions, where our senses can perceive relationships that would otherwise remain hidden in higher dimensions. The abovementioned representation must be such that, when discarding higher dimensions, the loss of information is minimal. A simile could illustrate the idea: imagine a large rectangular plate that is a three-dimensional object, but that for practical purposes, we consider it as a flat two-dimensional object. When carrying out this reduction in dimensionality, a certain amount of information is lost since, for example, opposite points located on the two sides of the rectangular plate will appear confused in a single one. However, the loss of information is largely compensated by the simplification made, since many relationships, such as the neighborhood between points, are more evident when they are drawn on a plane than when done

The analysis of principal components can reveal relationships between variables that are not evident at the first sight, which facilitates the analysis of the dispersion of observations, highlighting possible groupings and detecting the variables that are responsible for the dispersion.

The study of multivariate methods is greatly facilitated by means of matrix algebra [9–11]. Next, we introduce some basic concepts that are essential for the explanation of statistical techniques, as well as for geometric interpretations. In addition, the relationships that can be expressed in terms of matrices are easily programmable on computers, so we can apply calculation routines to obtain other quantities of interest. It is a basic introduction about concepts and relationships.

� �<sup>t</sup> be a random column vector of dimension <sup>p</sup>. Each component, Xi, is a

� �<sup>2</sup> h i

¼ σii. Given

measures of p variables, is reduced to n measures of k principal components.

by a three-dimensional figure that must necessarily be drawn in perspective.

random variable (r.v.) with mean E Xi ½ �¼ μ<sup>i</sup> and variance V Xi ½ �¼ E Xi � μ<sup>i</sup>

possible applications.

108 Statistics - Growing Data Sets and Growing Demand for Statistics

2. Preliminaries

Let X ¼ X<sup>1</sup> … Xp

2.1. The vector of means and the covariance matrix

One of the problems that linear algebra deals with is the simplification of matrices through methods that produce diagonal or triangular matrices, which are widely used in the resolution of linear systems of the form Ax ¼ b:

Definition 2.2: Let A be a square matrix. If vt Av ≥ 0 for any vector v, A is a nonnegative definite matrix. If Av ¼ λv, with v 6¼ 0, λ is an eigenvalue associated with the eigenvector v.

Proposition 2.2: Let A be a symmetric p by p matrix with real-valued entries. A has p pairs of eigenvalues and eigenvectors, ð Þ λ1; e<sup>1</sup> , …, λp; e<sup>p</sup> � �, such that:

	- a. A is positive definite if all the eigenvalues are positive.
	- b. A is nonnegative definite if all the eigenvalues are nonnegative.

Next, we introduce a statistical distance that will take into account the different variabilities and correlations. Therefore, it will depend on the variances and covariances, and this distance

pairs of measures of two variables, x<sup>1</sup> and x2. Suppose that the measurements of x<sup>1</sup> vary independently of x<sup>2</sup> and that the variability of the measures of x<sup>1</sup> are much greater than those of x2. This situation is shown in Figure 1, and our first objective is to define a distance from the

In Figure 1, we see that the values that have a given deviation from the origin are farther from the origin in the x<sup>1</sup> direction than in the x<sup>2</sup> direction, due to the greater variability inherent in the direction of x1. Therefore, it seems reasonable to give more weight in the coordinate x<sup>2</sup> than in the x1. One way to obtain these weights is to standardize the

<sup>2</sup> <sup>¼</sup> <sup>x</sup>2<sup>=</sup> ffiffiffiffiffiffi

. Therefore, the points that are equidistant from the origin of a constant

the variable xi. Thus, the statistical distance from a point Q ¼ ð Þ x1; x<sup>2</sup> to the origin is

distance c are on an ellipse centered at the origin, whose major axis coincides with the coordinate that has the greatest variability. In the case that the variability of one variable is analogous to that of the other and that the coordinates are independent, the Euclidean

xi. The statistical distance defined so far does not include most of the important cases where the variables are not independent. Figure 2 shows a situation where the pairs ð Þ x1; x<sup>2</sup> seem to have an increasing trend, so the sample correlation coefficient will be positive. In Figure 2,

� � are two points of <sup>ℜ</sup><sup>p</sup>

2

Figure 1. Scatter plot with more variability in x<sup>1</sup> than in x2. (a)Scatter plot (b) Ellipse of constant distance.

, and, to illustrate the situation, consider n

http://dx.doi.org/10.5772/intechopen.75007

111

Application of Principal Component Analysis to Image Compression

<sup>s</sup><sup>22</sup> <sup>p</sup> , where sii is the sample variance of

, with sii being the sample variance of the variable

, the statistical distance between

is fundamental in multivariate analysis.

points to the origin.

coordinates, that is, x<sup>∗</sup>

ffiffiffiffiffiffiffiffiffiffiffiffiffiffi x2 1 <sup>s</sup><sup>11</sup> <sup>þ</sup> <sup>x</sup><sup>2</sup> 2 <sup>s</sup><sup>22</sup> <sup>q</sup>

d Qð Þ¼ ; O

If Q ¼ x1;…; xp

them is d Qð Þ¼ ;R

Suppose we have a fixed set of observations in ℜ<sup>p</sup>

<sup>1</sup> <sup>¼</sup> <sup>x</sup>1<sup>=</sup> ffiffiffiffiffiffi

distance is proportional to the statistical distance.

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

spp <sup>r</sup>

<sup>s</sup><sup>11</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> xp�<sup>y</sup> ð Þ<sup>p</sup>

� � and <sup>R</sup> <sup>¼</sup> <sup>y</sup>1;…; yp

<sup>x</sup>1�<sup>y</sup> ð Þ<sup>1</sup> 2 <sup>s</sup><sup>11</sup> <sup>p</sup> and <sup>x</sup><sup>∗</sup>

Remark 2.1: Let X be a matrix with the values of a simple random sample in each column of a pdimensional r.v., and let y<sup>t</sup> <sup>i</sup> <sup>¼</sup> ð Þ xi1; …; xin , with i <sup>¼</sup> <sup>1</sup>…p, be the ith row of <sup>X</sup>. Let <sup>1</sup><sup>t</sup> <sup>n</sup> ¼ ð Þ 1;…; 1 be the n by one vector with all its coordinates equal to 1. It can be proven that:


#### 2.3. Distances

Many techniques of multivariate statistical analysis are based on the concept of distance. Let Q ¼ ð Þ x1; x<sup>2</sup> be a point in the plane. The Euclidean distance from Q to the origin, O, is d Qð Þ¼ ; O ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x2 <sup>1</sup> <sup>þ</sup> <sup>x</sup><sup>2</sup> 2 q . If Q ¼ x1;…; xp � � and <sup>R</sup> <sup>¼</sup> <sup>y</sup>1;…; yp � �, the Euclidean distance between these two points of <sup>ℜ</sup><sup>p</sup> is d Qð Þ¼ ;<sup>R</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x<sup>1</sup> � y<sup>1</sup> � �<sup>2</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> xp � yp � �<sup>2</sup> <sup>r</sup> . All points x1;…; xp � � whose square distance to the origin is a fixed quantity, for example, x<sup>2</sup> <sup>1</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>x</sup><sup>2</sup> <sup>p</sup> <sup>¼</sup> <sup>c</sup>2, are the points of the p-dimensional sphere of radius j j c .

For many statistical purposes, the Euclidean distance is unsatisfactory, since each coordinate contributes in the same way to the calculation of such a distance. When the coordinates represent measures subject to random changes, it is desirable to assign weights to the coordinates depending on how high or low the variability of the measurements is. This suggests a measure of distance that is different from the Euclidean.

Next, we introduce a statistical distance that will take into account the different variabilities and correlations. Therefore, it will depend on the variances and covariances, and this distance is fundamental in multivariate analysis.

2. The eigenvectors can be chosen with 2-norm equal to 1.

4. The eigenvectors are unique unless two or more eigenvalues are equal.

n by one vector with all its coordinates equal to 1. It can be proven that:

ð Þ n � 1 sii, and the scalar product of e<sup>i</sup> and e<sup>j</sup> is equal to nð Þ � 1 sij.

vectors. The volume will increase if the norm of some e<sup>i</sup> is increased.

whose square distance to the origin is a fixed quantity, for example, x<sup>2</sup>

. If Q ¼ x1;…; xp

measure of distance that is different from the Euclidean.

3. The sample correlation coefficient rij is the cosine of the angle between e<sup>i</sup> and ej.

4. If <sup>U</sup> is the volume generated by the vectors <sup>e</sup>i, with i <sup>¼</sup> <sup>1</sup>…p, then j j¼ <sup>S</sup> <sup>U</sup><sup>2</sup>

<sup>1</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>λ</sup>pepe<sup>t</sup>

� � is an orthogonal matrix and Λ is a diagonal matrix with main diagonal entries

� �, the spectral decomposition of <sup>A</sup> can be given by <sup>A</sup> <sup>¼</sup> <sup>P</sup>ΛP<sup>t</sup>

Remark 2.1: Let X be a matrix with the values of a simple random sample in each column of a p-

2. Matrix Sn is obtained from the residuals e<sup>i</sup> ¼ y<sup>i</sup> � xi1n, the squared 2-norm of e<sup>i</sup> is equal to

generalized sample variance is proportional to the square of the volume generated by deviation

Many techniques of multivariate statistical analysis are based on the concept of distance. Let Q ¼ ð Þ x1; x<sup>2</sup> be a point in the plane. The Euclidean distance from Q to the origin, O, is

x<sup>1</sup> � y<sup>1</sup>

For many statistical purposes, the Euclidean distance is unsatisfactory, since each coordinate contributes in the same way to the calculation of such a distance. When the coordinates represent measures subject to random changes, it is desirable to assign weights to the coordinates depending on how high or low the variability of the measurements is. This suggests a

� �

� �<sup>2</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> xp � yp � �<sup>2</sup> <sup>r</sup>

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

� � and <sup>R</sup> <sup>¼</sup> <sup>y</sup>1;…; yp

<sup>i</sup> <sup>¼</sup> ð Þ xi1; …; xin , with i <sup>¼</sup> <sup>1</sup>…p, be the ith row of <sup>X</sup>. Let <sup>1</sup><sup>t</sup>

p.

<sup>i</sup> on the vector 1<sup>n</sup> is the vector xi1n, whose 2-norm is equal to ffiffiffi

. Therefore,

<sup>n</sup> ¼ ð Þ 1;…; 1 be the

ð Þ <sup>n</sup>�<sup>1</sup> p. Therefore, the

, the Euclidean distance between

<sup>1</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>x</sup><sup>2</sup>

. All points x1;…; xp

� �

<sup>p</sup> <sup>¼</sup> <sup>c</sup>2, are the

<sup>n</sup> <sup>p</sup> xi j j.

3. The eigenvectors are mutually perpendicular.

110 Statistics - Growing Data Sets and Growing Demand for Statistics

5. The spectral decomposition of <sup>A</sup> is <sup>A</sup> <sup>¼</sup> <sup>λ</sup>1e1e<sup>t</sup>

<sup>P</sup><sup>t</sup> <sup>¼</sup> <sup>P</sup> p

i¼1 1 λi eie<sup>t</sup> i .

6. If P ¼ e1; …; e<sup>p</sup>

λ1; …; λ<sup>p</sup>

<sup>A</sup>�<sup>1</sup> <sup>¼</sup> <sup>P</sup>Λ�<sup>1</sup>

dimensional r.v., and let y<sup>t</sup>

2.3. Distances

d Qð Þ¼ ; O

1. The projection of the vector y<sup>t</sup>

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x2 <sup>1</sup> <sup>þ</sup> <sup>x</sup><sup>2</sup> 2

these two points of <sup>ℜ</sup><sup>p</sup> is d Qð Þ¼ ;<sup>R</sup>

points of the p-dimensional sphere of radius j j c .

q

Suppose we have a fixed set of observations in ℜ<sup>p</sup> , and, to illustrate the situation, consider n pairs of measures of two variables, x<sup>1</sup> and x2. Suppose that the measurements of x<sup>1</sup> vary independently of x<sup>2</sup> and that the variability of the measures of x<sup>1</sup> are much greater than those of x2. This situation is shown in Figure 1, and our first objective is to define a distance from the points to the origin.

In Figure 1, we see that the values that have a given deviation from the origin are farther from the origin in the x<sup>1</sup> direction than in the x<sup>2</sup> direction, due to the greater variability inherent in the direction of x1. Therefore, it seems reasonable to give more weight in the coordinate x<sup>2</sup> than in the x1. One way to obtain these weights is to standardize the coordinates, that is, x<sup>∗</sup> <sup>1</sup> <sup>¼</sup> <sup>x</sup>1<sup>=</sup> ffiffiffiffiffiffi <sup>s</sup><sup>11</sup> <sup>p</sup> and <sup>x</sup><sup>∗</sup> <sup>2</sup> <sup>¼</sup> <sup>x</sup>2<sup>=</sup> ffiffiffiffiffiffi <sup>s</sup><sup>22</sup> <sup>p</sup> , where sii is the sample variance of the variable xi. Thus, the statistical distance from a point Q ¼ ð Þ x1; x<sup>2</sup> to the origin is d Qð Þ¼ ; O ffiffiffiffiffiffiffiffiffiffiffiffiffiffi x2 1 <sup>s</sup><sup>11</sup> <sup>þ</sup> <sup>x</sup><sup>2</sup> 2 <sup>s</sup><sup>22</sup> <sup>q</sup> . Therefore, the points that are equidistant from the origin of a constant distance c are on an ellipse centered at the origin, whose major axis coincides with the coordinate that has the greatest variability. In the case that the variability of one variable is analogous to that of the other and that the coordinates are independent, the Euclidean distance is proportional to the statistical distance.

If Q ¼ x1;…; xp � � and <sup>R</sup> <sup>¼</sup> <sup>y</sup>1;…; yp � � are two points of <sup>ℜ</sup><sup>p</sup> , the statistical distance between them is d Qð Þ¼ ;R ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi <sup>x</sup>1�<sup>y</sup> ð Þ<sup>1</sup> 2 <sup>s</sup><sup>11</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> xp�<sup>y</sup> ð Þ<sup>p</sup> 2 spp <sup>r</sup> , with sii being the sample variance of the variable xi. The statistical distance defined so far does not include most of the important cases where the variables are not independent. Figure 2 shows a situation where the pairs ð Þ x1; x<sup>2</sup> seem to have an increasing trend, so the sample correlation coefficient will be positive. In Figure 2,

Figure 1. Scatter plot with more variability in x<sup>1</sup> than in x2. (a)Scatter plot (b) Ellipse of constant distance.

Figure 2. Scatter plot with positive correlation.

we see that if we make a rotation of amplitude α and consider the axes g1; g<sup>2</sup> � � we are in conditions analogous to those of Figure 1 (a). Therefore, the distance from the point Q ¼ g1; g<sup>2</sup> � � to the origin will be d Qð Þ¼ ; <sup>O</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi g2 1 <sup>~</sup>s<sup>11</sup> <sup>þ</sup> <sup>g</sup><sup>2</sup> 2 <sup>~</sup>s<sup>22</sup> <sup>q</sup> , where ~sii is the sample variance of the variable gi .

The relationships between the original coordinates and the new coordinates can be expressed as

$$\begin{aligned} g\_1 &= \mathbf{x}\_1 \cos(\alpha) + \mathbf{x}\_2 \sin(\alpha) \\ g\_2 &= -\mathbf{x}\_1 \sin(\alpha) + \mathbf{x}\_2 \cos(\alpha) \end{aligned} \tag{2}$$

A ¼

matrix of a data matrix, S, is a candidate to define a statistical distance.

Figure 4. Scatter plot with center of gravity R and a point Q.

rotated and moved and scatter plot.

2 6 4

a<sup>11</sup> ⋯ a1<sup>p</sup> ⋮ ⋱⋮ ap<sup>1</sup> ⋯ app

Figure 3. Ellipses of constant statistical distance. (a) Point <sup>Q</sup> at a constant distance from <sup>R</sup>. (b) Ellipse <sup>x</sup><sup>2</sup>=<sup>3</sup> <sup>þ</sup> <sup>4</sup>y<sup>2</sup> <sup>¼</sup> <sup>1</sup>

The elements of Eq. (4) cannot be arbitraries. In order to define a distance over a vector space, Eq. (4) must be a square, symmetric, positive definite matrix. Therefore, the sample covariance

Figure 4 shows a cloud of points with center of gravity, ð Þ x1; x<sup>2</sup> , at point R. At the first glance, it can be seen that the Euclidean distance from point R to point Q is greater than the Euclidean

3 7

<sup>5</sup> (4)

Application of Principal Component Analysis to Image Compression

http://dx.doi.org/10.5772/intechopen.75007

113

and, after some algebraic manipulations, d Qð Þ¼ ; O ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a11x<sup>2</sup> <sup>1</sup> <sup>þ</sup> <sup>2</sup>a12x1x<sup>2</sup> <sup>þ</sup> <sup>a</sup>22x<sup>2</sup> 2 q , where aij are values that depend on the angle and the dispersions, and also must meet the condition that the distance between any two points must be positive.

The distance from a point Q ¼ ð Þ x1; x<sup>2</sup> to a fixed point R ¼ y1; y<sup>2</sup> � � in situations where there is a positive correlation is d Qð Þ¼ ;R ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a<sup>11</sup> x<sup>1</sup> � y<sup>1</sup> � �<sup>2</sup> <sup>þ</sup> <sup>2</sup>a<sup>12</sup> <sup>x</sup><sup>1</sup> � <sup>y</sup><sup>1</sup> � � <sup>x</sup><sup>2</sup> � <sup>y</sup><sup>2</sup> � � <sup>þ</sup> <sup>a</sup><sup>22</sup> <sup>x</sup><sup>2</sup> � <sup>y</sup><sup>2</sup> � �<sup>2</sup> <sup>q</sup> . So, in this case, the coordinates of all points Q ¼ ð Þ x1; x<sup>2</sup> verify the equation a<sup>11</sup> x<sup>1</sup> � y<sup>1</sup> � �<sup>2</sup> <sup>þ</sup> <sup>2</sup>a12ðx<sup>1</sup> �y1Þ x<sup>2</sup> � y<sup>2</sup> � � <sup>þ</sup> <sup>a</sup><sup>22</sup> <sup>x</sup><sup>2</sup> � <sup>y</sup><sup>2</sup> � �<sup>2</sup> <sup>¼</sup> <sup>c</sup>2, which is the equation of an ellipse of center <sup>R</sup> <sup>¼</sup> <sup>y</sup>1; <sup>y</sup><sup>2</sup> � � and with axes parallel to g1; g<sup>2</sup> � �. Figure 3 shows ellipses with constant statistical distances.

This distance can be generalized to <sup>ℜ</sup><sup>p</sup> if <sup>a</sup>11,…, app, a12, …, ap�1,p are values such that the distance from Q to R is given by.

$$d(\mathcal{Q}, R) = \sqrt{A + B}, \text{ where} \begin{aligned} A &= a\_{11} \left(\mathbf{x}\_1 - \boldsymbol{y}\_1\right)^2 + \dots + a\_{pp} \left(\mathbf{x}\_p - \boldsymbol{y}\_p\right)^2\\ B &= 2a\_{12} \left(\mathbf{x}\_1 - \boldsymbol{y}\_1\right) \left(\mathbf{x}\_2 - \boldsymbol{y}\_2\right) + \dots + 2a\_{p-1, p} \left(\mathbf{x}\_{p-1} - \boldsymbol{y}\_{p-1}\right) \left(\mathbf{x}\_p - \boldsymbol{y}\_p\right) \end{aligned} \tag{3}$$

This distance, therefore, is completely determined by the coefficient aij, with i, j∈ f g 1;…; p , which can be arranged in a matrix given by

Figure 3. Ellipses of constant statistical distance. (a) Point <sup>Q</sup> at a constant distance from <sup>R</sup>. (b) Ellipse <sup>x</sup><sup>2</sup>=<sup>3</sup> <sup>þ</sup> <sup>4</sup>y<sup>2</sup> <sup>¼</sup> <sup>1</sup> rotated and moved and scatter plot.

$$\mathbf{A} = \begin{bmatrix} a\_{11} & \cdots & a\_{1p} \\ \vdots & \ddots & \vdots \\ a\_{p1} & \cdots & a\_{pp} \end{bmatrix} \tag{4}$$

The elements of Eq. (4) cannot be arbitraries. In order to define a distance over a vector space, Eq. (4) must be a square, symmetric, positive definite matrix. Therefore, the sample covariance matrix of a data matrix, S, is a candidate to define a statistical distance.

Figure 4 shows a cloud of points with center of gravity, ð Þ x1; x<sup>2</sup> , at point R. At the first glance, it can be seen that the Euclidean distance from point R to point Q is greater than the Euclidean

Figure 4. Scatter plot with center of gravity R and a point Q.

we see that if we make a rotation of amplitude α and consider the axes g1; g<sup>2</sup>

� � to the origin will be d Qð Þ¼ ; <sup>O</sup>

112 Statistics - Growing Data Sets and Growing Demand for Statistics

Figure 2. Scatter plot with positive correlation.

and, after some algebraic manipulations, d Qð Þ¼ ; O

the distance between any two points must be positive.

The distance from a point Q ¼ ð Þ x1; x<sup>2</sup> to a fixed point R ¼ y1; y<sup>2</sup>

A ¼ a<sup>11</sup> x<sup>1</sup> � y<sup>1</sup>

B ¼ 2a<sup>12</sup> x<sup>1</sup> � y<sup>1</sup>

a<sup>11</sup> x<sup>1</sup> � y<sup>1</sup>

this case, the coordinates of all points Q ¼ ð Þ x1; x<sup>2</sup> verify the equation a<sup>11</sup> x<sup>1</sup> � y<sup>1</sup>

Q ¼ g1; g<sup>2</sup>

variable gi

�y1Þ x<sup>2</sup> � y<sup>2</sup>

.

positive correlation is d Qð Þ¼ ;R

� � <sup>þ</sup> <sup>a</sup><sup>22</sup> <sup>x</sup><sup>2</sup> � <sup>y</sup><sup>2</sup>

and with axes parallel to g1; g<sup>2</sup>

distance from Q to R is given by.

<sup>A</sup> <sup>þ</sup> <sup>B</sup> <sup>p</sup> , where

which can be arranged in a matrix given by

d Qð Þ¼ ; <sup>R</sup> ffiffiffiffiffiffiffiffiffiffiffiffi

conditions analogous to those of Figure 1 (a). Therefore, the distance from the point

The relationships between the original coordinates and the new coordinates can be expressed as

g<sup>1</sup> ¼ x1cosð Þþ α x2sinð Þ α g<sup>2</sup> ¼ �x1sinð Þþ α x2cosð Þ α

values that depend on the angle and the dispersions, and also must meet the condition that

� �<sup>2</sup> <sup>þ</sup> <sup>2</sup>a<sup>12</sup> <sup>x</sup><sup>1</sup> � <sup>y</sup><sup>1</sup>

This distance can be generalized to <sup>ℜ</sup><sup>p</sup> if <sup>a</sup>11,…, app, a12, …, ap�1,p are values such that the

� �<sup>2</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> app xp � yp

This distance, therefore, is completely determined by the coefficient aij, with i, j∈ f g 1;…; p ,

� � <sup>x</sup><sup>2</sup> � <sup>y</sup><sup>2</sup>

q

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi g2 1 <sup>~</sup>s<sup>11</sup> <sup>þ</sup> <sup>g</sup><sup>2</sup> 2 ~s<sup>22</sup> � � we are in

(2)

. So, in

� �

, where aij are

� �<sup>2</sup> <sup>þ</sup> <sup>2</sup>a12ðx<sup>1</sup>

xp � yp � � (3)

, where ~sii is the sample variance of the

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

<sup>1</sup> <sup>þ</sup> <sup>2</sup>a12x1x<sup>2</sup> <sup>þ</sup> <sup>a</sup>22x<sup>2</sup>

2

� � in situations where there is a

� � <sup>þ</sup> <sup>a</sup><sup>22</sup> <sup>x</sup><sup>2</sup> � <sup>y</sup><sup>2</sup>

� �

a11x<sup>2</sup>

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

� �<sup>2</sup> <sup>q</sup>

� �. Figure 3 shows ellipses with constant statistical distances.

� �<sup>2</sup>

� � <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>2</sup>ap�1,p xp�<sup>1</sup> � yp�<sup>1</sup>

� �<sup>2</sup> <sup>¼</sup> <sup>c</sup>2, which is the equation of an ellipse of center <sup>R</sup> <sup>¼</sup> <sup>y</sup>1; <sup>y</sup><sup>2</sup>

� � <sup>x</sup><sup>2</sup> � <sup>y</sup><sup>2</sup>

q

distance from point R to the origin; however, Q seems to have more to do with the cloud of points than the origin. If we take into account the variability of the points in the cloud and take the statistical measure, then Q will be closer to R than the origin.

The above given explanation has tried to be an illustration of the need to consider distances other than the Euclidean.
