**4. Examples**

#### **4.1 An artifical example**

The synthesis/analysis paradigm can be useful for understanding a problem. This means synthesizing (simulating) a dataset, so that you know the model and parameter values, and then applying your analysis method to see how well it performs. In the present context, it is interesting to simulate a dataset of measurements of rectangles, with variables length (L) and width (W) and also some functions of those such as perimeter = 2 L + 2 W and difference = L–W. In one synthesis, we took L to be Normal with a mean of 10 and a variance of 1, W was Normal with a mean of 10 and a variance of 1, PERI = 2 L + 2 W plus N(0,1) error, and DIFF = L–W plus N(0,1) error. The eigensystem was computed, and as expected, it is noted that there are two large eigenvalues, with subsequent ones dropping off a lot in value and being close to zero. The eigenvalues of the correlation matrix were 1.91, 1.83, 0.21, and 0.05.

#### **4.2 A real example**

Next, we consider the principal component analysis of a sample from the Los Angeles (LA) Heart Study. This was a long-term study, 1947–1972. It was a study among Civil Servants of Los Angeles county. LA civil servants, 2252, randomly selected, ages 21–70, received a battery of examinations for "routine" cardiovascular disease (CVD) risk factors.

The variables include age, systolic blood pressure (SYS), diastolic blood pressure (DIAS), weight (WT), height (HT), and coronary incident, a binary variable indicating whether the individual had a coronary incident during the course of the study. Blood pressure is reported as a bivariate variable, (SYS, DIAS). SYS is the pressure when the heart pumps, and DIAS is the pressure when the heart relaxes.

In the textbook [9], data for a sample of *n* ¼ 100 men were studied. (Data on the same variables for another sample of 100 men are also given in [9]. Results can be compared and contrasted between the two samples.) Although, of course, the emphasis in the Heart Study was on explaining and predicting the coronary incident variable, here, we focus on the first five variables, their representation in terms of a smaller

number of PCs, and the interpretations of the PCs. we did the PC analysis; it was not in the LA Heart Study or the textbook.

We used Minitab statistical software for the analysis. Aspects of the analysis are shown as follows.

The lower-triangular portion of the correlation matrix for the five variables is shown in **Table 1**. The highest correlation is 0.835, between SYS and DIAS. The next highest correlation, 0.426, is between HT and WT.
