**3.1 DT and MCA classifiers**

For the DT and MCA classifiers, it is assumed that the dataset is divided such that two-thirds of the samples in each group are used in the training set and one-third is used in the testing set. The accuracy of the classifier is the sum of the sensitivity and specificity of the testing set.

For the DT algorithm, all samples are used in the construction of the decision tree. After the best decision tree is constructed from the evolutionary programming search over ordered sets of seven features, one-third of the samples are removed to build the testing set. This is done in a way that does not change the description of each terminal node (i.e. it stays as either a healthy or diseased node) and the sensitivity and specificity of the training and testing sets are approximately equal. This may appear to be cheating, but the goal of this investigation is to determine the minimum accuracy that could be obtained from data that contains no information.

The best quality from the five datasets for each number of Cases and Controls is given in Table 1. For 30 Cases and 30 Controls, a 7-node decision tree was able to correctly

A Comparison of Biomarker and Fingerprint-Based Classifiers of Disease 195

respectively. This is better than a 7-node decision tree, but not as good as any of the MCA classifiers examined. Datasets that contain 300 Cases, 300 Controls, and only 300 features have SVM and LDA accuracies of 61.3 and 63.7% respectively, which is below the accuracy

**Cases and Controls SVM LDA** 

Table 2. Highest quality obtained using normalized feature values from support vector machine (SVM) and linear discriminate analysis (LDA) classifiers. *Note:* The quality is the sum of the sensitivity and specificity for the testing set averaged over 100 runs of 10-fold

The accuracy of the BMDK classifier is shown in Table 3. This accuracy is determined from a leave-one-out cross-validation, a procedure that is known to exaggerate the accuracy of a classifier. After each sample is classified, the overall accuracy is the sum of the sensitivity and specificity minus the percentage of samples that were classified as unknown. For the smallest dataset (30 Cases and 30 Controls) a 3-feature DD-KNN classifier correctly classified 78.9% of

> 147.4 153.3 157.8 142.9 136.4 137.3 151.7 140.0 140.1 123.3 137.9 137.3 118.0 127.3 125.3 115.7 121.7 122.1

Table 3. Highest quality obtained from the BioMarker Development Kit (BMDK) using absolute differences in un-scaled peak intensities. *Note:* The quality is the sum of the

sensitivity and specificity minus the percent of samples undetermined using a leave-one-out

**BMDK**  1-Feat 2-Feat 3-Feat

the samples. In general, the accuracy decreased as the number of samples increased.

198.50 199.06 199.88 199.84 199.02 199.68 194.82 199.52 174.38 167.90 132.66 135.44

of the DT and MCA classifiers.

cross-validation.

**3.3 BMDK Classifier** 

**Cases and Controls** 

cross-validation of the entire set of data.

classify all 60 samples. In other words, all eight terminal nodes only contained samples from a single group. For 60 Cases and 60 Controls, a decision tree correctly classifies over 89% of the testing samples, and for 90 Cases and Controls, over 83% of the testing samples were correctly classified. It is only when the number of samples is as large or larger than the number of features that the DT classifier drops below 80% accuracy for the testing data.


Table 1. Highest quality obtained using absolute differences in un-scaled peak intensities, from a decision tree (EPDT) and the medoid classifier algorithm (MCA). *Note:* Taken from [Luke & Collins, 2008] where the quality is the sum of the sensitivity and specificity.

The classifier constructed by the MCA is order dependent in that the first training sample automatically becomes the medoid of the first region. Therefore, this analysis uses all samples to construct a classifier with the requirement that the samples used as medoids cannot exceed two-thirds of either the Cases or Controls. One-third of the samples from each group are then selected as testing samples and are chosen such that the accuracy of the training set is at least as high as the testing set.

The results in Table 1 show that an MCA classifier performs excellently using datasets that contain no information. If only seven of the 300 features are used, one can find a classifier with an average sensitivity and specificity of over 90% even when there are 300 Cases and 300 Controls (200 of each in the training set and 100 of each in the testing set). If the number of features is reduced to five or six, the accuracy stays above 90% for all but the largest datasets.

#### **3.2 SVM and LDA classifiers**

The results for the SVM and LDA classifiers are shown in Table 2. As described above, all samples are used to determine which features will be used in the classifier, but the accuracy of the final classification uses the sum of the sensitivity and specificity of the testing set for 10-fold cross-validation, averaged over 100 runs where the order of the samples are scrambled before each run. SVM has an average classification accuracy over 97% when there are 90 or fewer Cases and Controls in the dataset. For LDA, the average accuracy stays above 99.7%.

For the larger datasets, the accuracy decreases significantly. When there are 150 Cases and 150 Controls, the SVM and LDA classifiers have an average accuracy of about 87 and 84%, respectively. This is better than a 7-node decision tree, but not as good as any of the MCA classifiers examined. Datasets that contain 300 Cases, 300 Controls, and only 300 features have SVM and LDA accuracies of 61.3 and 63.7% respectively, which is below the accuracy of the DT and MCA classifiers.


Table 2. Highest quality obtained using normalized feature values from support vector machine (SVM) and linear discriminate analysis (LDA) classifiers. *Note:* The quality is the sum of the sensitivity and specificity for the testing set averaged over 100 runs of 10-fold cross-validation.
