**1.4 Fingerprint-based classifiers**

Informatic analysis has led to a new paradigm for classification known as fingerprinting or pattern matching. In this paradigm, individuals are classified based upon a particular pattern of intensities [Petricoin et al., 2003]. If an untested individual has the same pattern as a known individual, then these two have the same classification. The simplest fingerprintbased classifier is a decision tree (Figure 1). In all known applications of a decision tree to produce a classifier using spectral data [Ho et al., 2006; Liu et al., 2005; Yang et al., 2005; Yu

map.

A Comparison of Biomarker and Fingerprint-Based Classifiers of Disease 183

Srinivasan et al., 2006; Stone et al., 2005]. While these authors stated that their algorithm was quite similar to a Self-Organizing Map (SOM), their algorithm, as they described it, has virtually nothing in common with a SOM. In a SOM [Kohonen, 1988], the layout of the cells is determined *a priori*, as are the number of features, *n*, used in the separation. In general, the cells are placed in a rectangular or hexagonal pattern, with a maximum of four or six adjacent cells, respectively. The cells are seeded with random centroids that represent the *n*dimensional coordinates of each cell. The first training sample is assigned to the cell with the closest centroid and the centroids of this and all the other cells are significantly shifted towards this sample. This procedure is repeated for all samples. Once all samples have been processed, the algorithm repetitively cycles through the list of training samples. In each subsequent cycle, the extent to which the centroid of the cell it is assigned to shifts towards that sample decreases, as does the extent to which the other centroids are affected. This shift becomes significantly smaller for cells that are further from the selected cell, as defined by the initial mapping. When finished, all samples are assigned to cells and each centroid represents an approximate average of the *n* features for all samples in that cell, and the distance between centroids increases as the cells become further apart in the pre-defined

In contrast, the algorithm used in the references cited above places the first training sample as the center of the first cell, and this cell is classified as the category of this sample. Since it is sample-centered, each cell has a medoid not a centroid. Each cell is given a constant trust radius, *r*. If the second sample has a distance that is larger than *r* from the first, it is assigned to a second cell and that cell is classified by its category; otherwise it is assigned to the first

Therefore, a SOM has a fixed number of cells, each cell is described by a centroid, and the algorithm cycles through the training data many times to adjust the centroid's coordinates. The algorithm used by the groups of Petricoin and Liotta has an undefined number of cells,

Also grouped within the class of fingerprint-based classifiers are Support Vector Machines and Linear Discriminant Analysis. A Support Vector Machine (SVM) [Boser et al., 1992; Vapnik, 1998] is a kernel-based learning system. SVM searches for the optimal hyperplane that maximizes the margin of separation between the hyperplane and the closest data points on both sides of the hyperplane. Linear Discriminant Analysis (LDA) [Fukunaga, 1990] is a supervised learning algorithm. LDA finds the linear combination of features that maximize the between-class scatter and simultaneously minimize the within-class scatter to achieve maximum discrimination in a dataset. The within-class scatter matrix may become singular if the sample size is smaller than the dimensionality of the search space (number of

An example if a state-specific marker is shown in Figure 2. Each "+" represents, for example, the blood concentration of a particular biochemical. The individuals in the left column are in a specific disease state, while those in the right column are not and are therefore considered to be in a healthy state, at least with respect to this disease. Individuals in each state have different blood concentrations of this biochemical due to genetic and environmental

cell. This process continues until all training samples have been analyzed.

features), but several techniques are available to handle this situation.

**1.5 Biomarker-based classifiers** 

each described by a single sample, and the training data is only processed once.

et al., 2005], a single scoring metric (e.g. Gini Index, entropy gain, etc.) was used to determine the cut point at a given node so that the two daughter nodes were as homogeneous as possible for one or more categories (e.g. diseased versus healthy). Given the general structure of a decision tree in Figure 1, the root node (Node 1) would contain all training samples and an *m/z* value, or feature, and its cut point would be selected that best separates diseased and healthy individuals between Nodes 2 and 3. If there was still enough of a mixture in Node 2, for example, a second feature would be chosen based on the same metric that would separate the individuals into Nodes 4 and 5. The same process would be used for all heterogeneous nodes until there was a sufficient domination of one category over another.

Fig. 1. Example of a 7-node decision tree.

Decision Support also uses decision trees, but an independent question is asked at each level in the tree. For example, Node 1 may be used to separate the individuals by gender, race, or other genetic difference, and then different features may be used to separate samples obtained from affected and healthy patients at a given level of stratification. Since the stratifying variables are not known ahead of time, there is no way to know the proper metric that should initially separate the training set. Therefore, the procedure used here is to construct unconstrained decision trees that best classify the training individuals.

The medoid classification algorithm is a best attempt at reproducing the algorithm used in many of the studies conducted in the laboratories of Emmanuel Petricoin and Lance Liotta [Browers et al., 2005; Conrads et al., 2004; Ornstein et al., 2004; Petricoin et al., 2004;

et al., 2005], a single scoring metric (e.g. Gini Index, entropy gain, etc.) was used to determine the cut point at a given node so that the two daughter nodes were as homogeneous as possible for one or more categories (e.g. diseased versus healthy). Given the general structure of a decision tree in Figure 1, the root node (Node 1) would contain all training samples and an *m/z* value, or feature, and its cut point would be selected that best separates diseased and healthy individuals between Nodes 2 and 3. If there was still enough of a mixture in Node 2, for example, a second feature would be chosen based on the same metric that would separate the individuals into Nodes 4 and 5. The same process would be used for all heterogeneous nodes until there was a sufficient domination of one category

Decision Support also uses decision trees, but an independent question is asked at each level in the tree. For example, Node 1 may be used to separate the individuals by gender, race, or other genetic difference, and then different features may be used to separate samples obtained from affected and healthy patients at a given level of stratification. Since the stratifying variables are not known ahead of time, there is no way to know the proper metric that should initially separate the training set. Therefore, the procedure used here is to

The medoid classification algorithm is a best attempt at reproducing the algorithm used in many of the studies conducted in the laboratories of Emmanuel Petricoin and Lance Liotta [Browers et al., 2005; Conrads et al., 2004; Ornstein et al., 2004; Petricoin et al., 2004;

construct unconstrained decision trees that best classify the training individuals.

over another.

Fig. 1. Example of a 7-node decision tree.

Srinivasan et al., 2006; Stone et al., 2005]. While these authors stated that their algorithm was quite similar to a Self-Organizing Map (SOM), their algorithm, as they described it, has virtually nothing in common with a SOM. In a SOM [Kohonen, 1988], the layout of the cells is determined *a priori*, as are the number of features, *n*, used in the separation. In general, the cells are placed in a rectangular or hexagonal pattern, with a maximum of four or six adjacent cells, respectively. The cells are seeded with random centroids that represent the *n*dimensional coordinates of each cell. The first training sample is assigned to the cell with the closest centroid and the centroids of this and all the other cells are significantly shifted towards this sample. This procedure is repeated for all samples. Once all samples have been processed, the algorithm repetitively cycles through the list of training samples. In each subsequent cycle, the extent to which the centroid of the cell it is assigned to shifts towards that sample decreases, as does the extent to which the other centroids are affected. This shift becomes significantly smaller for cells that are further from the selected cell, as defined by the initial mapping. When finished, all samples are assigned to cells and each centroid represents an approximate average of the *n* features for all samples in that cell, and the distance between centroids increases as the cells become further apart in the pre-defined map.

In contrast, the algorithm used in the references cited above places the first training sample as the center of the first cell, and this cell is classified as the category of this sample. Since it is sample-centered, each cell has a medoid not a centroid. Each cell is given a constant trust radius, *r*. If the second sample has a distance that is larger than *r* from the first, it is assigned to a second cell and that cell is classified by its category; otherwise it is assigned to the first cell. This process continues until all training samples have been analyzed.

Therefore, a SOM has a fixed number of cells, each cell is described by a centroid, and the algorithm cycles through the training data many times to adjust the centroid's coordinates. The algorithm used by the groups of Petricoin and Liotta has an undefined number of cells, each described by a single sample, and the training data is only processed once.

Also grouped within the class of fingerprint-based classifiers are Support Vector Machines and Linear Discriminant Analysis. A Support Vector Machine (SVM) [Boser et al., 1992; Vapnik, 1998] is a kernel-based learning system. SVM searches for the optimal hyperplane that maximizes the margin of separation between the hyperplane and the closest data points on both sides of the hyperplane. Linear Discriminant Analysis (LDA) [Fukunaga, 1990] is a supervised learning algorithm. LDA finds the linear combination of features that maximize the between-class scatter and simultaneously minimize the within-class scatter to achieve maximum discrimination in a dataset. The within-class scatter matrix may become singular if the sample size is smaller than the dimensionality of the search space (number of features), but several techniques are available to handle this situation.
