**1. Introduction**

178 Biomarker

Sheiner LB, Beal SL, Dunne A (1997). Analysis of non-randomly censored ordered

Sheiner LB, Stanski DR, Vozeh S, Miller RD, and Ham J (1979). Simultaneous modeling of

Sieghart W, Sperk G (2002). Subunit composition, distribution and function of GABA(A)

Silverstone PH, Cowen PJ (1994). The 5-HT3 antagonist BRL 46470 does not attenuate m-

Snyder SM (1975). Opiate receptor in normal and drug altered brain function. *Nature*, 257:

Syvälahti EK, Kanto JH (1975). Serum growth hormone, serum immunoreactive insulin and

Takeuchi A, Takeuchi N (1967). Anion permeability of the inhibitory post-synaptic membrane of the crayfish neuromuscular junction. *J Physiol*, 191 (3): 575–90. Takeuchi A, Takeuchi N (1969). A study of the action of picrotoxin on the inhibitory

Takeuchi A, Onodera K (1972). Effect of bicuculline on the GABA receptor of the crayfish

Taylor, J.E. Evans, M. I. (1953). The Ohio Journal of Science, v 53:37-

Tsai YJ, Hoyme HE (2002). Pharmacogenomics: the future of drug therapy. *Clin Genet*, 62 (4):

Van der Graaf PH and Danhof M (1997). Analysis of drug-receptor interactions in vivo: a

Weinshilboum RM, Wang L (2006). Pharmacogenetics and pharmacogenomics:

Wessa P. (2008). Kernel Density Estimation (v1.0.6) in *Free Statistics Software (v1.1.23-r7),*

WHO (2001). *The world health report 2001 - Mental Health: New Understanding*, New Hope.

WHO (2005). *Mental health, facing the challenges, building solutions*. Report from the European

Zhou SF (2009). Polymorphism of human cytochrome P450 2D6 and its clinical significance:

Zuideveld KP, Maas HJ, Treijtel N, Graaf PH, Peletier LA, Danhof M (2001). A set-point

model with oscillatory behaviour predicts the time course of (8-)-OH-DPAT

neuromuscular junction of the crayfish. *J Physiol*, 205 (2): 377–91.

neuromuscular junction. *Nature New Biol*, 236 (63): 55–6.

hypothalamus. *Neuropsychopharmacology*, 15 (6), 533–540.

http://www.wessa.net/rwasp\_density.wasp/

induced hypothermia. *Am J Physiol*, 281: R2059-R2071

ministerial conference.Kobenhavn.

Part I. *Clin Pharmacokine*, 48(11):689-723

*American Statistical Association*, 92: 1235-55.

receptor subtypes. *Curr Top Med Chem*, 2 (8): 795–816.

*Pharmacol Ther*, 25:358-371.

*Psychiatry*, 36:309–316.

Biopharm, 12(1-2):74-82.

185- 189.

pp. 257–64.

Geneva.

*Pharmacol Ther*, 35:442-446.

categorical longitudinal data from analgesic trials (with discussion). *Journal of the* 

pharmacokinetics and pharmacodynamics: application to d-tubocurarine. *Clin* 

chlorophenylpiperazine (mCPP)-induced changes in human volunteers. *Biol* 

blood glucose response to oral and intravenous diaz in man. Int J Clin Pharmacol

41.Tetrahydroprogesterone attenuates the endocrine response to stress and exerts glucocorticoid-like effects on vasopressin gene transcription inthe rat

new approach in pharmacokinetic-pharmacodynamic modelling. *Int J Clin* 

development, science, and translation. *Annu Rev Genomics Hum Genet*, 7: pp. 223-45.

Office for Research Development and Education, Available from.

#### **1.1 Early detection of disease**

Early detection of a disease is very important since it greatly improves the individual's chance of responding well to treatment. For example, the 5-year survival rate from prostate cancer is nearly 100% if it is detected early [http://www.toacorn.com/news/2005/1027/ Health\_and\_Wellness/077.html]. Similarly, the 5-year survival rate for ovarian cancer is 95% if caught early, but since 75% of the cases are first observed in the later stages of the disease, the overall 5-year survival rate is less than 50% [http://www.information-aboutovarian-cancer.com/]. It would be nice if there was a single test to determine if an individual had cancer somewhere in their body, but unfortunately such a test does not exist. While all cancers have many factors in common, tissue differences and the body's response to different cancers make the test for ovarian cancer (CA125) very different from the test for prostate cancer (PSA). The lack of sufficient sensitivity and specificity has recently resulted in the recommendation that PSA no longer be used as a potential marker of prostate cancer [http://www.uspreventiveservicestaskforce.org/uspstf/uspsprca.htm].

Even within the same tissue, all cancers are not necessarily the same. It is well known that there are two major types of lung cancer, small cell lung cancer (SCLG) and non-small cell lung cancer (NSCLC). It is also known that NSCLC has three major sub-types; adenocarcinoma (AC), squamous cell carcinoma (SCC), and large cell undifferentiated carcinoma (LCUC). Each of these has differences in the biochemical processes going on within the cancer cell and one should not expect that the detection, or necessarily the treatment, of these cancers will be the same. Of the four recognized forms of lung cancer (SCLG, AC, SCC and LCUC), the latter three are strictly differentiated by appearances of the cell under the microscope. It is possible that the underlying biochemical processes of an AC cell in one individual are significantly different than the biochemical processes in another individual with a cancer that appears similar. Therefore, each of these categories of lung cancer may be composed of one or more states. While the disease category represents the name of the disease based on some experimental observation, the disease state represents a grouping based on the underlying biochemical processes within the diseased cell. The detection of a disease and its treatment should be relative to specific disease states, not a disease category or individuals within that category.

A Comparison of Biomarker and Fingerprint-Based Classifiers of Disease 181

the best is retained. All remaining features are then tested in combination with this best feature to find the feature-pair that performs the best. This procedure continues until either the addition of an additional feature does not improve the classification or a pre-set number of features are selected. An extension of this Greedy Search is known as Branch-and-Bound. In this latter procedure multiple classifiers are retained at each cycle and the search results

Reverse-selection works in the opposite direction. Initially, all features are used in the classifier and features that are not important to the classification are removed. In cases where the number of features is larger than the number of samples, special procedures need to be used to ensure that an important feature is not removed in the early steps of the

Multidimensional searches use a pre-defined number of features and try different combinations of features in the classifier. Examples of multidimensional search techniques are Simulated Annealing, Tabu Search, Gibbs Sampling, Genetic Algorithm, Evolutionary Programming, Ant Colony Optimization, and Particle Swarm Optimization. The first three techniques modify a single set of features while the latter four use a population of sets, where each feature set is changed throughout the search to find one or more optimal sets. It should be noted that in a Genetic Algorithm the number of features in the set can be reduced, so the pre-defined number should be considered a maximum number of allowed

The goal of each wrapper search technique is to find the optimal set of features, and therefore are approximations to an exhaustive search. If the objective is to find the best set of *k* features from a total set of *K* features, the number of unique combinations is generally

features, an exhaustive search would require examining 4.04x1013 unique sets of features. The situation is slightly more complicated for a decision tree. In this case the order of the features is important since this order determines if the feature acts on the entire set of samples, or a particular subset of samples. Here, the number of possible combinations is

In contrast, the search for the search for the best biomarker-based classifier is exhaustive. All features are examined by each filtering method and all combinations of putative biomarkers

Informatic analysis has led to a new paradigm for classification known as fingerprinting or pattern matching. In this paradigm, individuals are classified based upon a particular pattern of intensities [Petricoin et al., 2003]. If an untested individual has the same pattern as a known individual, then these two have the same classification. The simplest fingerprintbased classifier is a decision tree (Figure 1). In all known applications of a decision tree to produce a classifier using spectral data [Ho et al., 2006; Liu et al., 2005; Yang et al., 2005; Yu

 and an exhaustive search of 300 features to find the best seven-node decision tree would require examining 2.04x1017 trees. Since this is not computationally feasible, any result from a fingerprint-based classifier should be considered as a lower-bound to the

��. If there are a total of 300 features and the goal is to find the best set of seven

in a population of classifiers that have the highest accuracy.

reduction.

given by �

�! �����!

features in the final set.

�

accuracy of the classification algorithm.

can be used in the final classifier.

**1.4 Fingerprint-based classifiers** 

#### **1.2 Types of classifiers**

A high-quality classifier would be a great aide in the early detection of disease. The general procedure is to obtain a biological specimen and search for one or more features that correctly classify the individual. The specimen can be blood, urine, mucous, or tissue sample, for example, and the feature can be the expression level of mRNA, a protein, or a metabolite. The construction of the classifier starts with obtaining a large number of features from individuals with known phenotypes, known as the training set, and constructing a classifier that sufficiently predicts each sample's phenotype. This classifier is then used on a second set of samples of known phenotype, called the testing set, to determine its overall accuracy. Since the number of features will be much larger than the number of samples in the training set, the construction of a classifier suffers from the "curse of dimensionality" [Bellman, 1957, 1961, 2003]. If the training set contains Nh healthy samples and Nd diseased samples, then virtually any classifier that uses the smaller of Nh and Nd features, such as their social security numbers, can correctly classify all samples in the training set. This extreme example would represent a case where the classifier is fitting the individuals in the training set and not their phenotype. The goal is to choose a relatively small number of features that correctly distinguishes the samples.

Two extremes in the total range of possible classifiers are fingerprint-based and biomarkerbased classifiers. A fingerprint-based classifier uses a collection of features, which is also known as a panel of markers. If two individuals have the same pattern in this set of features (i.e. similar fingerprints), and one is known to have a particular disease, it is assumed that the other has this same disease. A biomarker-based classifier tries to find a very small number of features that distinguishes all healthy samples from all diseased samples. In other words, a biomarker-based classifier tries to cluster all samples of the same phenotype into the smallest possible number of clusters. The optimum biomarker would distinguish all healthy from all diseased samples, resulting in a single healthy cluster and a single disease cluster.

#### **1.3 Selecting features**

A major difference between fingerprint-based and biomarker-based classifiers is how the features are selected. In a fingerprint-based classifier, the actual classifier is used to determine who well a given set of features distinguishes the samples. An overriding heuristic is used to determine what set of features is tested in the classifier, and the quality of the classification can be used to determine which feature set is tried next. This is known as a *wrapper method*. In contrast, a biomarker-based classifier uses one or more procedures to determine which features successfully distinguish some or all of the samples in the training set. This set of putative biomarkers is then used in a different classifier, either individually or a small number together, to determine how well the healthy samples can be distinguished from the diseased samples. In other words, the selection of features for a biomarker-based classifier uses a *filter method*.

Many different procedures can be used in the wrapper method to find the optimal set of markers. Three major classes of procedures are forward-selection, reverse-selection, and multidimensional searches. The simplest forward-selection method is a Greedy Search. In this procedure all features are individually tested in the classifier and the one that performs

A high-quality classifier would be a great aide in the early detection of disease. The general procedure is to obtain a biological specimen and search for one or more features that correctly classify the individual. The specimen can be blood, urine, mucous, or tissue sample, for example, and the feature can be the expression level of mRNA, a protein, or a metabolite. The construction of the classifier starts with obtaining a large number of features from individuals with known phenotypes, known as the training set, and constructing a classifier that sufficiently predicts each sample's phenotype. This classifier is then used on a second set of samples of known phenotype, called the testing set, to determine its overall accuracy. Since the number of features will be much larger than the number of samples in the training set, the construction of a classifier suffers from the "curse of dimensionality" [Bellman, 1957, 1961, 2003]. If the training set contains Nh healthy samples and Nd diseased samples, then virtually any classifier that uses the smaller of Nh and Nd features, such as their social security numbers, can correctly classify all samples in the training set. This extreme example would represent a case where the classifier is fitting the individuals in the training set and not their phenotype. The goal is to choose a relatively small number of

Two extremes in the total range of possible classifiers are fingerprint-based and biomarkerbased classifiers. A fingerprint-based classifier uses a collection of features, which is also known as a panel of markers. If two individuals have the same pattern in this set of features (i.e. similar fingerprints), and one is known to have a particular disease, it is assumed that the other has this same disease. A biomarker-based classifier tries to find a very small number of features that distinguishes all healthy samples from all diseased samples. In other words, a biomarker-based classifier tries to cluster all samples of the same phenotype into the smallest possible number of clusters. The optimum biomarker would distinguish all healthy from all diseased samples, resulting in a single healthy cluster and a single disease

A major difference between fingerprint-based and biomarker-based classifiers is how the features are selected. In a fingerprint-based classifier, the actual classifier is used to determine who well a given set of features distinguishes the samples. An overriding heuristic is used to determine what set of features is tested in the classifier, and the quality of the classification can be used to determine which feature set is tried next. This is known as a *wrapper method*. In contrast, a biomarker-based classifier uses one or more procedures to determine which features successfully distinguish some or all of the samples in the training set. This set of putative biomarkers is then used in a different classifier, either individually or a small number together, to determine how well the healthy samples can be distinguished from the diseased samples. In other words, the selection of features for a

Many different procedures can be used in the wrapper method to find the optimal set of markers. Three major classes of procedures are forward-selection, reverse-selection, and multidimensional searches. The simplest forward-selection method is a Greedy Search. In this procedure all features are individually tested in the classifier and the one that performs

**1.2 Types of classifiers** 

cluster.

**1.3 Selecting features** 

features that correctly distinguishes the samples.

biomarker-based classifier uses a *filter method*.

the best is retained. All remaining features are then tested in combination with this best feature to find the feature-pair that performs the best. This procedure continues until either the addition of an additional feature does not improve the classification or a pre-set number of features are selected. An extension of this Greedy Search is known as Branch-and-Bound. In this latter procedure multiple classifiers are retained at each cycle and the search results in a population of classifiers that have the highest accuracy.

Reverse-selection works in the opposite direction. Initially, all features are used in the classifier and features that are not important to the classification are removed. In cases where the number of features is larger than the number of samples, special procedures need to be used to ensure that an important feature is not removed in the early steps of the reduction.

Multidimensional searches use a pre-defined number of features and try different combinations of features in the classifier. Examples of multidimensional search techniques are Simulated Annealing, Tabu Search, Gibbs Sampling, Genetic Algorithm, Evolutionary Programming, Ant Colony Optimization, and Particle Swarm Optimization. The first three techniques modify a single set of features while the latter four use a population of sets, where each feature set is changed throughout the search to find one or more optimal sets. It should be noted that in a Genetic Algorithm the number of features in the set can be reduced, so the pre-defined number should be considered a maximum number of allowed features in the final set.

The goal of each wrapper search technique is to find the optimal set of features, and therefore are approximations to an exhaustive search. If the objective is to find the best set of *k* features from a total set of *K* features, the number of unique combinations is generally given by � � ��. If there are a total of 300 features and the goal is to find the best set of seven features, an exhaustive search would require examining 4.04x1013 unique sets of features. The situation is slightly more complicated for a decision tree. In this case the order of the features is important since this order determines if the feature acts on the entire set of samples, or a particular subset of samples. Here, the number of possible combinations is �! �����! and an exhaustive search of 300 features to find the best seven-node decision tree would require examining 2.04x1017 trees. Since this is not computationally feasible, any result from a fingerprint-based classifier should be considered as a lower-bound to the accuracy of the classification algorithm.

In contrast, the search for the search for the best biomarker-based classifier is exhaustive. All features are examined by each filtering method and all combinations of putative biomarkers can be used in the final classifier.
