**2. Role of statistics in HT screening design**

#### **2.1. HT screening process**

A typical HT screening campaign can be divided into five major steps regardless of the as‐ say type and the assay read-out (Fig. 1). Once target or pathway is identified, assay develop‐ ment is performed to explore the optimal assay conditions, and to miniaturize the assay to a microtiter plate format. Performance of an HT assay is usually quantified with statistical pa‐ rameters such as signal window, signal variability and Z-factor (see definition in section 4). To achieve acceptable assay performances, one should carefully choose the appropriate re‐ agents, experimental controls and numerous other assay variables such as cell density or protein/substrate concentrations.

The final distribution of the activities from a screening data set depends highly on the target and pathway (for siRNA) or the diversity of the compound libraries, and efforts have been continuously made to generate more diverse libraries (Entzeroth et al. 2009; Gillet 2008; Kummel and Parker 2011; Zhao et al. 2005). Furthermore, the quality and reliability of the screening data is affected by the stability and the purity of the test samples in the screening libraries, where storage conditions should be monitored and validated in a timely manner (Baillargeon et al. 2011; Waybright et al. 2009). For small molecules, certain compounds might interfere with the detection system by emitting fluorescence or by absorbing light, and they should be avoided whenever possible to obtain reliable screening results.

Assay development is often followed by a primary screen, which is carried out at a single con‐ centration (small molecule) or single point measurements (siRNA). As the "hits" identified in the primary screen are followed-up in a subsequent confirmatory screen, it is crucial to opti‐ mize the assay to satisfactory standards. Sensitivity - the ability to identify an siRNA or com‐ pound as a "hit" when it is a true "hit", and specificity - the ability to classify an siRNA or compound as a "non-hit" when it is not a true "hit", are two critical aspects to identify as many candidates while minimizing false discovery rates. Specificity is commonly emphasized in the confirmatory screens which follow the primary screens. For instance, the confirmatory screen for small molecules often consists of multiple measurements of each compound's activity at various concentrations using different assay formats to assess the compound's potency and se‐ lectivity. The confirmatory stage of an RNAi screen using pooled siRNA may be performed in a deconvolution mode, where each well contains a single siRNA. Pooling strategy is also applica‐ ble to primary small molecule screens, where a keen pooling design is necessary (Kainkaryam and Woolf 2009). The confirmatory screens of compounds identified from small molecule libra‐ ries are followed by lead optimization efforts involving structure-activity relationship investi‐ gations and molecular scaffold clustering. Pathway and genetic clustering analysis, on the other hand, are widespread hit follow-up practices for RNAi screens. The processes encom‐ passing hit identification from primary screens and lead optimization methods require power‐ ful software tools with advanced statistical capabilities.

**Figure 1.** The HT screening process.

sive HT operation for subsequent interpretation and annotation of the large data sets. In this chapter, we review statistical data analysis methods developed to meet the needs for han‐ dling large datasets generated from HT campaigns. We first discuss the influence of proper assay design on statistical outcomes of the HT screening data. We then highlight similarities and differences among various methods for data normalization, quality assessment and "hit" selection. Information presented here provides guidance to researchers on the major

A typical HT screening campaign can be divided into five major steps regardless of the as‐ say type and the assay read-out (Fig. 1). Once target or pathway is identified, assay develop‐ ment is performed to explore the optimal assay conditions, and to miniaturize the assay to a microtiter plate format. Performance of an HT assay is usually quantified with statistical pa‐ rameters such as signal window, signal variability and Z-factor (see definition in section 4). To achieve acceptable assay performances, one should carefully choose the appropriate re‐ agents, experimental controls and numerous other assay variables such as cell density or

The final distribution of the activities from a screening data set depends highly on the target and pathway (for siRNA) or the diversity of the compound libraries, and efforts have been continuously made to generate more diverse libraries (Entzeroth et al. 2009; Gillet 2008; Kummel and Parker 2011; Zhao et al. 2005). Furthermore, the quality and reliability of the screening data is affected by the stability and the purity of the test samples in the screening libraries, where storage conditions should be monitored and validated in a timely manner (Baillargeon et al. 2011; Waybright et al. 2009). For small molecules, certain compounds might interfere with the detection system by emitting fluorescence or by absorbing light,

Assay development is often followed by a primary screen, which is carried out at a single con‐ centration (small molecule) or single point measurements (siRNA). As the "hits" identified in the primary screen are followed-up in a subsequent confirmatory screen, it is crucial to opti‐ mize the assay to satisfactory standards. Sensitivity - the ability to identify an siRNA or com‐ pound as a "hit" when it is a true "hit", and specificity - the ability to classify an siRNA or compound as a "non-hit" when it is not a true "hit", are two critical aspects to identify as many candidates while minimizing false discovery rates. Specificity is commonly emphasized in the confirmatory screens which follow the primary screens. For instance, the confirmatory screen for small molecules often consists of multiple measurements of each compound's activity at various concentrations using different assay formats to assess the compound's potency and se‐ lectivity. The confirmatory stage of an RNAi screen using pooled siRNA may be performed in a deconvolution mode, where each well contains a single siRNA. Pooling strategy is also applica‐ ble to primary small molecule screens, where a keen pooling design is necessary (Kainkaryam

and they should be avoided whenever possible to obtain reliable screening results.

aspects of high throughput screening data interpretation.

**2. Role of statistics in HT screening design**

**2.1. HT screening process**

202 Drug Discovery

protein/substrate concentrations.

Accuracy and precision of an assay are also critical parameters to consider for a successful campaign. While accuracy is a measurement of how close a measured value is to its true val‐ ue, precision is the proximity of the measured values to each other. Therefore, accuracy of an assay is highly dependent on the performance of the HT instruments in use. Precision, on the other hand, can be a function of sample size and control performances as well as instru‐ ment specifications, indicating that the experimental design has a significant impact on the statistical evaluation of the screening data.
