**1. Introduction**

[39] Walling, LA (2011). An inline QC method for determining serial dilution perform‐ ance of DMSO-based systems. *Journal of the Association for Laboratory Automation*, Vol.

[40] Watson, J, Greenough, EB, Leet, JE, Ford, MJ, Drexler, DM, Belcastro, JV, Herbst, JJ, Chatterjee, M & Banks, M (2009). Extraction, identification, and functional characteri‐ zation of a bioactive substance from automated compound-handling plastic tips.

*Journal of Biomolecular Screening*, Vol. 14(5): pp. 566-72

16(3): pp. 235-40

200 Drug Discovery

With the advances in biotechnology, identification of new therapeutic targets, and better un‐ derstanding of human diseases, pharmaceutical companies and academic institutions have accelerated their efforts in drug discovery. The pipeline to obtain therapeutics often involves target identification and validation, lead discovery and optimization, pre-clinical animal studies, and eventually clinical trials to test the safety and effectiveness of the new drugs. In most cases, screening using genome-scale RNA interference (RNAi) technology or diverse compound libraries comprises the first step of the drug discovery initiatives. Small interfer‐ ing RNA (siRNA, a class of double-stranded RNA molecules 20-25 nucleotides in length ca‐ pable of interfering with the expression of specific genes with complementary nucleotide sequence) screen is an effective tool to identify upstream or downstream regulators of a spe‐ cific target gene, which may also potentially serve as drug targets for a more efficient and successful treatment. On the other hand, screening of diverse small molecule libraries against a known target or disease-relevant pathway facilitates the discovery of chemical tools as candidates for further development.

Conducting either genome-wide RNAi or small molecule screens has become possible with the advances in high throughput (HT) technologies, which are indispensible to carry out massive screens in a timely manner (Macarron 2006; Martis et al. 2011; Pereira and Williams 2007). In screening campaigns, large quantities of data are collected in a considerably short period of time, making rapid data analysis and subsequent data mining a challenging task (Harper and Pickett 2006). Numerous automatic instruments and operational steps partici‐ pate in an HT screening process, requiring appropriate data processing tools for data quality assessment and statistical analysis. In addition to quality control (QC) and "hit" selection strategies, pre- and post-processing of the screening data are essential steps in a comprehen‐

© 2013 Goktug et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2013 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

sive HT operation for subsequent interpretation and annotation of the large data sets. In this chapter, we review statistical data analysis methods developed to meet the needs for han‐ dling large datasets generated from HT campaigns. We first discuss the influence of proper assay design on statistical outcomes of the HT screening data. We then highlight similarities and differences among various methods for data normalization, quality assessment and "hit" selection. Information presented here provides guidance to researchers on the major aspects of high throughput screening data interpretation.

and Woolf 2009). The confirmatory screens of compounds identified from small molecule libra‐ ries are followed by lead optimization efforts involving structure-activity relationship investi‐ gations and molecular scaffold clustering. Pathway and genetic clustering analysis, on the other hand, are widespread hit follow-up practices for RNAi screens. The processes encom‐ passing hit identification from primary screens and lead optimization methods require power‐

Data Analysis Approaches in High Throughput Screening

http://dx.doi.org/10.5772/52508

203

Accuracy and precision of an assay are also critical parameters to consider for a successful campaign. While accuracy is a measurement of how close a measured value is to its true val‐ ue, precision is the proximity of the measured values to each other. Therefore, accuracy of an assay is highly dependent on the performance of the HT instruments in use. Precision, on the other hand, can be a function of sample size and control performances as well as instru‐ ment specifications, indicating that the experimental design has a significant impact on the

One of the main assumptions when analyzing HT screening data is that the data is normally distributed, or it complies with the central limit theorem, where the mean of the distributed values converge to normal distribution unless there are systematic errors associated with the screen (Coma et al. 2009). Therefore, log transformations are often applied to the data in the pre-processing stage to achieve more symmetrically distributed data around the mean as in a normal distribution, to represent the relationship between variables in a more linear way especially for cell growth assays, and to make an efficient use of the assay quality assess‐

ful software tools with advanced statistical capabilities.

**Figure 1.** The HT screening process.

statistical evaluation of the screening data.

ment parameters (Sui and Wu 2007).

**2.2. Classical versus robust (resistant) statistics**
