**4. QC methods**

There are various environmental, instrumental and biological factors that contribute to as‐ say performance in an HT setting. Therefore, one of the key steps in the analysis of HT screening data is the examination of the assay quality. To determine if the data collected from each plate meet the minimum quality requirements, and if any patterns exist before and after data normalization, the distribution of control and test sample data should be ex‐ amined at experiment-, plate- and well-level. While there are numerous graphical methods and tools available for the visualization of the screening data in various formats (Gribbon et al. 2005; Gunter et al. 2003; Wu and Wu 2010), such as scatter plots, heat maps and frequen‐ cy plots, there are also many statistical parameters for the quantitative assessment of assay quality. Same as for the normalization techniques, both controls-based and non-controlsbased approaches exist for data QC methods. The most commonly-used QC parameters in HT screening are listed as follows and summarized in Table 2.

**•** Signal-to-background (S/B): This is a simple measure of the ratio of the positive control mean to the background signal mean (i.e. negative control).

$$\mathbf{S/B} = \frac{\text{mean}(\mathbf{C\_{pos}})}{\text{mean}(\mathbf{C\_{neg}})} \tag{13}$$

**•** Signal window (SW): This is a more indicative measure of the data range in an HT assay than the above parameters. Two alternative versions of the SW are presented below,

( ) ( ) ( ( ) ( ))

pos neg pos neg

( ) ( ) ( ( ) ( ))

pos neg pos neg

(15)

211

( )

mean C -mean C -3x std C +std C SW= (a) std C

mean C -mean C -3x std C +std C SW= (b) std C

trols as opposed to SW, and can be defined as (1-Z'-factor) as presented below.

AVR=

assay types (Coma et al. 2009; Gribbon et al. 2005).

Z'-factor=1 -

pos

( )

neg

**•** Assay variability ratio (AVR): This parameter captures the data variability in both con‐

3 x std(Cpos) + 3 x std(Cneg)

**•** Z'-factor: Despite of the fact that AVR and Z'-factor has similar statistical properties, the latter is the most widely used QC criterion, where the separation between positive (Cpos) and negative (Cneg) controls is calculated as a measure of the signal range of a particular assay in a single plate. Z'-factor has its basis on normality assumption, and the use of 3 std's of the mean of the group comes from the 99.73% confidence limit (Zhang et al. 1999). While Z'-factor accounts for the variability in the control wells, positional effects or any other variability in the sample wells are not captured. Al‐ though Z'-factor is an intuitive method to determine the assay quality, several con‐ cerns were raised about the reliability of this parameter as an assay quality measure. Major issues associated with the Z'-factor method are that the magnitude of the Z'-fac‐ tor does not necessarily correlate with the hit confirmation rates, and that Z'-factor is not an appropriate measure to compare the assay quality across different screens and

3 x std(Cpos) + 3 x std(Cneg)

**•** Z-factor: This is the modified version of the Z'-factor, where the mean and std of the negative control are substituted with the ones for the test samples. Although Z-factor is more advantageous than Z'-factor due to its ability to incorporate sample variabili‐ ty in the calculations, other issues associated with Z'-factor (as discussed above) still apply. Additionally, in a focused library in which many possible "hits" are clustered in certain plates, Z-factor would not be an appropriate QC parameter. While assays with Z'- or Z-factor values above 0.5 are considered to be excellent, one may want to include additional measures, such as visual inspection or more advanced formulations


Data Analysis Approaches in High Throughput Screening

http://dx.doi.org/10.5772/52508


which only differ by denominator.

**•** Signal-to-noise (S/N): This is a similar measure to S/B with the inclusion of signal variabil‐ ity in the formulation. Two alternative versions of S/N are presented below. Both S/B and S/N are considered week parameters to represent dynamic signal range for an HT screen and are rarely used.

$$\begin{aligned} \text{S/N=} & \frac{\text{mean}\left(\text{C}\_{\text{pos}}\right)\text{-mean}\left(\text{C}\_{\text{neg}}\right)}{\text{std}\left(\text{C}\_{\text{neg}}\right)} \text{(a)}\\ \text{S/N=} & \frac{\text{mean}\left(\text{C}\_{\text{pos}}\right)\text{-mean}\left(\text{C}\_{\text{neg}}\right)}{\sqrt{\text{std}\left(\text{C}\_{\text{pos}}\right)^2 + \text{std}\left(\text{C}\_{\text{neg}}\right)^2}} \text{(b)} \end{aligned} \tag{14}$$

**•** Signal window (SW): This is a more indicative measure of the data range in an HT assay than the above parameters. Two alternative versions of the SW are presented below, which only differ by denominator.

tics (ROC) curves were generated to compare the performance of several positional correction algorithms based on sensitivity and "1-specificity" values, and R-score was found to be the most superior. On the other hand, application of well-correction or diffusion model on data sets with no spatial effects was shown to have no adverse effect on the final "hit" selection (Carralot et al. 2012; Makarenkov et al. 2007). Additionally, reduction of thermal gradients and associat‐ ed edge effects in cell-based assays was shown to be possible by easy adjustments to the assay workflow, such as incubating the plates at room temperature for 1 hour immediately after dis‐

There are various environmental, instrumental and biological factors that contribute to as‐ say performance in an HT setting. Therefore, one of the key steps in the analysis of HT screening data is the examination of the assay quality. To determine if the data collected from each plate meet the minimum quality requirements, and if any patterns exist before and after data normalization, the distribution of control and test sample data should be ex‐ amined at experiment-, plate- and well-level. While there are numerous graphical methods and tools available for the visualization of the screening data in various formats (Gribbon et al. 2005; Gunter et al. 2003; Wu and Wu 2010), such as scatter plots, heat maps and frequen‐ cy plots, there are also many statistical parameters for the quantitative assessment of assay quality. Same as for the normalization techniques, both controls-based and non-controlsbased approaches exist for data QC methods. The most commonly-used QC parameters in

**•** Signal-to-background (S/B): This is a simple measure of the ratio of the positive control

**•** Signal-to-noise (S/N): This is a similar measure to S/B with the inclusion of signal variabil‐ ity in the formulation. Two alternative versions of S/N are presented below. Both S/B and S/N are considered week parameters to represent dynamic signal range for an HT screen

> ( ) ( ) ( )

mean C -mean C S/N= (a) std C mean C -mean C S/N= (b) std C +std C

pos neg neg

( ) ( )

pos neg 2 2 pos neg

( ) ( )

mean(Cneg) (13)

(14)

S/B= mean(Cpos)

pensing the cells into the wells (Lundholt et al. 2003).

HT screening are listed as follows and summarized in Table 2.

mean to the background signal mean (i.e. negative control).

**4. QC methods**

210 Drug Discovery

and are rarely used.

$$\begin{aligned} \text{SW} &= \frac{\left| \text{mean} \left( \text{C}\_{\text{pos}} \right) \cdot \text{mean} \left( \text{C}\_{\text{neg}} \right) \right| \cdot 3 \times \left( \text{std} \left( \text{C}\_{\text{pos}} \right) + \text{std} \left( \text{C}\_{\text{neg}} \right) \right)}{\text{std} \left( \text{C}\_{\text{pos}} \right)} \text{(a)} \\ \text{SW} &= \frac{\left| \text{mean} \left( \text{C}\_{\text{pos}} \right) \cdot \text{mean} \left( \text{C}\_{\text{neg}} \right) \right| \cdot 3 \times \left( \text{std} \left( \text{C}\_{\text{pos}} \right) + \text{std} \left( \text{C}\_{\text{neg}} \right) \right)}{\text{std} \left( \text{C}\_{\text{neg}} \right)} \text{(b)} \end{aligned} \tag{15}$$

**•** Assay variability ratio (AVR): This parameter captures the data variability in both con‐ trols as opposed to SW, and can be defined as (1-Z'-factor) as presented below.

$$\text{AVR} = \frac{3 \times \text{std}(\text{C}\_{\text{pos}}) + 3 \times \text{std}(\text{C}\_{\text{neg}})}{|\text{mean}(\text{C}\_{\text{pos}}) \cdot \text{mean}(\text{C}\_{\text{neg}})|} \tag{16}$$

**•** Z'-factor: Despite of the fact that AVR and Z'-factor has similar statistical properties, the latter is the most widely used QC criterion, where the separation between positive (Cpos) and negative (Cneg) controls is calculated as a measure of the signal range of a particular assay in a single plate. Z'-factor has its basis on normality assumption, and the use of 3 std's of the mean of the group comes from the 99.73% confidence limit (Zhang et al. 1999). While Z'-factor accounts for the variability in the control wells, positional effects or any other variability in the sample wells are not captured. Al‐ though Z'-factor is an intuitive method to determine the assay quality, several con‐ cerns were raised about the reliability of this parameter as an assay quality measure. Major issues associated with the Z'-factor method are that the magnitude of the Z'-fac‐ tor does not necessarily correlate with the hit confirmation rates, and that Z'-factor is not an appropriate measure to compare the assay quality across different screens and assay types (Coma et al. 2009; Gribbon et al. 2005).

$$\text{ZZ\text{-}factor} = 1 - \frac{3 \times \text{std}(\text{C}\_{\text{pos}}) + 3 \times \text{std}(\text{C}\_{\text{neg}})}{|\text{mean}(\text{C}\_{\text{pos}}) \cdot \text{mean}(\text{C}\_{\text{neg}})|} \tag{17}$$

**•** Z-factor: This is the modified version of the Z'-factor, where the mean and std of the negative control are substituted with the ones for the test samples. Although Z-factor is more advantageous than Z'-factor due to its ability to incorporate sample variabili‐ ty in the calculations, other issues associated with Z'-factor (as discussed above) still apply. Additionally, in a focused library in which many possible "hits" are clustered in certain plates, Z-factor would not be an appropriate QC parameter. While assays with Z'- or Z-factor values above 0.5 are considered to be excellent, one may want to include additional measures, such as visual inspection or more advanced formulations in the decision process, especially for cell-based assays with inherently high signal variability. The power of the above mentioned parameters were discussed in multiple studies (Gribbon et al. 2005; Iversen et al. 2006; Macarron and Hertzberg 2009; Ste‐ vens et al. 1998).

$$\text{Z-factor=1} - \frac{3 \times \text{std}(\text{C}\_{\text{pos}}) + 3 \times \text{std}(\text{S}\_{\text{all}})}{|\text{mean}(\text{C}\_{\text{pos}}) - \text{mean}(\text{S}\_{\text{all}})|} \tag{18}$$

**5. "Hit" selection methods**

major categories: primary and confirmatory screen analysis.

methods have been developed for HT screens as presented below.

sensitive to outliers, a more robust version is presented next.

**5.1. "Hit" selection in primary screen**

with strong controls.

ter (Chung et al. 2008).

The main purpose of HT screens is to obtain a list of compounds or siRNAs with desirable activ‐ ity for further confirmation. Therefore, the ultimate goal of an HT screening campaign is to nar‐ row down a big and comprehensive compound or siRNA library to a manageable number of "hits" with low false discovery rates. While the initial library of test samples undergoes multi‐ ple phases of elimination, the most critical factor is to select as many true "hits" as possible. Af‐ ter data normalization is applied as necessary, "hit" selection is performed on the plates that pass the QC criterion. As stated previously in Section 2.1, HT processes in primary and confir‐ matory screens differ in design. The "hit" selection process following a primary screen is simi‐ lar for RNAi and small-molecule screens, where the screening run is often performed in single copy, and a single data point (obtained from either endpoint or kinetic reading) is collected for each sample. On the other hand, a confirmatory RNAi screen is typically performed in repli‐ cates using pooled or individual siRNA, while the confirmatory small-molecule screens are executed in dose-response mode. Here, we classify the "hit" selection methodologies in two

Data Analysis Approaches in High Throughput Screening

http://dx.doi.org/10.5772/52508

213

Although RNAi and small molecule assays differ in many ways, a common aim is to classify the test samples with relatively higher or lower activities than the reference wells as "hits". Hence, it is required to select an activity cut-off, where test samples with values above or below the cutoff are identified as "hits". It is very crucial to select a sensible cut-off value with enough differ‐ ence from the noise level in order to reduce false positive rates. Depending on the specific goals of the projects, the cut-off might need to be a reasonable value that leads to a manageable quan‐ tity of "hits" for follow-up studies. To guide scientists in the process, numerous "hit" selection

**•** Percent inhibition cut-off: The "hits" from HT screening data that is normalized for per‐ cent inhibition (NPI method in Section 3.1) can be selected based on a percent cut-off val‐ ue that is arbitrarily assigned relative to an assay's signal window. As this method does not have much statistical basis to it, it is primarily preferred for small molecule screens

**•** Mean +/- k std: In this method, cut-off is set to the value that is k std's above or below the sample mean. While the cut-off can be applied to the normalized data, a k value of 3 is typically used, which is associated with the false positive error rate of 0.00135 (Zhang et al. 2006). As this cut-off calculation method is primarily based on normality assumption, it is also equivalent to a Z-score of 3. Since the use of mean and std make this method

**•** Median +/- k MAD: To desensitize the "hit" selection to outliers, a cut-off that is k MADs above or below the sample median was developed, and a study comparing the std- and MAD-based "hit" selection methods showed lower false non-discovery rates with the lat‐

**•** SSMD: It is an alternative quality metric to Z'- and Z-factor, which was recently devel‐ oped to assess the assay quality in HT screens (Zhang 2007a; Zhang 2007b). Due to its ba‐ sis on probabilistic and statistical theories, SSMD was shown to be a more meaningful parameter than previously mentioned methods for QC purposes. SSMD differs from Z' and Z-factor by its ability to handle controls with different effects, which enables the se‐ lection of multiple QC criteria for assays (Zhang et al. 2008a). The application of SSMDbased QC criterion was demonstrated in multiple studies in comparison to other commonly-used methods (Zhang 2008b; Zhang 2011b; Zhang et al. 2008a). Although SSMD was developed primarily for RNAi screens, it can also be used for small molecule screens.


$$\text{SSMD} = \frac{\text{mean}(\text{C}\_{\text{pos}}) \ast \text{ mean}(\text{C}\_{\text{neg}})}{\sqrt{\text{std}(\text{C}\_{\text{pos}})^2 \ast \text{std}(\text{C}\_{\text{neg}})^2}} \tag{19}$$

**Table 2.** Summary of HT screening data QC methods.
