**3.1. Normalization for assay variability**

In HT screening practices, the presence of outliers - data points that do not fall within the range of the rest of the data - is generally experienced. Distortions to the normal distribution of the data caused by outliers impact the results negatively. Therefore, an HT data set with outliers needs to be analyzed carefully to avoid an unreliable and inefficient "hit" selection process. Although outliers in control wells can be easily identified, it should be clear that outliers in the test sample may be misinterpreted as real "hits" instead of random errors.

There are two approaches for statistical analysis of data sets with outliers: classical and ro‐ bust. One can choose to replace or remove outliers based on the truncated mean or similar approaches, and continue the analysis process with classical methods. However, robust stat‐ istical approaches have gained popularity in HT screening data analysis in recent decades. In robust statistics, median and median absolute deviation (MAD) are utilized as statistical parameters as opposed to mean and standard deviation (std), respectively, to diminish the effect of outliers on the final analysis results. Although there are numerous approaches to detect and abolish/replace outliers with statistical methods (Hund et al. 2002; Iglewicz and Hoaglin 1993; Singh 1996), robust statistics is preferred for its insensitivity to outliers (Huber 1981). In statistics, while the robustness of an analysis technique can be determined by two main approaches, i.e. influence functions (Hampel et al. 1986) and breakdown point (Ham‐ pel 1971), the latter is a more intuitive technique in the concept of HT screening, where the breakdown point of a sample series is defined as the amount of outlier data points that can be tolerated by the statistical parameters before the parameters take on drastically different values that are not representing anymore distribution of the original dataset. In a demon‐ strated example on a five sample data set, robust parameters were shown to perform superi‐ or to the classical parameters after the data set was contaminated with outliers (Rousseeuw 1991). It was also emphasized that median and MAD have a breakdown point of 50%, while mean and std have 0%, indicating that sample sets with 50% outlier density can still be suc‐

As mentioned previously, depending on the specificity and sensitivity of an HT assay, erro‐ neous assessment of "hits" and "non-hits" is likely. Especially in genome-wide siRNA screens, false positive and negative results may mislead the scientists in the confirmatory studies. While the cause of false discovery results may be due to indirect biological regula‐ tions of the gene of interest through other pathways that are not in the scope of the experi‐ ment, it may also be due to random errors experienced in the screening process. Although the latter can be easily resolved in the follow-up screens, the former may require a better assay design (Stone et al. 2007). Lower false discovery rates can also be achieved by careful selection of assay reagents to avoid inconsistent measurements (outliers) during screening. The biological interference effects of the reagents in RNAi screens can be considered in two categories: sequence-dependent and sequence-independent (Echeverri et al. 2006; Mohr and Perrimon 2012). Therefore, off-target effects and low transfection efficiencies are the main challenges to be overcome in these screens. Moreover, selection of the appropriate controls for either small molecule or RNAi screens is very crucial for screen quality assessment as

cessfully handled with robust statistics.

**2.3. False discovery rates**

204 Drug Discovery

Despite meticulous assay optimization efforts considering all the factors mentioned previ‐ ously, it is expected to observe variances in the raw data across plates even within the same experiment. Here, we consider these variances as "random" assay variability, which is sepa‐ rate from the systematic errors that can be linked to a known reason, such as failure of an instrument. Uneven assay performances may unpredictably occur at any given time during screening. Hence, normalization of data within each plate is necessary to enable comparable results across plates or experiments allowing a single cut-off for the selection of "hits".

When normalizing the HT screening data, two main approaches can be followed: controlsbased and non-controls-based. In controls-based approaches, the assay-specific in-plate pos‐ itive and negative controls are used as the upper (100%) and lower (0%) bounds of the assay activity, and the activities of the test samples are calculated with respect to these values. Al‐ though, it is an intuitive and easily interpretable method, there are several concerns with the use of controls for normalization purposes. With controls-based methods, too high or too low variability in the control wells does not necessarily represent the variability in the sam‐ ple wells, and the outliers and biases within the control wells might impair the upper and lower activity bounds (Brideau et al. 2003; Coma et al. 2009). Therefore, non-control-based normalizations are favored for better understanding of the overall activity distribution based on the sample activities per se. In this method, most of the samples are assumed to be inactive in order to serve as their own "negative controls". However, this approach may be misleading when the majority of the wells in a plate consist of true "hits" (such as screening a library of bioactive molecules) or siRNAs (e.g., focused library). Since the basal activity level would shift upwards under these conditions, non-controls-based method would result in erroneous decision making.

Plate-wise versus experiment-wise normalization and "hit" picking is another critical point to consider when choosing the best fitting analysis technique for a screen. Experi‐ ment-wise normalizations are advantageous in screens where active samples are clustered within certain plates. In this case, each plate is processed in the context of all plates in the experiment. On the other hand, plate-wise normalizations can effectively correct systemat‐ ic errors occurring in a plate-specific manner without disrupting the results in other plates (Zhang et al. 2006). Therefore, the normalization method that fits best with one's experimental results should be carefully chosen to perform efficient "hit" selection with low false discovery rates.

The calculation used in the most common controls-based normalization methods are as follows:

**•** Percent of control (PC): Activity of the ith sample (Si ) is divided by the mean of either the positive or negative control wells (C).

$$\text{PC} = \frac{\text{S}\_{\text{i}}}{\text{mean(C)}} \text{x100} \tag{1}$$

**•** Percent of samples (PS): The mean of the control wells in the PC parameter (only when negative control is the control of interest) is replaced with the mean of all samples (Sall).

**•** Robust percent of samples (RPS): In order to desensitize the PS calculation to the outliers, robust statistics approach is preferred, where mean of Sall in PS calculation is replaced

**Assay Variability Normalization**

mean(Sall) x100 (3)

Data Analysis Approaches in High Throughput Screening

http://dx.doi.org/10.5772/52508

207

median(Sall) x100 (4)


MAD(Sall)=1.4826xmedian(|Si



Z-score Z-score= Si

Robust Z-score Robust Z-score= Si

Well-correction

Diffusion state model (can be controls-based too)

and the mean of Sall by the std of Sall. Z-

std(Sall) (5)

MAD(Sall) (6)

Si

Si

Percent of samples

percent of samples RPS= Si

median(Sall) x100

**Systematic Error Corrections**

**•** Z-score: Unlike the above parameters, this method accounts for the signal variability in

score is a widely used measure to successfully correct for additive and multiplicative off‐

Si -mean(Sall)

**•** Robust Z-score: Since Z-score calculation is highly affected by outliers, robust version of Z-score is available for calculations insensitive to outliers. In this parameter, the mean and

Si


std((rijp) all)

Background correction

PS= Si mean(Sall) x100

Robust

PS=

RPS=

**Non-controls-based**

BZ-score BZ-score= r ijp- mean((rijp) all)

zij= <sup>1</sup> <sup>N</sup> ∑ p= 1 N Sijp '

sets between plates in a plate-wise approach (Brideau et al. 2003).

Z-score=

Robust Z-score=

**Table 1.** Summary of HT screening data normalization methods.

the sample wells by dividing the difference of Si

std are replaced with median and MAD, respectively.

with the median of Sall.

Normalized percent inhibition

mean(Chigh)-mean(Clow) x100

Percent of control

NPI= mean(Chigh)-Si

Median polish rijp=Sijp-µ ^ <sup>p</sup>-row^ <sup>i</sup> -col ^ <sup>j</sup>

B-score B-score= <sup>r</sup> ijp MADp

PC= Si mean(C) x100

**Controls-based**

**Non-controls-based**

**•** Normalized percent inhibition (NPI): Activity of the ith sample is normalized to the activi‐ ty of positive and negative controls. The sample activity is subtracted from the high con‐ trol (Chigh) which is then divided by the difference between mean of the low control (Clow) and the mean of the high control. This parameter may be termed normalized percent ac‐ tivity if the final result is subtracted from 100. Additionally, control means may be pref‐ erably substituted with the medians.

$$\text{NPI} = \frac{\text{mean}(\text{C}\_{\text{high}}) \cdot \text{S}\_{i}}{\text{mean}(\text{C}\_{\text{high}}) \cdot \text{mean}(\text{C}\_{\text{low}})} \times 100 \tag{2}$$

The calculation used in the most common non-controls-based normalization methods are as follows.

**•** Percent of samples (PS): The mean of the control wells in the PC parameter (only when negative control is the control of interest) is replaced with the mean of all samples (Sall).

$$\text{PS} = \frac{\text{S}\_i}{\text{mean}(\text{S}\_{\text{all}})} \text{x100} \tag{3}$$

**•** Robust percent of samples (RPS): In order to desensitize the PS calculation to the outliers, robust statistics approach is preferred, where mean of Sall in PS calculation is replaced with the median of Sall.

$$\text{RPS} = \frac{\text{S}\_{\text{i}}}{\text{median}(\text{S}\_{\text{all}})} \times 100 \tag{4}$$


**Table 1.** Summary of HT screening data normalization methods.

though, it is an intuitive and easily interpretable method, there are several concerns with the use of controls for normalization purposes. With controls-based methods, too high or too low variability in the control wells does not necessarily represent the variability in the sam‐ ple wells, and the outliers and biases within the control wells might impair the upper and lower activity bounds (Brideau et al. 2003; Coma et al. 2009). Therefore, non-control-based normalizations are favored for better understanding of the overall activity distribution based on the sample activities per se. In this method, most of the samples are assumed to be inactive in order to serve as their own "negative controls". However, this approach may be misleading when the majority of the wells in a plate consist of true "hits" (such as screening a library of bioactive molecules) or siRNAs (e.g., focused library). Since the basal activity level would shift upwards under these conditions, non-controls-based method would result

Plate-wise versus experiment-wise normalization and "hit" picking is another critical point to consider when choosing the best fitting analysis technique for a screen. Experi‐ ment-wise normalizations are advantageous in screens where active samples are clustered within certain plates. In this case, each plate is processed in the context of all plates in the experiment. On the other hand, plate-wise normalizations can effectively correct systemat‐ ic errors occurring in a plate-specific manner without disrupting the results in other plates (Zhang et al. 2006). Therefore, the normalization method that fits best with one's experimental results should be carefully chosen to perform efficient "hit" selection with

The calculation used in the most common controls-based normalization methods are as

) is divided by the mean of either the

mean(C) x100 (1)

mean(Chigh)-mean(Clow) x100 (2)

in erroneous decision making.

low false discovery rates.

**•** Percent of control (PC): Activity of the ith sample (Si

PC=

NPI= mean(Chigh)-Si

Si

**•** Normalized percent inhibition (NPI): Activity of the ith sample is normalized to the activi‐ ty of positive and negative controls. The sample activity is subtracted from the high con‐ trol (Chigh) which is then divided by the difference between mean of the low control (Clow) and the mean of the high control. This parameter may be termed normalized percent ac‐ tivity if the final result is subtracted from 100. Additionally, control means may be pref‐

The calculation used in the most common non-controls-based normalization methods are as

positive or negative control wells (C).

erably substituted with the medians.

follows:

206 Drug Discovery

follows.

**•** Z-score: Unlike the above parameters, this method accounts for the signal variability in the sample wells by dividing the difference of Si and the mean of Sall by the std of Sall. Zscore is a widely used measure to successfully correct for additive and multiplicative off‐ sets between plates in a plate-wise approach (Brideau et al. 2003).

$$\text{Z-score} = \frac{\text{S}\_{\text{i}}\text{-mean}(\text{S}\_{\text{all}})}{\text{std}(\text{S}\_{\text{all}})} \tag{5}$$

**•** Robust Z-score: Since Z-score calculation is highly affected by outliers, robust version of Z-score is available for calculations insensitive to outliers. In this parameter, the mean and std are replaced with median and MAD, respectively.

$$\text{Robust Z-score} \stackrel{\text{S}\_i\text{-median}\{\text{S}\_{\text{all}}\}}{\text{MAD}\{\text{S}\_{\text{all}}\}} \tag{6}$$

$$\text{MAD}(\text{S}\_{\text{all}}) = 1.4826 \times \text{median}(\text{\{S}\_{\text{i}}\text{-median}(\text{S}\_{\text{all}})\text{\textdegree})} \newline \tag{7}$$

cause of its capability to correct for row and column effects, it is less powerful than Bscore and does not fit very well with the normal distribution model (Wu et al. 2008).

std((rijp)

**•** Background correction: In this correction method, the background signal corresponding to each well is calculated by averaging the activities within each well (S'ijp representing the nor‐ malized signal of a well in ith row and jth column in pth plate) across all plates. Then, a polyno‐ mial fitting is performed to generate an experiment-wise background surface for a single screening run. The offset of the background surface from a zero plane is considered to be the consequence of present systematic errors, and the correction is performed by subtracting the background surface from each plate data in the screen. The background correction per‐ formed on pre-normalized data was found to be more efficient, and exclusion of the control wells was recommended in the background surface calculations. The detailed description of

all)

all) (11)

Data Analysis Approaches in High Throughput Screening

http://dx.doi.org/10.5772/52508

209

(12)

BZ-score= rijp- mean((rijp)

the algorithm is found in (Kevorkov and Makarenkov 2005).

tive for successful "hit" selection (Makarenkov et al. 2007).

screening run based on the generated model.

zij= <sup>1</sup> <sup>N</sup> ∑ p= 1 N Sijp '

**•** Well-correction: This method follows an analogous strategy to the background correction method; however, a least-squares approximation or polynomial fitting is performed inde‐ pendently for each well across all plates. The fitted values are then subtracted from each data point to obtain the corrected data set. In a study comparing the systematic error cor‐ rection methods discussed so far, well-correction method was found to be the most effec‐

**•** Diffusion-state model: As mentioned previously, the majority of the spatial effects are caused by uneven temperature gradients across assay plates due to inefficient incubation conditions. To predict the amount of evaporation in each well in a time and space de‐ pendent manner, and its effect on the resulting data set, a diffusion-state model was de‐ veloped by (Carralot et al. 2012). As opposed to the above mentioned correction methods, the diffusion model can be generated based on the data from a single control column in‐ stead of sample wells. The edge effect correction is then applied to each plate in the

Before automatically applying a systematic error correction algorithm on the raw data set, it should be carefully considered whether there is a real need for such data manipulation. To de‐ tect the presence of systematic errors, several statistical methods were developed (Coma et al. 2009; Root et al. 2003). In a demonstrated study, the assessment of row and column effects was performed based on a robust linear model, so called R score, and it was shown that performing a positional correction using R score on the data that has no or very small spatial effects results in lower specificity. However, correcting a data set with large spatial effects decreases the false discovery rates considerably (Wu et al. 2008). In the same study, receiver operating characteris‐

#### **3.2. Normalization for systematic errors**

Besides the data variability between plates due to random fluctuations in assay perform‐ ance, systematic errors are one of the major concerns in HT screening. For instance platewise spatial patterns play a crucial role in cell-based assay failures. As an example, incubation conditions might be adjusted to the exact desired temperature and humidity set‐ tings, but the perturbed air circulations inside the incubator unit might cause an uneven temperature gradient, resulting in different cell-growth rates in each well due to evapora‐ tion issues. Therefore, depending on the positions of the plates inside the incubator, columnwise, row-wise or bowl-shape edge effects may be observed within plates (Zhang 2008b; Zhang 2011b). On the other hand, instrumental failures such as inaccurate dispensing of re‐ agents from individual dispenser channels might cause evident temporal patterns in the fi‐ nal readout. Therefore, experiment-wise patterns should be carefully examined via proper visual tools. Although some of these issues might be fixed at the validation stage such as performing routine checks to test the instrument performances, there are numerous algo‐ rithms developed to diminish these patterns during data analysis, and the most common ones are listed as follows and summarized in Table 1.

**•** Median polish: Tukey's two-way median polish (Tukey 1977) is utilized to calculate the row and column effects within plates using a non-controls-based approach. In this meth‐ od, the row and column medians are iteratively subtracted from all wells until the maxi‐ mum tolerance value is reached for the row and column medians as wells as for the row and column effects. The residuals in pth plate (rijp) are then calculated by subtracting the estimated plate average (µ ^ <sup>p</sup>), ith row effect (row^ <sup>i</sup> ) and jth column effect (col ^ <sup>j</sup> ) from the true sample value (Sijp). Since median parameter is used in the calculations, this method is rela‐ tively insensitive to outliers.

$$\mathbf{r}\_{\rm ijp} = \mathbf{S}\_{\rm ijp} \stackrel{\wedge}{\cdot} \mu\_{\rm p} \stackrel{\wedge}{\cdot} \text{row}\_{\rm i} \cdot \stackrel{\wedge}{\cdot} \text{col}\_{\rm j} \tag{8}$$

**•** B-score: This is a normalization parameter which involves the residual values calculated from median polish and the sample MAD to account for data variability. The details of median polish technique and an advanced B-score method, which accounts for plate-toplate variances by smoothing are provided in (Brideau et al. 2003).

$$\mathbf{B}\text{-score} = \frac{\mathbf{r}\_{\text{ip}}}{\mathbf{M}\text{AD}\_{\text{p}}} \tag{9}$$

$$\text{MAD}\_{\text{p}} = 1.4826 \times \text{median}(\ulcorner(\text{r}\_{\text{i|p}})\_{\text{all}} \text{ - median}(\text{(r}\_{\text{i|p}})\_{\text{all}} \text{|}\urcorner)) \tag{10}$$

**•** BZ-score: This is a modified version of the B-score method, where the median polish is followed by Z-score calculations. While BZ-score is more advantageous to Z-score be‐ cause of its capability to correct for row and column effects, it is less powerful than Bscore and does not fit very well with the normal distribution model (Wu et al. 2008).

MAD(Sall)=1.4826 x median(|Si

Besides the data variability between plates due to random fluctuations in assay perform‐ ance, systematic errors are one of the major concerns in HT screening. For instance platewise spatial patterns play a crucial role in cell-based assay failures. As an example, incubation conditions might be adjusted to the exact desired temperature and humidity set‐ tings, but the perturbed air circulations inside the incubator unit might cause an uneven temperature gradient, resulting in different cell-growth rates in each well due to evapora‐ tion issues. Therefore, depending on the positions of the plates inside the incubator, columnwise, row-wise or bowl-shape edge effects may be observed within plates (Zhang 2008b; Zhang 2011b). On the other hand, instrumental failures such as inaccurate dispensing of re‐ agents from individual dispenser channels might cause evident temporal patterns in the fi‐ nal readout. Therefore, experiment-wise patterns should be carefully examined via proper visual tools. Although some of these issues might be fixed at the validation stage such as performing routine checks to test the instrument performances, there are numerous algo‐ rithms developed to diminish these patterns during data analysis, and the most common

**•** Median polish: Tukey's two-way median polish (Tukey 1977) is utilized to calculate the row and column effects within plates using a non-controls-based approach. In this meth‐ od, the row and column medians are iteratively subtracted from all wells until the maxi‐ mum tolerance value is reached for the row and column medians as wells as for the row and column effects. The residuals in pth plate (rijp) are then calculated by subtracting the

sample value (Sijp). Since median parameter is used in the calculations, this method is rela‐

**•** B-score: This is a normalization parameter which involves the residual values calculated from median polish and the sample MAD to account for data variability. The details of median polish technique and an advanced B-score method, which accounts for plate-to-

MADp

**•** BZ-score: This is a modified version of the B-score method, where the median polish is followed by Z-score calculations. While BZ-score is more advantageous to Z-score be‐

all- median((rijp)


) and jth column effect (col

^ j

all)|) (10)

^ <sup>j</sup> (8)

) from the true

(9)

<sup>p</sup>), ith row effect (row^ <sup>i</sup>

rijp=Sijp-µ ^ <sup>p</sup>-row^ <sup>i</sup>

plate variances by smoothing are provided in (Brideau et al. 2003).

MADp= 1.4826 x median(|(rijp)

B-score= rijp

**3.2. Normalization for systematic errors**

208 Drug Discovery

ones are listed as follows and summarized in Table 1.

^

estimated plate average (µ

tively insensitive to outliers.


$$\text{BZ-score} \coloneqq \frac{\text{r}\_{\text{ip}} \cdot \text{mean}(\text{(r}\_{\text{ip}})\_{\text{all}})}{\text{std}((\text{r}\_{\text{ip}})\_{\text{all}})} \tag{11}$$

**•** Background correction: In this correction method, the background signal corresponding to each well is calculated by averaging the activities within each well (S'ijp representing the nor‐ malized signal of a well in ith row and jth column in pth plate) across all plates. Then, a polyno‐ mial fitting is performed to generate an experiment-wise background surface for a single screening run. The offset of the background surface from a zero plane is considered to be the consequence of present systematic errors, and the correction is performed by subtracting the background surface from each plate data in the screen. The background correction per‐ formed on pre-normalized data was found to be more efficient, and exclusion of the control wells was recommended in the background surface calculations. The detailed description of the algorithm is found in (Kevorkov and Makarenkov 2005).

$$\mathbf{z}\_{\mathbf{i}\mathbf{j}} = \frac{1}{N} \sum\_{\mathbf{P}^{\*}}^{N} \mathbf{S}\_{\mathbf{i}\mathbf{j}\mathbf{p}}^{\top} \tag{12}$$


Before automatically applying a systematic error correction algorithm on the raw data set, it should be carefully considered whether there is a real need for such data manipulation. To de‐ tect the presence of systematic errors, several statistical methods were developed (Coma et al. 2009; Root et al. 2003). In a demonstrated study, the assessment of row and column effects was performed based on a robust linear model, so called R score, and it was shown that performing a positional correction using R score on the data that has no or very small spatial effects results in lower specificity. However, correcting a data set with large spatial effects decreases the false discovery rates considerably (Wu et al. 2008). In the same study, receiver operating characteris‐ tics (ROC) curves were generated to compare the performance of several positional correction algorithms based on sensitivity and "1-specificity" values, and R-score was found to be the most superior. On the other hand, application of well-correction or diffusion model on data sets with no spatial effects was shown to have no adverse effect on the final "hit" selection (Carralot et al. 2012; Makarenkov et al. 2007). Additionally, reduction of thermal gradients and associat‐ ed edge effects in cell-based assays was shown to be possible by easy adjustments to the assay workflow, such as incubating the plates at room temperature for 1 hour immediately after dis‐ pensing the cells into the wells (Lundholt et al. 2003).
