Compound Set Enrichment: A Novel Approach to Analysis of Primary HTS Data.Thibault Varin, Gubler Hanspeter, Christian Parker, Zhang Ji-Hu, Ertl Peter, Schuffenhauer Ansgar
|1Novartis Institutes for BioMedical Research|
|The goal of high-throughput screening (HTS) should be the identification of active chemical series rather than just individual active compounds . Often only a simple activity cut-off is used for the selection of compounds, for their progression for potency determination. Only afterwards are compounds clustered into chemical classes to allow preliminary Structure Activity Relationship (SAR) analysis to be conducted. However, if the aim of HTS is the identification of compound clusters showing SAR then the decision of which compounds to select for progression should focus on maximizing the number of active classes rather than just the number of active compounds.|
Identification of active chemical classes from primary HTS data requires two tasks. First, the structures of the compounds in the data set need to be grouped into chemical classes. Second, it needs to be assessed, for each chemical class, whether membership in the class increases the probability of a compound being active.
For chemical classification an approach termed the Scaffold Tree , has been found to be a robust means of grouping compounds. Scaffold Tree can be applied to large groups of compounds as the methods scales linearly with the size of data sets, allowing the compounds to be grouped based on common chemical scaffolds.
For the second task, assessing the activity of a whole chemical series, two methods have been compared. The first approach requires data binarisation (definition of active or inactive compounds based on a threshold). The activity of compound classes can then be determined using the binomial test. However, reducing screening results to an active / inactive designation results in a loss of information due to the inherent error associated with any activity measurement. It would be preferable to assess the activity of a chemical class by comparing the activity distribution of the class members to all of the tested compounds. Moreover primary HTS results cannot be assumed to have a Gaussian distribution and so a non-parametric method, the Kolmogorov-Smirnov (KS) statistic, has been used to assess compound class activity. The Kolmogorov Smirnov statistic compares the distribution of the primary activity readout for the compounds of a chemical class to the activity distribution of all the other compounds. The lower the probability for acceptance of the null-hypothesis (both distributions are resulting from random sampling of the same (unknown) parent distribution), the more likely it is that this class represents a genuine active scaffold.
This study will show: first, that the identification of active chemical classes using primary screening results has a strong correlation with the activity of compounds in confirmation testing. Second, that by using the KS test it is possible to identify active classes not found using a threshold to select active compounds.
 Schnecke V et al. Drug. Discov. Today 2006, 11, 43-50.
 Schuffenhauer A et al. J. Chem. Inf. Model. 2007, 47, 47-58.