Abstract Details

Poster 1: “Prediction Fingerprint”: A New Fingerprint for Virtual Screening Based on Class Predictions of Random Forests

Sereina Riniker1, Gregory Landrum1
1Novartis Institutes for BioMedical Research, CH-4002 Basel, Switzerland
Similarity provides a natural organizing principle for chemical data sets and underlies the standard assumption in drug discovery that similar molecules will have similar properties. A huge variety of similarity descriptors for molecules are described in the literature [1], but no globally optimal similarity descriptor has yet been found [2]. The performance of any given similarity descriptor, usually assessed in a retrospective manner, depends strongly on the data set composition and evaluation method used. We have recently developed a benchmarking platform that employs three publicly available collections of literature data sets [3-5] to provide a broadly based comparison of the performance of similarity methods. The details of the benchmarking platform, together with its source code and a set of results generated for a series of commonly used 2D similarity descriptors has recently been submitted for publication [6].
Similarity descriptors are usually stored as molecular fingerprints, where structural features are represented by either bits in a bit string or counts in a count vector [7]. This allows the fast and computationally efficient comparison of chemical structures. Four large classes of 2D molecular fingerprints are normally distinguished: (i) dictionary-based fingerprints, (ii) topological or path-based fingerprints, (iii) circular fingerprints, and (iv) pharmacophores. Fingerprints that combine information about multiple active (and potentially inactive) molecules in a single fingerprint represent a fifth class. The scope of these “knowledge-based fingerprints” can range from simply considering only or scaling bits common to all actives, e.g. [8,9], to employing machine-learning algorithms to extract the common features, e.g. [10-12].
Here, we present a new knowledge-based fingerprint called a “prediction fingerprint” (PFP) that is generated using on random forests. A random forest is trained using a small set of known actives and known or assumed inactives. Using this random forest, PFPs are generated for new molecules by combining the class predictions of each tree in the classifier into a bit string. The poster will provide an overview of the benchmarking platform and datasets as well as a detailed comparison of the performance of PFPs to standard 2D fingerprints and some other knowledge-based fingerprints.

[1] A. Bender, R. C. Glen, Org. Biomol. Chem., 2, 3204 (2004).
[2] R. P. Sheridan, S. K. Kearley, Drug Discov. Today, 7, 903 (2002).
[3] J. J. Irwin, J. Comput. Aided Mol. Des., 22, 193 (2008).
[4] S. G. Rohrer, K. Baumann, J. Chem. Inf. Model., 49, 169 (2009).
[5] K. Heikamp, J. Bajorath, J. Chem. Inf. Model., 51, 1831 (2011).
[6] S. Riniker, G. Landrum, J. Cheminf., submitted (2013).
[7] R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH: Weinheim, Germany (2000).
[8] N. E. Shemetulskis, D. Weininger, C. J. Blankley, J. J. Yang, C. Humblet, J. Chem. Inf. Comput. Sci., 36, 862 (1996).
[9] L. Xue, F. L. Stahura, J. W. Godden, J. Bajorath, J. Chem. Inf. Comput. Sci., 41, 746 (2001).
[10] N. Stiefl, I. A. Watson, K. Baumann, A. Zaliani, J. Chem. Inf. Model., 46, 208 (2006).
[11] A. Jahn, G. Hinselmann, N. Fechner, A. Zell, J. Cheminf., 1, 14 (2009).
[12] E. Lounkine, F. Nigsch, J. L. Jenkins, M. Glick, J. Chem. Inf. Model., 51, 3158 (2011).

Return to Programme