Abstract Details


Poster 46: Comparison of Similarity Coefficients for Ligand-based Virtual Screening using Binary and Weighted Fingerprints

Hua Xiang1, John Holliday1, Peter Willett1
1Information School, University of Sheffield, United Kingdom
Similarity searching is one of the most intensively used methods for ligand-based virtual screening. It has three principal components: the structure representation, the weighting scheme, and the similarity coefficient [1]. Conventionally, binary fingerprints and the Tanimoto coefficients are the most popular molecular representation and similarity coefficient in similarity search. Previous studies have shown that some other coefficients may be less affected by the compound bit-density than the Tanimoto coefficient [2-5]. More recently, Arif et al. [6, 7] suggested increases in performance can be achieved by weighting the bits in a fingerprint so as to describe the frequency of occurrence of 2D substructural fragments, rather than using just the fragment presences or absences that are encoded in conventional binary fingerprints.

The work reported here seeks to ascertain whether it is possible to identify a similarity coefficient that might (either in certain circumstances or more generally) be superior to the Tanimoto coefficient when either binary or frequency-based approaches to fragment weighting are used. The work has involved extensive similarity searches of the MDDR, WOMBAT and ChEMBL databases, using Pipeline Pilot extended connectivity fingerprints. Our initial experiments have investigated 44 binary coefficients and demonstrated clearly that several coefficients retrieved greater numbers of active molecules than did the Tanimoto coefficient, when averaged over multiple searches and multiple types of bioactivity, and some other coefficients were shown to be monotonic to the Tanimoto coefficient (i.e., they produced identical similarity rankings despite yielding different similarity values). Several of the best-performing coefficients were then selected for use with frequency-weighted fingerprints. Two weighting schemes were adopted here according to Arif et al.[6] and the results again showed that the Tanimoto coefficient was by no means the most effective in simulated virtual screening experiments.

We hence conclude that whilst the Tanimoto coefficient continues to provide an effective tool for ligand-based virtual screening , there are other coefficients that appear to be worthy of serious consideration for this purpose.

1. Willett, P., Similarity methods in chemoinformatics. Annual Review of Information Science and Technology, 2009. 43: p. 3-71.
2. Holliday, J.D., N. Salim, and P. Willett, Combination of fingerprint-based similarity coefficients using data fusion. Journal of Chemical Information and Computer Sciences, 2003. 43(2): p. 435-442.
3. Holliday, J.D., C.Y. Hu, and P. Willett, Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combinatorial Chemistry & High Throughput Screening, 2002. 5(2): p. 155-166.
4. Holliday, J. and M. Haranczyk, Comparison of similarity coefficients for clustering and compound selection. Journal of Chemical Information and Modeling, 2008. 48(3): p. 498-508.
5. Al Khalifa, A., M. Haranczyk, and J. Holliday, Comparison of Nonbinary Similarity Coefficients for Similarity Searching, Clustering and Compound Selection. Journal of Chemical Information and Modeling, 2009. 49(5): p. 1193-1201.
6. Arif, S.M., J.D. Holliday, and P. Willett, Analysis and use of fragment-occurrence data in similarity-based virtual screening. Journal of Computer-Aided Molecular Design, 2009. 23(9): p. 655-668.
7. Arif, S.M., J.D. Holliday, and P. Willett, Inverse Frequency Weighting of Fragments for Similarity-Based Virtual Screening. Journal of Chemical Information and Modeling, 2010. 50(8): p. 1340-1349.

Return to Programme