Susan Leung Abstract

SuCOS: A Better Alternative to RMSD for Evaluating Fragment Elaboration and Docking Poses

Susan Leung1, Mike Bodkin2, Frank von Delft3, Paul Brennan1, Garrett Morris1

1University of Oxford
2Evotec
3Diamond Light Source
One of the fundamental assumptions of hit-to-lead fragment-based drug discovery is that the binding mode of the fragment will be structurally conserved upon synthetic elaboration. Indeed, this was borne out by a recent survey of the X-ray crystal structures of fragments and elaborated-fragments by Malhotra and Karanicolas. Hence, during virtual screening of elaborated molecules, it is reasonable to keep only those screened molecules that retain the crystallographically observed binding mode. One of the most common ways of quantifying binding mode similarity is Root Mean Square Deviation (RMSD) of the positions of corresponding atoms. Protein Ligand Interaction Fingerprints (PLIFs) are an increasingly used alternative way to compare binding modes, and in particular, explicit interactions made between the ligand and the protein. We present SuCOS, an open-source RDKit-based implementation of Malhotra and Karanicolas’ combined overlap score (COS). SuCOS has a Pearson correlation coefficient with COS of 0.92. We compared the performance of RMSD, PLIF-Tversky/PLIF-Tanimoto, and SuCOS on (i) Malhotra and Karanicolas’ dataset of paired larger and smaller molecules bound to the same protein; (ii) redocking of the larger and smaller molecules into their respective proteins using AutoDock Vina; and (iii) cross-docking of the larger molecule into the smaller molecule’s cognate protein structure using AutoDock Vina.

From this, we explore the strengths and weaknesses for each of the metrics. When comparing elaborated molecules, RMSD requires a substructure to match and the most common method is by identifying the Maximum Common Substructure (MCS); however, this is not suitable when there are heteroatom changes to the molecule’s core. RMSD also fails at recognising pseudo-symmetric groups and bioisosteres in a molecule, even though there may be good spatial overlap. PLIFs are able to capture the conservation of interactions between ligand and protein and has advantages over the other two metrics as it can indirectly contain information about the changes in protein conformation by seeing if the protein-ligand interactions are conserved. However, PLIFs are highly dependent on distance and geometry between the atoms or atom groups of the ligand and protein and therefore a small rotation in the molecule can drastically affect the PLIF similarity score. When computing PLIF similarity, all interaction types are given equal importance; however, this may not be sensible e.g. should a hydrogen bond be as important as a weak hydrogen bond? SuCOS produces low scores for cases where there are staggered rings between reference and query molecules due to poor chemical feature overlap.

Nevertheless, we show evidence from the three studies that combined volumetric and 3D-pharmacophoric-based metrics like SuCOS are superior to RMSD and PLIF similarity when comparing an elaborated fragment (larger molecule) with its original fragment hit counterpart (smaller molecule). In the redocking study, SuCOS produces fewer False Positives and False Negatives than RMSD and PLIF similarity. In the cross-docking study, SuCOS has the best correlations when comparing docking poses of a larger ligand to its respective crystal pose and to its smaller counterpart crystal pose. Therefore, SuCOS is better at differentiating experimentally-observed binding modes of an elaborated molecule given the pose of its non-elaborated counterpart.

The range of SuCOS is from 0 to 1, regardless of molecular size, and is therefore suitable for defining a more universal threshold. This contrasts with RMSD which has no upper limit. We believe that SuCOS has potential as a conservation of binding mode metric and tool to help with structure-based virtual screening.