Benoît Baillif Abstract

Applying Atomistic Neural Networks to Bias Conformer Ensembles towards Bioactive-like Conformations

Benoît Baillif1, Jason Cole2, Patrick McCabe2 and Andreas Bender11Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, CB2 1EW, Cambridge, United Kingdom
2Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, United Kingdom


Generating chemically plausible conformations of small molecules is commonly used in drug design to seed docking or pharmacophore searching experiments. While recent conformer generators create bioactive-like conformations for most known ligands, there is currently no general heuristic to identify them and developing methods to prioritise conformers that could represent likely target-bound poses is therefore desirable. We extracted 13,500 bioactive conformations of 10,500 curated ligands in the PDBbind dataset, generated up to 250 conformers for each ligand, and computed the atomic root-mean-square deviation (ARMSD) between each conformer and its closest bioactive conformation. We then trained atomistic neural networks (AtNN) with various levels of expressiveness to process 3D information of generated conformers and predict the pre-computed ARMSD. On a random ligand splitting of PDBbind, ranking conformers using AtNNs predicted ARMSD leads to early enrichment of bioactive-like (ARMSD < 1 Å) conformations measured with the median BEDROC ranging from 0.22 ± 0.03 for the least expressive SchNet to 0.30 ± 0.03 for the most expressive ComENet, the latter outperforming the MMFF94s energy ranking baseline with 0.18 ± 0.02. AtNN ranking was most effective on ligands of over-represented protein target classes such as proteases where we observe a median BEDROC of 0.35 ± 0.03 for ComENet compared to 0.10 ± 0.02 with the energy baseline. AtNN ranking also shows a much lower early enrichment (i.e., higher impoverishment) of non-bioactive (ARMSD > 2.5 Å) conformations with a median BEDROC of 0.03 ± 0.02 using ComENet, compared to the energy baseline with 0.28 ± 0.03. Moreover, we conducted DUD-E virtual screening tasks using GOLD rigid-ligand docking, evaluating the enrichment of actives over all tested molecules using the BEDROC metric, selecting different fractions of ranked conformations for each molecule. For the most represented kinase and protease in PDBbind, CDK2 and BACE1, docking only the top 5% conformers ranked with an AtNN leads to a similar or higher BEDROC of actives than docking all generated conformers. Other ranking baselines required docking more than 15% of the top ranked conformers to reach a similar BEDROC of actives, representing a 3-fold screening speedup for these targets. Hence, the approach presented here applied atomistic neural network to focus conformer ensemble towards bioactive-like conformations, representing an opportunity to reduce computational expenses in virtual screening applications on known targets that requires input conformations. Further directions include conditioning AtNN with target representations for target-dependant bioactive conformation biasing.