Deep generative models for 3D compound design from fragment screens
Fergus Imrie1, Anthony Bradley2, Mihaela van der Schaar3,4, Charlotte Deane1
1Department of Statistics, University of Oxford
2Exscientia Ltd
3University of Cambridge
4Alan Turing Institute
Fragment-based drug discovery (FBDD) has become an increasingly important tool for finding hit compounds, in particular for challenging targets and novel protein families. FBDD utilises smaller than drug-like compounds to identify low potency, high quality leads, that are then matured into more potent, drug-like compounds. Once an initial fragment screen has taken place, a key challenge is deciding which hits to follow-up, and in what way. We seek to automate the elaboration of initial fragment hits in a data-driven and principled manner using state-of-the-art machine learning techniques.
Typically decision making in FBDD is highly subjective and the objective properties used (e.g. ligand efficiency) focus on the starting point and not the destination, the optimised small molecule. Although this strategy can be successful, it will not always be optimal and is very hard to scale. This is one of the reasons FBDD has failed to live up to its promise of 20 years ago and novel approaches to fragment elaboration are sorely needed.
Several computational methods for fragment elaboration have been proposed that rely exclusively on database look-up or pre-determined reaction schemes. These are inherently constrained to the set of known rules, limiting exploration of chemical space, and only incorporate additional structural knowledge via filtering mechanisms after an exhaustive search. Previous machine learning methods for molecule generation have typically focused on unconstrained generation or, increasingly, property optimisation. Due to the nature of these tasks, the methods utilised are not readily applicable to the challenges in FBDD. In addition, previous methods have not incorporated structural information in the generation process. Our method addresses both of these limitations in a general and readily extendable fashion.
We have developed graph-based deep generative methods for fragment elaboration combining state-of-the-art machine learning techniques with structural knowledge. One specific application of our model is fragment linking, where our method takes as input two fragment hits and designs a molecule incorporating both fragments. The generation process is context dependent, and integrates 3D structural information such as the distance between fragments and their relative orientations. This 3D information is of paramount importance to successful compound design, and we have demonstrated the limitations of omitting such information.
We trained our model using a dataset of molecules extracted from ZINC, and tested our model using the CASF-2016 dataset. The CASF-2016 dataset contains more than 250 binders for a diverse range of proteins together with active conformations derived from high-quality crystal structures. Our method was frequently able to reproduce the original molecule, as well as design de novo linkers that possessed high shape similarity to the original, despite generating a limited number of candidate molecules. In addition, we validated the generated molecules through docking and are exploring further validation.
As far as we are aware, this is both the first example of deep learning applied to FBDD and the first molecular generative model to incorporate 3D structural information directly into the generative process. We have demonstrated that our method designs sensible linkers, in both a 2D and 3D sense, and allows fragment elaboration in a principled, data-driven manner, without the limitations of database-based methods. We believe that our research will prompt a shift in how FBDD is conducted and we are currently working on extensions of our methods to more challenging scenarios within FBDD.