Generate What You Can Make: De Novo Generation of In-House Synthesizable Drug Candidates
Alan Kai Hassen1, Martin Sicho2,3, Yorick van Aalst2, Sohvi Luukkonen2, Anthe Janssen4, Djork-Arne Clevert5, Gerard van Westen2 and Mike Preuss1
1Leiden Institute of Advanced Computer Science, Leiden University, The Netherlands
2Leiden Academic Centre of Drug Research, Leiden University, The Netherlands
3CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Czech Republic
4Leiden Institute of Chemistry, Leiden University, The Netherlands
De novo drug design methods to generate novel molecular structures [1–5] are a new and promising path to finding possible drug candidates for desired protein targets [e.g., 6]. However, there is one challenge that these new methods face: as generated molecules are evaluated in vitro and not in silico, generated structures must be synthesizable. Surprisingly, so far, contemporary research has only addressed this challenge from a theoretical perspective, suggesting that molecular structures and synthesis routes should be generated simultaneously [7, 8].
A possible approach for assessing synthesizability in de novo drug design is the application of Computer Aided Synthesis Planning (CASP) [9, 10], i.e., computationally breaking down a molecule recursively into molecular precursors until a set of commercially available building block molecules are found. Given that CASP is a resource-intensive method, synthetic accessibility scores [11, 12] try to approximate the CASP result by learning the relationship between a molecule’s structure and the ability to find a synthesis route with CASP successfully. Importantly, these approximation methods are non-building block agnostic as they use a predefined set of generally commercially available building blocks to create their training data. They, thus, capture the general synthesizability of a given molecule.
Instead of concentrating on a general notion of synthesizability in de novo drug design, we employ CASP and synthetic accessibility scores to find new molecules that can be synthesized from in-house building blocks available at the Leiden Early Drug Discovery & Development (Led3) consortium. This way, we reduce costs and lead times in the design-make-test-analyze cycle by suggesting easy-to-synthesize candidate molecules. In doing so, we contribute three important findings: First, we show that it is possible to conduct CASP using only six thousand in-house building blocks and achieve only an 11% drop in performance compared to using 17 million commercially available building blocks. Second, we introduce an in-house synthesizability score, the Led3Score, that can successfully predict if a molecule is synthesizable with our in-house building blocks. We ensure that the score is easily retrainable in case of in-house building block changes. Third, we successfully use this score as an objective in de novo drug design along a target QSAR model to generate candidate molecules that are synthesizable in-house.
 Winter, R. et al. Efficient multi-objective molecular optimization in a continuous latent space. Chem. Sci. 2019, 10, 8016–8024, DOI: 10.1039/C9SC01928F.
 Mendez-Lucio, O. et al. De Novo Generation of Hit-like Molecules from Gene Expression Signatures Using Artificial Intelligence. Nature Communications 2020, 11, 10, DOI: 10.1038/ s41467-019-13807-w.
 Winter, R. et al. grünifai: interactive multiparameter optimization of molecules in a continuous vector space. Bioinformatics 2020, 36, 4093–4094, DOI: 10.1093/bioinformatics/btaa271.
 Liu, X. et al. DrugEx v2: De Novo Design of Drug Molecules by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology. Journal of Cheminformatics 2021, 13, 85, DOI: 10.1186/s13321-021-00561-9.
 Liu, X. et al. DrugEx v3: Scaffold-Constrained Drug Design with Graph Transformer-based Reinforcement Learning. 2021, DOI: 10.26434/chemrxiv-2021-px6kz.
 Moret, M. et al. Leveraging Molecular Structure and Bioactivity with Chemical Language Models for de Novo Drug Design. Nature Communications 2023, 14, 114, DOI: 10.1038/ s41467-022-35692-6.
 Bradshaw, J. et al. In Advances in Neural Information Processing Systems, ed. by Larochelle, H. et al., Curran Associates, Inc.: 2020; Vol. 33, pp 6852–6866, DOI: 10.48550/ARXIV.2012.11522.
 Gao, W. et al. In International Conference on Learning Representations, 2022, DOI: 10.48550/ARXIV.2110.06389.
 Segler, M. H. et al. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018, 555, 604–610, DOI: 10.1038/nature25978.
 Genheden, S. et al. AiZynthFinder: A Fast, Robust and Flexible Open-Source Software for Retrosynthetic Planning. Journal of Cheminformatics 2020, 12, 70, DOI: 10.1186/s13321020-00472-1.
 Thakkar, A. et al. Retrosynthetic Accessibility Score (RAscore)-Rapid Machine Learned Synthesizability Classification from AI Driven Retrosynthetic Planning. Chem. Sci. 2021, 12, 3339–3349, DOI: 10.1039/d0sc05401a.
 Liu, C.-H. et al. RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software. Journal of Chemical Information and Modeling 2022, 62, 2293–2300, DOI: 10.1021/acs.jcim.1c01476.