Handling Large Chemical Spaces in Structure-Based Drug Design
Noel M. O’Boyle and Chris de Graaf
Sosei Heptares, Cambridge R&D Facility (U.K.), Steinmetz Building, Granta Park, Great Abington,
Cambridge CB21 6DG United Kingdom
Chemical space is “vastly, hugely, mindbogglingly big”  with some estimates suggesting that there are 1060 drug-like molecules. This is both a cause for optimism (“the right molecule must surely exist!”) and a potential challenge (“how will I ever find the right molecule?”). Within the context of Structure-Based Drug Design the challenge is to find a match between the chemical space of small molecules and the conformational flexibility of orthosteric and allosteric protein binding sites that can be targeted by small molecules [2-4].
Here we explore the challenges and opportunities of navigating large chemical space to target diverse protein binding sites in the context of GPCR Structure-Based Drug Design (SBDD) at Sosei Heptares. We will present approaches to guide hit identification and optimisation where we are looking for synthesisable molecules with desirable physicochemical properties that modulate the activity of a particular GPCR.
Combinatorial ultra-large virtual libraries such as Enamine REAL are growing exponentially and present a particular challenge to established techniques for virtual screening such as protein-ligand docking [5,6]. Recently, approaches based around the components of the combinatorial library (‘synthons’) have been developed , as well as machine-learning approaches [for example, 8]. We will describe the application of a genetic algorithm to this problem and compare this to alternatives.
In contrast to screening known or virtual compounds, AI generative models allow a bias-free exploration of the space of small-molecule structures towards those that have high docking scores against a particular target [9,10]. This provides a complementary approach to virtual screening but care must be taken to ensure diversity, drug-likeness, and synthesisability.
Finally, we will present structural cheminformatics approaches, combining structural, pharmacological, and chemical information on GPCR-ligand interactions, to navigate chemically diverse GPCR ligand space targeting a large variety of different orthosteric and allosteric binding sites [2-4]. We will describe several applications of these approaches, including the design of chemogenomics based screening libraries to support hit identification for GPCR structure-based drug discovery.
1. Adams DN. The Hitchhiker’s Guide to the Galaxy. Pan Books, 1979.
2. Vass M. Chemical Diversity in the G Protein-Coupled Receptor Superfamily. Trends Pharmacol Sci. 2018, 39, 494.
3. Congreve M. Applying Structure-Based Drug Design Approaches to Allosteric Modulators of GPCRs. Trends Pharmacol Sci. 2017, 38, 837.
4. Congreve M, de Graaf C, Swain NA, Tate CG. Impact of GPCR Structures on Drug Discovery. Cell. 2020, 181, 81.
5. Ballante F. Structure-Based Virtual Screening for Ligands of G Protein-Coupled Receptors: What Can Molecular Docking Do for You? Pharmacol Rev. 2021, 73, 527.
6. Bender BJ. A practical guide to large-scale docking. Nat Protoc. 2021, 16, 4799.
7. Sadybekov AA. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature. 2022, 601, 452.
8. Gentile Fl. ACS Cent Sci. 2020, 6, 939.
9. Thomas M. Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J Cheminform. 2021, 13, 39.
10. Thomas M. Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation. J Cheminform. 2022, 14, 68.