MolScore: A Semi-automated Platform for Generative Model Molecule Scoring and Evaluation in Drug Design
Morgan Thomas1, Noel M. O’Boyle2, Andreas Bender1 and Chris de Graaf2
1Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
2Sosei Heptares, Cambridge, UK
Many generative model architectures and approaches designed for application to de novo molecule generation for drug design have emerged in recent years. However, the number of approaches, speed of advancement, inconsistency in evaluation, and use of practically irrelevant objectives – such as Penalized logP – leads to difficulty in understanding practical utility, comparing approaches, and identifying state-of-the-art. Moreover, community benchmarks are either not applicable to goal-directed generative models  or have a fixed suite of tasks that are too easy and cannot differentiate between top-performing models .
To facilitate generative model applicability to drug design and comparison, we introduce MolScore: an easy-to-implement, open-source python package for the design of flexible, relevant, and difficult objective tasks. This software includes a collection of open-source and licensed scoring functions for example, molecular descriptors, similarity measures, predictive models (~2,300 pre-trained bioactivity models), docking (Smina , Glide , GOLD , PLANTS , Fred ), ligand preparation protocols (LigPrep , RDkit , Gypsum-DL ) and shape-based matching (ROCS , shape-it ). This allows a host of difficult, drug-discovery relevant objectives to be easily configured. Users can design their own benchmark which can be reproducibly used in the community or simply use it for practical application. In addition, the package contains a collection of performance metrics for model evaluation and graphical user interfaces to aid usability.
We will demonstrate the utility of this package by its use in 3 cases of de novo molecule evaluation: comparison between ligand-based and structure-based scoring functions , benchmarking of RNN-based reinforcement learning algorithms  and evaluation of molecular grammars used in language models.
 D. Polykovskiy, A. Zhebrak, B. Sanchez-Lengeling, S. Golovanov, O. Tatanov, S. Belyaev, R. Kurbanov, A. Artamonov, V. Aladinskiy, M. Veselov, A. Kadurin, S. Johansson, H. Chen, S. Nikolenko, A. Aspuru-Guzik and A. Zhavoronkov, Front. Pharmacol., 2020, 11, 1931.
 N. Brown, M. Fiscato, M. H. S. Segler and A. C. Vaucher, J. Chem. Inf. Model., 2019, 59, 1096–1108.
 D. R. Koes, M. P. Baumgartner and C. J. Camacho, J. Chem. Inf. Model., 2013, 53, 1893–1904.
 R. A. Friesner, J. L. Banks, R. B. Murphy, T. A. Halgren, J. J. Klicic, D. T. Mainz, M. P. Repasky, E. H. Knoll, M. Shelley, J. K. Perry, D. E. Shaw, P. Francis and P. S. Shenkin, J. Med. Chem., 2004, 47, 1739–1749.
 G. Jones, P. Willett, R. C. Glen, A. R. Leach and R. Taylor, J. Mol. Biol., 1997, 267, 727–748.
 O. Korb, T. Stützle and T. E. Exner, Swarm Intell. 2007 12, 2007, 1, 115–134.
 M. McGann, J. Chem. Inf. Model., 2011, 51, 578–596.
 Schrödinger Release 2019-4, .
 RDKit, Open-source cheminformatics, http://www.rdkit.org.
 P. J. Ropp, J. O. Spiegel, J. L. Walker, H. Green, G. A. Morales, K. A. Milliken, J. J. Ringe and J. D. Durrant, J. Cheminform., 2019, 11, 1–13.
 OpenEye Scientific Software, .
 J. Taminau, G. Thijs and H. De Winter, J. Mol. Graph. Model., 2008, 27, 161–169.
 M. Thomas, R. T. Smith, N. M. O’Boyle, C. de Graaf and A. Bender, J. Cheminform., 2021, 13, 39.
 M. Thomas, N. M. O’Boyle, A. Bender and C. de Graaf, J. Cheminform., 2022, 14, 68.