James Webster Abstract

An in-silico Benchmarking Platform for Generative de novo Drug Design

James Webster1, Lionel Colliandre2, Christophe Muller2, Ferruccio Palazzesi3, James E. A. Wallace1 and Dimitar Hristozov1

1Evotec (U.K.) Ltd, 95 Park Drive, Milton Park, Abingdon, OX14 4RY, United Kingdom

2Evotec (France) SAS Campus Curie 195, route d‘Espagne, 31036 Toulouse CEDEX, France

3Aptuit (Verona) Srl Campus Levi-Montalcini Via Alessandro Fleming 4 37135 Verona, Italy


In-silico generative de novo design is principally concerned with the generation of novel molecules which satisfy a user defined properties profile [1]. As the field has developed a rich ecosystem of different tools and in-silico approaches to molecule generation have been created [2]. A core aspect of research in de novo design is the ability to assess the performance of a de novo design tool. The gold standard of assessment is to experimentally validate proposed molecules as part of a design-make-test-analyze (DMTA) cycle. Nevertheless, experimental validation is often expensive, hard to replicate, and not coupled with reliable baselines. In contrast, in-silico validation seeks to assess the performance of a de novo design method based on the computed properties of the generated molecules coupled with assessment of the underlying algorithm performance.

Recently, molecular generation benchmarks have become popular to assess in-silico the performance of de novo design tools. These benchmarks are collections of tasks with standardized datasets and metrics of performance evaluation [3,4,5,6,7]. These problems span a range of different tasks from rediscovery of known drugs based on 2D similarity to scaffold hopping and more recently molecular docking.

A common critique is that the tasks themselves are not representative of the problems that are faced regularly by industrial practitioners such as ADMET issues, synthetic tractability limitations, 3D structural constraints and off target effects. Moreover, problems faced by practitioners are inherently multiobjective and temporal in nature spanning multiple DMTA cycles. Finally, molecular design benchmarks often lack rigorous statistical analysis of generated molecules to allow reliable method comparison.

Herein, we describe the development of an in-silico benchmarking platform prioritizing the requirements of industrial practitioners coupled with rigorous statistical evaluation. From an initial cross-sectional survey of end users of generative de novo design methods and analysis of our internal molecular design platform [8], we construct several new classes of benchmarking problems based on hit-finding, hit-to-lead, and lead optimization. Finally, we describe a new approach to facilitate the numerical rating of the performance of a de novo design method based on methods commonly used to assess player skill in games [9,10].
We demonstrate that by using this new platform it allows for the identification of the best method for specific tasks and provides clear guidelines on models strengths and weaknesses.

[1] Meyers, J. et al. Drug Discovery Today 26, 2707–2715 (2021)
[2] Palazzesi, F. et al. in Artificial Intelligence in Drug Design (ed. Heifetz, A.) 273–299 (Springer US, 2022)
[3] Brown, N. et al. J. Chem. Inf. Model. 59, 1096–1108 (2019)
[4] Polykovskiy, D. et al. Front. Pharmacol. 11, 565644 (2020)
[5] García-Ortegón, M. et al. J. Chem. Inf. Model. 62, 3486–3502 (2022)
[6] Huang, K. et al. arXiv:2102.09548 (2021)
[7] Gao, W. et al. arXiv:2206.12411 (2022)
[8] Wallace, J. E. A. presented in part at 3rd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Symposium, Virtual, September, 2020.
[9] Elo, A. E. (Arco Pub., 1978)
[10] Glickman, M. E. J. R. Stat. Soc., C: Appl. 48, 377–394 (1999)