Reaction Vector Based Monte Carlo Tree Search for De Novo Design
James Webster1, James E. A. Wallace2, Dimitar Hristozov2, Beining Chen3, Michael J. Bodkin2, Valerie J. Gillet1
1Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, United Kingdom
2Evotec (U.K.) Ltd, 114 Innovation Drive, Milton Park, Abingdon, OX14 4RZ, United Kingdom
3Department of Chemistry, University of Sheffield, Dainton Building, Brook Hill, Sheffield, S3 7HF, United Kingdom
De novo design [1] is an approach to rationally designing molecules which fit the desired property profile. De novo design is divided into three subfields; construction, scoring and search. Each component within a de novo design approach must be carefully balanced to chart an effective path through the vastness [2] of chemical space to areas of interest.
An active area of de novo design research is utilising reaction vectors for de novo design [3]. Reaction vector based de novo design utilises knowledge-based reaction transforms to build novel molecules systematically. Benefits of the reaction vector based approach include an increased likelihood the designed molecules are synthetically accessible and effective constraint on the size of chemical space to be explored. What is more reaction vectors also provide a prospective synthesis route to the designed molecules saving time on synthetic planning.
Previous work on search utilising reaction vectors has primarily explored the idea of greedy enumeration whereby all applicable transforms are applied in a breadth-first manner. This approach has a significant limitation in terms of depth due to the combinatorial explosion of products that are produced. Further refinement attempted to pick molecules to enumerate based on desirability criteria. This approach, however, led to an increased probability of being trapped in local optima: “the intermediate problem”. Hence in this poster, we explore the use of the Monte Carlo tree search (MCTS) algorithm [4,5] as an alternative method of reaction vector based search.
Herein we describe our implementation of the reaction vector based Monte Carlo tree search (RV-MCTS) to search through chemical space effectively. To test our approach, we build a small, focused rediscovery benchmark tailored to reaction based de novo design. We demonstrate that the RV-MCTS approach can rediscover the majority of target compounds in the benchmark while simultaneously proposing a valid synthetic route.
1 M. Hartenfeller and G. Schneider, Wiley Interdiscip. Rev. Comput. Mol. Sci., 2011, 1, 742–759.
2 P. G. Polishchuk, T. I. Madzhidov and A. Varnek, J. Comput. Aided. Mol. Des., 2013, 27, 675–679.
3 H. Patel, M. J. Bodkin, B. Chen and V. J. Gillet, J. Chem. Inf. Model., 2009, 49, 1163–1184.
4 R. Coulom, in Computers and Games, 2006, pp. 72–83.
5 L. Kocsis and C. Szepesvári, in Machine Learning: ECML, 2006, pp. 282–293.