Marwin Segler Abstract

Planning Chemical Syntheses with Deep Neural Networks and Monte Carlo Tree Search

Marwin H.S. Segler1

1BenevolentAI
Computer-aided retrosynthesis, also known as computer aided synthesis planning (CASP), is one of the oldest research topics in chemoinformatics [1,2]. CASP would be a highly valuable tool to find better synthetic routes and to determine the synthesizability of virtual de-novo designed compounds. However, despite several waves of research, CASP was never widely accepted by chemists, because the systems were slow, and the results were considered to be of unsatisfactory quality [3,4,5].
Here, we present our recent findings on retrosynthesis using deep learning and modern search algorithms [6,7]. First, we show that deep neural networks can be trained over night on very large reaction datasets (here, the complete Reaxys database), to predict and rank the most suitable (automatically extracted) transforms to apply to a molecule [6]. This way of training also allows the machine to learn the tolerated and conflicting functional groups of a transform implicitly [6]. In earlier approaches, this information had to be entered manually by experts. Second, to perform search, we employ Monte Carlo Tree Search (MCTS). MCTS allows to efficiently treat problems with very large branching factors, and does not rely strongly on hand-designed search heuristics, which makes it very well suited for retrosynthesis [7].

In comparison to the established search technique, Best First Search with hand-coded heuristics [4], our approach solves twice as many molecules and is almost two orders of magnitudes faster [7]. To assess the quality of the predicted routes, we conducted double blind tests. Here, we found, for the first time, that organic chemists could not distinguish between real routes taken from the literature and predicted routes [7]. Our results also indicate limitations and potential futures lines of research, which will be discussed in detail.

Literature:

[1] G. Vleduts, Information Storage and Retrieval, 1963, 117
[2] E.J. Corey, W.T. Wipke. Science, 1969, 166, 178
[3] W.D. Ihlenfeldt, J. Gasteiger, Angew. Chem. Int. Ed., 1996, 34, 2613
[4] S. Szymkuc et al., Angew. Chem. Int. Ed., 2016, 55, 5904
[5] A. Cook et al., W. Interd. Rev. Comp. Mol. Sci., 2012, 79
[6] M. Segler, M. P. Waller, Chem. Eur. J. 2017, 23, 5966
[7] M. Segler, M. Preuß, M. P. Waller, 2018, Nature, 555, 604