Ingvar Lagerstedt Abstract

RTSA – a retrosynthethic analysis method, tools and implementation

Ingvar Lagerstedt1, Zhen (Gordon) Huang1, Jibo Wang1, Christos A. Nicolaou1

1Eli Lilly
Synthetic route design is of frequent use in discovery-oriented chemistry organizations. Traditionally, finding solutions to this problem has been the domain of human experts. Recently, there have been several advances in computational approaches, aided by the improvement in algorithms and the availability of large reaction collections. We present here both the tools used and our implementation of a retrosynthetic analysis method and demonstrate its capabilities in an attempt to identify synthetic routes for a collection of approved drugs1. Our results indicate that the method, leveraging on reaction transformation rules learned from a large patent reaction dataset, can identify multiple theoretically feasible synthetic routes and, thus, support research chemist everyday efforts. The reactions can be classified at different radii from the reaction center, a larger radii gives more detailed information about the chemical environment around the center. Counting the frequency of how often each reaction transform occurs in the collection, gives a way to rank the likelihood that the reaction will be successful, with more accurate ranking at larger radius. We have made our tools available in an open source repository, LillyMol2. LillyMol was setup with the aim of sharing resources with other institutes, both to provide and receive tools useful in drug discovery, and through collaboration improve the code quality. LillyMol contains tools for several cheminformatics tasks, such as file format conversion, compound standardization, feature perception, reaction transforms to manipulate structures, and assessing viability of a structure as a drug candidate, in addition to the tools for retrosynthetic analysis presented here.

In this poster we describe: (i) our efforts to mine corporate reaction data, stored in electronic laboratory notebooks (eLN) and automated synthetic systems databases, and compile a corporate synthetic knowledge repository; (ii) steps to develop a data-driven RA engine aiming to provide feasible synthetic routes for input chemical structures; (iii) LillyMol, our open source software repository, containing the tools used here.

1. Watson, I. A.; Wang, J; Nicolaou, C. N. A retrosynthetic analysis algorithm implementation. J. Cheminform. 2019 11:1, https://doi.org/10.1186/s13321-018-0323-6
2. LillyMol: https://github.com/EliLillyCo/LillyMol