Abstract Details


Extraction, Analysis, Atom Mapping, Classification and Naming of Reactions from Pharmaceutical ELNs.

Roger Sayle1, Daniel Lowe1, Noel O'Boyle1, Michael Kappler2, Anna Paola Pelliccioli3, Nick Tomkinson4, Daniel Stoffler5
1NextMove Software, Cambridge, UK
2Hoffman-La Roche, Nutley, USA
3Novartis, Basel, Switzerland
4AstraZeneca, Alderley Park, UK
5Hoffman-La Roche, Basel, Switzerland
Electronic Laboratory Notebooks (ELNs) are widely used in the pharmaceutical industry for recording the details of chemical synthesis experiments. The primary use of this information is often for the capture of intellectual property for future patent filings, however this data can also be used in a number of additional applications, including synthetic accessibility calculations, reaction planning, and reaction yield prediction/optimization. Not only does a pharmaceutical ELN capture those classes of reactions suitable for small scale medicinal chemistry, but it is also uniquely a source of information on failed and poor yield reactions; an important class of data rarely found in the scientific literature or commercial reaction databases.

This talk describes several of the technical chemoinformatics challenges in exploiting the wealth of synthetic chemistry information in ELNs. Starting with the hand-drawn sketches stored in relational databases, we describe the steps required to transform and normalize this data into a clean and annotated reaction database in an "open" file format such as MDL's RD and RXN formats, or reaction SMILES. This process includes the tricky steps of reaction atom mapping, role assignment of reactants, reagents, catalysts and solvents, and the recognition of a reaction as an example of a known named reaction (Suzuki coupling, Diels-Alder cyclization, nitro reduction, chiral separation etc.) Novel (and improved) algorithms for each of these tasks will be described, and where appropriate compared to and benchmarked against previous methods and implementations.

For reaction naming and classification we describe the efficient pattern matching of a large database of SMIRKS-like transformations against a candidate reaction to recognize and assign the mechanism of action. When successful, this associates the reaction with its RXNO identifier in the RSC's reaction ontology, and as an added benefit also provides the corresponding atom mapping. Statistics will be presented on the high-usage of a limited number of robust reactions typical in the pharmaceutical industry.

For the general case of reaction atom mapping, we describe a novel “consensus” method that combines the results from two or more third-party atom mapping algorithms, producing results which are shown to be superior to any single implementation.

For reaction role assignment, we use the above high-quality atom mapping to determine which of components of the reacting mixture contribute atoms to product or products, marking these as reactants, whilst those components that don't contribute atoms are “agents”, that are subsequently further classified as solvents and catalysts by dictionary methods. A consistent automatic assignment of reaction roles, avoids the problem where in-house business rules fail to instruct bench chemists on which molecules should be drawn above (rather than to the left of) a reaction arrow.

The annotated roles are then used in generating canonical reaction InChI identifiers or reaction SMILES, for duplicate identification and variation tracking. Additionally, canonical reactants and products can be used to construct and visualize multi-step syntheses [associating sequences, trees and directed graphs of intermediate reaction steps].


[1] John S. Carey, David Laffan, Colin Thomson and Mike T. Williams, “Analysis of the Reactions used for the Preparation of Drug Candidate Molecules”, Organic & Biomolecular Chemistry, 2006.
[2] Stephen D. Roughley and Allan M. Jordan, “The Medicinal Chemist's Toolbox: An Analysis of Reactions Used in the Pursuit of Drug Candidates”, J. Med. Chem., Vol. 54, 3451-3479, 2011.
[3] Mikko J. Vainio, Thierry Kogej and Florian Raubacher, “Automated Recycling of Chemistry for Virtual Screening and Library Design”, Journal of Chemical Information and Modeling (JCIM), Vol. 52, No. 7, pp. 1777-1786, June 2012.

Return to Programme