Stephen Pickett Abstract

Validating Automated Design and Active Learning

Stephen Pickett1, Darren Green1, David Marcus1, Jacob Bush1, Chris Luscombe1

1GlaxoSmithKline
Small molecule drug discovery involves multi-disciplinary teams coming together to discover a molecule with the appropriate profile. This is often cast as a complex multi parameter optimisation problem with cycles of design, make and test. Recent advances in generative algorithms and active learning approaches are enabling a more automated data-driven approach to the design cycle. In this presentation we describe how data-driven cheminformatics methods may automate much of what has historically been done by a medicinal chemist. It will explore what is reasonable to expect “AI” approaches to achieve, and what is best left with a human expert.

This data-driven approach involves a number of key technologies:
1) Automated design of compounds in the relevant chemical space.
2) Automated methods to build and update models and to filter and score compounds.
3) Algorithms for the selection of compounds for the next iteration.

We will illustrate the approach being adopted at GSK to implement these technologies with examples from BRADSHAW, GSK’s experimental automated design environment. We will focus on the validation of the methodologies, particularly with regard to molecular generation where we have combined traditional cheminformatics approaches, based on Matched Molecular Pair transforms and rule based fragmentation (BRICS), with the latest generative algorithms from deep neural networks, including a novel application that solves the one to many mapping problem of converting from a higher level representation, such as Reduced Graphs, to individual molecules.
Using several examples of lead compounds, we have compared the output from the different approaches to compound suggestions of experienced medicinal chemists. We have asked the same panel of chemists to try to distinguish machine generated and chemist generated ideas. The results show that the molecule generation approaches are comprehensive in their coverage and produce compounds as valid to a particular chemist as other chemists’ ideas.
We shall also present results of the application of these methodologies in the context of active drug discovery programs where active learning has been applied in both lead generation and lead optimisation scenarios.
The results suggest that the traditional approach to drug discovery should be rethought and the design process itself be redesigned around appropriately validated technologies.