Artificial Intelligence in De Novo Drug Design
Christoph Grebner1, Hans Matter1, Gerhard Hessler1
1Sanofi, R&D, Integrated Drug Discovery
Artificial intelligence plays an important role in daily live already. Significant achievements in very different areas such as image and speech recognition, natural language processing, self-driving cars, or playing complex games like Go or StarCraft, have been developed. Many of these technologies have been entering drug discovery for years now and offer promising new opportunities. They are expected to make the search for new drugs quicker and more effective. In early drug discovery, applications like target identification, screening analysis as well as lead generation and optimization are benefiting from new developments in artificial intelligence.
In this presentation, we want to focus on two main fields where artificial intelligence plays a crucial role: a) generating novel molecules, and b) scoring design suggestions. To get deeper insights into practical aspects of applications, we compare different settings like chemical spaces, different scoring functions and different network architectures at hand of project examples.
For generating novel molecules, we investigate generative methods like reinforcement learning techniques combined with recurrent neural networks which are evolving in the field of de novo design. These techniques allow generating molecules tailored towards specific project needs, for example physical chemistry property profiles, activities, or substructures.
In this context, the chemical space used to train the initial networks also influences the distribution of sampled and accessible molecules as well as the chemical complexity. Therefore, we compare results of reinforcement learnings using three different chemical spaces: public accessible compounds (Chembl), the Sanofi compound collection, and virtual chemical spaces.
In addition, there are several possibilities of choosing suitable scoring function. To get deeper insights into the influence of scoring functions, we explore results from simple scoring functions (physical chemistry properties and 2D fingerprints), classical machine learning QSAR models, deep neural network QSAR models and combinations of these.
Depending on the current stage of the drug design project, either more explorative settings are required (hit/lead identification) or more strict and narrow settings have to be used (lead optimization). In this context, we explore the performance of reinforcement learning combined with prior transfer learning, i.e. pre-training of the initial networks towards a set of known active molecules.
The goal of the project is the exploration and analysis of generative and reinforcement learning approaches and scoring functions for de novo drug design and definition of best settings for a typical drug design project.