Drug Molecule De novo Design by Multi-Objective Reinforcement Learning for PolypharmacologyXuhan Liu1, Kai Ye2, Herman W. T. van Vlijmen1,3, Adriaan P. IJzerman1, Gerard J. P. van Westen1 |
|
1Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden, The Netherlands 2Omics and Omics informatics, Xi’an Jiaotong University, 28 Xianning W Rd, Xi’an, China 3Janssen Pharmaceutica NV, Beerse, Belgium |
|
Over the last five years deep learning has progressed tremendously in both image recognition and natural language processing [1]. Now it is increasingly applied to other data rich fields. In drug discovery recurrent neural networks (RNNs) have been shown to be an effective method to generate novel chemical structures in the form of SMILES [2, 3]. Our group also proposed a new method named DrugEx that works by integrating an exploration strategy into RNNs based reinforcement learning to improve the diversity of the generated molecules. [4]
Most current deep learning based methods only focus on a single target to generate drug-like active molecules. In reality, however, drug molecules often interact with more than one target and unintended drug-target interactions can cause adverse effects [5]. Here, we extend our DrugEx model for multi-objective optimization and generate synthesizable drug molecules against more than one selective targets (e.g. adenosine receptors, including types A1, A2A, A2B, and A3). In this model two deep neural networks (DNNs) interplay with each other under the reinforcement learning framework. We apply an RNN as the agent and a multi-task fully connected DNN as the environment. Ligands that were annotated in bioactivity assays on the adenosine receptors were collected from ChEMBL [6]. Subsequently the environment was created to predict the probability score whether generated molecules are active or not for each protein target. The agent was firstly pre-trained for molecular library generation, and then it was trained under the guidance of the reward, which is given by the weighted sum of these target prediction scores to generate desired molecules. Finally, more desired molecules appear during the training loop until the algorithm converges in reinforcement learning. Our proof of concept generated compounds with a diverse predicted selectivity profile toward multiple targets. Hence, our model can generate molecules with potentially high efficacy and lower toxicity caused by off-target effects. References |
|
Bursary Application |