Protein Structure-informed Molecular Fragment Replacement with InteractionDB
Giulio Mattedi, Andrew Potterton and Yi Mok
BenevolentAI, 4-8 Maple St, Bloomsbury, London W1T 5HD
Matched molecular pair analysis (MMPA) is a well-established approach to generate novel compound ideas in a knowledge-based fashion. By learning medicinal chemistry modifications and their effect on an endpoint from a set of related molecules, molecular transformations are applied to optimise input molecules [1,2].
While MMPA has been successfully used for a wide range of applications, from improving ADMET profile to enhancing on-target potency [3,4], generation of rule sets often involves averaging over multiple distinct molecular series and binding modes. This can result in transformations that are incompatible with the binding mode of the molecules of interest in the target protein structure.
Here we report the development of InteractionDB, a protein structure-informed molecular replacement tool. InteractionDB focuses on transformations that are expected to retain an interaction with a specific residue in the binding site of the target protein. Through mining protein-ligand contacts across PDB data for the target protein family using Arpeggio , ligand molecular fragments engaging the residues of interest in related proteins are identified as replacements to optimise input molecules.
Structurally equivalent residues are tracked across the target family using multiple sequence alignments from published literature, accounting for gaps and shifts in the sequences. The tool therefore can identify molecular fragments that have been experimentally demonstrated to engage residues that are aligned to those in the binding site of interest. Identified replacement fragments are then ranked according to their similarity to the fragment in the input molecule.
Structural data was mined for target protein families including human kinases and phosphodiesterases. Using published medicinal chemistry series reported in the literature, we demonstrate that InteractionDB can replicate molecular replacements that were successfully applied in series optimisation.
In this work we show that incorporating protein structural information in molecular replacements is a useful strategy for generating molecular design ideas that can account for compatibility within the binding site of the target protein.
 Dalke, A., Hert, J. & Kramer, C. Mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets. J. Chem. Inf. Model. 58, 902–910 (2018).
 Awale, M., Riniker, S. & Kramer, C. Matched Molecular Series Analysis for ADME Property Prediction. J. Chem. Inf. Model. 60, 2903–2914 (2020).
 O’Boyle, N. M., Boström, J., Sayle, R. A. & Gill, A. Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity. J. Med. Chem. 57, 2704–2713 (2014).
 Jubb, H. C. et al. Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures. J. Mol. Biol. 429, 365–371 (2017).