Illuminating the Chemical Space of Untargeted Proteins
Maria J. Falaguera and Jordi Mestres
Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute, Parc de Recerca Biomèdica (PRBB), Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
According to the Illuminating the Druggable Genome (IDG) initiative, 90% of the proteins encoded by the human genome still lack an identified biologically active ligand (small molecule with high binding potency). Under this scenario, there is an urgent need for new approaches to chemically address these yet untargeted proteins. It is widely recognized that the best starting point for generating novel small molecules for proteins is to exploit the expected polypharmacology of known active ligands across phylogenetically related proteins following the paradigm that similar proteins are likely to interact with similar ligands. Here, we introduce a computational strategy to identify privileged structures that, when chemically expanded, are highly probable of containing active small molecules for untargeted proteins. The protocol was first tested on a set of 576 currently targeted proteins which had at least one protein family sibling the year before their first active ligand was reported. A privileged structure contained in active ligands that were identified in the following years was correctly anticipated for 214 (37%) of those targeted proteins, a lower-bound recall estimate when considering data completeness issues. When applied to a set of 1,184 untargeted potential druggable genes in cancer, the identification of privileged structures from known bioactive ligands of protein family siblings allowed for extracting a priority list of diverse commercially available small molecules for 960 of them. Assuming a minimum success rate of 37%, the chemical library selections should be able to deliver active ligands for at least 355 currently untargeted proteins associated with cancer.