Andras Stracz Abstract

Design hub for early phase drug discovery

Andras Stracz1, Akos Tarcsay1, Ivan Solt1, Andras Volford1

1ChemAxon
Drug discovery is an iterative process of hypothesis construction relying on observations and validation through triggering new observations mainly by synthesis of new chemical entities. During the evolution of an idea to reach selection for synthesis, evidence and prediction results are collected and the results are assessed and scrutinized by the project group. Therefore, the success of recent drug design depends on how data is turned into information and how much knowledge is extracted out of it. Accordingly, attempts toward connecting data sources or making an even broader spectrum of data available in centralized data lakes with corresponding access engines operating on top – drive contemporary development and represent a key trend. Technologies supporting preprocessing of data (like matched molecular pairs (MMP)) or provide instant access to large amounts of chemical information are of high demand. Depending on the volume and quality of the raw data, model building approaches play role in the preprocessing steps. Data analytics platforms with supervised or unsupervised methods are applied like linear fitting, clustering, pattern recognition or neural networks. These models are moving beyond the raw information and the extracted correlations can be exploited on novel, hypothetical structures to judge them in a triaging phase, before deciding on synthesis.

Effective coordination of the hypotheses and compound series in projects where multiple groups are collaborating requires access to optimized and dynamically changing information. Accordingly, the major problem is the collection, grouping, management, and overview of the relevant information (ideas, calculated properties, related data from databases, graphics, comments, attachments, etc.) within a single application.

The goal of this presentation is to introduce the Marvin Live platform for integration of a wide variety of data sources and services to augment real-time design. Marvin Live offers a vendor agnostic, real-time plugin system that can be configured to the current information needs. This allows the seamless integration of in-house databases, local models and workflow tools (KNIME, Pipeline Pilot). We are presenting two use cases: first, we will show how an MMP analysis based on ChEMBL data can support designing out hERG liability. Second, we will exemplify a simultaneous and instant search of various databases like SureChEMBL, PubChem and vendor catalogs (e-Molecules, Mcule, Molport, Enamine). Based on the novel search engine this service provides results within seconds to a compound collection with a size of >500M molecules. It supports estimation of freedom to operate, novelty and provides a quick view on reagent and purchasable compound availabilities.