Lauren Reid Abstract

SARkush®: Automated Markush-like Structure Generation using Matched Pairs and Generic Atom Scaffolds

Lauren Reid, Jess Stacey, Phillip de Sousa, Bashy Khan, Dan James, David Cousins, Jacqueline Clarkson, Andrew Leach, Ed Griffen and Al Dossetter.

MedChemica Ltd. The Motorworks, Chestergate, Macclesfield, SK11 6DU


The COVID-Moonshot consortium was formed via social media during the heat of the COVID-19 pandemic, where medicinal and computational chemists offered their services (often for free) for the greater good. This environment pushed the limits of many 3D-modelling, artificial intelligence (AI) and cheminformatics techniques. The complete open-source data gave many the opportunity to truly trial their technologies. One tool that was developed and tested using COVID-Moonshot data is SARkush®, an SAR analysis tool that automatically clusters compounds into Markush-like (SARkush®) structures.

Markush structures and R group tables are a simple and intuitive way of summarising structure activity relationships (SAR) of chemical series. Their application to large datasets provides ideal input for QSAR modelling, a technique that harnesses data to predict compound properties. Despite widespread use, producing Markush structures from large compound datasets is a tedious and time-consuming task, requiring a human to manually curate compounds into chemical series / cores and input them into R group decomposition algorithms. MedChemica’s Markush-like structure generator, SARkush®, automatically clusters compounds based on matched molecular pair (MMP)-networks and generic-atom scaffolds, and generates Markush-like depictions and decomposition tables. Importantly, the output of SARkush® provides an effective summary of the SAR and can be inputted easily into further modelling techniques, making the software an essential tool for both medicinal and computational chemists. The talk will describe the cheminformatics processes behind SARkush® and will present the COVID-Moonshot lead optimisation story that led to the candidate drug, DNDi-6510, through the use of SARkush® structures.