Peter Ertl Abstract

The encyclopedia of functional groups

Peter Ertl1

1Novartis Institutes for BioMedical Research
The concept of functional groups – sets of connected atoms that determine properties and reactivity of the parent molecule – forms a cornerstone of organic chemistry, medicinal chemistry, toxicity assessment, spectroscopy and also chemical nomenclature. There is, however, surprisingly little attention paid to the study of functional groups from the cheminformatics point of view. The few programs that are currently available for this purpose are relying on a set of predefined substructures describing only the most common functionalities. Although this approach may work well for standard organic molecules, it fails for special data sets containing molecules with many not so common functional groups, like natural products or ligands that need to interact with complex targets that have specific pharmacophoric requirements.
This clearly indicates necessity to develop a novel method for analysis of functional groups that would identify all functional groups in a molecule without need to rely on a predefined set of substructures. In this presentation a novel algorithm to identify all functional groups in organic molecules will be presented and its various applications discussed. The method is based on a recursive marching through atoms in the molecule, collecting all clusters of heteroatoms including also the relevant connected carbons. Also some special functionalities, like multiple carbon carbon bonds or small strained rings are collected. The method provides detailed statistics about the groups identified. This new method allows analysis of functional groups in large chemical databases in a way that was not possible using the previous approaches based only on the predefined set of substructures.
The main part of the presentation will be focused on applications of this new methodology. Results of an analysis of functional groups in a large database of bioactive molecules will be presented including discussion about the most common groups. The most frequent functional groups in drug-like molecules are amide, ester, tertiary amine and fluoro and chloro substituents. The diversity of available functionalities is very high, altogether more than 3000 unique functional groups have been identified. Their frequency statistics shows typical power law (or long tail) distribution with few very common groups and large number of infrequent functional groups. Molecules of different origins show considerably different distribution of functional groups, what is illustrated by comparing results for common synthetic molecules, drug-like molecules and natural products.
Detailed statistics about distribution of functional groups allows to look at various interesting structural characteristics of molecules that have not been analysed previously. For example the concepts of functional group density is defined (as a ratio between atoms that are part of a functional group and all atoms in the molecule) and discussed. New way to calculate similarity between molecular collections based on the similarity between vectors characterising frequencies of functional groups in these datasets will be also introduced. And finally, several examples of the use of functional groups statistics in the drug discovery process will be discussed.

Details about the methodology:
P. Ertl, An algorithm to identify functional groups in organic molecules, Journal of Cheminformatics, 9:36 (2017)

Open source implementation of this algorithm in RDKit:
R. Hall, https://github.com/rdkit/rdkit/tree/master/Contrib/IFG