Calculating More Property Distributions of Chemical Fragment Spaces
Justin Lübbers1, Uta Lessel2, Louis Bellmann1 and Matthias Rarey1
1Universität Hamburg, ZBH – Center for Bioinformatics
2Boehringer Ingelheim Pharma GmbH & Co. KG
Recently developed make-on-demand compound catalogs increased the size of searchable compound collections with a high likelihood of synthetic availability by orders of magnitude. These catalogs are described in a combinatorial fashion, and in many cases, they are too large to be enumerated. Many cheminformatics tasks require new algorithms dealing with these descriptions without the need for enumeration. Here, we address the problem of calculating property distributions in the form of histograms for the entire catalog.
For this purpose, Bellmann et al. developed SpaceProp, an algorithm for calculating physicochemical property distributions of chemical fragment spaces efficiently [1]. SpaceProp enables the description and comparison of nonenumerable chemical fragment spaces regarding the distribution of heavy atom counts, molecular weight, hydrogen acceptors/donors, and the octanol-water partition coefficient. Bellmann et al. applied the algorithm to the three commercial make-on-demand spaces REAL Space (Enamine), GalaXi (WuXi), and CHEMriya (Otava) and the open source based KnowledgeSpace [2].
In this work, we present recent extensions of the SpaceProp algorithm. All molecular properties considered so far are atom-based. We show that the algorithm can be extended by arbitrary molecular properties that fulfill clearly defined criteria. As a first example, we show the calculation of molecular polar surface area distributions based on the widely used TPSA algorithm [3]. Furthermore, we demonstrate the calculation of bond-based property distributions at the example of the number of rotatable bonds to estimate the products’ flexibility. Finally, we introduce a topological property based on molecular substructures. For a set of query structures given by a user, SpaceProp computes a substructure distribution counting for all products how many of the query structures they contain.
The extended SpaceProp algorithm allows further insights into the composition of non-enumerable chemical fragment spaces. The substructure distributions enable the analysis of fragment spaces with regard to project-specific target scaffolds. We use the extended SpaceProp algorithm to analyze and compare the above-mentioned REAL Space (Enamine), GalaXi (WuXi), CHEMriya (Otava), and KnowledgeSpace as well as the eXplore (eMolecules) and the FreedomSpace (ChemSpace) [2]. Elucidating the contents of these otherwise opaque chemical libraries is an essential step toward a guided exploration of the chemical space and the construction of optimized combinatorial libraries.
References
[1] Bellmann, L.; Klein, R.; Rarey, M. Calculating and Optimizing Physicochemical Property Distributions of Large Combinatorial Fragment Spaces. Journal of Chemical Information and Modeling. 2022, 62(11), 2800-2810.
[2] BioSolveIT GmBH. Chemical Spaces. https://www.biosolveit.de/infiniSee/#chemical_spaces. (accessed 30.01.2023).
[3] Ertl, P.; Rohde, B.; Selzer, P. Fast Calculation of Molecular Polar Surface Area as a Sum of Fragment-Based Contributions and Its Application to the Prediction of Drug Transport Properties. Journal of Medicinal Chemistry. 2000, 43 (20), 3714-3717.