Christian Meyenburg Abstract

Exploring Fragment Spaces with Genetic Algorithms

Christian Meyenburg^1,2, Jens Sadowski³, Erik Malmerberg³, Christoph Grebner³, Matthias Rarey^1,2

¹Universität Hamburg
²ZBH – Center for Bioinformatics
³AstraZeneca plc
To this day, the search for lead structures during the drug design process heavily relies on the screening of in-house compound libraries and large databases of commercially available molecules. However, these libraries are too small to properly cover the space of chemically reasonable structures in a sufficient way. Fragment-based drug discovery was developed as a complementary approach. Fragment-based methods attempt to combine several small molecular fragments into a larger drug with a high affinity. They are strongly related to combinatorial chemistry [1] and can result in large compound libraries.

Fragment spaces [2] were developed as a way to easily encode a large space of compounds in silico. They consist of a set of molecular fragments that contain special linker atoms as building blocks. Two fragments can be combined by merging two linker atoms into a chemical bond according to a set of compatibility rules. Fragment spaces are combinatorial in nature and therefore require efficient search methods to retrieve structures of interest. Current methods are mostly based on topological or structural similarity models [3-5].

We introduce a novel method which uses a Genetic Algorithm to sample a given fragment space and optimizes molecules according to an arbitrary fitness function. The fitness function is handled as a black box which allows the method to use any external scoring method for the optimization process by offering a simple interface. Other similar methods enforce a heavy constraint on the layout of the molecules by predefining a blueprint for the fragment locations and their connectivity or by only allowing a pre-selected set of fragments for specific locations instead of sampling the whole space. The aim of this method is to sample big parts of the fragment space and further optimize good solutions that were found during this sampling. The optimization relies on the removal, addition, and replacement of single fragments in the molecule or the recombination of the fragments of two molecules. These are implemented as mutation and crossover operations for the genetic algorithm respectively. Applying one of these operations to a molecule can be considered equivalent to the molecule moving through the space of encoded molecules. It is possible that not all molecules can reach every part of the space of encoded molecules by applying these operations alone. This is the case if the combination of molecular fragments and connection rules results in isolated subspaces in the fragment space. In order to sample all such subspaces, the initialization of the population ensures that molecules from all subspaces are created. Molecules are created by randomly combining compatible fragments to each other until either a set maximum number of fragments is reached or until no further fragments can be connected to a molecule due to missing linker atoms.

To evaluate the algorithm we performed several proof of concept computations with fragment spaces of various sizes and a variety of fitness functions and compared the results to random sampling of the fragment spaces. We also analyzed how well the algorithm samples the fragment space compared to simple exhaustive enumeration.

1. Terrett N. K. (1998) Combinatorial Chemistry. Oxford University Press Inc.
2. Lauck F., Rarey M. (2013) Coping with Combinatorial Space in Molecular Design. De novo Molecular Design, Chapter 14: 325-347
3. Rarey M., Stahl M. (2001) Similarity searching in large combinatorial chemistry spaces. Journal of Computer-Aided Molecular Design 15: 497-520
4. Degen J., Rarey M. (2006) FlexNovo: Structure-Based Searching in Large Fragment Spaces. ChemMedChem 1: 854-868
5. Lippert T., Schulz-Gasch T., Roche O., Guba W., Rarey M. (2011) De novo design by pharmacophore-based searches in fragment spaces. Journal of Computer-Aided Molecular Design 25: 931-945