Virtual Screening of Virtual Libraries using a Genetic Algorithm
Rajarshi Guha
Vertex Pharmaceuticals, Boston, Massachusetts, United States
Reaction based virtual libraries have expanded our access to readily synthesizable chemical spaces, with sizes ranging from millions to billions of virtual molecules. Searching such spaces for molecules of interest can be performed in multiple ways ranging from explicit fragment based approaches to sampling based approaches. We present the use of a genetic algorithm (GA) to screen virtual libraries of arbitrary size using 2D fingerprints or shape similarity. The GA employs simple mutation and cross-over operators. We apply it the Enamine REALSpace and Chemriya virtual libraries and show that it is effective in finding the optimal result for a number of queries. We perform benchmarks using queris from ChEMBL to compare its performance to exhaustive search, diversity of results and execution time, When used with 2D circular fingerprints the GA is able to the screen the entire Enamine REALspace in less than 7 min on a M1 Macbook using 10 threads, sampling just 0.01% of the entire 33B molecule collection.