Abstract Details


Poster 51: Virtual Screening on Billions of Small Molecules from the Chemical Universe Database GDB-17

Lars Ruddigkeit1, Lorenz C. Blum1, Jean-Louis Reymond1
1Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, CH-3012 Bern, Switzerland
During the last decade the high attrition rate of drug candidates due to lack of efficacy, side effects and toxicity issues has been a major challenge for the pharmaceutical industry. Therefore, identifying novel chemotypes and scaffold hopping is becoming a major problem for many drug discovery projects.[1] De novo drug design may help to address this problem by in silico methods such as scaffold analysis, breeding of molecules by genetic algorithms and exhaustive enumeration of chemical space.[2] The small molecule database GDB-13[3] was already successfully applied to ligand-based and structure-based virtual screening projects.[4] We reported a complete new and faster algorithm, resulting in more than 160 billion molecules containing up to 17 atoms. Compared to other reference databases such as DrugBank, ChEMBL or PubChem, the topology and category distribution of molecules with up to 17 atoms is different. GDB-17 has a much higher content of nonplanar molecules resulting into a rich source of sp3-configured centres. It might serve as inspiration to design new chemotypes series with hopefully better properties for drug discovery projects.[5]

Here we propose our solution to perform virtual screening on this large amount of molecules. To our knowledge this is the only example of virtual screening of more than 160 billion molecules performed on one CPU in less than a day. Since we already know that we can identify bioactive molecules by Molecular Quantum Numbers (MQN)[6], we classified the entire database by them for rapid browsing. In a first optimization step we reordered the 42 MQN values and additionally added other features in order to have an optimized, faster and flexible virtual screening method. The resulting XMQN fingerprints were then divided into hash table improving remarkably the search speed by a factor of 5'000 in comparison to the time a linear search would take and file structure allowing for rapid extraction of the corresponding molecules. Extracting the resulting structures depends almost entirely on the time of reading and writing the data on the hard drive. If the number of hits is limited to up to 100'000 molecules, they can be directly displayed in a few minutes. This extreme fast virtual screening strategy should also be in principle applicable to other fingerprints. The hole data can easily be stored and handled by one computer. More advanced virtual screening methods can be applied after this procedure in batch mode.

On 15 drug examples, we performed “scaffold hopping” defined by a ROCS score of 1.6 or higher and a Tanimoto Substructure Fingerprint of 0.5 or lower. We avoided carbon-carbon double bond unsaturation by non selecting this feature by the browser as trivial hits in the selection process. Some scaffold-hopping hits contain heterocyclic analogs or isosteric replacements of substituents but others are very different as they do not even share the graphs of the parent molecule. Over 97% of the scaffold hopping hits occur in a city block distance of less than 13. [6] This constraint will be used in the future for applied virtual screening projects.


[1] I. Kola and J. Landis, Nat. Rev. Drug Discov. 2004, 8, 711.
[2] J.-L. Reymond et. al., WIREs Comput. Mol. Sci. 2012, 2, 717-733.
[3] L. C. Blum and J.-L. Reymond, J. Am. Chem. Soc. 2009, 131, 8732.
[4] E. Luethi et. al., J.Med. Chem. 2010, 53, 7236.
[5] L. Ruddigkeit et. al., J. Chem. Inf. Model 2012, 52, 2864.
[6] L. C. Blum et. al., J. Chem. Inf. Model 2011, 51, 3105.
[7] L. Ruddigkeit et. al., J.Chem. Inf. Model 2013, ASAP.

Return to Programme