A Challenging Dataset to Validate Pharmacophore Programs – Automated Protocol to Select and Overlay Structures from the RCSB Protein Data Bank.Ilenia Giangreco1, David A. Cosgrove1, Martin J. Packer1
|1AstraZeneca, Mereside, Alderley Park, Macclesfield SK10 4TG, UK|
|Pharmacophore hypotheses play a central role in both the design and optimisation of drug-like ligands;1 they are used to explain the binding affinity of ligands and to assist in the design of chemically distinct scaffolds which show affinity for a target of interest. The importance of pharmacophores in rationalising ligand affinity has led to numerous algorithms which seek to overlay ligands based on their pharmacophoric features.2 All such algorithms must be validated with respect to known ligand overlays, usually by extracting them from the Protein Data Bank (PDB). The large number of structures and protein families in the PDB makes it difficult to establish a definitive overlay set; as a result, validation studies have rarely employed the same data sets.3-6 |
We have therefore undertaken an exhaustive analysis of the RCSB PDB to identify 121 distinct ligand overlay sets spanning a broad range and including overlays which any algorithm should be able to reproduce, as well as some for which there is very weak evidence for a conserved pharmacophore of any sort.
We have also defined a protein overlay protocol which is free from subjective decisions about which residues to include. The normal approach to such overlays is to define the protein binding site by including those residues within a given distance of ligand atoms. However, if such defined binding site contains flexible regions, the mobility can impact the quality of the final overlay, particularly of the ligands, since it will be an average of the different protein conformations. Where possible, we have instead taken a PROSITE motif located within or near the ligand binding site, and used the corresponding residues for the structure superimposition. PROSITE motifs are known to be biologically significant for the protein’s function, and are conserved during evolution. This approach produces better ligand alignments in cases of active site flexibility.
1. Yang, S. Y. Pharmacophore modeling and applications in drug discovery: challenges and
recent advances. Drug Discov. Today 2010, 15, 444-450.
2. Leach, A. R.; Gillet, V. J.; Lewis, R. A.; Taylor, R. Three-dimensional pharmacophore methods in drug discovery. J. Med. Chem. 2010, 53, 539-558.
3. Patel, Y.; Gillet, V. J.; Bravi, G.; Leach, A. R. A comparison of the pharmacophore
identification programs: Catalyst, DISCO and GASP. J. Comput. Aided Mol. Des. 2002, 16,
4. Jones, G. GAPE: an improved genetic algorithm for pharmacophore elucidation. J. Chem.
Inf. Model. 2010, 50, 2001-2018.
5. Taylor, R.; Cole, J. C.; Cosgrove, D. A.; Gardiner, E. J.; Gillet, V. J.; Korb, O.
Development and validation of an improved algorithm for overlaying flexible molecules. J.
Comput. Aided Mol. Des. 2012, 26, 451-472.
6. Cross, S.; Ortuso, F.; Baroni, M.; Costa, G.; Distinto, S.; Moraca, F.; Alcaro, S.; Cruciani,
G. GRID-Based Three-Dimensional Pharmacophores II: PharmBench, a Benchmark Data Set
for Evaluating Pharmacophore Elucidation Methods. J. Chem. Inf. Model. 2012, 52, 2599-