2nd Joint Sheffield Conference on Chemoinformatics:
9th-11th April, 2001
lecular similarity is a measure of the degree of overlap between a pair of molecules in some property space. It can be calculated for a wide range of molecular properties, and is an important tool in the generation of quantitative structure-activity relationships (QSAR's). QSAR's aim to link systematically the chemical or biological properties of molecules to their structures, and are widely used in rational drug design. With the advent of high throughput synthesis and combinatorial chemistry techniques, extremely large sets of potentially bioactive molecules can be created, making the efficient storage of data and the rapid computation of molecular similarity values become crucial.
This poster explains a three step process developed for the calculation of molecular similarity indices. First the target molecules are reduced to two dimensional forms. Next the optimum value of the similarity index is obtained for each pair of molecules by systematic rotation in relative configurational space. This is the most commonly used technique for three dimensional similarity calculations. For highly similar sets of molecules this method is more time consuming than pre-alignment, but for diverse sets it provides much better results. Finally the resulting matrix of two dimensional similarity values is optimised using a neural net. This technique appears to provide significant speed increases with minimal loss of accuracy for molecular similarity calculations on systems with more than 30 atoms.
A new computational methodology for hybrid quantum mechanical/molecular mechanical (QM/MM) calculations is introduced. Non-covalent boundary between QM and MM parts of the system is treated according the Effective Fragment Potential Method (EFP).
The basic idea of the EFP method is to perform a regular ab initio calculation only for the «active» part of the system, while the chemically «inert» part is replaced by the model potentials, that incorporate the most important non-bonded interaction energy terms, namely, Coulomb interaction, polarization and exchange repulsion. The effective fragment potentials are generated by separate ab initio calculations. All energy terms of the EFPs are treated as one-electron terms in the quantum mechanical Hamiltonian of the «active» part of the system.
In the new approach the «inert» part of the system, such as «MM-sized» biochemical molecule, is divided into specified number of small fragments with fixed internal structures, typical for biochemical systems. The fragments obtained are replaced by the EFPs. Like in the original EFP method, the interaction between the ab initio system and fragments enter via one-electron operators in the ab initio Hamiltonian, while the fragment-fragment interactions are replaced by Molecular Mechanics force field. The most important advantages of such methodology are the flexibility of the «inert» biochemical molecule and significant timesavings compared to full ab initio calculations on the same system. The new approach has been implemented into the PC-GAMESS package.
The new methodology has been tested on calculations of the relative conformational energies and geometries of alanin dipeptide, treated as six EFPs, with four water molecules, which represent the ab intio part of the system. The Restricted Hartree-Fock calculations with different basis sets were applied for the QM-part of the system and for the construction of the effective fragment potentials. The MM part was treated with the MM3 force fields implemented in the TINKER program. The results obtained are in good agreement with pure QM calculations and experimental results.
The De Novo molecular structure design program SPROUT has been developed in the University of Leeds since 1992. A new variant is SynSprout that builds synthetic constraints into the structure generation process. This helps to overcome a standard problem in De Novo ligand design where hypothetical ligands, including those predicted to bind very strongly, have no practical value unless they can be readily synthesised.
A key feature of this new version is an automatic method for the generation of large starting 3D fragment libraries. In one variant drug-like structures are taken back to their starting materials using a user-defined retro-synthetic knowledge base and these starting materials can then be used as building blocks in the initial docking and building up process. The library generation process also provides perception of essential atom properties such as hydrogen bonding and hybridisation with functional group detection.
The poster will provide an overview of the main concepts and then focus on the methodology involved in the automatic process of building a fragment library from a database of drug-like structures. Future work to build fragment libraries in a hierarchical manner in order to speed up the structure generation process will also be included.
The chemical hyperstructure is a single structure representation of a molecular library generated by the sequential overlapping of each of the library’s compounds to the current hyperstructure, retaining the molecule’s atom and bond information as part of the hyperstructure. Vladutz and Gould  originally proposed the hyperstructure representation as a method for improving the retrieval capabilities of chemical databases.
The problem in determining the greatest overlap between the graphs is equivalent to the location of the set of disconnected maximal edge-induced subgraphs that are common to both the current molecule and hyperstructure. Since this is a NP-Complete graph-matching problem, it is computationally expensive to locate an optimal mapping in a realistic time frame without the application of heuristics. A heuristic that has been used with success in this particular problem is the Genetic Algorithm (GA), an optimisation heuristic inspired by natural evolution in which a population of chromosomes (candidate solutions) is iteratively evaluated, perturbed and sampled. The GA heuristic was originally applied to this particular problem in Brown et al. .
The work presented here improves on the existing hyperstructure generation GA, most notably in the optimisation of the parameter set governing the genetic operator probabilities, number of generations and population size, along with the introduction of atom type as a mapping constraint rather than the elemental type.
 Vladutz, G. and Gould, S. R. (1988). Joint compound/reaction storage and possibilities of
hyperstructure-based solution. In Warr, W. A. (ed.). Chemical structures: the international
language of chemistry. Springer-Verlag, Berlin.
 Brown, R. D., Jones, G. and Willett, P. (1994). Matching two-dimensional chemical graphs using genetic algorithms. J. Chem. Inf. Comput. Sci. 34 63-70.
The tremendous increase in protein related sequential and structural information raises the question of how these can be utilised in drug discovery and development. Proteins with adenine containing substrates can be used as a test case for what is to come. Thousands of adenine binding proteins have been identified, there are several hundred crystal structures and many of these are attractive targets for drug development. Due to the fact that adenine has a relatively drug-like structure, most inhibitors for these proteins were designed to bind the adenine pocket but a major problem is still specificity. Many autors have been analyzing different sets of proteins in order to define adenine recognition motifs [1,2]
We have analyzed crystal structures of proteins containing an adenylic fragment (excluding DNA and RNA) from the point of view of the ligand. We will show that proteins adopting the same fold (classified according to SCOP ) or having the same function have significantly different binding sites for adenine. Thus, in order to exploit structural information for drug development, it is necessary to analyse the ligand binding site. The large amount of structural information available for adenylate binding proteins will be used to distinguish between conserved interactions, which we believe to be responsible for binding affinity and differences in the binding site which can be exploited to generate ligand specificity. A set of parameters describing the interactions between ligand and protein is used to classify adenylate binding proteins in terms of their ligand binding site. Such a database could be useful for similarity analysis and for the design of specific inhibitors. For example, proteins for counter- screening which have a similar binding site and thus a high probability to give specificity problems can be selected.
Keywords: adenine binding pocket; ligand design; specificity; database of binding sites.
 S.L. Moodie, J. Mitchell and J.M. Thornton, J.Mol.Biol., 243, 486-50 (1996).
 K.A. Denessiouk. and M.S. Johnson. PROTEINS, 38,310-26 (2000).
 A. Murzin, S. Brenner, T. Hubbard and C. Chothia, J.Mol.Biol.,247,536-40 (1995).
Electron transfer is a fundamental biological process. Medium chain acyl-CoA dehydrogenase (MCAD) is the most prominent enzyme of the fatty acyl-CoA dehydrogenases found in the kidney cortex (Thorpe et al., 1979). It is involved in the oxidation of mitochondrial fatty acids (fatty acid oxidation provides up to 40 % of the total human energy requirement (Sherratt, 1988)). Its natural electron acceptor is an electron transferring flavoprotein (ETF). Defects in human ETF result in glutaric acidemia type II (GA II), an often fatal disease resulting in the inability to oxidize various fatty acyl-CoAs (Loehr et al., 1990). Models of the human ETF:MCAD complex, consistent with our X-ray scattering data, were produced using an automated protein-protein docking program (Vakser, 1996) and electron transfer rates (Page et al.,1999) were calculated. The electron transfer data indicates that optimal electron transfer requires domain II of ETF to rotate by ~ 30° to 50° (towards domain I) relative to its position in the X-ray structure. This creates a new regime in which the intrinsic electron transfer rate is elevated well above typical values observed in physiological electron transfer complexes (102 to 103 s-1). Thus, domain motion establishes a new ‘robust engineering principle’ for electron transfer complexes, tolerating multiple configurations of the complex whilst retaining efficient electron transfer.
Loehr, J. P., Goodman, S. I., and Frerman, F. E. (1990) Pediatr. Res. 27, 311-315.
Page, C. C., Moser, C. C., Chen, X. X., and Dutton, P. L. (1999) Nature 402, 47-52.
Sherratt, H. S. A. (1988) Biochem. Soc. Trans. 16, 409.
Thorpe, C., Matthews, R. G., and Williams, C. H. (1979) Biochemistry 18, 331-337.
Vakser, I.A. (1996) Biopolymers 39, 455-464.
An empirical scoring function and a flexible molecular docking method based on it are being developed at the University of Leeds.
The scoring function contains elements describing Van der Waals, hydrogen bonding, metal ion bonding, hydrophobic, rotatable bond entropy and dihedral strain energy terms. The coefficients of the different terms are obtained by regression analysis based on a training set of 50 protein-ligand complexes from the Protein Data Bank. The scoring function also comprises a novel building of a hydrogen and metal bonding framework within the receptor- ligand complex by rotating terminal rotatable bonds and considering protonation states to achieve optimal hydrogen and metal bonding geometries.
The docking method is based on a novel simulated annealing minimization algorithm called Systematic Population Annealing (SPA) which has been developed in our laboratory and applied to the scoring function above. During the optimization, flexibility in the ligand is treated by rotating around internal single bonds and the receptor is kept rigid, apart from the terminal bonds and protonation states mentioned above.
The poster will provide an overview of the scoring and docking methods of SPA, will also show some results, and compare it to other flexible docking methods.
In recent years, considerable efforts have been directed towards predicting biodegradability of chemicals. Fruits of a collaboration with the French environmental authorities, we have at our disposal a set of molecules with both structure and ecotoxicity data notified. In a first time, we focused on the information related to easy biodegradability. In order to get a structural similarity between compounds, we implemented a program which calculates a largest common subgraph (LCS) between two molecular graphs. With this tool, we are able to arrange a set of compounds in descending similarity based on LCS.
We decided then to evaluate the interest of such arrangements. First, we conducted a series of experiments in order to learn whether our calculated orders are independent for hypergeometric law that represents a random order. Then we aimed to assess the degree of correlation between our similarity and activity (biodegradability in our study). We consider explaining our experiments and most significant results.
These research studies have been financed both by the Conseil Régional of Basse-Normandie and the company ATOFINA.
Acid Proteinase ( Rhizopuspepsin ) belongs to the large group of ferments named hydrolases. Hydrolases are the main enzymes which catalyze initial phase of partitioning of proteins and complicated carbohydrates to the simple substances. When hydrolases attack substrate, intermolecular bonds are breaking. This process takes place in presence of water molecules.
Hydrolases are divided on several groups peptidases being one of them. These enzymes catalyze breaking up the peptide bond in peptide molecules. It is assumed that this reaction depends strongly on pH, and by this reason we model the stage of proton transfer from the Asp residue from the Rhizopuspepsin to the water molecule. It is also assumed that the reaction depends on the presence of Ser residue, geometry of the protein and other details of the entire system. We apply the hybrid ab initio QM/MM molecular simulations using based on the EFP methodology by using the GAMESS and TINKER packages. This method allows us to separate ab initio (“active”) part of substance and replace residuary part (chemically “inert”) by the model potentials.
In the present work, the water molecule is treated as an ab initio part, MM-sized enzyme molecule is subdivided into a series of fixed geometry fragments.
Bioisosteric replacement can be defined as the replacement of a functional group in a bioactive molecule by another functionality having similar size and physicochemical properties. Bioisosteric transformations are used in the pharma industry to optimize properties of drug candidates (activity, selectivity, transport), to remove unwanted side effects, to design molecules easier to synthesize or to avoid patented structural features. The manual identification of proper bioisosteric groups, however, is not easy because it requires the location of a delicate minimum on a complex hypersurface of hydrophobic, electronic and steric properties.
To enable bench chemists at Novartis to perform drug design based on the bioisosteric principle, we developed an easy to use web-based system written in Java which allows automatic identification of substituents physicochemically equivalent to the given target. The system uses a database of about 10000 functional groups characterized by calculated hydrophobicity, quantum chemical parameters compatible with the Hammett sigma constants and by hydrogen bonding and steric properties. The program also allows automatic design of analogs by replacing functional groups in a molecule by their bioisosteric equivalents, while keeping global molecular properties (logP, size, polar surface area, pharmacophoric pattern) similar to those of the parent structure.
P. Ertl, Simple quantum chemical parameters as an alternative to the Hammett sigma
constants in QSAR studies, Quant. Struct.-Act. Relat. 16, 377-382 (1997).
P. Ertl, World Wide Web-based system for the calculation of substituent parameters and substituent similarity searches, J. Mol. Graph. Model. 16, 11-13 (1998).
The protein docking problem involves predicting the mode of interaction between two proteins. We have developed a genetic algorithm (GA) for protein-protein docking in which the proteins are represented by dot surfaces calculated using the Connolly program. The Connolly program associates with each dot the vector normal to the surface at that point and also a shape descriptor. We use the GA to move the surface dots of one protein relative to the other to locate the area of greatest surface complementarity between the two. Dots are matched if their normals are opposed, their Connolly shape types are not the same and their hydrogen bonding or hydrophobic potential is fulfilled. The fitness function also contains a penalty for overlap of the proteins’ interiors. We tested the GA on 34 large protein-protein complexes where one or both proteins has been crystallised separately. 30 of the complexes have at least one near-native solution ranked in the top 100. Additionally we have successfully reassembled a 1400-residue heptamer based on the top-ranking GA solution obtained when docking two bound subunits.
Early efforts in combinatorial library design were directed towards diversity analysis on the assumption that maximising the range of structural types within a library will result in a broad coverage of bioactivity types. However, many combinatorial libraries either failed to deliver the improved hit rates that were expected or resulted in hits that did not have “drug-like” characteristics. The emphasis in library design has now shifted towards designing libraries which are optimised on a number of criteria, such as diversity, cost and drug-like characteristics. We have previously described the program SELECT which is a genetic algorithm (GA) approach to the design of libraries in product-space . In SELECT, multiple objectives are optimised simultaneously via a weighted-sum fitness function. Here, we discuss the limitations of using a weighted-sum fitness function and describe the development of MoSELECT that is based on a MOGA (MultiObjective Genetic Algorithm)  and represents a significant improvement in multiobjective library design.
 Gillet, V.J., Willett, P., Bradshaw, J., Green, D.V.S. Selecting combinatorial libraries to
optimise diversity and physical properties. J. Chem. Inf. Comput. Sci. 1999; 39, 169-177.
 Fonseca, C.M., Fleming, P.J. Multiobjective optimization and multiple constraint handling with evolutionary algorithms- Part 1: A unified formulation. IEEE Trans. Syst., Man Cybern. 1998; 28(1), 26-37.
In recent years, studying the variability of molecular descriptors has attracted a lot of interest as a means for comparing compound databases. These studies may help revealing trends in the characteristics of the compounds forming different databases and they are relevant for activities such as library design, diversity selection, property filtering, and compound acquisition. However, such analyses are often limited to visually inspecting the binning distributions of molecular descriptors. Although visual impressions are qualitatively highly valuable, a more quantitative measure of the variability of molecular descriptors in compound databases would be desirable. In this respect, the use of Shannon entropy calculations to quantitatively compare compound databases has been recently proposed [1,2]. The present contribution aims at exploring its advantages and limitations in several application examples.
 J.W. Godden, F.L. Stahura and J. Bajorath. Variability of molecular descriptors in
compound databases revealed by Shannon entropy calculations. J. Chem. Inf. Comput. Sci. 2000, 40, 796.
 G.M. Maggiora and V. Shanmugasundaram. Similarity-based Shannon-like diversity measure. 219th American Chemical Society National Meeting. Division of Computers in Chemistry. Abstract No. 119, 2000.
The Ligand Design (LUDI/Insight_2000) program  was applied in order to design new inhibitors of leucine aminopeptidase (LAP, E.C. 188.8.131.52), predict their activity, determine their binding mode and analyze the interactions with the enzyme. The investigation was based on the crystal structure of bovine lens leucine aminopeptidase complexed with its inhibitor – the phosphonic analogue of leucine - LeuP (1lcp in PDB). The new inhibitors were designed by modification of LeuP, consisting in the incorporation of new substituents found in Ludi_Link library in LeuP structure . This resulted in phosphonic analogues of amino acids, which side chains are bound in S1 pocket of the enzyme as well as in dipeptide analogues, bound in S1 and S1’ pockets of LAP. Several of new potential inhibitors containing different side chains in P1 position (e.g. Leu, Tyr, hPhe - homophenylalanine, hTyr - homotyrosine) and several substituents in P1’ position (Leu, Phe, Tyr, Gly) were synthesized and their activity towards the enzyme was measured . All of the tested compounds appeared to be strong LAP inhibitors, with the inhibition constant values in micro and nanomolar range. Most of them are significantly more active than already known inhibitors of the enzyme, containing phosphorus atom in the structure. A reasonable agreement between binding affinities calculated by several computational approaches and experimental Ki values has been observed for most of the studied inhibitors. Our results confirm the efficiency of the LUDI program for the design of enzyme inhibitors as well as for prediction their activity.
In addition, it was possible to improve binding energy estimates for the inhibitors differing with the electronic structure of functional groups, which interactions with LAP active site were dominated by the electrostatic effects . For this purpose we applied the another method, developed in our laboratory, which is based on an ab initio calculations of the interaction energy in ligand–receptor system . This permitted us to obtain precise inhibitory activity estimates for several known LAP inhibitors – transition state analogues .
 Gubernator, K. and H.J. Böhm, eds. Structure - based ligand design. Methods and
Principles in Medicinal Chemistry , ed. R. Mannhold, Kubinyi, H., Timmerman, H. Vol. 6.
(1998), Wiley-VCH: Weinheim, 153.
 Grembecka, J., W.A. Sokalski, and P. Kafarski, Computer-aided design and activity prediction of leucine aminopeptidase inhibitors. J. Comp. Aided Molec. Design 14, 531-544 (2000).
 Grembecka, J., Mucha, A., Cierpicki, T., Kafarski, P., New potent phosphinate and phosphonamidate inhibitors of leucine aminopeptidase - structure-based design, synthesis and activity - submitted.
 Grembecka, J., P. Kedzierski, and W.A. Sokalski, Non-empirical analysis of the nature of the inhibitor-active site interactions in leucine aminopeptidase. Chem. Phys. Lett. 313, 385- 392 (1999).
 Sokalski, W.A., Kedzierski, P., Grembecka, J., Dziekonski, P. Strasburger, K., in Computational Molecular Biology, J. Leszczynski, Editor. (1999), Elsevier Science: Amsterdam. p. 369-396.
 Grembecka, J., Sokalski, W.A., Kafarski, P., Quantum chemical analysis of the interactions of transition state analogues with leucine aminopeptidase, Int. J. Quant. Chem. (2001) in press.
The destruxins are hexacyclopdepsipeptidic mycotoxins produced by the entomogenous fungus Metarzhium anisoplae. These natural products are considered "vivotoxins" because they are produced during the fungal infection of a host. The lethal effect of destruxins is associated in leptidopteran insects with a paralysis, that results from a depolarization of the muscle cell membrane. This depolarisation results from opening of endogenous calcium channels (Samuels et al., 1988). Significant differences in biological response to a set of destruxins have been reported, in spite of apparently minor differences in the structural chemistry of this group (Dumas et al., 1994). However, the relationships between structure and the activity among these cyclic peptides has not yet been elucidated. The principal objective of the project is to identify the mode of action of destruxins in order to make an accurate comparison of the cytotoxic action of the different destruxins using the modelling package Quanta. The modelling studies have been validated by experimentation using liposomes and dye indicators. Another aim is to establish SARs for the set of molecules reported by Dumas (1994) and to identify the structural features that determine the toxicity. Armed with this knowledge, a programme of chemical synthesis is planned to discover more effective materials for use as agrochemicals.
Nowadays pharmaceutical research is characterized by screening growing compound pools with increasing percentages of combinatorial libraries for more biological targets. Using combinatorial chemistry, it is no problem to increase the sizes of screening pools enormously in a short period of time, but in order to be efficient it is useful to find criteria to decide which compounds are worth being synthesized and screened.
With the hypothesis that active compounds are characterized by target-specific properties and that these properties are more or less distributed in the whole chemistry space for different targets, screening pools should be as diverse as possible.
In this context, at Boehringer Ingelheim, we analyzed the composition of our compound pool. First, we learnt that combinatorial libraries are more selective towards biological targets than samples of conventionally synthesized substances. We wanted to find reasons for this, leading to a strategy for planning new combinatorial libraries. For this purpose we used the software Diverse Solutions developed by Pearlman and Smith at the University of Texas. We applied their BCUT values as descriptors to create a chemistry space and analyzed the distribution of different combinatorial libraries in it. This way we could evaluate different templates of combinatorial libraries.
In this presentation the results of our study shall be shown and discussed.
The bacterial ribosome contains the target sites of many clinically used antibiotics, including macrolides, ketolides, lincosamides, streptogramins, aminoglycosides, tetracyclines and oxazolidinones. We present the application of a novel molecular docking software, RiboDock®, specifically designed for efficient docking to both RNA and protein targets. In the search for new antibiotic classes that avoid the bacterial resistance issues associated with many current agents, we have undertaken high-throughput screening of a ribosomal target by computer. Three dimensional structures of over 1.2 million compounds, representing the available compound libraries of 12 suppliers, were generated and then docked using RiboDock® into the X-ray structural coordinates of GAR, an RNA/protein interface. GAR, located on the 50S ribosomal subunit, is the binding site of thiostrepton, a poorly soluble and structurally complex antibiotic unexploited for human use. The top 1145 in-silico hits were then selected for purchase. Initially, 197 compounds were tested for interaction with GAR in vitro. 13 compounds were then assayed for translation inhibition and three were active. Data is presented on one of these compounds that is a selective inhibitor of bacterial translation (IC50 = 7.6ug/ml) and active against Staphylococcus aureus at 50ug/ml. This is the first report of the use of high throughput docking software to screen for new classes of anti- ribosomal inhibitors. Running on only ten 550MHz Pentium III PC's, the process took just 6 weeks from initiation to experimental validation of the first hit. The process is highly scalable, speed being proportional to processor power (current performance on 100 1Ghz Intel PIII is about 1M compounds docked per day). The publication of the detailed crystal structures of ribosomal sub-units opens the way for a concerted in-silico screening program of various sites on the bacterial ribosome, an approach that could revolutionise the discovery of new antibacterial protein synthesis inhibitors.
Nicotine acetylcholine receptors (nAChRs) play a critical role in the signal transmission between cells at the nerve/muscle synapses and influence important high brain functions, as well as neurodegenerative pathologies such as Alzheimer's and Parkinson's diseases . To identify the main physicochemical interactions responsible for the high affinity binding, eleven nAChR agonists were selected for the pharmacophore development on the basis of the following criteria: i) high binding affinity (pKi ( 8.15); ii) regular distribution of data over the whole range of affinity explored; iii) presence of the same molecules already investigated in previous pharmacophores ; iv) high structural diversity and limited molecular flexibility. A consistent pharmacophore model  was derived using, in a parallel and independent fashion, several automated computational approaches. Convergent results from DISCO  (Distance Comparison), QXP  (Quick Explore), Catalyst/HipHop  and MIPSIM  (Molecular Interaction Potential Similarity) allowed us to identify and locate, in a well defined spatial arrangement, three significant and geometrically independent structural features : i) a positively charged nitrogen atom for ionic or hydrogen bond interactions, ii) a lone pair of the pyridine nitrogen or a specific lone pair of a carbonyl oxygen, as hydrogen bond site, and iii) a dummy atom indicating the centre of mass of hydrophobic areas generally occupied by aliphatic cycles.
The results from the present study are in full agreement and complement those obtained in our exhaustive 2/3D QSAR analysis  of a large array (269 molecules) of nAChRs agonists. Overall our findings may be profitably used for the design of new, potent and possibly selective nAChR agonists.
. Levin, E.D.; Drug Dev. Res., 1996, 38, 188.
. Glennon, R. ; Dukat, M.; In Neuronal Nicotinic Receptors; A Wiley-Liss Publication, Ed.; Arneric, P.S.; Brioni, J. D.: New York, 1999
. Nicolotti, O.; Pellegrini-Calace, M. ; Carrieri A.; Altomare, C.; Centeno, N.B.; Sanz, F.; Carotti, A.; J. Comp. Aid. Mol. Des, submitted
. Brint, A.; Willett P.; J.Chem.Inf.Comput.Sci., 1987, 27,152
. McMartin, C.; Bohacek, R.S.; J. Comp. Aid. Mol. Des., 1997, 11, 333.
. Barnum, D.; Greene, J.; Smellie, A.; Sprague, P.; J. Chem. Inf. Comput. Sci. 36, 1996, 563
. Càcere, M.;Villà, J.; Lozano, J.; Sanz, F; Bioinformatics, 2000, 16, 568
. Nicolotti, O.; Pellegrini-Calace, M.; Altomare, C.; Carotti, A.; Sanz, F. Med Chem Res., in the press.
The application of the maximum common subgraph (MCS) problem to the area of chemoinformatics has been well established. The use of the MCS approach has historically been limited due to its combinatorially explosive nature and it has thus not been possible to apply the technique to complex and/or large structural databases. The proposed research seeks to improve this situation through the development of a new MCS algorithm.
The proposed algorithm is based on the well known reduction of the MCS problem to the problem of determining the maximum clique in a compatibility graph [1, 3, 4], also known as the association graph or modular product. However it differs from previous efforts in the manner in which clique detection is performed. The substantial improvements in runtime are achieved by first simplifying the compatibility graph using graph theoretic and chemical knowledge based heuristics and then using improved bounding methods in the maximum clique detection procedure.
 Barrow, H. and Burstall, R., Subgraph Isomorphism, Matching Relational Structures and
Maximal Cliques. Inf. Proc. Lett., 1976. 4(4), 83-84.
 Bessonov, Y.E., On the Solution of a Problem on the Search for the Best Intersection of Graphs on the Basis of an Analysis of the Projections of the Subgraphs of the Modular Product (in Russian). Vychisl. Sistemy, 1985 112, 3-22,121.
 Durand, P., Pasari, R., Baker, J., and Tsai, C., An Efficient Algorithm for Similarity Analysis of Molecules. Internet J. Chem., 1999. 2, 1-12.
 Levi, G., A Note on the Derivation of Maximal Common Subgraphs of Two Directed or Undirected Graphs. Calcolo, 1972. 9, 341-352.
Chemical suppliers have long made their catalogues available in machine-readable form and these increasingly contain only fingerprint representations and not full structures. Pharmaceutical companies are expected to decide whether or not to acquire a compound purely on the basis of the fingerprints and this would typically involve consideration of similarities to in-house compounds of known structure and properties.
The problem of whether to acquire or reject a compound is essentially one of two-class pattern recognition, a category to which many machine learning algorithms have been applied with considerable success. The work described assumes the existence of two datasets, one of “keepers” and the other of “rejects”, the latter containing molecules with some property (such as toxicity, reactivity or number of rotatable bonds) that renders them unattractive as potential drugs. This data is used to train a Support Vector Machine to discriminate between “keepers” and “rejects” and the effectiveness of the discriminator on out-of-sample data is assessed.
High-throughput estimation of ADMET properties is critical for early lead optimization. Towards this goal, we propose to use a modified Free-Willson equation:
LogX = S Qi + S Fi,j,s + S fk
where X is ADMET property, Qi is additive increment of the i-th pharmacophore, ?i,j,s is increment of the j-th fragment in the s-th side radical (specific binding), and fk is increment of the k-th fragment which is site-independent (non-specific binding). This equation differs from the classic Free-Willson approach in two aspects. First, we simultaneously analyze multiple pharmacophores. Second, we differentiate specific and non-specific binding of side radicals. This equation leads to the following scheme for creating ADMET estimation methods:
All of these stages have been implemented in a PC program Advanced Algorithm Builder which can handle up to 100,000 compounds. Current study demonstrates its application for restricted data sets of P-glycoprotein affinities, LD50 toxicities and CYP 2D6 Inhibition. Advantages and limitations of proposed method are briefly discussed.
 A.Petrauskas and E.Kolovanov, Persp. Drug Disc. and Design, 19, 99-116 (2000).
 C. Hansch and A.Leo, Substituent Constants for Correlation Analysis, Wiley, N.Y. (1979).
 J.A.Platts, D.Butina, M.H.Abraham and A.Hersey, J. Chem. Inf. Comp. Sci., 39, 835 (1999).
While simple van der Waals potentials often perform adequately in protein-ligand docking studies, DNA-ligand recognition is a tougher challenge. It is now well established that the accurate prediction of the DNA binding affinity of a ligand requires the correct treatment of hydration and long range electrostatic effects. While the ‘gold standard’ remains modeling studies with explicit treatment of water molecules and counterions, new implicit hydration models are being advanced as suitable choices for the much more rapid evaluation of ligand target interactions, required for virtual high throughput screening.
In this study we have evaluated one such model, the generalized borne/surface area (GB/SA) model, to accurately reproduce the structure and dynamics of a drug-DNA recognition system. The system we have chosen is the 2:1 complex between the minor groove binding drug Hoescht 33258 and the DNA dodecamer d(CTTTTGCAAAAG)2. We have extensive NMR and explicit hydrated modeling data on this system and it shows the interesting property of highly cooperative binding of the drug, such that the 1:1 complex is never observed. We find that by suitable choice of parameters, the GB/SA model can perform very well, reproducing to good accuracy the structural and dynamic information we have on this system.
Analysis of protein structures at the tertiary level has many useful applications, for example, in the prediction of function from structure, in investigating trends in protein structure to aid the development of structure prediction methods, in designing novel functional sites in proteins and, because structure is more conserved than sequence, in tracing protein evolution.
The ASSAM and ASPROTE programs were developed in Sheffield to search and compare the three-dimensional tertiary structure of proteins using graph matching methods. ASSAM uses a subgraph isomorphism algorithm to locate all occurrences of a specific three- dimensional pattern of amino acid residues within the structures of the Protein Data Bank. ASPROTE uses a maximal common subgraph isomorphism algorithm, with the same graph representation, to compare two three-dimensional protein structures and highlight the largest section of tertiary structure in common between them.
This work aims to improve these two programs to enable the user to input queries with increased specificity, leading to a more efficient questioning of known protein structure. For example, the ASSAM user can now specify what type of secondary structure a residue is in, how accessible to the solvent it is, and whether cysteine residues are involved in disulphide bonds. A further development allows queries to be specified using the backbone atoms of the protein as well as, as in the original programs, the positions of the side-chains of the residues. Considerable efforts have also been made to ensure that the biologically relevant multimer of the protein is used in determining the solvent accessibility of the constituent residues.
Searching structural databases for occurances of a query pharmacophore is one of the oldest techniques used in rational drug design, dating back to mid 70s. From early methods, that were limited by the use of only one rigid, usually low energy conformation it took 20 years to develop flexible conformational search techniques. These latter methods can carry out searches which are noticably more effective, but at the expense of a significant increase in the amount of the computer time required. However, even these methods, which in most cases operate by adjusting torsional angles to match the query, cannot guarantee that a matching conformation of a particular compound is found, since they do not consider all possibilities (e.g. because of their use of local optimization).
A method, that uses an exhaustive analysis of the conformational space of a ligand molecule, that is capable to delievering a guarantee of this type will be presented. This method, based on interval analysis (an evolving field of mathematics) uses a continuous representation and search technique without the need of particular discrete sample points. As a result, failure to find a given pharmacophoric pattern in a ligand means, that no such conformation does exists. On the other hand, if there are suitable conformations, then all of them are found.
The effectiveness of this novel method will be demonstrated using three and four point pharmacophores.
Many pharmaceutical companies have accumulated large amounts of pharmacological data over time. These pharmacological data may originate from broad-spectrum screening initiatives, project bound HTS campaigns or lead optimisation projects. The number of compounds that have been submitted for testing as well as the type and quality of the recorded pharmacological activities may vary significantly between the assays: % effect or % concentration data, IC50, ED50, Ki-values, binary (active-not active) or ordinal data schemes.
We present a novel method that allows queries in diverse pharmacological databases. We introduce discrete score values for the activity of a compound to allow queries on the database that represent complex pharmacological profiles. The method retrieves compounds that have a pharmacological profile that closely, but not necessarily exactly fits the target profile, or compounds that have the potential to fit the target profile but need further testing. An example is presented that compares the results to those of a classical database search to find compounds similar to doxorubicine in the database from the National Cancer Institute (NCI), containing more than 30000 compounds with their 50% growth-inhibitory concentration (GI50) against a panel of 60 human cancer cell lines.
Random search proved to be an inefficient method for lead discovery. Alternatively, virtual screening algorithms enable a guided search in the high dimensional chemical space. 2D and 3D pharmacophore models, neural network concepts, and new bioinformatic approaches lead to fast and efficient virtual screening tools. PHACIR® (PHArmaCophore Identification Routine) - a 3D pharmacophore model based algorithm - generates highly enriched focused compound libraries as demonstrated in several retrospective screenings. The database screening speed exceeds 7000 cmpds/sec on average workstations. Even single topological query information is sufficient for PHACIR screening, i.e. 2D structure input of only one active compound can be used for scanning large compound libraries. ClassyFire® - CallistoGen's artificial neural network diversity analyser - produces high quality focused compound libraries to identify potential lead candidates with different scaffolds.
For de novo design of biologically active peptides evolutionary algorithms proved to be very useful. PepHarvester® and Darwinizer® allow a guided search through the high dimensional sequence space. Compared to known isofunctional sequences the peptides found are highly diverse.
Similarity searching usually involves the specification of a target molecule that exhibits some form of biological activity and results in the retrieval of biosterically similar molecules. The search itself compares user defined characteristics of the target molecule with that of the other molecules within the database and calculates a measure of similarity based on those properties. The retrieval of biosterically similar molecules has become a key stage in the rational development of new pharmaceuticals, and as such much interest is shown in the development for new methods/descriptors for their retrieval.
Protein-ligand interactions are fundamental for any sort of biological activity. Therefore calculation of the similarity between groups based upon non-bonded contacts is a useful measure of the ability of one functional group to act as a biosteric replacement for another.
IsoStar is the definitive database of experimental and theoretical information on non-bonded interactions. The experimental information contained within the database is derived from the Cambridge Structural Database (CSD) and the Protein Data Bank (PDB). Information is presented in the form of scatterplots that show the spatial distribution of non-bonded contacts between two chemical groupings.
Using the IsoStar scatterplots, the three-dimensional similarity of the functional groups (central groups) contained within the database are calculated based upon the spatial distribution and density of the contact groups. Scatterplots can be converted to propensity maps for specific probe atoms in a contact group. The propensity maps are compared by superimposing the central groups in a chemically relevant fashion. The similarity is calculated based upon the degree of overlap of the propensity maps. This procedure has been carried out for all the central groups in IsoStar.
The results of these similarity calculations are validated using the Bioster database by comparing the similarities for known biosteric functional groups with those for random pairs of functional groups. The results show a marked difference in the similarities of the random pairs versus the known biosteric pairs, indicating that the IsoStar propensity maps are a good descriptor of three-dimensional similarity.
Rigid ring systems can be used to position receptor-binding functional groups in 3D space and they thus play an increasingly important role in the design of combinatorial libraries. This poster presents a method using shape-similarity to identify ring systems that are structurally similar to a user-defined target ring system and that can thus be used to identify alternative scaffolds for the construction of a combinatorial library.