Poster 14: Algorithms for Automatic Tautomer Generation and Their Applications

Nikolay T. Kochev1, Vesselina H. Paskaleva1, Nina Jeliazkova2
1University of Plovdiv, Department of Analytical Chemistry and Computer Chemistry
2Ideaconsult Ltd, 4 A. Kanchev str., Sofia 1000, Bulgaria
We present several algorithms for automatic generation of all tautomers of a given chemical compound implemented within Ambit-Tautomer library. Ambit-Tautomer is an open source Java library implemented on top of Chemistry Development Kit (CDK). The system includes three main algorithms: pure combinatorial method, improved combinatorial method and incremental algorithm. The tautomer generator uses a predefined rules database that can be customized additionally if needed. The rules are defined by Daylight SMILES/SMARTS line notations and they support 1-3, 1-5 and 1-7 proton tautomer shifts which cover the basic types of tautomerism. Pure combinatorial method generates all tautomeric forms considering all possible combinations of the matched rule states. The improved combinatorial method uses sub-combinations based on rules clustering. The incremental algorithm applies depth-first search to handle sophisticated cases of overlapping rules. Additionally, rule pre-filtering and tautomer post-filtering are applied for fine tuning of the generation process. The tautomer generator implements tautomer ranking based on empirical rules defined in terms of relative energy difference.
Ambit-Tautomer library is used to improve the Ambit database storage of chemical structures and accordingly to implement search procedures which take into account the tautomerism information. We also studied the influence of tautomerization on QSAR/QSPR models for various end points. Each QSAR/QSPR model is done first without taking the possibility of molecules to tautomerize and second the tautomer information is included into the training data set. For each compound from the training data set all possible tautomers are generated exhaustively by Ambit-Tautomer. The tautomers are used to calculate modified values of the original descriptors. The new value for a particular descriptor is obtained as an average from all descriptor values calculated respectively for all tautomers of the current compound. The average is calculated with different weighting schemes which take into account the tautomer ranking. The modified descriptor values are used to make improved QSAR/QSPR models which take into account the tautomerism. The described approach has effect only for those models which utilize descriptors that depend on the tautomeric forms.
Ambit-Tautomer is a part of Ambit2 project - an open source software for chemoinformatics data management distributed under LGPL license. Ambit2 consists of functional modules and a MySQL database. Ambit2 services are available as an online web services and as a downloadable application. A web page providing online tautomer generation by Ambit-Tautomer and several different software packages are available on http://apps.ideaconsult.net:8080/ambit2/depict/tautomer.

