3D QSAR Model for Binding Affinity Prediction
Shyamal K. Nath, Jingyi Chen, Ali Mozaffari and Riddhish Pandharkar,
OpenEye Cadence Molecular Sciences
A new 3D QSAR methodology has been developed that utilizes the similarity between molecules, both in physical and chemical space, between aligned ligands, to build a predictive model. The model is built as a composition of multiple models that combines orthogonal sets of descriptors and machine learning techniques. A prediction from the model is provided as the consensus of predictions from the individual models. The consensus model predictions are statistically superior to any of the individual model predictions. A combination of Gaussian Process Regression (GPR) and Kernel Partial Least Squares (KPLS) machine learning methods, with similarity measures as estimated from ROCS and EON as 3D molecular descriptors are used in building the models.
The newly developed 3D QSAR method is validated using several publicly available datasets, and against a baseline 2D QSAR model as well as against multiple available commercial 3D QSAR models. Results from the validation studies show that the newly developed methodology performance is superior to both the baseline 2D QSAR model, as well as the third party 3D QSAR models.
Models built using KPLS are used to develop interpretation within the active site regarding regions where availability of groups containing specific interactions such as a hydrogen-bond donor/acceptor, or an anion/cation etc. are preferred. Such interpretations can be used in generating new ideas, and allows the model to be used a potential generative design tool in the drug discovery cycle. Ability of the 3D-QSAR model with KPLS to provide generative intuition is a clear advantage over other 3D or 2D models. Predictions from the new model also provides explanation of the results by providing contributions on each of functional groups in the molecule, as well as an estimate of confidence of the model on the provided prediction. The estimate of confidence provided from the model can help guide when a molecule or a class of molecules are outside of domain of applicability of the model and that more experimental inputs are required to improve it.