Ben Honore Poster

Machine Learning Prediction of Conformationally Averaged NMR Chemical Shifts

Ben Honore, Calvin Yiu and Craig Butts

University of Bristol

NMR is the most important analytical tool in 3D chemical structure elucidation. Density functional theory (DFT) is a quantum mechanical computational method that models the electronic structure of a system. Among other properties, it can calculate the shielding tensor of nuclei in a molecule, which can in turn be converted to NMR chemical shifts by linear scaling to experimental data.

Predicting the NMR spectrum of a molecule is important for comparison when analysing experimental spectra and this is generally done using DFT. However, these are computationally intensive calculations and become exponentially more so for larger molecules or inorganic metal-containing complexes. Machine learning can be used to predict the same information to a similar level of chemical accuracy in a matter of seconds, rather than the days or hours that DFT can take.

IMPRESSION (Intelligent Machine PREdiction of Shift and Scalar Information Of Nuclei) is an NMR chemical shift and coupling constant predictor trained on DFT-calculated parameters. Generation 1 was published in 2020 and uses a shallow learning kernel ridge regression algorithm. Generation 2 is currently in development and uses a deep learning graph transformer architecture. IMPRESSION has been shown to predict 1H and 13C chemical shifts to within tiny errors of DFT calculated shifts.

Recently, IMPRESSION has been compiled into a user-friendly package to automate chemical shift predictions. This allows the user to start from a molecular input file or SMILES string and make conformationally averaged chemical shift predictions to better match the NMR properties of a molecule in solution. However, currently the conformational averaging is based on molecular mechanics (MM).

Research has been done to show that the prediction performance of IMPRESSION does not decrease substantially when given MM-optimised chemical structures as opposed to the DFT-optimised structures that it is trained on. However, the MM energy associated with each conformer is likely insufficient for accurate conformational averaging since it is based on molecular level properties rather than the more rigorous electronic level properties of DFT. Of course, DFT optimising takes long enough that it would somewhat undermine the purpose of using machine learning for speed. If IMPRESSION or a derivative of IMPRESSION could be adapted appropriately to predict molecular level properties in a realistic time frame, in theory one of these properties could be DFT calculated total energy. This would add an extra layer to the current workflow of IMPRESSION to improve the accuracy of conformationally averaged chemical shift prediction at a relatively small cost to prediction time.