Evaluation of normalised prediction intervals for ADME predictions
Christina Founti1, Val J. Gillet1, Jonathan Vessey2
|1University of Sheffield
|Standard practices for the development and use of QSAR models highlight the importance of reporting on the reliability estimates of the models’ predictions. Yet, many state-of-the-art machine learning algorithms do not directly provide this information and other methods; such as applicability domain filters and statistical techniques or error modelling may need to be applied . Another option is to adopt the conformal prediction (CP) framework ; a confidence estimation method that makes use of calibration data for the calculation of prediction intervals for regression models. Training a conformal predictor involves the optimisation of several parameters, such as the size and technique for sampling calibration data as well as the normalising method. The normalising method scales the size of individual prediction intervals so that it represents the uncertainty associated with the individual predictions and, thus, any relevant reliability estimation method may be selected. Despite the assertions of the frameworks’ valid results, the prediction intervals obtained by CP may not always be useful, particularly, if the model’s accuracy is low or the normalising method does not correlate well with prediction error. In addition, CP metrics focus primarily on the evaluation of the validity and efficiency of the conformal predictor rather than the utility of the prediction intervals obtained.
Random forest conformal predictors have been trained on ADME data and normalised using error model estimates. Different sets of descriptors were used to train the error models, including 1) molecular descriptors and 2) applicability domain variables , which are unseen by the underlying model. As the performance of error models is generally quite poor, the minimum error model performance required to obtain useful normalised prediction intervals is investigated. The utility of the normalised prediction intervals is evaluated based on their ability to rank prediction accuracy, as well as their size relative to the experimental error of the data and non-normalised prediction intervals.