Sébastien Guesné Abstract

Conformal calibration of probabilistic predictions

Sébastien Guesné1, Stéphane Werner1, Thierry Hanser1

1Lhasa Limited
The decision domain framework developed at Lhasa Limited is a more formal definition of applicability domain. [1] In this framework the class probabilities estimated by an in silico classifier are used to quantify the assertiveness of predictions made. This quantification is termed the decidability score of a prediction and is of great importance when assessing the potential liability of a single chemical to cause toxic/adverse effect(s) and promoting the use of in silico predictions in the context of risk assessment. This quantification must be calibrated so that the in silico classifier user is confident in using the model.

Imbalanced training sets with respect to the distribution between classes and the nature of the in silico classifier algorithm often generate non-calibrated probability estimates. Given that for many important endpoints the datasets investigated are imbalanced, it is therefore extremely important to calibrate the probabilistic predictions of the in silico classifier.

The conformal algorithmic framework complements an underlying in silico algorithm that allows the resulting system to produce predictions with information on their confidence. In the context of a classification problem such information includes p-values which delimit prediction regions. [2] This presentation will describe how a modified cross Mondrian conformal classifier algorithm [3] allows the modeller to calibrate a probabilistic in silico binary classifier. In this novel calibration approach the p-values generated by the conformal prediction algorithm are converted into calibrated class probabilities while avoiding information loss or distortion. This presentation will show how accurately the probabilities predicted by the calibrated in silico classifier map to the observed probabilities when using imbalanced training sets with varying degrees of class imbalance.

[1] – Hanser T., Barber C., Marchaland J.F., Werner S. Applicability domain: towards a more formal definition. SAR QSAR Environ Res. 2016, 27, 893-27909.

[2] – Vovk V., Gammerman A., and Shafer G. Algorithmic Learning in a Random World. 2005, New York, NY: Springer Science & Business Media.

[3] – Sun J., Carlsson L., Ahlberg E., Norinder U., Engkvist O and Chen H Applying Mondrian cross-conformal prediction to estimate prediction confidence on large imbalanced bioactivity data sets. J Chem Inf Model 2017 57, 1591–1598.