Andrea Morger Abstract

A Case Study of Toxicity Prediction including Reliability and Confidence Estimation

Andrea Morger1, Janosch Achenbach2, Miriam Mathea2, Antje Wolf2, Roland Buesen2, Robert Landsiedel2, Klaus-Jürgen Schleifer2, Andrea Volkamer1

1Charité Universitätsmedizin Berlin
2BASF SE, Ludwigshafen
With new chemicals being synthesized every year, assessment of their toxicological potential, i.e. their harmful effects on humans and the environment, is a prerequisite for production and marketing. Most of the toxicological testing required by regulations is still requesting animal studies. In this context, in silico methods have great potential to reduce time, cost, and ultimately animal testing as they make use of the ever-growing amount of available toxicity data.[1]

In our KnowTox project, we develop a toxicity prediction tool that makes use of available knowledge from external and in-house data to provide rational support for read-across, including modern machine learning (ML) techniques to come closer to the vision of transforming toxicology into a predictive science.

Our major data source is the freely available ToxCast[2] dataset, consisting of ~8300 compounds, such as pesticides, pharmaceuticals, and industrial chemicals, tested on up to 1000 different endpoints, e.g. effects on cell cycle, cytotoxicity, or steroid receptor interactions. We will present a workflow – together with a case study – to search the entire ToxCast dataset for substances, which are most similar to any query compound. Information about these substances’ properties in terms of risk assessment can automatically be generated. Furthermore, previously identified substructures associated with toxic effects[3,4] are highlighted to warn the user and either guide the design of less toxic compounds or target subsequent in vitro and in vivo testing. For ML application, we adapted an open source standardisation workflow to remove duplicates, salts, and mixtures, yielding a reduced set of ~7500 clean compounds. One prevalent challenge in ML is the transferability of well-performing models to new chemical space. In order to define the applicability domain of our models, we have adopted the concept of conformal prediction[5,6] (CP). CP is based on a ML framework but includes an additional calibration step. Predictions made by the trained random forests are compared with those of the calibration set. We could validate such a CP model with an external and an inhouse dataset. Given a certain significance level, the model recognizes whether it has enough data to make a confident prediction. Nearly a hundred of such models were trained and we will illustrate the application of the combined tool in a case study.

Identification of sufficiently similar chemicals will support rationales for read-across[7], and accurate toxic mechanism or endpoint predictions will guide further toxicity testing or the deselection of most likely harmful compounds in an early stage of the often lengthy research and development process. Estimating the reliability of the predictions made, can emphasise on the importance of a decision.

Our combined prediction tool can, together with the experience of toxicologists, help to improve efficiency and reduce the need for animal testing for toxicological assessments in development projects and regulatory product registration.

1. Mayr A.; Klambauer G.; Unterthiner T.; Hochreiter S. DeepTox: Toxicity Prediction using Deep Learning. Front. Environ. Sci. 2016, 3, 80.
2. Richard, A. M.; Judson, R. S.; Houck, K. A.; Grulke, C. M.; Volarath, P.; Thillainadarajah, I.; Knudsen, T. B. et al. ToxCast chemical landscape: paving the road to 21st century toxicology. Chemical research in toxicology. 2016, 29(8), 1225-1251.
3. Sushko, I.; Salmina, E.; Potemkin, V. A.; Poda, G.; Tetko, I. V. ToxAlerts: a web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. J. Chem. Inf. Model. 2012, 52(8), 2310-2316.
4. Baell, J. B.; Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. Journal of medicinal chemistry. 2010, 53(7), 2719-2740.
5. Norinder, U.; Carlsson, L.; Boyer, S.; Eklund, M. Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. Journal of chemical information and modeling, 2014, 54(6), 1596-1603.
6. Svensson, F.; Norinder, U.; Bender, A. Modelling compound cytotoxicity using conformal prediction and PubChem HTS data. Toxicology Research, 2017, 6(1), 73-80.
7. Teubner W.; Landsiedel R. Read-across for hazard assessment: The ugly duckling is growing up.  Alternatives to laboratory animals (ATLA), 2015, 43, 67-71.

Bursary Application