Consensus QSAR modeling for the toxicity of organic chemicals against Pseudokirchneriella subcapitata using 2D descriptors
Kabiruddin Khan1, Kunal Roy1
1Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032, Kolkata, India
Organic chemicals are used in our day-to-day life in various forms such as food substances, cosmetics, and other personal care products, medicines, biocides used in various household ingredients, etc. In the last decade, there has been a sharp rise in the attention paid to the study organic chemicals as potential environmental pollutants. The two most commonly used measurements in ecological risk assessment include no-observed-effective concentration (NOEC) and x% effective concentration (ECx) where x can be 5-100. The existing methods of determination of NOEC and ECx via laboratory testing involve considerable cost and time; additionally these experiments cannot be performed for all possible endpoints. However, computational tools like quantitative structure-activity/toxicity relationship (QSAR/ QSTR) modeling can help filling the data gap [1,2]. The QSAR technique for the risk assessment of chemical compounds is recommended by various regulatory agencies like European Centre for the Validation of Alternative Methods (ECVAM), European Union Commission’s Scientific Committee on Toxicity, Ecotoxicity, and Environment (CSTEE) and United States Environmental Protection Agency (US EPA). Microscopic algae constitute a major class of primary producers; thus they can act as ecological base for higher species. Any unnoticed toxic effects of organic chemicals against this species would lead to cause harm to the whole ecosystem leading to secondary effects at higher throphic levels. This necessitates developing predictive QSAR models for toxicity of organic chemicals towards algae. The current report proposes robust, externally validated consensus quantitative structure-activity relationship (QSAR) models developed from 334 organic chemicals for the prediction of effective concentrations of chemicals for 50% and 10% inhibition of algal growth . 2D descriptors having definite physicochemical meaning were calculated from Dragon and PaDEL-descriptor software tools. The calculated pool of descriptors consist of constitutional indices, atom type E-state indices, 2D-atom pairs, ring descriptors, functional group counts, atom-centered fragment, topological indices, molecular property, CrippenLogP, XlogP and extended topochemical atom (ETA) indices. The used set of descriptors was so chosen that they could give meaningful models with definite physicochemical meanings in order to understand chemical features responsible for the toxicity of organic chemicals. Model development, validation and interpretation were performed following the strict guidelines of Organization for Economic Co-operation and Development (OECD). For feature selection, genetic algorithm along with stepwise selection was used, while the final models were developed from partial least squares regression technique in order to obviate any chance of intercorrelation among descriptors. The variables like MLOGP, MR and LogKow (experimental lipophilicity) exert highest positive contributions in controlling the aquatic toxicity, whereas polar groups such as oxygens in the form of SO2OH (nSO2OH descriptor) and alpha hydrogen (H-051 descriptor) showed an inverse correlation with the algal toxicity. The applicability domain analysis was carried out using the DModX technique available in SIMCA-P software in order to set the predefined chemical space to obtain reliable predictions for unknown organic chemicals. The obtained models against pEC50 endpoints were then used to predict toxicity of 64 organic chemicals not having definite observed response. Interestingly the model could predict accurately 53 (82%) out of 64 compounds with deviation of ± 2 log units (for the lower range values) and 51 (80%) out of 64 compounds with deviation of ± 2 log units (for the higher range values). Finally, the prediction reliability indicator tool was used to assess the confidence with which unknown compounds were predicted . The obtained results also emphasize on the use of consensus modeling and its application in reducing prediction errors . The obtained QSAR models can act as helpful tools for identification and prioritization for chemicals of highest concern, production of safer alternatives within the scope of REACH regulations for hazardous chemicals.
1. J. C. Dearden, Int. J. Quant. Struct.-Prop. Relat., 2016, 1(1), 1-44..
2. P. M. Khan and K. Roy, Expert Opin Drug Discov., 2018, 13, 1075-1089
3. K. O. Kusk, A. M. Christensen and N. Nyholm, Chemosphere., 2018, 204, 405-412.
4. K. Roy, P. Ambure and S. Kar, ACS Omega, 2018, 3, 11392-11406.
5. K. Roy, P. Ambure, S. Kar and P. K. Ojha, J. Chemom, 2018, 32, e2992.