Thierry Hanser Abstract

Privacy preserving knowledge transfer from corporate data to federative models

Thierry Hanser1

1Lhasa Limited
Recent progress in the field of artificial intelligence (AI) has dramatically amplified the potential of applying machine learning to many important tasks in the process of drug discovery. To maximise the value of AI applications it is critical to access enough good quality data to allow machine learning algorithms to extract relevant knowledge and produce useful and predictive models. One of the main challenges in AI is therefore to compile such valuable datasets and this task is particularly difficult in the domain of drug discovery due to the confidential nature of the primary information: the chemical structure. Although we have access to limited public data, the most valuable knowledge is embedded in corporate data which can’t be shared easily without disclosing private information. As a consequence valuable information is kept isolated in private silos for practical reasons despite the willingness of industry to share non-competitive knowledge. To overcome this obstacle, Lhasa Limited has developed a methodology to facilitate the transfer of knowledge from corporate data to federative models whilst preserving the privacy of the original data. The method is based on the Teacher-Student approach [1] adapted to the domain of molecular informatics. The Teacher-Student is based on a two steps transfer method; first a private teacher model is trained from the proprietary data and used to label public data which is subsequently used to train a public student model. This indirection protects the original proprietary structural information since it does not disclose any structure or descriptors of proprietary data and protects the data against query based attacks. This new method enables knowledge sharing without privacy leak. In this presentation we will show how this methodology can be successfully applied to transfer knowledge from confidential hERG data from pharmaceutical companies into a useful and accurate model without disclosing any chemical structures. We will demonstrate that the student model is able to outperform any of the individual teacher models by learning the knowledge distributed across the ensemble of teachers.

[1] Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data. Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar. International Conference on Learning Representations (ICLR) 2017