Self Organising Hypotheses Networks: A New QSAR Approach to Automatically Discover and Organize KnowledgeThierry Hanser1, Chris G. Barber1, Edward Rosser1, Richard J. Sherhod1, Samuel J. Webb1, Stéphane Werner1
|1Lhasa Ltd, 22-23 Blenheim Terrace, Woodhouse Lane, Leeds, Ls2 9HD, UK|
|Structure Activity Relationship elucidation is widely used in drug development and risk assessment. Many methodologies and tools have been developed to help predict activity or toxicity. Among these various approaches are expert systems which produce accurate and highly interpretable predictions and machine learning techniques which, while accurate, offer less transparency. Whereas expert systems require a manual expert rule compilation process, machine learning techniques are fully automated and can digest large datasets efficiently. Lhasa Limited aims to combine the best of both approaches to mine datasets and automatically extract knowledge to provide accurate and transparent predictive models. To this purpose we developed a new knowledge discovery and organisation methodology called SOHN (Self Organising Hypotheses Network).|
The key concept underlying the SOHN approach is a hypothesis. A hypothesis can be seen as a local model that can be more or less general. A hypothesis simply expresses a predictive relationship between a given property of a compound (structural, physico-chemical, pharmacophoric, etc.) and the studied endpoint (activity, toxicity, etc.). For instance, a hypothesis could be in the form “If the structure contains a given structural motif then it is potentially mutagenic” or “If the structure’s molecular weight is greater than 550 then it is unlikely to be mutagenic”. A hypothesis is therefore a simple and interpretable prediction element. Individual examples in the analysed dataset are either covered by a given hypothesis (i.e. the hypothesis applies to that example) or they are outside the scope of the hypothesis. Covered examples form the supporting examples for the given hypothesis and the activity distribution of these examples determines the predictive value for the hypothesis.
The first important step of the SOHN approach is to identify the most relevant hypotheses for a given endpoint; this can be achieved through statistical analysis of a dataset using machine learning techniques and information theory. Once the key hypotheses have been mined from the dataset, they are organised according to their degree of abstraction from the most generic to the most specific. This very important step leads to a network of hypotheses where a path between two nodes expresses the “is more generic than” relationship. In order to define if a hypothesis H1 is more generic than a hypothesis H2 we compare their respective sets of supporting examples S1 and S2. If S2 is a subset of S1 then H1 is more generic than H2 and will therefore be an ancestor of H2 in the SOHN. This definition based on supporting examples allows us to combine hypotheses of different natures (structural, physico-chemical, pharmacophoric, etc.) into a single network.
The resulting network of hypotheses (SOHN) becomes a powerful and versatile knowledge organisation. SOHNs can be used as transparent predictive models as well as information-rich organisations to support knowledge discovery (expert rule compilation, activity cliff detection, lead optimisation).
When used as a predictive model, the query compound is compared to the most generic hypotheses. Hypotheses that cover the query structure are then investigated further by searching their more specific child hypotheses and so on. This process is repeatedly performed in order to determine the most specific hypotheses applicable to the query compound. These final hypotheses act as very local and transparent models and can provide a good accuracy in prediction. Additionally, the examples covered by these specific hypotheses become relevant examples to support the outcome of the prediction and can be used to refine the confidence in a prediction using a similarity measure. SOHN models have an associated domain of applicability directly linked to the way hypotheses are defined.
In conclusion SOHNs are a new approach in QSAR modelling that provides accurate and transparent predictions along with confidence and applicability domain assessment thus meeting the OECD guidelines.
This presentation will describe the science of the Self Organising Hypotheses Networks and give a practical example of a mutagenicity prediction model based on structural hypotheses. The fragment based approach to produce structural hypotheses will be explained. We will also present the results of this model and demonstrate that it performs comparably to other approaches while also providing interpretable conclusions. Finally we will introduce the notion of confidence in SOHN models and describe how the applicability domain is assessed.