Abstract Details


Poster 6: Use of Finite Mixture Models for Clustering Chemical Compounds and Novelty Detection.

Damjan Krstajic1, Simon Thomas2
1Research Centre for Cheminformatics, Serbia
2Cyprotex Discovery Ltd, UK
The finite mixture model (FMM) provides a statistical approach to clustering. In an FMM each cluster is associated with a component probability distribution. Therefore, selection of the number of clusters and the appropriate clustering method can be seen as a statistical model selection. Outliers are handled by adding one or more components representing a different distribution for outlying data.

We use FMMs with various clustering methods to analyse diversity and ways of forming groups of similar compounds. Furthermore, we use them as a means of performing novelty detection. In this mode, we calculate the probabilities that a new compound belongs to the different clusters. Finally, we use FMMs as a pre-processing step in the generation of QSAR models, in order to decide the level of similarity at which to operate (e.g. whether to develop models at a 'global' or a 'local' level).

We will illustrate the above concepts with clusters of compounds generated by FMMs on various datasets and the possibilities of using finite mixtures in QSAR.

Return to Programme