Poster 6: Use of Finite Mixture Models for Clustering Chemical Compounds and Novelty Detection.Damjan Krstajic1, Simon Thomas2
|1Research Centre for Cheminformatics, Serbia|
2Cyprotex Discovery Ltd, UK
|The finite mixture model (FMM) provides a statistical approach to clustering. In an FMM each cluster is associated with a component probability distribution. Therefore, selection of the number of clusters and the appropriate clustering method can be seen as a statistical model selection. Outliers are handled by adding one or more components representing a different distribution for outlying data.|
We use FMMs with various clustering methods to analyse diversity and ways of forming groups of similar compounds. Furthermore, we use them as a means of performing novelty detection. In this mode, we calculate the probabilities that a new compound belongs to the different clusters. Finally, we use FMMs as a pre-processing step in the generation of QSAR models, in order to decide the level of similarity at which to operate (e.g. whether to develop models at a 'global' or a 'local' level).
We will illustrate the above concepts with clusters of compounds generated by FMMs on various datasets and the possibilities of using finite mixtures in QSAR.