TITLE: Hierarchical Clustering for Datamining

AUTHORS: Anna Szymkowiak, Jan Larsen, Lars Kai Hansen
Informatics and Mathematical Modelling, Building 321
Technical University of Denmark, DK-2800 Lyngby, Denmark
emails: asz,jl,lkhansen@imm.dtu.dk
www: http://eivind.imm.dtu.dk


This paper presents hierarchical probabilistic clustering methods for unsupervised and supervised learning in datamining applications. The probabilistic clustering is based on the previously suggested Generalizable Gaussian Mixture model. A soft version of the Generalizable Gaussian Mixture model is also discussed. The proposed hierarchical scheme is agglomerative and based on a ${\cal L}_2$ distance metric. Unsupervised and supervised schemes are successfully tested on artificially data and for segmention of e-mails.

Apperas in special session on neural networks and datamining at KES-2001 Fifth International Conference on Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies , Osaka and Nara, Japan, September 6-8, 2001.