TITLE: Probabilistic Hierarchical Clustering with Labeled and Unlabeled Data
AUTHORS: Jan Larsen, Anna Szymkowiak, Lars Kai Hansen
Informatics and Mathematical Modelling, Building 321
Technical University of Denmark, DK-2800 Lyngby, Denmark
emails: asz,jl,lkhansen@imm.dtu.dk
www: http://eivind.imm.dtu.dk
ABSTRACT:
This paper presents hierarchical
probabilistic clustering methods for unsupervised and supervised
learning in datamining applications, where supervised learning is
performed using both labeled and unlabeled examples.
The probabilistic clustering is based on the previously
suggested Generalizable Gaussian Mixture model and is extended
using a modified Expectation Maximization procedure for learning with both
unlabeled and labeled examples. The proposed hierarchical scheme
is agglomerative and based on probabilistic similarity measures. Here, we
compare a L2 dissimilarity measure, error confusion similarity, and
accumulated posterior cluster probability measure. The unsupervised and
supervised schemes are successfully tested on
artificially data and for e-mails segmentation.
invited submission for International Journal of Knowledge-Based Intelligent Engineering
Systems, 2001.