TITLE: Modeling Text with Generalizable Gaussian Mixtures

AUTHORS: Lars Kai Hansen, Sigurdur Sigurdsson, Thomas Kolenda, Finn Årup Nielsen, Ulrik Kjems and Jan Larsen
Department of Mathematical Modelling, Building 321
Technical University of Denmark, DK-2800 Lyngby, Denmark
emails: lkhansen,siggi,thko,fn,uk,jl@imm.dtu.dk
www: http://eivind.imm.dtu.dk


We apply and discuss generalizable Gaussian mixture (GGM) models for text mining. The model automatically adapts model complexity for a given text representation. We show that the generalizability of these models depends on the dimensionality of the representation and the sample size. We discuss the relation between supervised and unsupervised learning in text data. Finally, we implement a novelty detector based on the density model.

Apperas in proc. of ICASSP-2000, Istanbul, Turkey, June 5-9, 2000.