TITLE: Modeling Text with Generalizable Gaussian Mixtures
AUTHORS: Lars Kai Hansen, Sigurdur Sigurdsson, Thomas Kolenda,
Finn Årup Nielsen, Ulrik Kjems and Jan Larsen
Department of Mathematical Modelling, Building 321
Technical University of Denmark, DK-2800 Lyngby, Denmark
emails: lkhansen,siggi,thko,fn,uk,jl@imm.dtu.dk
www: http://eivind.imm.dtu.dk
ABSTRACT:
We apply and discuss generalizable Gaussian mixture (GGM) models for text mining.
The model automatically adapts model complexity for a given
text representation. We show that the generalizability of
these models depends on the dimensionality of the
representation and the sample size. We discuss the relation between
supervised and unsupervised learning in text data. Finally,
we implement a novelty detector based on the density model.
Apperas in proc. of ICASSP-2000, Istanbul, Turkey, June 5-9, 2000.