TITLE: Webmining: Learning from the World Wide Web

AUTHORS: J. Larsen, L.K. Hansen, A. Szymkowiak, T. Christiansen and T. Kolenda
Informatics and Mathematical Modelling, Building 321
Technical University of Denmark, DK-2800 Lyngby, Denmark
emails: lkhansen,jl,thko@imm.dtu.dk
www: http://eivind.imm.dtu.dk


Automated analysis of the world wide web is a new challenging area relevant in many applications, e.g., retrieval, navigation and organization of information, automated information assistants, and e-commerce. This paper discusses the use of unsupervised and supervised learning methods for user behavior modeling and content-based segmentation and classification of web pages. The modeling is based on independent component analysis and hierarchical probabilistic clustering techniques.

Appears in special issue of Computational Statistics and Data Analysis, 2001.