Home

Introduction
  - Boolean Retrieval
  - Word Association
  - Document Representation
  - Vector Space Model
  - Probabilistic Retrieval 
  - Latent Semantic Indexing
  - Document Classification

Related Sites
Research
  - Projects
  - People
  - Publications
  - Software

Internal
Back to THOR

The Internet is growing with an increasing rate, and it is obvious that it will be difficult to search for information in this gigantic digital library. The estimated size of the Internet, from February 1999, indicates that there are about 800 million pages on the World-Wide Web, on about 3 million servers [1].

Retrieval of text information is a difficult task. The problem can be either that the information is misinterpreted because of natural language ambiguities or the information need can be imprecisely or vaguely defined by the user [2]. This calls for improved automatic methods for searching and organizing text documents so information of interest can be accessed fast and accurately.

This introduction is a short overview of methods in Information Retrieval (IR). We start off by looking at the widely used Boolean retrieval method. Then the vector space model will be discussed followed by the extension of the vector space model called Latent Semantic Indexing. In the end the clustering and classification of documents will be discussed.