Community Research and Development Information Service - CORDIS

Abstract

In this paper we present three clustering and visualization techniques for large document collections which have been developed at the JRC. These are a) an implementation of the neural network application Websom, b) a hierarchical clustering method, and c) a new method to present collections in two-dimensional space which is based on previous hierarchical clustering. In order to put these three techniques into their context, we describe the general design of a document retrieval, information extraction and visualization system which is being developed at the JRC to support specific users within the European Commission. The description includes information on the individual components of the system, ie. a tool to retrieve the documents safely from the internet, a text pre-processing tool which also extracts some basic entities, a language recognizer, a keyword identification tool, a subject domain identifier and a word clustering tool.

Additional information

Authors: ISIS, ;JRC-ISPRA (IT),
Bibliographic Reference: Paper presented: 16th International Joint Conference on Artificial Intelligence 1999, (1999)
Availability: Available from Public Relations and Publications Unit, Ispra (IT)
Record Number: 199911172 / Last updated on: 1999-08-19
Category: PUBLICATION
Original language: en
Available languages: en