Community Research and Development Information Service - CORDIS


This position paper. reports on ongoing work where three clustering and visualization techniques for large document collections (developed at the JRC) are applied on textual data to support the European Commission's investigation on suspected fraud cases. The techniques are (a) an implementation of the neural network application WEBSOM, (b) a hierarchical clustering method, and (c) a method to present collections in two-dimensional space which is based on previous hierarchical clustering. In order. to put these three techniques into their context, we describe the general design of a document retrieval, information extraction and visualization system which is being developed at the Joint Research Centre to support the anti-fraud office (OLAF) of the European Commission in their fight against frauds. The description includes information on the individual components of the system, i.e. a tool to retrieve documents safely from the internet, a text preprocessing tool which also extracts some basic entities, a language recognizer, a keyword identification tool, a subject domain identifier and a word clustering tool.

Additional information

Authors: HAGMAN J, JRC-Ispra, ISIS, RMDS, AIM;STEINBERGER R, JRC-Ispra, ISIS, RMDS, AIM;PERROTTA D, DG Infso, Information Society Techn.;VARFIS A, JRC-Ispra, ISIS, MIA
Bibliographic Reference: Paper presented: 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, Lyon (FR), 13-16 September 2000
Availability: JRC, ISIS, RMDS T.P. 361 21020 Ispra (VA), IT
Record Number: 200012003 / Last updated on: 2000-06-05
Original language: en
Available languages: en