Community Research and Development Information Service - CORDIS

LAURIN

The project focused on newspaper clipping archives and libraries which are actually articles cut from current newspapers. This is normally done with scissors and glue, files and copying machines. The aim of the project has been to replace this kind of work with electronic tools. This objective has been fully achieved and the LAURIN software package is actually used in real world conditions.

A second objective has been to set up the prototype of a distributed database and network comprising several partners from Europe. This objective has also been achieved but the network is still in the stage of a prototype and therefore not been implemented for daily work.

Results:
1. LAURIN software I; the image capturing tool “libClip”: LibClip 3.2, which has been developed by the Austrian company Improx based on layout analysis technology from the German company CCS-GmbH, is an image capturing tool especially developed for the needs of the clipping procedure. Newspaper pages up to the format A0 can be scanned, automatically extracted from the surrounding articles, pasted onto a A4 (or any other format) page, OCR processed, corrected and exported to a database via a ODBC gateway. The automatic extraction allows also a semi-automatic recording of bibliographic index data such as the title, subtitle, the caption line or the author of an article. Other data, such as the rubric (e.g. “foreign affairs”) have to be allocated manually.

2. LAURIN Software II; Local Database Tools and Indexing Tool:
The Index Tool together with some other tools for administering the database has been developed by the Italian software vendor CM-Sistemi. The software is running under Windows NT or 98, the relational database is implemented with the Oracle 8i DBMS. The main functionality of the indexing interface is to provide a list of all articles which have to be indexed, to allow quality control of the basic index data and to allocate thesaurus terms to the articles.

3. LAURIN Software III; Thesaurus and Thesaurus Management Tools:
The heart of the indexing system is the LAURIN thesaurus which has been developed by the University of Innsbruck. Both the thesaurus as well as the software is completely designed for multilingual usage. The LAURIN thesaurus has a number of aspects which will be described in detail in a separate paper. The two most outstanding characteristics are that firstly authority data such as names of persons, institutions, organisations, as well as geographical names are handled together with the “classic” subject headings. And secondly, that these authority data are connected with the subject headings by a number of special relation types which enlarge the set of relations commonly provided by the ISO standard on thesauri. Currently the LAURIN thesaurus comprises some 6000 subject headings (many of them multilingual), some 30.000 persons and some 200.000 geographical entries. Altogether around 540.000 relations are currently used to connect these concepts.

4. LAURIN Software IV; Search Interface:
The HTML based search interface is currently designed for internal use only. An advanced version will be developed after the implementation phase will have been finished. The main reason is that the development of a user-friendly search interface can only be done on the basis of real world data. Once the IZA database comprises some ten thousand articles it will be much easier to define requirements and specifications for this interface.

Related information

Result In Brief

Contact

Guenter MUEHLBERGER
Tel.: +43-512-5079050
Fax: +43-512-5072607
E-mail
Record Number: 26423 / Last updated on: 2003-01-17
Information source:
Collaboration sought: Further research or development support
Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top