Skip to main content

A proposal for cross language information retrieval and organisation of text and audio documents

Objective


The project achieved all its planned outcomes and produced all its deliverables, resulting in a highly usable cross language information retrieval system, which was extensively tested at the project's user sites. The only exception to the planned outcomes was in the original intention of the project to marry cross language retrieval with cross language speech retrieval: producing a system that offered both retrieval of documents in a language different from the query and documents in a different media form (i.e. spoken words) from the query. As described in the deliverable D4-2 (a mid-term project re-evaluation deliverable), the intention was to start exploring speech retrieval in the second half of the project. Before starting that however, it was decided prudent to conduct a review of the state of the art of current research and to review user group requirements.

The conclusions of the review were that speech retrieval was a lower priority for Clarity partners than anticipated and that recent advances in research on retrieval of spoken documents was producing results that questioned the necessity of the proposed research. Therefore, alternative work was described in D4-2 and then pursued in the remaining time of the project. Clarity research was extended in the recognition and classification of Named Entities (NEs) using a NE identification system for one language to match and classify names in another. Named entity identification was judged central to the reporting functionalities of Clarity i.e. the multi-document reports for filtering the document collections. The aim was to utilise an existing system that performed well in NE identification in one language (preferably English) and devise techniques that would allow the mapping of the identified names in cross-language fashion. This was achieved and integrated in the Clarity reporting system created by Clarity partner SICS.

Funding Scheme

CSC - Cost-sharing contracts

Coordinator

THE UNIVERSITY OF SHEFFIELD
Address
Firth Court, Western Bank
S10 2TN Sheffield
United Kingdom

Participants (6)

ALMA MEDIA CORPORATION
Finland
Address
Etelaeesplanadi 14
00101 Helsinki
BRITISH BROADCASTING CORPORATION
United Kingdom
Address
Broadcasting House
W1A 1AA London
SABIEDRIBA AR IEROBEZOTU ATBILDIBU "TILDE"
Latvia
Address
Vienibas Gatve 75 A
1004 Riga
SICS, SWEDISH INSTITUTE OF COMPUTER SCIENCE AB
Sweden
Address
Isafjordsgatan 22
1263 Kista
SWEDISH INSTITUTE OF COMPUTER SCIENCE
Sweden
Address
Isafjordsgatan 22
164 29 Kista
TAMPEREEN YLIOPISTO
Finland
Address
Kalevantie 4
33014 Tampere