Skip to main content

Cross-Language Evaluation Forum

Objective

The project aims at supporting multilingual information access to European digital libraries by providing a platform for the evaluation of monolingual and cross-language information retrieval systems. The technical infrastructure will include an evaluation protocol and metrics and a testbed of multilingual training and testing data. Two evaluation campaigns will be organised and the results will be analysed and discussed at annual workshops. Collaborative links will be established with similar systems evaluation initiatives in the US and Asia, working on other sets of languages. The end-product will be test-suites of multilingual data that can be used by system developers for benchmarking purposes. The goal is to assist European cross-language system development in order to guarantee its competitiveness on the global market.

Objectives:
The objective of the CLEF proposal is to support global digital library applications by:
(i) developing an infrastructure for the evaluation, testing and tuning of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and
(ii) creating test-suites of reusable data which can be employed by system developers for benchmarking purposes.

Through the organisation of system evaluation campaigns, the aim is to create a community of researchers and developers studying the same problems and to facilitate future collaborative initiatives between groups with similar interests. CLEF will also establish strong links, exchanging ideas and sharing results, with similar cross-language evaluation initiatives in the US and Asia, working on other sets of languages. The final goal is to assist and stimulate the development of European cross-language retrieval systems in order to guarantee their competitiveness on the global marketplace.

Work description:
The CLEF work schedule is planned in three stages: creation of the evaluation framework; organisation of two successive annual evaluation campaigns; formulation of exploitation and exit plans, and recommendations for future evaluation actions. The activity will begin with a survey of the needs of system developers and end users. The results will provide input for the final definition of the evaluation framework and the specific tasks to be offered by the CLEF campaigns.

These will include:
monolingual (non-English) information retrieval system evaluation;
cross-language text retrieval evaluation;
cross-language domain-specific evaluation.

A second survey will be performed at the end of the first campaign to identify new and emerging requirements.

A technical infrastructure will be implemented supporting:
an evaluation protocol and metrics;
a testbed of multilingual training and testing data;
rules to create multiple language sets of queries;
procedures to assess the runs submitted by participating systems;
procedures to generate precision and recall measures and produce comparable results;
a discussion forum.

The core set of languages in the multilingual collection will be English, French, German, Italian and Spanish; criteria will be defined for the addition of other European languages during the project lifetime. Two evaluation campaigns will be organised with the aim of testing different types of mono- and cross-language system issues. The results will be discussed at annual workshops. The end-product will be reusable electronic resources in the form of test-suites that can be used by system developers for future benchmarking activities. Distribution agreements will be negotiated with the data providers and an exploitation plan will be studied to make the test-suites generally available to the interested R&D community. An exit plan proposing mechanisms that could render future evaluation activities self-sustaining will be formulated.

Milestones:
User needs reports;
An evaluation infrastructure for mono- and cross-language information retrieval systems operating on European languages;
Multilingual comparable corpora;
Testing and training data;
Two system evaluation campaigns;
Annual workshops with Proceedings;
Test-suites for system developers;
Exploitation and exit plans.

Funding Scheme

ACM - Preparatory, accompanying and support measures

Coordinator

CONSIGLIO NAZIONALE DELLE RICERCHE
Address
Piazzale Aldo Moro 7
00185 Roma
Italy

Participants (5)

EUROSPIDER INFORMATION TECHNOLOGY AG
Switzerland
Address
Schaffhauserstrasse 18
8006 Zuerich
EVALUATIONS AND LANGUAGE RESOURCES DISTRIBUTION AGENCY
France
Address
55/57 Rue Brillat Savarin
75013 Paris
FUNDACION UNIVERSIDAD EMPRESA
Spain
Address
Serrano Jover 5, 7
28015 Madrid
INFORMATIONSZENTRUM SOZIALWISSENSCHAFTEN DER ARBEITSGEMEINSCHAFT SOZIALWISSENSCHAFTLICHER INSTITUTE E.V.
Germany
Address
Lennestrasse 30
53113 Bonn
NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY
United States
Address
100. Bureau Dr, Stop 08940
20899-8940 Gaithersburg, Maryland