European Multilingual Information Retrieval

Objective

The aim of the EMIR project is to validate a linguistic and statistical approach for the indexation of free text and multilingual query of textual databases by the use of a prototype. The final goal is to provide the user with an opportunity to query in his own language text databases written in various languages. It will also make it possible to query simultaneously in one language databases containing texts in several different languages.
A feasibility study is being carried out into the automatic indexing of free text and the multilingual querying of text databases. At the end of the study, tools and utilities designed for such purposes will have been embodied in a demonstration prototype. To develop this, existing tools will be used to carry out such tasks as automatic indexing (based on statistical methods and using a linguistic treatment which employs morphological and syntactic analysis). The automatic indexing method produces, as part of the formatted database, a statistical model which can be used during the query answering phase to sort documents according to a relevance hierarchy. Monolingual queries in natural language can use a reformulation expert system which has at its disposal a large vocabulary stock. Work has started on an existing English/French prototype extending to an English/German pair which requires the development of an analyzer for German. The French/German pair will follow, resulting in a trilingual query system. Methods and tools could then be applied to other languages. Multilingual text databases will be employed.

A first prototype of the bilingual French English interrogation system has been developed. It is based on word for word translations.
A second prototype capable of taking multiunit words and expressions into account is currently in the experimental stage.
The final version of the bilingual prototype integrating both kinds of translations will be ready at the end of 1993. At the same time, a first version of the German monolingual prototype has been developed. It is based on a linguistic analysis integrating a morphological analysis including the treatment of 1-word compounds. This analysis is based on the full term dictionary. The syntactic analysis includes grammatical disambiguation and a simplified recognition of dependency relations.
The system developed within the project must be domain dependent. When processing a new domain, little work is needed to adapt the dictionaries and the user is helped by tools developed inside the project to perform this adaptation. More specifically, a semi-automatic method has been developed to extract compounds and their translations from texts that have already been translated.

In order to prove the generality of the approach, experimentation is done on three languages: English, French, and German. The English-French and French-German couples are currently under work. The German parser has been developed within the framework of the project. This parser specifically takes into account the splitting of compounds which is crucial for information retrieval systems.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

FP2-ESPRIT 2 - European strategic programme (EEC) for research and development in information technologies (ESPRIT), 1987-1992

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Data not available

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Data not available

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Data not available

Coordinator

Commissariat à l'Energie Atomique (CEA)

EU contribution

No data

Address

Centre d'Études de Saclay
91191 Gif-sur-Yvette
France

Total cost

No data

Participants (3)

SYSTEX

France

EU contribution

No data

Address

FERME DU MOULON
91190 GIF-SUR-YVETTE

Total cost

No data

Transmodul Software GmbH

Germany

EU contribution

No data

Address

Am Staden 18
66121 Saarbrücken

Total cost

No data

UNIVERSITE DE LIEGE

Belgium

EU contribution

No data

Address

PLACE DU AOUT, 7
4000 LIEGE

Total cost

No data

Objective

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (3)

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.