Skip to main content
Aller à la page d’accueil de la Commission européenne (s’ouvre dans une nouvelle fenêtre)
français fr
CORDIS - Résultats de la recherche de l’UE
CORDIS
Contenu archivé le 2024-05-27

Bilingual Automatic Parallel Indexing and Classification

Objectif

Most of today's published scientific and technical articles are written in English. The number of English documents being collected and maintained by information brokers/providers such as bibliographic database producers, libraries and publishers has increased rapidly. However, there is always a significant number of documents available only in the native language of the author. One method for facilitating the reliable and accurate access to this information is provided by smart indexing processes. This ensures a consistent indexing in multiple languages and also allows for the multilingual presentation of the information.

The objective of project BINDEX is to support information producers by integrating an existing generic solution, the AUTINDEX system, to automatically index and classify documents in English and in German into the production process of two user organisations. AUTINDEX takes advantage of sophisticated language processing technologies and already existing special purpose language resources such as thesauri, classification schemes and large lexicons which have to be adapted to the specific user requirements. Due to a modular design, the outcome of BINDEX will comprise various software utilities for monolingual indexing and classification in English and German as well as for a parallel bilingual indexing and classification, together with appropriate APIs to facilitate the integration of AUTINDEX in existing workflow environments.

Objectives:
The objective of BINDEX is to support information producers by integrating a generic solution, the AUTINDEX system, to index and classify automatically documents in English and in German into the production process of the two users with the advantage of quicker, cheaper and more consistent population of information repositories. AUTINDEX takes advantage of sophisticated language processing technologies and already existing special purpose language resources such as thesauri, classification schemes and large lexicons which have to be adapted to the specific user requirements. Due to a modular design, the outcome of the project will be various applicable mature software utilities for monolingual indexing and classification in English and German as well as for a parallel bilingual indexing and classification together with appropriate APIs to facilitate the integration of AUTINDEX in existing workflow environments.

Work description:
The aim of the trial BINDEX is to adopt the prototype of the AUTINDEX system which indexes and classifies automatically bilingual documents into the production process of the two users involved. As outcome an applicable mature software utility will be developed which can be used for a bilingual (English and German) indexing and classification. Additionally to a modular approach and well-defined APIs the system could be easily extended to cover other languages as well. The AUTINDEX approach is based on a controlled vocabulary and advanced natural language processing technologies. The controlled vocabulary is provided by a classical thesaurus together with a specialised bilingual dictionary, which presents a merge of the IAI German-English respectively English-German transfer dictionary and a so-called conversion dictionary, which maps different descriptor types in one language into the other. The linguistic processing provides all the information necessary to assign the thesaurus concepts to words including multiword units of the documents, i.e. the indexing, by performing a morpho-syntactic analysis, a term recognition component based on a shallow parsing combined with statistical techniques. Classification of documents is also based on the output of the linguistic processing and the classification schemes already in use on user' sides. Within this trial, the AUTINDEX system will be adopted to the requirements of the two users involved whereas in a first step the monolingual modules of the complete system are adopted and improved, and in the second phase the bilingual component will be enhanced. All modules will then have the same functional level. The whole system will be implemented as a web-service, therefore appropriate multilingual user interfaces will be developed as well as APIs to integrate the system into the production cycle of the potential users.

Milestones:
Three milestones can be identified: The first marks the further improved German indexing and classification component of the AUTINDEX system, the second the elaborated English component, and the third consists of a bilingual, English and German, component. All three components will be integrated in the workflow of the two users involved in the trial and will be intensively evaluated. The expected results will be a usable near market software package. Also a demonstrator will be available.

Champ scientifique (EuroSciVoc)

CORDIS classe les projets avec EuroSciVoc, une taxonomie multilingue des domaines scientifiques, grâce à un processus semi-automatique basé sur des techniques TLN. Voir: Le vocabulaire scientifique européen.

Vous devez vous identifier ou vous inscrire pour utiliser cette fonction

Programme(s)

Programmes de financement pluriannuels qui définissent les priorités de l’UE en matière de recherche et d’innovation.

Thème(s)

Les appels à propositions sont divisés en thèmes. Un thème définit un sujet ou un domaine spécifique dans le cadre duquel les candidats peuvent soumettre des propositions. La description d’un thème comprend sa portée spécifique et l’impact attendu du projet financé.

Appel à propositions

Procédure par laquelle les candidats sont invités à soumettre des propositions de projet en vue de bénéficier d’un financement de l’UE.

Données non disponibles

Régime de financement

Régime de financement (ou «type d’action») à l’intérieur d’un programme présentant des caractéristiques communes. Le régime de financement précise le champ d’application de ce qui est financé, le taux de remboursement, les critères d’évaluation spécifiques pour bénéficier du financement et les formes simplifiées de couverture des coûts, telles que les montants forfaitaires.

ACM - Preparatory, accompanying and support measures

Coordinateur

FACHINFORMATIONSZENTRUM TECHNIK E.V.
Contribution de l’UE
Aucune donnée
Adresse
OSTBAHNHOFSTRASSE 13
60314 FRANKFURT
Allemagne

Voir sur la carte

Coût total

Les coûts totaux encourus par l’organisation concernée pour participer au projet, y compris les coûts directs et indirects. Ce montant est un sous-ensemble du budget global du projet.

Aucune donnée

Participants (2)

Mon livret 0 0