Objectif
Most of today's published scientific and technical articles are written in English. The number of English documents being collected and maintained by information brokers/providers such as bibliographic database producers, libraries and publishers has increased rapidly. However, there is always a significant number of documents available only in the native language of the author. One method for facilitating the reliable and accurate access to this information is provided by smart indexing processes. This ensures a consistent indexing in multiple languages and also allows for the multilingual presentation of the information.
The objective of project BINDEX is to support information producers by integrating an existing generic solution, the AUTINDEX system, to automatically index and classify documents in English and in German into the production process of two user organisations. AUTINDEX takes advantage of sophisticated language processing technologies and already existing special purpose language resources such as thesauri, classification schemes and large lexicons which have to be adapted to the specific user requirements. Due to a modular design, the outcome of BINDEX will comprise various software utilities for monolingual indexing and classification in English and German as well as for a parallel bilingual indexing and classification, together with appropriate APIs to facilitate the integration of AUTINDEX in existing workflow environments.
Objectives:
The objective of BINDEX is to support information producers by integrating a generic solution, the AUTINDEX system, to index and classify automatically documents in English and in German into the production process of the two users with the advantage of quicker, cheaper and more consistent population of information repositories. AUTINDEX takes advantage of sophisticated language processing technologies and already existing special purpose language resources such as thesauri, classification schemes and large lexicons which have to be adapted to the specific user requirements. Due to a modular design, the outcome of the project will be various applicable mature software utilities for monolingual indexing and classification in English and German as well as for a parallel bilingual indexing and classification together with appropriate APIs to facilitate the integration of AUTINDEX in existing workflow environments.
Work description:
The aim of the trial BINDEX is to adopt the prototype of the AUTINDEX system which indexes and classifies automatically bilingual documents into the production process of the two users involved. As outcome an applicable mature software utility will be developed which can be used for a bilingual (English and German) indexing and classification. Additionally to a modular approach and well-defined APIs the system could be easily extended to cover other languages as well. The AUTINDEX approach is based on a controlled vocabulary and advanced natural language processing technologies. The controlled vocabulary is provided by a classical thesaurus together with a specialised bilingual dictionary, which presents a merge of the IAI German-English respectively English-German transfer dictionary and a so-called conversion dictionary, which maps different descriptor types in one language into the other. The linguistic processing provides all the information necessary to assign the thesaurus concepts to words including multiword units of the documents, i.e. the indexing, by performing a morpho-syntactic analysis, a term recognition component based on a shallow parsing combined with statistical techniques. Classification of documents is also based on the output of the linguistic processing and the classification schemes already in use on user' sides. Within this trial, the AUTINDEX system will be adopted to the requirements of the two users involved whereas in a first step the monolingual modules of the complete system are adopted and improved, and in the second phase the bilingual component will be enhanced. All modules will then have the same functional level. The whole system will be implemented as a web-service, therefore appropriate multilingual user interfaces will be developed as well as APIs to integrate the system into the production cycle of the potential users.
Milestones:
Three milestones can be identified: The first marks the further improved German indexing and classification component of the AUTINDEX system, the second the elaborated English component, and the third consists of a bilingual, English and German, component. All three components will be integrated in the workflow of the two users involved in the trial and will be intensively evaluated. The expected results will be a usable near market software package. Also a demonstrator will be available.
Champ scientifique (EuroSciVoc)
CORDIS classe les projets avec EuroSciVoc, une taxonomie multilingue des domaines scientifiques, grâce à un processus semi-automatique basé sur des techniques TLN. Voir: Le vocabulaire scientifique européen.
CORDIS classe les projets avec EuroSciVoc, une taxonomie multilingue des domaines scientifiques, grâce à un processus semi-automatique basé sur des techniques TLN. Voir: Le vocabulaire scientifique européen.
- sciences naturelles informatique et science de l'information logiciel
- sciences naturelles informatique et science de l'information bases de données
- sciences naturelles informatique et science de l'information science des données traitement du langage naturel
Vous devez vous identifier ou vous inscrire pour utiliser cette fonction
Programme(s)
Programmes de financement pluriannuels qui définissent les priorités de l’UE en matière de recherche et d’innovation.
Programmes de financement pluriannuels qui définissent les priorités de l’UE en matière de recherche et d’innovation.
Thème(s)
Les appels à propositions sont divisés en thèmes. Un thème définit un sujet ou un domaine spécifique dans le cadre duquel les candidats peuvent soumettre des propositions. La description d’un thème comprend sa portée spécifique et l’impact attendu du projet financé.
Les appels à propositions sont divisés en thèmes. Un thème définit un sujet ou un domaine spécifique dans le cadre duquel les candidats peuvent soumettre des propositions. La description d’un thème comprend sa portée spécifique et l’impact attendu du projet financé.
Appel à propositions
Procédure par laquelle les candidats sont invités à soumettre des propositions de projet en vue de bénéficier d’un financement de l’UE.
Données non disponibles
Procédure par laquelle les candidats sont invités à soumettre des propositions de projet en vue de bénéficier d’un financement de l’UE.
Régime de financement
Régime de financement (ou «type d’action») à l’intérieur d’un programme présentant des caractéristiques communes. Le régime de financement précise le champ d’application de ce qui est financé, le taux de remboursement, les critères d’évaluation spécifiques pour bénéficier du financement et les formes simplifiées de couverture des coûts, telles que les montants forfaitaires.
Régime de financement (ou «type d’action») à l’intérieur d’un programme présentant des caractéristiques communes. Le régime de financement précise le champ d’application de ce qui est financé, le taux de remboursement, les critères d’évaluation spécifiques pour bénéficier du financement et les formes simplifiées de couverture des coûts, telles que les montants forfaitaires.
Coordinateur
60314 FRANKFURT
Allemagne
Les coûts totaux encourus par l’organisation concernée pour participer au projet, y compris les coûts directs et indirects. Ce montant est un sous-ensemble du budget global du projet.