Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary
Content archived on 2024-05-14

MULTILINGUAL DOCUMENT PROCESSING

CORDIS provides links to public deliverables and publications of HORIZON projects.

Links to deliverables and publications from FP7 projects, as well as links to some specific result types such as dataset and software, are dynamically retrieved from OpenAIRE .

Exploitable results

The project bridges the gap between the current academic research into natural language processing (NLP) and machine translation (MT) and the needs of industrial users. Although the user should profit from the advantages that modern NLP techniques can offer with respect to maintenance, variability and extendibility, the user interfaces and usability of the results are ultimately more important than the underlying linguistic techniques. Through intensive dialogue on both sides, the steps necessary for industrial validation have been taken without losing the advantages of a modern NLP system. The CAT2 system is an official EUROTRA sideline that runs under SICStus Prolog on a UNIX or DOS platform. The formalism is unification-based and realizes translation with a system of tree-to-tree transductors. The grammars developed within this formalism are inspired by most recent linguistic formalisms. The user produces a text in German and sends it via e-mail to IAI, where the mail triggers the CAT2 system for automatic translation into English or French. The output of the translation is sent back to the user, who via a hotline can comment on the translation and introduce, if possible, new terminology. An extensive morphology has been developed and integrated into the CAT2 MT system for the German, English and French languages. For the three languages, three medium sized MT oriented dictionaries have been developed, where every lexical item specifies among other things the word-forming processes it may enter into (compounding and derivation) and a set of lexical functions. Language modules have been developed for the three languages based on a set of 'common' rules and language-specific parameters, allowing for the rapid inclusion of additional languages and parametrization according to the users' requirements.

Searching for OpenAIRE data...

There was an error trying to search data from OpenAIRE

No results available