Multilingual Text Tools and Corpora

Informazioni relative al progetto

MULTEXT

ID dell’accordo di sovvenzione: LRE62050

Progetto chiuso

Data di avvio 1 Gennaio 1994

Data di completamento 1 Marzo 1996

Finanziato da

Specific programme of research and technological development (EEC) in the field of telematic systems in areas of general interest - Linguistic research and engineering -, 1990-1994

Costo totale

Nessun dato

Contributo UE

Nessun dato

Coordinato da

Universite de Provence

CORDIS fornisce collegamenti ai risultati finali pubblici e alle pubblicazioni dei progetti ORIZZONTE.

I link ai risultati e alle pubblicazioni dei progetti del 7° PQ, così come i link ad alcuni tipi di risultati specifici come dataset e software, sono recuperati dinamicamente da .OpenAIRE .

Risultati sfruttabili

The project has developed a set of generally usable software tools to manipulate and analyse text corpora, together with lexicons and multilingual corpora in seven European languages. It has established conventions for the encoding of corpora and harmonized specifications for computational lexicons, building on and contributing to the preliminary recommendations of the relevant international and European standardization initiatives. MULTEXT has developed the first set of publicly available large-scale resources and tools for use in corpus-based language engineering applications. The project's specific achievements fall into three areas: lexical specifications for 7 European languages (English, French, Spanish, Italian, German, Dutch, Swedish), comprising the first large-scale application of and contribution to the EAGLES work in this area; specifications for encoding corpora in standard generalized markup language (SGML), comprising one of the first large-scale applications of the Text Encoding Initiative Guidelines; specification of a data architecture for linguistic corpora, providing the first hypertext view of such corpora. The tools include: a language-independent, parameterizable text tokenizer; a modular and language-independent part-of-speech tagger; a text aligner; a complete speech workbench; a public SGML query-language interpreter; a set of SGML-aware corpus exploitation tools. Text-oriented methods and software tools have come to be of primary interest to the natural language processing (NLP) community. The availability of basic multilingual tools and data will improve and extend research and development across a wide range of disciplines, including not only the various areas of language engineering, but also fields such as speech technology, language learning, lexicography and lexicology, information retrieval, etc. The project's methodologies and results are being used in a related project, thus extending the application to 13 western and eastern European languages.

È in corso la ricerca di dati su OpenAIRE...

Risultati sfruttabili

Scarica Scarica il contenuto della pagina