Skip to main content
European Commission logo
Deutsch Deutsch
CORDIS - Forschungsergebnisse der EU
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary
Inhalt archiviert am 2024-06-18

Statistical Machine Translation Using Monolingual Corpora

Ziel

As evidenced by a number of machine translation competitions, statistical machine translation is producing encouraging results for language pairs where large corpora of previously translated texts are available for training. However, in practice the availability of such data is often a severe bottleneck. We therefore propose a methodology that only requires a bilingual dictionary and monolingual text corpora of the source and the target language, which should considerably relieve the data acquisition problem. What we suggest is a two stage procedure. In the first step we create a database of translation equivalents by extracting them from a pair of comparable monolingual corpora using a bilingual dictionary in combination with automatically generated thesauri of related words. In the second step we translate new sentences by retrieving appropriate translation equivalents from the database and by merging them using a combinatorial approach.

Wissenschaftliches Gebiet

CORDIS klassifiziert Projekte mit EuroSciVoc, einer mehrsprachigen Taxonomie der Wissenschaftsbereiche, durch einen halbautomatischen Prozess, der auf Verfahren der Verarbeitung natürlicher Sprache beruht.

Aufforderung zur Vorschlagseinreichung

FP7-PEOPLE-2007-2-1-IEF
Andere Projekte für diesen Aufruf anzeigen

Koordinator

UNIVERSITAT ROVIRA I VIRGILI
EU-Beitrag
€ 207 884,12
Adresse
CARRER DE ESCORXADOR
43003 Tarragona
Spanien

Auf der Karte ansehen

Region
Este Cataluña Tarragona
Aktivitätstyp
Higher or Secondary Education Establishments
Kontakt Verwaltung
M. Dolores Jimenez Lopez (Dr.)
Links
Gesamtkosten
Keine Daten