Testing the portability of techniques to handle dissimilar source and target languages in MT

Enhancing machine translation

EU research has advanced the study of machine translation (MT), promising major societal and industry impact. The project resulted in a system with enhanced MT architecture, offering a powerful tool for researchers, lecturers and students of natural language processing.

The EU-funded project 'Testing the portability of techniques to handle dissimilar source and target languages in MT' (ENEUS) combined expertise from the fields of linguistics, computer science and translation. The work is important for MT users as well as the study of interactions between computer and human languages. ENEUS measured the ability of the Matxin MT architecture to be ported to different language pairs. It also assessed the system in terms of having analytic languages (e.g. English) at source and agglutinative languages (e.g. Basque) at target. Matxin proved to be suitable for translation between dissimilar languages as it can handle deep analysis, with emphasis on morphosyntax. A rule-based machine translation (RBMT) prototype was built following work that ported the existing Spanish–Basque system to work in the English–Basque direction. The prototype covers 35 000 entries. It can address simple affirmative, negative and interrogative sentences comprised of indicative tenses for all four subject–object paradigms, as well as for active and passive voices and imperatives. ENEUS studied agglutinative features and word order profiles of English and of Basque, Finnish and Hungarian. The last three are agglutinative languages. Project work clearly showed that SMT systems cannot equally address all agglutinative languages, and that a more source language-oriented approach might be possible and more beneficial. Research on alignment for English–Finnish, English–Hungarian and English–Basque pairs resulted in ENEUS SMT systems being built for all pairs. As part of the ENEUS outreach programme, over 500 users contributed to the human evaluation campaign. They compared four English–Basque MT systems developed by the project as well as Google's state-of-the-art translator. Results showed the morphologically savvy SMT system was on a par with Google's translator; these two systems performed better against all others. ENEUS' best system has been integrated within the Bologna Translation Service at Elhuyar, and users will be able to access ENEUS prototypes through the Matxin website (powered by Elhuyar). The RBMT system is the first open-source English–Basque MT system. It is available to developers through sourceforge and offers the possibility of building and researching using English and Spanish as source languages to be translated into any other language.


