"The aim of the ENEUS project is to develop an English-Basque machine translation system which responds to two main objectives: (1) it contributes to the MT research field by testing the portability of techniques to handle dissimilar source and target languages; and (2) it meets the social demand for a good quality English-Basque translation system.
Both rule-based and statistical approaches will be explored based on the systems and results obtained in the OpenMT (2006-2009) and EUSMT (Labaka, 2009) projects developed by the Ixa Group to translate from Spanish to Basque.
Matxin, an open rule-based machine translation system, will be extended to the English/Basque language pair. A hybrid constituent and dependency-based scheme for the analysis of Spanish will be transferred to English, integrating the successful Freeling Suite of Language Analyzers; the language pair structural differences will be thoroughly explored to fine-tune the existing limited transfer rules; and the maximation of reuse regarding the existing language pair will be exploited.
Based on EUSMT, a customised Moses baseline to improve segmentation and reordering for dissimilar languages and Basque in particular, the ENEUS project will investigate the statistical independence for the morphemes measured as χ2 on a large monoligual corpus to adapt the optimal segmentation option to the English-Basque language pair. Simultaneously the set of reordering rules will be adapted to the new language pair.
During the project, an effort to extend the bilingual and monolingual corpora is also planned given its importance to exploit the latest corpus-based approaches.
The ENEUS project brings together expertise from three different fields, that of linguistics, computer science and translation, allowing the fellow to re-start her research career within a reknown NLP research group receiving specialised computational training while contributing with a specialised linguistic and translation background."
Field of science
- /natural sciences/computer and information sciences
- /humanities/languages and literature/linguistics
Call for proposal
See other projects for this call