The goal of the project is to create a translation service for the educational material of Massive Open Online Courses. The vision of this project is to overcome language barriers and thus help connect the world through great education. The results are showcased for 11 European and BRIC target languages, most of them weakly or fragmentarily supported by MT solutions.
To achieve its goal, the project pursues the following challenging scientific and technological objectives:
-High-quality machine translation of MOOC text. The high quality of the machine translation will be achieved through a hybrid translation schema that will combine automatic processes (i.e. text processing tools and MT resources that are adapted to cope with multi-genre MOOC educational text) as well as limited, focused human intervention (i.e. crowdsourcing in a novel time- and cost-efficient setup for evaluating the generated translation).
-Novel translation evaluation schemata and metrics. The second goal is the establishment of a new, appropriate, standardized, multi-level evaluation schema for deter-mining the value of the produced translation. Taking into account the idiosyncrasies of the particular text type and its multi-genre character, the evaluation schema will consist of a human and an automatic aspect, as well as a traditional explicit (direct) and an innovative implicit mode, that will be facilitated via a separate text mining application, namely topic identification. An analysis of results from a first phase evaluation process will be used for providing the translation engines with more accurate and wider-coverage data for re-training and thereby for improved translation in a second phase.
Infrastructure bootstrapping. In alignment with the need for high quality translation, even for poorly equipped languages, and the automatic nature of the proposed approach to translation, an important objective of TraMOOC is the automatic boot-strapping of new resources for languages that are fragmentarily or weakly equipped with infrastructure.
Language independence. The machine translation process will be language-independent. The translation approach will be statistical, and relying in principle on no language-dependent resources or thesauri. 11 languages are targeted to prove the lan-guage independent aspect of TraMOOC, i.e. nine European and two BRIC, languages, namely German (DE), Italian (IT), Portuguese (PT), Dutch (NL), Bulgarian (BG), Greek (EL), Polish (PL), Czech (CZ), Croatian (HR), Russian (RU) and Chinese (ZH). The particular languages were selected because they constitute strong use cases, i.e. that they warrant a market need for MOOC translation, they are of significant im-portance to the political and commercial agendas of the European Commission, and they also constitute challenging translation pairs, i.e. languages that are weakly equipped with tools and resources and languages that have been proven difficult to translate into.