Community Research and Development Information Service - CORDIS


QT21 Report Summary

Project ID: 645452
Funded under: H2020-EU.

Periodic Reporting for period 1 - QT21 (QT21: Quality Translation 21)

Reporting period: 2015-02-01 to 2016-07-31

Summary of the context and overall objectives of the project

A European Digital Single Market free of barriers, including language barriers, is a stated EU objective to be achieved by 2020. The findings of the META-NET Language White Papers show that currently only 3 of the EU-27 languages enjoy moderate to good support by our machine translation technologies, with either weak (at best fragmentary) or no support for the vast majority of the EU-27 languages. This lack is a key obstacle impeding the free flow of people, information and trade in the European Digital Single Market. Many of the languages not supported by our current technologies show common traits: they are morphologically complex, with free and diverse word order. Often there are not enough training resources and/or processing tools. Together this results in drastic drops in translation quality. The combined challenges of linguistic phenomena and resource scenarios have created a large and under-explored grey area in the language technology map of European languages. Combining support from key stakeholders, QT21 addresses this grey area developing
(1) substantially improved statistical and machine-learning based translation models for challenging languages and resource scenarios,
(2) improved evaluation and continuous learning from mistakes, guided by a systematic analysis of quality barriers, informed by human translators,
(3) all with a strong focus on scalability, to ensure that learning and decoding with these models is efficient and that reliance on data (annotated or not) is minimised.
To continuously measure progress, and to provide a platform for sharing and collaboration (QT21 internally and beyond), the project revolves around a series of Shared Tasks, for maximum impact co-organised with WMT.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

"1) Work performed
During this period, the project produced 11 deliverables and wrote 93 different scientific papers from which 27 are system papers.
The scientific work performed by QT21 has benn done along three axis, a) semantics (WP1), b) morphology and low resources (WP2), c) continuous learning from humans (WP3).
In order to measure progress and compare with the state of the art, QT21 co-organises and sponsors WMT (Workshop on Machine Translation which goal is to benchmark and measure Machine Translation (MT) on different tasks (WP4).

In WP1, we develop models that overcome the problems occurring in syntactically and semantically divergent languages that cannot be adequately addressed by purely statistical models by:
o Structuring the translation differently, not along shallow phrases but rather along the semantics of the sentence (Task 1.1).
o Improving handling of semantics (the expressed relations between events, participants and other elements in the sentence) in existing shallow models (Task 1.2).
o Better learning in MT, both for existing models as well as by introducing novel models that learn full structural prediction (Task 1.3).
WP1 produced 3 deliverables in this reporting period: D1.1, D1.3 and D1.5.
The work in WP1 lead to the publication of 31 papers.

Work package 2 (WP2) addresses the problem of translating under-resourced and morphological rich languages. Even though tested only on the 6 QT21 language pairs, the methodology developed in this work package will make it possible to use MT for more languages and new MT based applications will be deploy-able.
Many methods addressing the challenges of this WP were already evaluated. Some of these techniques were already presented in major conferences in Natural Language Processing (NLP) and MT and were also part of submissions to open evaluation campaigns like the Conference on Statistical Machine Translation (WMT) shared task, partly organised by QT21.
The work package is organised around four tasks:
• T2.1: Morphology-aware word representations. We addressed the problem of large vocabularies in morphological rich languages by investigating new word representations. Instead of using words as atomic units, morphemes and factored representations were utilised to represent words during word alignment in phrase-based and syntax-based machine translation as well as in neural network models.
• T2.2 Modelling morphological agreement. In this task, we addressed the problem of modelling morphological agreement in statistical machine translation. In this period, new approaches to address the intra-sentence agreements between words as well as inter-sentence agreements such as anaphora were investigated.
• T2.3: Improved usage of parallel data through machine learning. In order to make better use of parallel data for under-resourced languages, a tighter integration of neural network models into the overall machine translation system was investigated. Furthermore, discriminative training methods to assess specific translation needs were investigated.
• T2.4: Exploiting other data sources for under-resourced languages. To improve the translation of under-resourced languages, new data sources have to be exploited. In this period we achieved this by using cross-lingual transfer of dependency annotations. Furthermore, looking at language independent approaches, we explored the use of a pivot language in order to learn sentence representation in a language neutral way but also the use of deciphering algorithms.
WP2 produced one deliverable D2.1
The work in WP2 lead to publication of 25 articles.

The objective of WP3 is to access and use the information provided by human feedback in various forms (human translation, MT error annotation and post-edits) for various MT systems and language pairs. This will make it possible to profile issues arising from machine translation and apply this knowledge to automatically improve the output of the systems in question.
WP3 is organised into three main tasks: T3.1, dealing with data collection; T3.2, dealing with the use of human-annotated data for diagnostic purposes; and T3.3, focusing on the use of human annotations to improve MT.
The activities carried out on all tasks during the reporting period are substantially in line with the project agenda. In particular:
• The work on preparing data collection has led to the definition of all the aspects relevant to the acquisition of reliable and high-quality human annotations (selection and adaptation of suitable annotation tools, definition of guidelines selection of the domain and type of data, training of dedicated MT models).
• The work done on the translation diagnosis front led to the definition of the harmonised error metric that unified the MQM and DQF frameworks and provided a subset recommended for detailed analysis of MT output. Other aspects addressed during the reporting period are the development of error test suites and reference-based metrics featuring high correlation with human annotations and the automatic estimation of MT quality.
• The work on exploiting human annotations has led to the first results on automatic post-editing (APE) as a way to improve MT quality by learning from human corrections.
WP3 produced 2 deliverables, D3.1 and D3.2
These activities led to the publication of 34 scientific papers.

in WP4, QT21 organizes three annual shared task campaigns, in which all research partners participate. Annual shared tasks are also open to outside machine translation research and development groups.
The evaluation campaign comprises five distinct tasks:
• a translation task
• a quality estimation task
• a metrics task
• a tuning task
• an automatic post-editing task
QT21 Quality Translation Shared Tasks comprise the core testing and validation activity within the project. New research methods developed in QT21 are evaluated with respect to impact on translation quality. The shared tasks also provide a platform for sharing ideas and expertise and to drive research through competition.
Each sub-task in WP 4 involves the organisation of a single shared task campaign, which are scheduled as follows:
• Task 4.1: First Quality Translation Shared Task [M01-M12]
• Task 4.2: Second Quality Translation Shared Task [M13-M24]
• Task 4.3: Third Quality Translation Shared Task [M25-M36]
Each shared task involves creation and distribution of training data, creation of test data, definition of an evaluation protocol, infrastructure to collect participant submissions, as well as automatic and manual scoring of submissions and publication of results.
WP4 produced 1 deliverable D4.1.
The work in WP4 lead to the publication of 6 articles plus 27 system papers.

QT21 has been active in disseminating towards different communities, scientific, industry, general public and decision makers.
. The webiste has been developed and is regularly maintained.
. 90 scientific articles have been published. From these 90 publications, 27 are system papers.
. GALA, a subcontractor to QT21 has organised talks at
. 2 conferences (GALA Annual Conference, TEKOM) organised by GALA, a QT21 subcontractor.
. 4 webinars hosted by GALA
. QT21 is present on different websites, on GALA ( and on FIT, a second subcontractor to QT21 ( ) also with a blog (
. DFKI organised a talk at LocWorld in Dublin
. QT21 was present at various events (LocWorld Tokyo, LT-Innovate)
. TAUS, Tilde and DFKI have produced 5 press releases for the industry but also for the general public. One press release in Germany lead to 4 radio interviews.
. The Data Management Plan has been produced and updated.
. The MQM-DQF unified quality framework produced by QT21 has gained traction in the industry with a dedicated Enterprise User Group (https:\\events\user-calls\dqf-enterprise-solution-user-call) and several large LSPs having integrated it into their platform.
WP5 produced 4 deliverables D5.1 D5.2m D5.5 and D5.6

2) Main Objectives
QT21 addresses this grey area with the following objectives:
- (1) To substantially improve statistical and machine-learning based translation models for challenging languages and resource scenarios;
- (2) To improve evaluation and continuous learning from mistakes, guided by a systematic analysis of quality barriers, informed by human translators;
- (3) To ensure that learning and decoding with these models is efficient and that reliance on data (annotated or not) is minimised;
- (4) To continuously measure progress, and to provide a platform for sharing and collaboration (QT21 internally and beyond), the project revolves around a series of Shared Tasks, for maximum impact co-organised with WMT;
- (5) To support early technology transfer, QT21 proposes a Technology Bridge linking ICT-17(a) and (b) projects and opening up the possibility of showing technical feasibility of early research outputs in near to operational environments.
The project works on 6 language pairs , 4 having English as source language (English->German, English->Czech, English->Latvian, English->Romanian) and 2 having English as target language (CzechEnglish, GermanEnglish).

3) Main Results:
- Objective (1): QT21 has made substantial contributions in the field of Neural Machine Translation (NMT) that has been emerging over the last two years, now moving State Of The Art (SOTA) ahead. Those contributions include among others “back translation” that allows to synthetically augment the training data volume, Byte Pair Encoding (BPE) to compress vocabularies of Morphologically Rich Languages (MRL) and to some extent System Combination (Ensemble). They all proved essential at WMT16, QT21 systems winning 2/3 of all WMT16 tasks, outperforming the known online MT systems on En↔De, En↔Cz and En→Ro, the core languages of QT21.

- Objective (2): QT21 submissions won the WMT16 Quality Estimation (QE) Task 3 on “predicting document level quality” (and scored 3rd from 13 submissions on task 1 at sentence level). QT21 systems also won all WMT16 Metrics tasks. In collaboration with TAUS, QT21 developed the Harmonised MQM-DQF error classification which is beginning to be used in the translation industry. In the WMT16 Automatic Post Editing task (APE), which aims at “learning from mistakes” (learning from post editions from professional translators) a QT21 system improved the baseline by 2.64 BLEU points with the 2nd best performance. Further, an online APE system that interacts with human post-editors has been developed within a continuous learning scenario improving APE over current results by 1 to 2 BLEU points.

- Objective (3): QT21 introduced back-translation (see Objective (1)) what reduces the dependency towards bi-lingual data. Further BPE (see Objective (1)) tackles well the problem of MRLs by significantly compressing their vocabulary.

- Objective (4): The organisation of WMT (co-organised with CRACKER-Horizon2020 # 645357) is at the core of this objective. The +48% increased number of submissions from 2015 to 2016 on the main task (News Task) demonstrates the community interest in benchmarking, discussing methods and sharing ideas.

- Objective (5): QT21 is working together with HimL (Horizon2020 # 644402), with a joint QT21-HimL submission to WMT and a workshop. TraMOOC (Horizon # 644333) is now using NMT from QT21. QTLeap (FP7 # 610516) is using the QT21 test suites."

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

QT21 leads MT technology development. The results from WMT16, the reference benchmark event on Machine Translation show that QT21 won 2/3 of all tasks, even being significantly better than the known online MT systems on En↔De, En↔Cz and En→Ro, the core languages from QT21. This is a first.
Also the traction the QT21 harmonised MQM-DQF error annotation paradigm is experiencing from the industry shows the impact of QT21 work in the translation industry.

New State Of The Art (SOTA) results have been obtained with “back translation”, Byte Pair Encoding (BPE) and to some extent System Combination (Ensemble).
Automatic Post Editing (APE) has for the first time proven being a very interesting approach by improving BLEU scores by more than 2.5 points. Further QT21 APE's system improves translations by 1 to 2 BLEU points on data that is not annotated (not using annotation information).

Automatic Post Editing has improved by 2 BLEU points with class based Language Models and factored models.

Related information

Record Number: 194853 / Last updated on: 2017-02-16