Periodic Reporting for period 2 - CRACKER (Cracking the Language Barrier: Coordination, Evaluation and Resources for European MT Research)
Reporting period: 2016-07-01 to 2017-12-31
Back in 2014, the European Machine Translation research community had been facing exciting challenges but also experiencing increased pressure for success. Both originated from the frameworks and schedules of the EU, especially regarding the Digital Single Market, the commitment to the preservation of cultural and linguistic diversity, the commitment to personal mobility and the goal of a fair society. But MT research had also been facing increasing expectations and demand from the business world, where globalisation has multiplied linguistic markets. The only way to address these issues was an improvement of research in terms of efficiency and effectiveness through new modes of collaborative research that include shared challenges, intermediate targets and success metrics. CRACKER built upon existing efforts to meet its goals: community building, networking, roadmapping; organising benchmarking and evaluation campaigns; extending, administering and promoting resource infrastructures; coordination between H2020 MT research and innovation actions; coordination between the European MT research community and European user organisations and deployment actions; promotion of MT; training of skills and promoting education. CRACKER consisted of a carefully selected consortium: DFKI (DE), Charles University Prague (CZ), ELDA (FR), FBK (IT), Athena Research and Innovation Center (GR), University of Edinburgh (UK), University of Sheffield (UK). The language resources community was featured through ELDA. ATHENA RC lead the META-SHARE development with DFKI, FBK and ELDA being partners that contributed significantly to its design and implementation. The MT evaluations were represented through UEDIN (WMT) and FBK (IWSLT). USFD works on the interface between MT, Quality Translation and metrics-based evaluations. CUNI continued the QT/MT Marathon events. Our community building activities included all organisations and projects working on technologies for Multilingual Europe – this group is now known as the Cracking the Language Barrier federation. CRACKER built upon META-NET and continued the successful META-FORUM series of events in 2015, 2016, 2017. CRACKER conducted surveys to find out the impact of MT on society and economy, including the Human Language Project. CRACKER built bridges between the MT research and innovation activities carried out in H2020 and the deployment and service-oriented work in CEF AT. Evaluation campaigns were organised by continuing two series of dedicated workshops: WMT (Workshop on Statistical Machine Translation) for written text translation in 2016 and 2017 and IWSLT (International Workshop on Spoken Language Translation) for spoken language translation in 2015, 2016 and 2017. CRACKER supported the resource sharing activities by building upon and extending META-SHARE. The organisation of two MT Marathons served as a means for skill building, kicking off open-source tool development and external communication.
Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far
The work performed in CRACKER has been fully in line with the project plan. In the work package Quality Translation – Planning and Coordination (WP1), we carried out community building and coordination tasks and identified synergies with other projects. CRACKER initiated the Cracking the Language Barrier federation of organisations and projects working on technologies for multilingual Europe. It is organised around a multilateral Memorandum of Understanding and has, at the time of writing, almost 40 members. We created a website for the federation and are continuously working on enlarging its membership. The goal of the federation is to enable the European language and language technology community to speak with one voice when addressing stakeholders in politics and administration, especially with regard to our vision of the Human Language Project. CRACKER organised three large conferences, META-FORUM 2015 (including the Riga Summit 2015) in Riga, META-FORUM 2016 in Lisbon, and META-FORUM 2017 in Brussels. We also prepared several iterations of the Strategic Research and Innovation Agenda, the final document – Language Technologies for Multilingual Europe: Towards a Human Language Project – was published in December 2017. The goal of the work package Quality Translation – Evaluation (WP2) was to organise and to coordinate several evaluation campaigns. In this work package and in its whole work plan, CRACKER built upon established evaluation campaigns and workshops. CRACKER organised the International Workshop on Statistical Machine Translation (WMT 2016) in Berlin and WMT 2017 in Copenhagen as well as the International Workshop on Spoken Language Translation (IWSLT 2015) in Da Nang, IWSLT 2016 in Seattle and IWSLT 2017 in Tokyo. Several evaluation data sets were created. The work package Resource Sharing (WP3) continued work on META-SHARE, the open resource infrastructure and sharing facility designed, implemented, deployed and populated by META-NET. We took care of maintenance aspects. The portal was re-designed and re-structured, material was added, the licensing package received a makeover, the metadata scheme was updated and the search functionality improved. We also built bridges and identified synergies with other projects. In WP4, Communication and User Relations, CRACKER carried out surveys in order to identify the role and market penetration of technologies for High-Quality Machine Translation in industry and also in Language Service Providers. The first study was a collaboration between CRACKER and CommonSenseAdvisory (USA). A follow-up survey was run in April/May 2017. CRACKER also established a bridge to CEF AT, carried out dissemination activities and organised two QT Marathon events in Prague (2015, 2016). The Management work package (WP5) took care of managing the project including its website. Representatives of CRACKER gave multiple presentations on project-related topics at international events; the work carried out resulted in approximately 50 scientific publications.
Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)
CRACKER has reached all of its goals. The Cracking the Language Barrier federation has brought together the fragmented Multilingual Europe community to speak with one voice when addressing trans-national bodies and stakeholders in politics and administration. CRACKER has prepared several versions of its Strategic Research and Innovation Agenda. The final edition was published in Dec. 2017. CRACKER’s goal has been to bring the topic of Language Technologies back onto the list of priorities of the EC. With the re-insertion of the LT topic into the Horizon 2020 Work Programme (ICT-29) and other positive developments (STOA workshop/report; Human Language Project), it is evident that the topic is getting more and more visibility and traction in the EC and in the EP. Now, the political will is needed to establish the Human Language Project as a long-term research programme on LT. With regard to MT research, CRACKER’s activities have helped to provide new breakthroughs, especially with regard to Neural MT (see the reports of WMT 2016, 2017, IWSLT 2015, 2016, 2017). Neural MT was also presented and discussed at the QT/MT Marathon events organised by CRACKER. In terms of infrastructures, CRACKER has supported the advancement of META-SHARE with new features and functionalities.