Community Research and Development Information Service - CORDIS

H2020

CRACKER Report Summary

Project ID: 645357
Funded under: H2020-EU.2.1.1.4.

Periodic Reporting for period 1 - CRACKER (Cracking the Language Barrier: Coordination, Evaluation and Resources for European MT Research)

Reporting period: 2015-01-01 to 2016-06-30

Summary of the context and overall objectives of the project

The European machine translation (MT) research community is facing exciting challenges but is also experiencing increased pressure for rapid success. Both originate from the legal and political frameworks and schedules of the EU, especially regarding the Digital Single Market (DSM), the commitment to the preservation of cultural and linguistic diversity, the commitment to personal mobility and the goal of a fair society based on equality and inclusiveness. But MT research also faces steeply increasing expectation and economic demand from the business world, where globalisation has not only multiplied markets but also the number of languages companies have to deal with. On the other side, the research community has to cope with a striking disproportion between the scope of the challenge and the available resources. The scope of the challenge is determined by the number of languages to be covered, the host of open problems and the variety of applications to be delivered. The height of the demand and the size of the problem stand in stark contrast to the limited resources available for solving it, in terms of budgets approved for R&D and data representing the relevant languages.
The only way to counter this discrepancy is a considerable improvement of research in terms of efficiency and effectiveness. Spectacular breakthroughs were recently achieved in various scientific disciplines through new modes of massively collaborative research, where international research communities agreed on major challenges, intermediate targets and success metrics. By sharing data, results and evaluation instruments they managed to evoke the power of evolutionary processes with effective instruments of collaborative cross-fertilisation and competitive selection.
Such successful new research schemes require a considerable overhead of support and coordination, especially for agreeing on priorities, planning of tasks, organising of evaluations and maintaining and sharing of research data and results. How can the needed support and coordination be provided, how can the required infrastructures be built up without depleting the tight research budgets? For the area of European MT research there is only one strategy available: to build on existing resources, competencies and infrastructures to the largest degree possible. In EU-funded MT research, a development toward massively collaborative research was already initiated through a tradition of evaluation campaigns such as the ones of the WMT and IWSLT workshops. The community also developed instruments for selecting priorities and strategic planning such as the META-FORUM conferences. We can also build on existing infrastructures for language resources such as META-SHARE, the META-NET platform for resource sharing.
CRACKER builds upon all of these efforts to meet its central goals: community building, networking, roadmapping; organising benchmarking and evaluation campaigns; extending, administering and promoting resource infrastructures; coordination between H2020 MT research and innovation actions; coordination between the European MT research community and large European user organisations and deployment actions; promotion of MT; support and promotion of standards for metadata, interfaces, quality assessment; training of skills and promoting education.
One of the key underlying concepts of CRACKER is to maximise its effectiveness and ensure the quality of its activities through the continuation of established networks, infrastructures, evaluation campaigns, etc. and through a carefully selected consortium:
• DFKI: Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Germany
• CUNI: Charles University in Prague, Czech Republic
• ELDA: Evaluations and Language Resources Distribution Agency, France
• FBK: Fondazione Bruno Kessler, Italy
• ATHENA RC: Athena Research and Innovation Center in Information, Communication and Knowledge Technologies, Greece
• UEDIN: University of Edinburgh, UK
• USFD: University of Sheffield, UK
CRACKER is endorsed and conducted by the relevant communities including, crucially, META-NET. META-SHARE, the open resource exchange facility designed, implemented by META-NET between 2010 and 2013, is proposed to be the resource and sharing platform of the Cracking the Language Barrier federation. The language resources community is featured in the consortium through ELDA who decided, in 2012, to continue operating their own META-SHARE node and to adopt META-SHARE as their main in-house platform for resource description and maintenance. ATHENA RC lead the META-SHARE development arm in META-NET (T4ME) with DFKI and FBK being two partners that contributed significantly to its design and implementation. The MT evaluation initiatives are represented through UEDIN who is the leading research centre behind Moses and who also organise (in cooperation with other CRACKER partners) the WMT evaluations as well as FBK who organise the IWSLT workshops. USFD works on the interface between MT, Quality Translation (QT) and metrics-based evaluations. Partner CUNI is continuing the QT/MT Marathon events.
One of our key goals is concerned with coordination, community building and communication. The scope of these activities was, in the first months of the project, extended to all organisations and projects working on technologies for Multilingual Europe – this group is now known as the Cracking the Language Barrier federation. CRACKER builds upon META-NET and continues the successful META-FORUM series of events in 2015, 2016, and 2017. Furthermore, CRACKER is conducting surveys to find out the true impact of MT on society and economy, especially of recent EU research results. Additionally, CRACKER seeks to build bridges between the MT research and innovation activities carried out in Horizon 2020 and the deployment and service-oriented work in Connecting Europe Facility Automated Translation (CEF AT).
Open, transparent and application-driven assessment and benchmarking is essential for effective search, selection and cross-fertilisation processes in technology evolution and, therefore, constitutes a cornerstone of the envisaged mode of massive collaboration in research and evaluation. CRACKER caters to Horizon 2020, to all European MT research and development and finally also to the worldwide research community. Driven by European research needs and perceived quality barriers, evaluation campaigns have been and will be organised. CRACKER takes into account the demands of the H2020 projects as well as European social and economical needs. CRACKER continues two established series of dedicated workshops: WMT (Workshop on Statistical Machine Translation) for written text translation and IWSLT (International Workshop on Spoken Language Translation) for spoken language translation. WMT and IWSLT have been successfully running for ca. ten years each and are organised by CRACKER for two (WMT) and three (IWSLT) additional years. We build upon the success of the previous editions of these initiatives and focus on more challenging language pairs.
CRACKER coordinates and supports the resource sharing activities underpinning the research and development activities described above by building upon and extending the META-SHARE infrastructure. CRACKER also provides tools to support translation evaluation. Translate5 is an open-source translation data management system, to be extended by CRACKER in terms of features and functionalities. The organisation of two MT Marathons serves as a means for skill building, kicking off open-source tool development and external communication.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

The work performed in CRACKER in the first 18 months of the action has been fully in line with the project plan.
In the work package “Quality Translation – Planning and Coordination” (WP1), we carry out community building and coordination tasks and identify synergies between the projects funded under Horizon 2020 ICT-17 and also several other projects and organisations. CRACKER initiated the Cracking the Language Barrier federation of organisations and projects working on technologies for multilingual Europe; the federation is organised around a multilateral Memorandum of Understanding and has, at the time of writing, 34 members (August 2016). We created a website and other communication channels for the federation and are continuously working on enlarging its membership and visibility in order to broaden its reach. The overall goal of the federation is to enable the European language and language technology community to speak with one voice when addressing stakeholders in politics and administration. Together with the project QT21 (also a member of the federation) we organised the LREC 2016 Workshop “Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem”. We organised two large-scale conferences, META-FORUM 2015 (including the Riga Summit on the Multilingual Digital Single Market 2015) in Riga, Latvia, and META-FORUM 2016 in Lisbon, Portugal. META-FORUM 2017 will take place in Budapest, Hungary. META-FORUM is a series of annual conferences initiated by META-NET in 2010. An important component of reaching out to politics and to funding agencies is the preparation of two strategy papers, the Strategic Agenda for the Multilingual Digital Single Market; Version 0.5 was unveiled at META-FORUM 2015, Version 0.9 was presented at META-FORUM 2016. The document presents recommendations to the EC how to make the Digital Single Market multilingual – if the language component is not taken into account for the EC’s Digital Single Market strategy, the Digital Single Market will continue to be a set of fragmented islands, divided by their respective languages.
The goal of the work package “Quality Translation – Evaluation” (WP2) is to organise and to coordinate several evaluation campaigns, collaborating closely with other ICT-17 projects (and beyond). In this work package and, actually in its whole work plan, CRACKER builds upon already established evaluation campaigns and workshops. In that regard, CRACKER organised the International Workshop on Statistical Machine Translation (WMT 2016) in Berlin, Germany, with several different shared tasks and evaluations and also the International Workshop on Spoken Language Translation 2015 in Da Nang, Vietnam. IWSLT 2016 (Seattle, USA) and IWSLT 2017 as well as WMT 2017 will also be organised by CRACKER. For the preparation and successful running of these workshops, several evaluation data sets were created in collaboration with other researchers and projects such as, for example, QT21.
The work package “Resource Sharing” (WP3) focuses upon META-SHARE, the open resource infrastructure and sharing facility designed, implemented, deployed and populated by META-NET, funded through the EU-projects T4ME, CESAR, METANET4U, META-NORD, now continued in CRACKER. The project is mainly taking care of several maintenance aspects of the META-SHARE software. In that regard, the portal was re-designed and re-structured, material was added, the licensing package received a makeover, the metadata scheme was updated and the search functionality improved. We promote resource sharing and also carry out dissemination activities. Furthermore, we built bridges and identified synergies with other projects and members of the Cracking the Language Barrier federation, most importantly MLi and LIDER. CRACKER also prepared an initial and an updated version of our Data Management Plan, which also was offered to the members of the federation as a template for their own DMPs. Moreover, through a subcontract for extending the open source software translate5, CRACKER is supporting tool development and provides mechanisms for data creation and curation, specifically directed at data sets for MT research, evaluation and innovation.
Within WP4, “Communication and User Relations”, CRACKER carries out – realised through a subcontract – two surveys in order to identify the role and market penetration of technologies for High-Quality Machine Translation (HQMT) in industry and also in Language Service Providers (LSPs). The first study, published on the CRACKER website on 1 July 2016, was a pilot collaboration between CRACKER and the think tank CommonSenseAdvisory (CSA), who have been running two broad and all-encompassing MT surveys at the beginning of 2016 themselves. The second, thorough, follow-up study and survey, fully tailored to CRACKER’s needs, will be prepared and run in the second half of the project for CRACKER by CSA. In WP4 we also established a bridge between CRACKER (and the wider federation) and the EC’s Connecting Europe Facility programme (CEF), especially its Automated Translation building block (AT). The work package has also carried out dissemination activities with regard to industry, politics and research, organised and planned through two communication plans (for example, a networking session at the EC’s ICT 2015 conference in Lisbon). This work package has also organised one QT Marathon training and tool-building event in Prague (September 2015). The second QT Marathon is taking place in September 2016.
The final work package, “Project Management and Coordination” (WP5) is taking care of managing CRACKER itself. Part of this activity is also the creation and continuous maintenance of the CRACKER website.
In addition to the results briefly mentioned above, representatives of CRACKER gave multiple presentations on several different project-related topics at international events; the work carried out so far has resulted in circa 30 scientific publications.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

Within the first half of the project, CRACKER has made significant progress in several areas of activity.
While still in an emerging state, the Cracking the Language Barrier federation has the potential of truly bringing together and enabling the highly fragmented European language and language technology communities to speak with one voice when addressing trans-national bodies and stakeholders in politics and administration. In order further to establish the federation firmly within the different communities and to generate a certain standing and reputation, more activities and measures are planned for the second half of CRACKER.
CRACKER has prepared two strategy papers, the Strategic Agenda for the Multilingual Digital Single Market Version 0.5 and the updated Version 0.9 (presented at META-FORUM 2015 and 2016 respectively). One of the main strategic goals of CRACKER, META-NET and the Cracking the Language Barrier federation is to bring the topic of language technologies as well as multilingual technologies back onto the list of priorities of the EC. This set of technologies can provide multiple different benefits for the European economy, European politics, European administration and the European citizen. In that regard, one important result and also success of our project is the strengthening of the bridge between the LT community and the Big Data Value Association (BDVA), establishing a solid link between the two communities. In September and October 2016 the collaboration between the LT Community and BDVA will be intensified in order best to align the forthcoming version 1.0 of the Multilingual Digital Single Market SRIA with the BDVA SRIA document.
To sum up the current situation, it is very much evident that the language and multilingualism topic is getting more and more visibility and traction in the European Commission and also in the European Parliament. CRACKER will continue to help bring about a change in order to establish a firm role for sophisticated language technologies not only for the Multilingual DSM but also for other application areas. The political will is needed to establish a language policy change on the level of Member States and the EU. CRACKER will continue to coordinate and to intensify the push and keep up the pressure on the Member States, the EP and the EC to initiate a concerted action, maybe even a shared programme between the European Union and the Member States.
With regard to the Machine Translation research supported by CRACKER, our activities have helped to provide new breakthroughs in the field, especially with regard to the emerging predominant paradigm of Neural Machine Translation and also combinations of systems (see the reports of the WMT 2016 as well as IWSLT 2015 and 2016 workshops). Futhermore, CRACKER is continuing to push the aspect of quality translation with novel shared tasks organised in the abovementioned evaluation campaigns. The topic of Neural MT is also being presented and discussed at the successful QT/MT Marathon events organised and supported by CRACKER, always keeping the programme and agenda of the marathon event up to date.
In terms of infrastructures, CRACKER has supported the advancement of the META-SHARE software with new features and functionalities; the same goes for the translation data management software translate5. Furthermore, we have helped and will continue to help regarding community building between different stakeholder groups, especially research, industry, politics and administration.

Related information

Record Number: 192954 / Last updated on: 2016-12-16