Periodic Reporting for period 2 - LT_Observatory (LT_OBSERVATORY - OBSERVATORY FOR LR and MT in EUROPE)
Reporting period: 2016-01-01 to 2016-12-31
It is widely agreed that an effective DSM will require automated translation services (henceforth MT for machine translation) to overcome this language barrier offering all possible combinations of languages within the EU. This means that language data, usually in the form of parallel corpora with two language versions of content, will be used to train and operate MT systems, while no doubt research will continue into improving the semantic accuracy and stylistic smoothness of the output.
The LT_Observatory project focused, in a feasible, practical and collaborative manner, (1) on the improvement of those language resources (LRs) susceptible to facilitate the machine translation (MT) needs of the Digital Service Infrastructures (DSIs) of the Connecting Europe Facility (CEF) and (2) on the MT needs of key European MT user segments identified in LTi’s European Language Cloud (ELC) Programme.
LT_Observatory project aimed at establishing the bases of the development of new multilingual solutions that will avoid the fragmentation of the DSM.
The project achieved this overall goal through 5 key objectives:
-O1: Contribute to the advanced use and improved quality of MT through the identification of Language Resources (LR) in existing pools and in other national resources.
-O2: Make these LR practically usable by offering different entry points and proposing criteria and methodologies for their qualitative improvement.
-O3: Identify national language strategies and policies that could support a synergetic approach towards improved MT services; identify national/regional funding sources as well as European Structural and Investment Funds that could be used to foster the improvement of language resources.
-O4: Bring together different stakeholders from the language community, to define a practical MT deployment roadmap for the future of Europe's multilingual landscape, in particular in view of the digital services of CEF regarding MT.
-O5: Create the on-line LT_Observatory as a sustainable infrastructure to optimise the use of LR through federation and benchmarking, and to bridge the gap between H2020 (and former FP7) Project results and CEF in the area of MT.
Figure 2 presents the LT_Observatory concept.
The main results are:
-An online LR catalogue (www.lt-innovate.org/lt-observe/language-resources-observatory) and an observatory for national/regional strategies and funding opportunities: LT-Observe www.lt-innovate.org/lt-observe/public-policy-observatory
-A LT/MT EcoGuide,
-Contributions to a Strategic Research and Innovation Agenda for LT
-A one-stop shop for LT stakeholders
WP 1 LANGUAGE RESOURCE/CATALOGUE STOCK-TAKING has been devoted to identifying existing LR catalogues and repositories, national parallel corpora, private LRs from LTi members, other sources on the web, making them collectively accessible via in a user-friendly interface on the LT_Observatory website, and adding useful information (Valorisation) such as metadata and user feed-back on usability.
WP2 LT DIALOGUE EVENTS supported the on-going dialogue between all stakeholders and actors in the area of language resources, machine translation and language technologies. To bridge this communication gap, the LT_Observatory has organised 2 high-level cross-sector and cross-stakeholder events (“MT Dialogue Days”) and 4 “MT Charrettes” with the objectives of providing new insights into what “makes MT tick”, accelerating cooperation amongst actors and facilitating the emergence of synergies and practical solutions.
WP3 LT NATIONAL SUPPORT FOR LT/LR/MT has investigated the national/regional support given to languages, language technology and language strategies in the EU Member States. It enquired about national and regional language strategies and the financial means available from all possible sources (national/regional budgets or from European Social and Investment Funds (e.g. ERDF (including INTERREG), Cohesion Fund, European Social Fund) to finance necessary/desirable improvements to language corpora.
WP4 LT_OBSERVATORY (ONLINE ACCESS POINT) has been devoted to the implementation of an on-line single access point for the information gathered throughout the LT_Observatory project including qualified information on language resources, best practice methodologies and national/regional funding sources. The LT Observatory is hosted on the existing LTI platform www.lt-innovat.eu/lt-observe
WP 5 JOINT SRIA AND MT ECOGUIDE focused on the elaboration of a Joint Strategic Research and Innovation Agenda (SRIA), based on past achievements of LT projects. The work for SRIA was conducted jointly with the CSA project CRACKER and with the involvement of other stakeholders through the initiative Cracking the Language Barrier. Additional, WP5 implied the preparation of a practical “MT EcoGuide, based on the output of the previous WPs (WP1,2 and 3). The MT EcoGuide works as a pathfinder through the European LT/MT ecosystem for different stakeholder groups and presents them the content that is tailor-made for their needs.
WP 6 DISSEMINATION, EXPLOITATION AND OUTREACH provided a strategy how to disseminate project results and make them sustainable. In order to create an impact, it was crucial to define a strategy at an early stage and to have dissemination activities accompanying all other project activities, in order to build up an increasing awareness level of the project and its results. At the same time, dissemination activities have led to engage stakeholders to become active members of the LT Observatory.
WP 7 PROJECT MANAGEMENT is devoted to management duties.
The new online LR catalogue (http://www.lt-innovate.eu/lt-observe) goes beyond the usual LR repositories in that it added value to the LRs (metadata, tags) and had them commented on and rated by expert practitioners. This adds to the usability of the LRs, in particular for the benefit of LT developers (mostly SMEs). Therefore in the long term it will contribute to the removal of language barriers in the DSM and will also benefit (user) companies attempting to grow due to improvements in MT systems and easier access to critical language resources.
Furthermore, through the contacts with national/regional authorities, a large number of public sector officials became aware of LT/MT and realised its potential for their respective country/region. This showed also the stark need for awareness raising as LT is not yet a “household” name. This fact led also to the “LT in Context” paper that explained non-technical people what LT can do for them, and these findings found also their way in the MT EcoGuide as recommendations to decision makers.
Further detail on the impact achieved can be found in section 1.3 of Part B.