European Commission logo
English English
CORDIS - EU research results
CORDIS

Global Under-Resourced MEedia Translation

Article Category

Article available in the following languages:

Horizon 2020 project empowers biggest MT experiment in global news

Responding to the global media need for fast and accurate translations in languages with scarce data resources, an EU-funded project pushed the boundaries of research and newsroom solutions.

Digital Economy icon Digital Economy

Even though neural machine translation advances in leaps and bounds, for many languages, the results are not yet robust enough for use in newsroom workflows where accuracy is the paramount concern. The EU-funded GoURMET project aimed at improving the potential of neural machine translation for low-resource language pairs and domains. GoURMET focused on media monitoring, content creation and domain enhancements for health content in 16 low-resource language pairs. These languages have a potential weekly audience of over 120 million for the two media broadcasters involved in the project. As such, the project is the largest international experiment in MT by global news broadcasters. Three leading European Universities in the field (the University of Alicante, of Amsterdam and of Edinburgh as coordinator) worked closely with the innovation teams of British Broadcasting Corporation (BBC) and Deutsche Welle (DW) to identify languages of interest, compile training and test data from their content portfolio and have the output evaluated by native journalists. However, the 42-month project faced unanticipated challenges. “Over the last few years, we have had an incredible run of bad news, from COVID to the coup in Myanmar to the invasion of Ukraine,” confesses project coordinator Alexandra Birch. “It has been a challenge to engage journalists and language specialists who were under huge pressure due to the massive impact of COVID-19 on day-to-day production workflows.” “It has not just been the media who are under pressure. Our collaborators in Myanmar disappeared during the phase of the project when we were developing the Burmese translation models as researchers and academics were at the forefront of protests against the military,” reveals Birch.

Enhancing content and data in under-resourced languages

When GoURMET started, low-resourced machine translation papers would publish results on lower-resource European languages such as Romanian or Finnish or on small amounts of high-resource language pairs such as German. “The field has progressed enormously, and we have been able to play a small part in this,” notes Birch. The researchers ran three shared tasks on low-resource languages in Gujarati, Tamil and Hausa, which had large participation from both industry and academia. They have published over 80 publications and provided better ways of collecting data, the data itself, and models, which are freely available.

Paving the way for the future of machine translation in newsrooms

GoURMET’s models are high quality, open-source, easily accessible, locally installable and low-cost and, therefore, competitive against commercial systems, which are also integrated. The framework that GoURMET has provided to explore and experiment with genuine media use cases helped the BBC News Labs to develop a multilingual journalism toolkit. It involves a monitoring platform to follow developing news stories in any language which proved particularly potent during key global events like the Russia-Ukraine War; a discovery tool to identify and revert best original pieces, and a semi-autonomous graphics generator powered by MT models. News Labs was shortlisted for a BBC News Innovation Award and for a Computing Technology Product Award for this work. DW has successfully integrated machine translation into a production transcription, translation, subtitling and voice-over service called ‘plain X’, which is being rolled out to their journalists soon.

Keywords

GoURMET, machine translation, media, low resource language, neural machine translation

Discover other articles in the same domain of application