European Commission logo
English English
CORDIS - EU research results
CORDIS

Scalable Understanding of Multilingual Media

Article Category

Article available in the following languages:

News monitoring platform makes the job of media professionals easier and more efficient than ever

Media monitoring has become a challenge because it involves handling the massive growth in the number of broadcast and internet media channels worldwide. An EU initiative has met this challenge by developing a platform that deals with large volumes of data across many languages and different media types.

Digital Economy icon Digital Economy
Society icon Society

“The exponential growth of TV, radio, text and online news sources means that current media monitoring approaches can’t cope with the scale of the problem any longer,” says Prof. Steve Renals, coordinator of the EU-funded SUMMA project. Media monitoring is intricate, involving data in many languages, automatically processing and dealing with a huge amount of audio and video content. Integrating sophisticated speech and language technologies To assist journalists and media monitors, SUMMA has developed a scalable, multilingual monitoring platform that incorporates media processing tools and natural language processing technologies. The SUMMA team designed, developed and deployed the platform, and then tested several prototypes with journalists at BBC and Deutsche Welle, both partners in the project. Specifically, project partners developed state-of-the-art speech recognition and machine translation systems for German, English, Spanish, Latvian, Portuguese, Arabic, Persian (Farsi), Russian and Ukrainian. The platform currently processes these languages, but it can cover virtually all major languages by integrating off-the-shelf tools. The open source platform’s media processing tools, including speech recognition, automated transcription and machine translation, can scale to hundreds of audio and video streams, and extend to deal with the growth in the number of media streams. It’s flexible and able to cope with changes in user needs and smoothly integrate new technologies. Monitoring developments and searching trending topics made easy The platform’s fully automated monitoring system ingests content via an application programming interface. After ingestion, it automatically transcribes all audio from video, turning speech into text. It also automatically translates all text – from original text articles or from transcribed speech to text – into English. It uses that to come up with a cross-lingual overview of the content, clustering related items into stories, summarising stories and individual items, adding topical keywords and named entities, and adding sentiment analysis. The BBC is exploiting SUMMA’s outputs by using a prototype transcription engine that makes material ingested by BBC Monitoring searchable in a user-friendly way for monitoring journalists. The British international public service broadcaster is also employing a system that uses the platform to alert BBC World Service teams to published stories that would make ideal candidates for translation. In addition, Deutsche Welle is utilising SUMMA components in the European Broadcasting Union project Eurovox, which is developing standards for automated language processing such as translation, transcription, subtitling and voice-over for broadcasting. Two spin-out companies have been formed as a result of SUMMA. Based on the platform, Mindflux has developed a one-stop solution for automation-assisted content localisation to translate media in production quality. It will enable users to transcribe, translate and subtitle any audio, video or text in one place. Hatch AI has developed artificial intelligence and machine learning solutions for the financial services industry, building on the platform’s components. “With the SUMMA platform, it’s easier than ever to aggregate, structure and analyse language data,” concludes Prof. Renals. “Media professionals and newsrooms around the world can simply filter content to match their needs.”

Keywords

SUMMA, media, content, media monitoring, journalists, monitoring platform, media professionals, machine translation

Discover other articles in the same domain of application