The goal of MediaGist is to make significant advances in multilingual summarisation and sentiment analysis research so as to extract and present the GIST of online multilingual news and the corresponding commentaries.

During the first period, the researcher created an online system for crosslingual analysis of aggregated news and commentaries (the MediaGist system). It uses output from both objectives: summarization and sentiment analysis. It is designed to assist journalists to detect and explore news topics, which are controversially reported or discussed in different countries. News articles from current week are clustered separately in currently 5 languages and the clusters are then linked across languages. Sentiment analysis provides a basis to compute controversy scores and summaries help to explore the differences. Recognized entities play an important role in most of the system’s modules and provide another way to explore the data.

The system runs at this URL: the site contains details about the project as well.

A video describing its functionality can be found at:

The progress towards the multilingual summarisation objective goes along with international communities (Multilingual summarization community – MultiLing - and FP7/SENSEI consortium). The main issue was organising and participating in MultiLing 2015 shared tasks concluded by the workshop associated with ACL/SigDial’15. In the OnForumS task, the submitted system performed with the best precision and in the multi-document multilingual task, the submitted system holds the first position in most of the languages and repeated the success from 2013. There were two lines of progress in the own summarization approach: adapting the summarizer to process more languages and automatic event extraction. Four papers describe the progress (conferences: SigDial’15, FIRE’15, Data/znalosti’15 and RANLP’15).

The progress towards the sentiment analysis objective was mainly driven by successful participations in SemEval shared tasks. The researcher worked on the submission with other group members at the host institution. In 2014, the aspect-based SA system was ranked 4th (from 32) and in 2016, the tweet detection system was ranked in top 4 for most of the analysed topics (from 19). The approaches were presented at ACL/WASSA’14, SemEval’14, and submitted to SemEval’16. A side project on using brain data for SA, investigated by an international group led by University of Trento, was presented in the Journal for Language Technology and Computational Linguistics.

Researcher’s involvement in the international research include mainly organising and participating in efforts of the multilingual summarization community and collaborations with the FP7/SENSEI consortium (mainly University of Essex, UK) and EC’s Joint Research Centre, Italy. The researcher also reviewed 5 journal papers, 30 conference papers and he took part in 7 conference program committees.

As part of integration activities, the researcher was leading 6 courses during the first two years of the project, teaching 385 students in total. It includes the Information Retrieval course, which was prepared by the researcher and started during the actual term. He supervised 7 undergraduate, 2 master and 1 PhD student.

The researcher published 1 journal paper and 8 conference papers, 2 papers are currently in print and 2 new papers were recently submitted.

He realised 4 collaboration missions (JRC and University of Essex), 4 conference missions (ACL’14, RANLP’15, Data a znalosti’15, SigDial’15) and participated at the MCAA meeting, which started its Czech chapter.

The MediaGist system was presented and discussed at JRC, because of its relevance to the real-time media monitoring done there. It was also demonstrated at the host institution’s Open day 2016.

Planned activities include publishing a demo paper about MediaGist (the paper is currently under review at the top NLP conference NAACL/HLT’16, held in San Diego in June). MediaGist will be presented at the Language and computation meeting at the University of Essex in March.

The total cost of the project’s phase I was EUR 97k (with EU Contribution of EUR 50k and host institution contribution of EUR 47k).


