Final Report Summary - MEDIAGIST (Summarisation and Sentiment Analysis for Evolving Multilingual Media Content)

The integration project of Josef Steinberger started in March 2014. Since the beginning, he has been involved in teaching, supervising and research of the Natural language processing group at the host institution.

His research goal was to make significant advances in multilingual summarisation and sentiment analysis research so as to extract and present the GIST of online multilingual news and the corresponding commentaries. He created an online system for crosslingual analysis of aggregated news and commentaries (the MediaGist system). It uses output from both objectives: summarization and sentiment analysis. It is designed to assist journalists to detect and explore news topics, which are controversially reported or discussed in different countries. News articles from current week are clustered separately in currently 5 languages and the clusters are then linked across languages. Sentiment analysis provides a basis to compute controversy scores and summaries help to explore the differences. Recognized entities play an important role in most of the system’s modules and provide another way to explore the data.

He presented a demo of the developed system at the most important event of 2016 – the ACL conference. The system runs at: and video describing its functionality can be found at:

He published 2 journal papers, 1 book chapter and 14 conference papers during the project. He currently has H-index 11 and 352 citations in Scopus, H-index 6 and 154 citation in Web of Science and H-index 18 and 1 265 citations in Google Scholar.

He taught 10 courses during the project, 608 students in total and prepared a new course on Information retrieval. He supervised 7 undergraduate students, 7 master students and 2 PhD students.

He was involved in many international activities. He reviewed 9 journal papers and 75 conference papers and took part in 29 conference program committees. He co-organized 4 workshops of main NLP conferences: two workshop of Balto-Slavic NLP (at RANLP’2015 and at EACL’2017) and two multilingual summarization workshops (Multiling at SigDial’2015 and EACL’2017). He was coordinating several shared task, experiments preceding the workshops. Researcher’s involvement in the international research includes common efforts with the FP7/SENSEI consortium (mainly University of Essex, UK) and EC’s Joint Research Centre, Italy. He also contributed to the systems that took part in shared tasks of the community (SemEval’14, Multiling’15, SemEval’16, BSNLP’17, Multiling’17 and SemEval’18). These systems were ranked among the best ones in many cases. He also chaired a conference with more than 60 attendees (a Czech and Slovak data community).

He went on 16 missions, 6 collaboration missions, 9 conferences and took part in the Czech chapter of MCAA as well.

The total cost of the project was EUR 202k (with EU Contribution of EUR 100k and host institution contribution of EUR 102k).

