Community Research and Development Information Service - CORDIS

H2020

RENOIR Report Summary

Project ID: 691152
Funded under: H2020-EU.1.3.3.

Periodic Reporting for period 1 - RENOIR (Reverse EngiNeering of sOcial Information pRocessing)

Reporting period: 2016-01-01 to 2017-12-31

Summary of the context and overall objectives of the project

In today’s world, access to information is a decisive factor advancing industry, society and even culture. It is therefore of great importance to understand why and how some information (e.g. some memes) spreads virally with great ease, while other is met with disinterest and omission. Uncovering the reasons may allow promoting important information, like warnings about cyber-attacks, while stifle harmful rumors, such as vaccines causing autism. The aim of the project is to treat the vast complexity of such information dynamics in social systems by involving researchers in social sciences, journalism, computing, data mining and complexity science. The Project’s objectives are:
- Discovery and reverse-engineering the mechanisms of information spreading in social media, such as dynamics of news releases, blogs, Twitter messages, e-mails etc.,
- Training and exchange of knowledge between partners in different domains,
- Bidirectional knowledge transfer between academia and media industry by exposing researchers to real-life problems and giving business access to innovative methods and tools for information analysis.
The project is based on three pillars: data acquisition, data mining/machine learning and complex systems modeling. The specific problems addressed include understanding rules of and predicting information spreading in different media and about different topics, finding information sources and uncovering hidden information channels. The secondments will accelerate individual careers of involved researchers, especially early stage ones. The project will lay foundations for long-term collaboration by strengthening existing links between partners and creating new ones.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

The research done so far can be divided into three areas: investigation of spreading of news, information spreading in Twitter and theoretical approach to information source finding.
For research on spreading of news, one of the most basic issues is the ability to track information. This has been tackled in study of news released by Slovenian Press Agency and the use of these news by publishers such as newspapers, web information portals, prominent blogs, etc. The study shown that it's possible to track news with high accuracy by using text similarity measures, also revealing that most of the time news are transmitted "as is", without large changes. As a practical result, a tool for tracking use of Slovenian Press Agency articles has been created, allowing to directly exploit the obtained results.
When news spread they are rarely transmitted in unbiased way. Usually there is some bias – either in coverage of topics, geographical interests or sentiments included in the article. A comprehensive study based on the EventRegistry data has been performed, checking what kinds of reporting biases exist and to what degree. The study has shown significant biases in geographical coverage by different news agencies, and in consequence in publishers relying on specific agencies as their main sources, among other, less pronounced biases. The results allow better understanding the processes of news spreading and to undertake more complex descriptions.
News reports often form chains describing certain events, or describe chains of events. The availability of aggregated data in EventRegistry makes it possible to attempt to answer the question: given current news, what will we hear next ? Research into predicting future events, or at least news reports, has been started, looking at causality between events at different levels of details.
The Twitter is nowadays an important channel of news spreading. The pathways for spreading are mostly dictated by a network of follows – who is observing whose tweets. In collaboration with Stanford University there has been research into the mechanisms of that process. It shows that similarity of interests is a crucial factor in information transmission, not only directly, but also through shaping the network responsible for the spreading.
In parallel to data-driven studies the problem of information spreading has been investigated from a more theoretical point of view. Given a specific microscopic model, how to determine most likely source of information if we only have fragmented knowledge where and when the information arrived ? The research to improve existing methods has been done, resulting in methods that not only work significantly faster, but also show somewhat higher accuracy than their precedessors.
Overall there are significant results for understanding and tracking news spreading, models of information sprading in Twitter and improvements in algorithms to be exploited in further work. Almost all work done in the project is a result of collaborations, forming new ties between institutions as well as exposing researchers to new topics in different disciplines and preparing them to tackle more complex interdisciplinary issues in the future.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

Since it's impossible to fully and accurately describe such a complex process as information spreading, simplifications and approximations are always used when it is considered in a rigorous study. Typically it is treated as a simple diffusive or epidemic-like process, especially if something else is to be done with that model, such as locating information source. We are going a step further, and are combining more realistic models with the problem of finding sources of information. With methods that assume realistic model of spreading, it may be possible to find most probable sources of false rumors or disinformation with significantly higher accuracy.
A potential commercial application of the project's results regarding the ways news spread is a tool that allow tracking the use of news released by a press agency and thus assess the impact of various kinds of news produced. Tests shown that this tool can serve for quantification of information flows in the space of public media, at least within the same language.
The research into mechanisms of information spreading that has been performed so far has also another motivation. By investigating the vast amount of data on the news provided by EventRegistry it is possible to investigate chains of events on a global scale. Tracking of statistical relations between news not only does bring information about patterns in news reporting, but possibly patterns in the events themselves. In this way, it may be possible to predict future events in a proverbial case of "history repeating itself". Statistical data-driven methods may uncover signals of forthcoming events that are non-obvious or hidden in the thicket of other information. Through the research in this project we expect to be able to tell if such predictions are possible and how accurate they could be.
Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top