Skip to main content

Reverse EngiNeering of sOcial Information pRocessing

Periodic Reporting for period 2 - RENOIR (Reverse EngiNeering of sOcial Information pRocessing)

Reporting period: 2018-01-01 to 2019-12-31

In today’s world, access to information is a decisive factor advancing industry, society and even culture. It is therefore of great importance to understand why and how some information (e.g. some memes) spreads virally with great ease, while other is met with disinterest and omission. Uncovering the reasons may allow promoting important information, like warnings about cyber-attacks, while stifle harmful rumors, such as vaccines causing autism. The aim of the project is to treat the vast complexity of such information dynamics in social systems by involving researchers in social sciences, journalism, computing, data mining and complexity science. The Project’s objectives are:
- discovery and reverse-engineering the mechanisms of information spreading in social media, such as dynamics of news releases, blogs, Twitter messages, e-mails etc.,
- training and exchange of knowledge between partners in different domains,
- bidirectional knowledge transfer between academia and media industry by exposing researchers to real-life problems and giving business access to innovative methods and tools for information analysis.
The project is based on three pillars: data acquisition, data mining/machine learning and complex systems modeling. The specific problems addressed include understanding rules of and predicting information spreading in different media and about different topics, finding information sources and uncovering hidden information channels. The secondments will accelerate individual careers of involved researchers, especially early stage ones. The project will lay foundations for long-term collaboration by strengthening existing links between partners and creating new ones.
"The research in the project has been focused on three areas: understanding how news spread, predicting the stories that the news build and finding where information circulating in a network originated from.
To research the spreading of news, one of the most basic issues is the ability to track them. This has been tackled in study of news released by Slovenian Press Agency and the use of these news by publishers such as newspapers, web information portals, prominent blogs, etc. The study shown that it's possible to track news with high accuracy by using text similarity measures, also revealing that most of the time news are transmitted ""as is"", without large changes. As a practical result, a tool for tracking use of Slovenian Press Agency articles has been created, that is being used by the Slovenian Press Agency and is offered to other news agencies. We have also investigated both in theory and actual data gathered by EventRegistry what biases and barriers exist and what effect they ultimately have on spreading of news. We find that barriers and reporting biases exist between various geographical parts, for example Europe and North America following each other's news closely, while what happens in Africa gathers little interest outside it.
News reports often form chains describing certain events, or describe chains of events. The availability of aggregated data in EventRegistry makes it possible to attempt to answer an interesting question: given current news, what will we hear next ? By treating the media as an echo of events that happen around the world, it may be possible to find characteristic patterns and chains of events and use the principle of ""history repeating itself"" in practice to predict some events that will happen in the future. Thorough the project we have formulated methods to represent and measure the so-called causal templates found in the news worldwide, as well as build data-driven predictive model out of it. One of the challenges is the sheer scope of worldwide news publishing and enormous amount of possibilities of what could happen, that require new ways to handle data in entirely scalable way and methods to reduce complexity of the problem. The global event predictive model was not fully constructed, but some of the methods were tested or applied on smaller scale.
In parallel to data-driven studies the problem of information spreading has been investigated from a more theoretical point of view. If we know how information spreads on microscopic level and are able to observe some users, is it possible to determine where the information originally came from ? It turns out that it is possible, albeit with accuracy depending on the amount of data, topology of the connections between users and how much noise there is in spreading process. We have formulated improved methods to locate the true source given only limited information about where and when the information arrived. The improvement allows processing of large networks, comparable in size to online social networks as well as letting us exploit facts about where information did not arrive yet to make a much more accurate estimation early during the spreading process.
The project also yielded few other results not directly related to one of the three focused areas, such as new innovation seeding methods that maximize the final reach at the cost of time and new methods to model belief propagation.
In summary, we have understood a lot about how news spread, developed ways to build new large-scale data-driven predictive models for events and created algorithms to locate source of information spreading in social media given only a little data. Almost all work done in the project is a result of collaborations, forming new ties between institutions as well as exposing researchers to new topics in different disciplines and preparing them to tackle more complex interdisciplinary issues in the future.
The project's results were published in 40 papers, with 2 additional accepted and awaiting publication, 10 submitted and 10 in preparation, as well as numerous conference talks and posters throughout the project's duration."
"The project pushes the boundaries of state of the art both in development of theoretical models as well as practical applications.
The textual similarity is not a new concept, hence our research that used it did not advance the methods. However, by applying these methods to a real challenge that news agencies are facing we have developed a practical tool able to answer a vital question - who and how uses our output ? This allows news agencies to make much better informed decisions about how to change their service.
The idea of looking for repeating patterns to build a prediction is a core of all machine learning methods. But using all news publisher content as a proxy for actual events that happen in society to predict the future events goes beyond state of the art. The global scale of the system we attempt to predict and the high complexity of both production and spreading of news required new methods and approaches to be developed. While the system to predict future to some degree based on ""history repeats itself"" principle has not been finalized, the research on how to deal with complexity and scale of data yielded new methods beyond previous state of the art.
Finally the new, improved methods for locating the source of spreading information are not only, unlike previous similar methods, fast enough to be applied to large real social networks, they allow to use lack of information as information itself to locate source very early, when little data is actually gathered. This advances the state of the art on the methods to find information sources in social networks."