The research in the project has been focused on three areas: understanding how news spread, predicting the stories that the news build and finding where information circulating in a network originated from.
To research the spreading of news, one of the most basic issues is the ability to track them. This has been tackled in study of news released by Slovenian Press Agency and the use of these news by publishers such as newspapers, web information portals, prominent blogs, etc. The study shown that it's possible to track news with high accuracy by using text similarity measures, also revealing that most of the time news are transmitted "as is", without large changes. As a practical result, a tool for tracking use of Slovenian Press Agency articles has been created, that is being used by the Slovenian Press Agency and is offered to other news agencies. We have also investigated both in theory and actual data gathered by EventRegistry what biases and barriers exist and what effect they ultimately have on spreading of news. We find that barriers and reporting biases exist between various geographical parts, for example Europe and North America following each other's news closely, while what happens in Africa gathers little interest outside it.
News reports often form chains describing certain events, or describe chains of events. The availability of aggregated data in EventRegistry makes it possible to attempt to answer an interesting question: given current news, what will we hear next ? By treating the media as an echo of events that happen around the world, it may be possible to find characteristic patterns and chains of events and use the principle of "history repeating itself" in practice to predict some events that will happen in the future. Thorough the project we have formulated methods to represent and measure the so-called causal templates found in the news worldwide, as well as build data-driven predictive model out of it. One of the challenges is the sheer scope of worldwide news publishing and enormous amount of possibilities of what could happen, that require new ways to handle data in entirely scalable way and methods to reduce complexity of the problem. The global event predictive model was not fully constructed, but some of the methods were tested or applied on smaller scale.
In parallel to data-driven studies the problem of information spreading has been investigated from a more theoretical point of view. If we know how information spreads on microscopic level and are able to observe some users, is it possible to determine where the information originally came from ? It turns out that it is possible, albeit with accuracy depending on the amount of data, topology of the connections between users and how much noise there is in spreading process. We have formulated improved methods to locate the true source given only limited information about where and when the information arrived. The improvement allows processing of large networks, comparable in size to online social networks as well as letting us exploit facts about where information did not arrive yet to make a much more accurate estimation early during the spreading process.
The project also yielded few other results not directly related to one of the three focused areas, such as new innovation seeding methods that maximize the final reach at the cost of time and new methods to model belief propagation.
In summary, we have understood a lot about how news spread, developed ways to build new large-scale data-driven predictive models for events and created algorithms to locate source of information spreading in social media given only a little data. Almost all work done in the project is a result of collaborations, forming new ties between institutions as well as exposing researchers to new topics in different disciplines and preparing them to tackle more complex interdisciplinary issues in the future.
The project's results were published in 40 papers, with 2 additional accepted and awaiting publication, 10 submitted and 10 in preparation, as well as numerous conference talks and posters throughout the project's duration.