CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Social Networks: Algorithms, Privacy, and Security

Final Report Summary - SNAPS (Social Networks: Algorithms, Privacy, and Security)


This report covers the entire period of the project SNAPS from the beginning (June 2010) till the end of the project (March 2012). Note that the project terminated two months earlier from the expected termination date since the researcher obtained a permanent full-time faculty position at the hosting department with which (apparently) the project was incompatible. Nevertheless, the objectives of the project were achieved.

Given the prevalence of social netorks (online and offline) in our lives, and with the recent appearance of multiple online social networks, the objective of SNAPS was to develop algorithmic techniques for the analysis, exploration and exploitation of large online social networks and with the study of issues on privacy and security.

The researcher, Dr Aris Anagnostopoulos, worked on the project in the Department of Computer, Control, and Management Engineering (formerly known as Department of Computer and System Sciences) of the Sapienza University of Rome, while the scientist in charge was Prof. Stefano Leonardi, Full Professor in the same department.

Regarding objectives the project has been successful. We next describe the technical contributions and their significance. Along with collaborators from Brown University, Rhode Island, United States of America (USA), and from Yahoo! Research, California, USA, the researcher has extended his previous work on dynamic data of ICALP 2009. In that model the goal is to study algorithms that operate on data, while the data are changing over time. There is a bound on the number of queries that can be performed to the data in a given amount of time so the goal of an algorithm is to perform the right queries to the data so as to maintain a solution that remains as close to the real solution as possible. In addition, they have also applied the model to graphs and he has studied two problems, the minimum spanning tree problem and the problem of connectivity.

He has also studied information diffusion processes on social networks. Existing models of information diffusion assume that peer influence is the main reason for the observed propagation patterns. The researcher, along with some collaborators from Boston University, examined the role of authority pressure on the observed information cascades. They modeled this intuition by characterising some nodes in the network as 'authority' nodes. These are nodes that can influence large number of peers, while themselves cannot be influenced by peers. They proposed a model that associates with every item two parameters that quantify the impact of the peer and the authority pressure on the item's propagation. Given a network and the observed diffusion patterns of the item, they learn these parameters from the data and characterise the item as peer- or authority-propagated. They also developed a randomisation test that evaluates the statistical significance of their findings and makes their item characterisation robust to noise. Their experiments with real data from online media and scientific-collaboration networks indicated that there is a strong signal of authority pressure in these networks. In another line of work, he studied the application of models from statistical physics to issues of diffusion of opinions in social networks. He has been able to extend results of interest to physicians to more general problems related to the problem of influence maximisation on social networks.

On the more application-oriented side, he has collaborated with researchers from the Sapienza University of Rome, Italy, and from Yahoo! Research, Barcelona, Spain, to the development of a collaboration framework over the internet. The internet has enabled the collaboration of groups at a scale that was unseen before. A key problem for large collaboration groups is to be able to allocate tasks effectively. An effective task assignment method should consider both how fit teams are for each job as well as how fair the assignment is to team members, in terms that no one should be overloaded or unfairly singled out. The assignment has to be done automatically or semi-automatically given that it is difficult and time-consuming to keep track of the skills and the workload of each person. Obviously the method to do this assignment must also be computationally efficient. They presented a general framework for task-assignment problems. They provided a formal treatment on how to represent teams and tasks. They proposed alternative functions for measuring the fitness of a team performing a task and we discuss desirable properties of those functions. Then they focused on one class of task-assignment problems, they characterised the complexity of the problem, and they provided algorithms with provable approximation guarantees, as well as lower bounds. They also presented experimental results that showed that their methods are useful in practice in several application scenarios. Furthermore, they considered into account social structure: depending on factors such as geographic proximity, history of past collaborations, and so on, some teams might be more fit than others. They therefore incorporated in their model the coordination difficulty of the team, which can be quantified with various methods. This introduces an extra goal to optimise for, which requires the techniques of multicriteria optimisation. They presented approximation algorithms for various verisions of the problem, which they evalutated analytically and experimentally.

Regarding security on social networks the researcher studied the possibility of using ideas from game theory with the goal of taking into account the cost of an attack to the attacker. He defined a model for attacking and for users to protect themselves against attacks, in the case that a user can be attacked either directly or through some neigbhor in his network. The user has a cost to protect himself and a cost in the case that he becomes infected. The strategy of whether to protect or not depends on the various costs, on the network structure, as well as on the strategy of an adversary, who on his side he tries to maximise his utility which increases with the number of sites infected but decreases with his effors (the number of sites that he attacks). The researcher studied what are the Nash equilibria in these setting in some graph structures, which indicate the best strategies for the users against profit maximising adversaries. The researcher also studied techniques for obtaining information on the social relationships of users that is supposed to be private, in the case that one can combine a variety of anonymous sources, and exploiting some of their similarities. He has been able to give techniques that in theory are able to reveal significan part of private information, under some theoretical assumptions. While these techniques are not efficient for large graphs with current computational powwer, they signal the attention required when releasing data.

The researcher also studied related problems on query-log mining. Along with colleagues from the Sapienza University of Rome he introduced the problem of query covering as a means to efficiently cache query results. The general idea is to populate the cache with documents that contribute to the result pages of a large number of queries, as opposed to caching the top documents for each query. It turns out that the problem is hard and solving it requires knowledge of the structure of the queries and the results space, as well as knowledge of the input query distribution. They formulated the problem under the framework of stochastic optimisation; theoretically, it can be seen as a stochastic universal version of set multicover. While the problem is NP-hard to be solved exactly, we show that for any distribution it can be approximated using a simple greedy approach. The theoretical findings were complemented by experimental activity on real datasets, showing the feasibility and potential interest of query-covering approaches in practice.

We expect the research results to have significant socio-economic impact in the years to come. We steadily observe an increase use of human and social capital through collaborating projects such as tagging and geotagging systems, or simple crowdsourcing services such as Amazon's mechanical turk. We expect that such systems will increase in sophistications as crowdsourcing services become more elaborate, as online marketplaces start emerging (e.g. oDesk), and as users start communicating ubiquitously with smart-held devices. SNAPS has initiated a rigorous modeling and algorithmic analysis of such problems in the process of formalising the emerging requirement in a variety of application scenarios.