Skip to main content

GOSSPLE: A Radically New Approach to Navigating the Digital Information Universe

Final Report Summary - GOSSPLE (GOSSPLE: A Radically New Approach to Navigating the Digital Information Universe)

The Web 2.0 has been taken over by users: They represent the greediest bandwidth consumers, the ultimate deciders of which applications are actually adopted and, maybe even more importantly, the most prolific content generators. The Web 2.0 represents a dynamic gold mine of information. Every single event, anywhere in the world, is likely to be instantaneously commented and debated online by some community of users. This has drastic consequences: everything is out there, yet sometimes remains extremely difficult to dig out. In addition, we spend a substantial amount of time browsing news on the Web but the quest is not always effective: the stream of available news is huge and keeps growing exponentially. How can we filter out unwanted content while being able to discover new and relevant information? How can we find an answer when looking for ultra-specific content? The GOSSPLE project takes up this challenge by personalizing the Web 2.0 experience in a scalable and efficient way. Indeed, we argue that personalization is crucial. For scalability reasons, decentralization represents a great candidate: indeed personalizing requires to store and maintain a large amount of information per user, that only a few companies can afford. In addition, privacy reasons, since in front of the apparent eagerness of companies to exploit user-generated content, the users are more and more reluctant to make their interests explicit.
While explicit approaches (e.g. social networks) dominate on the Web, we introduced in GOSSPLE the notion of implicit social network and prove that it can be effective in a number of applications. More specifically, we have designed a fully decentralized system to assign to each user an implicit social network, connecting her to a set of users sharing similar interests. This relies on the definitions of novel application-specific metrics to measure the similarity between users accounting for multi interests and emerging ones. The GOSSPLE social network is built using two gossip protocols: (1) a first protocol constructs in a fully decentralized manner a dynamic random topology; (2) the second protocol relies on the first one to sample the network and ensures every user gets connected to a set of other nodes sharing interests with respect to an application (k closest neighbours). Among the many applications that can benefit from the GOSSPLE social network, recommenders are very promising and can be applied to pretty much any Web content editor today. We developed a fully decentralized news recommender called WhatsUp in which is a user expresses her interests through a like/dislike interface. In turn, WhatsUp associates to each user the users that share similar interests in a privacy-aware manner, resulting in each user receiving news items matching more and more her interests. The sampling-based approach of GOSSPLE can also be used to implement scalable user-based collaborative filtering –based recommenders. We also developed a hybrid architecture offloading CPU-intensive recommendation tasks to front-end client browsers, while retaining storage and orchestration tasks within back-end servers.
While the GOSSPLE approach, being fully decentralized, addresses the privacy issue w.r.t. a central authority, it introduces the issue of privacy w.r.t. other users in the system as well as some vulnerabilities w.r.t. selfish or even malicious behaviors. We introduce several approaches to cope with such issues: (1) a privacy-preserving mechanism to achieve similarity computations between users without revealing their own profile, addressing also differential privacy; (2) a novel approach to address misbehaviors in distributed systems, arguing that behind each machine lies a user who cares about her reputation and therefore misbehaves only up to the point that the misbehavior is not detected. This represents a radical change as compared to traditional approaches to handle misbehaviors in distributed computing; (3) an approach to perform distributed computation addressing at once scalability, privacy and accuracy.

Finally, we are now in the process of creating a startup based on this technology: the objective if to provide a scalable user-centric recommendation engine that any web content provider car easily integrate to a Web page. The proposed feature is scalable, cheap, easy to integrate, provides hugh-quality recommendations and cope extremely well with the high level of dynamics exhibited by Web users today.