Skip to main content
European Commission logo print header

The 3Ps of Distributed Information Delivery: Preferences, Privacy and Performance

Periodic Report Summary - DIP3 (The 3Ps of distributed information delivery: Preferences, privacy and performance)

Today there is an abundance of data on line. The grand challenge is turning this huge amount of data to knowledge useful to the individual users of the Internet. DIP3 addresses this challenge by tackling one form of data processing, often referred to as 'push' data delivery. In push data delivery, instead of explicitly searching for information, users get notified when relevant information becomes available. Examples of such systems include RSS feeds, news alerts and aggregators. The scientific objective of the proposal is to derive models, algorithms and techniques to control both the amount and quality of information received by users. To this end, we propose incorporating user preferences in data delivery to rank data items based on their relevance to the users. Although preference specification has been extensively studied, there is little previous research work on incorporating preferences in internet-scale data delivery. Furthermore, DIP3 will exploit the inherent social connections between users in Web 2.0 as expressed through social networks, social tagging, and other community-based features to enhance preference specification and ranked information delivery.

Research work and results

The researcher extended her work on preferences in databases. The focus of this specific scientific work is on preferences in conjunction with keyword-based search in relational databases. Query results are ranked based on both their relevance to the query and their preference degree for the user. Results of this line of research have been published in EDBT 2010. Furthermore, a demo of the implementation of an extension of this research towards extending databases with a recommendation functionality was presented in HDMS 2010.

New research results were attained in the context of database selection for XML document collections in cooperation with students of Prof. C. Pu in Georgia Tech. The focus of this research is on keyword queries with lowest common ancestor (LCA) semantics for defining query results, where the relevance of each document to a query is determined by properties of the LCA of those nodes in the XML document that contain the query keywords. Results of this line of research have been published in WWW 2010.

Two new lines of work with regards to privacy were initiated. The theoretical underpinning related to preferences and privacy preservation in large scale distributed systems was the main focus of both. The first line of work refers to the problem of privacy through data anonymisation in a distributed setting. Details of this line of research have been submitted for publication. The second line of research refers to the problem of enforcing privacy in a topic-based publish/subscribe systems. The main idea is to model the problem using item-set related privacy.

Final results and impact

Privacy and large-scale internet systems are central in the digital economy and areas of potential competitiveness for Europe. Many research labs (most notably Yahoo and Microsoft) are now hosting offices in Europe. In particular privacy has been an important concern in modern society and a major consideration with regards to the widespread use of internet services. The goal of DIP3 is to provide a novel perspective for distributed information delivery by exploiting preferences, respecting privacy and increasing efficiency thus making it suitable for modern large-scale internet systems.

For more information please refer to the project website: http://dmod.cs.uoi.gr/dip3