CORDIS - EU research results

"Doing Anonymization Practically, Privately, Effectively and Reusably"

Final Report Summary - DAPPER (Doing Anonymization Practically, Privately, Effectively and Reusably)

DAPPER: Delivering Anonymization Practically, Privately, Effectively and Reusably

There is currently a tug-of-war going on surrounding data releases. On one side, there are many strong reasons
pulling to release data to other parties: business factors, freedom of information rules, and scientific sharing agreements. On the other side, concerns about individual privacy pull back, and seek to limit releases. Privacy technologies such as differential privacy have been proposed to resolve this deadlock, and there has been much study of how to perform private data release of data in various forms. The focus of such works has been largely on the data owner: what process should they apply to ensure that the released data preserves privacy whilst still capturing the input data distribution accurately. Less attention has been paid to the needs of the data user, who wants to make use of the released data within their existing suite of tools and data. The difficulty of making use of data releases is a major stumbling block for the widespread adoption of data privacy technologies.

This Marie Curie career integration project considered the whole data release process, from the data owner to the data user. It laid out a set of principles for privacy tool design that highlight the requirements for interoperability, extensibility and scalability. The aim of the project was in Delivering Anonymization Practically, Privately, Effectively and Reusably (DAPPER). It produced published results under the following four themes:

· **Synthetic Private Data.** New methods were developed for providing synthetic data in the form of (social) networks, based on anonymized versions of real data under the strong privacy guarantee of differential privacy [SIGMOD16]. The fellow also proposed a new privacy definition called personalized differential privacy (PDP), a generalization of differential privacy in which users specify a personal privacy requirement for their data, and introduced several novel mechanisms for achieving PDP [ICDE15].

· **Correlated Data Modelling.** Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. The fellow provided a set of algorithms for materializing marginal statistics under the strong model of local differential privacy [SIGMOD18], as well as developing PrivBayes, a differentially private method for releasing high-dimensional data [SIGMOD14,TODS17].

· **Data Utility Enhancement.** The fellow worked on the core problem of count queries, and designed randomized mechanisms to release count data associated with a group of individuals [ICDE18]. The fellow also gave new algorithms to provide statistical information about graphs based on a ‘ladder’ distribution [SIGMOD15].

· **Trajectory Data.** The fellow presented DPT, a system to synthesize mobility data based on raw GPS trajectories of individuals while ensuring strong privacy protection in the form of e-differential privacy

The work of the fellow has had real impact on the state of the art: the fellow’s work is used within the private data collection software developed by Apple and deployed to millions of iOS and MacOS users around the world. The fellow is now a Professor at the University of Warwick in the UK, and leads a group of researchers working on topics related to privacy and data analysis.

**Website and Contact Details.**
Activities and news about the project are posted by the fellow at the website []( For more information, contact