Privacy and Utility Allied

Periodic Reporting for period 3 - HYPATIA (Privacy and Utility Allied)

Periodo di rendicontazione: 2022-10-01 al 2024-03-31

With the ever-increasing use of internet-connected devices, such as computers, smart grids, IoT appliances and GPS-enabled equipments, personal data are collected in larger and larger amounts, and then stored and manipulated for the most diverse purposes. Undeniably, the big-data technology provides enormous benefits to industry, individuals and society, ranging from improving business strategies and boosting quality of service to enhancing scientific progress. On the other hand, however, the collection and manipulation of personal data raises alarming privacy issues. Not only the experts, but also the population at large are becoming increasingly aware of the risks, due to the repeated cases of violations and leaks that keep hitting the headlines.

The objective of this project is to develop the theoretical foundations, methods and tools to protect the privacy of the individuals while letting their data to be collected and used for statistical purposes. We aim in particular at developing mechanisms that can be applied and controlled directly by the user thus avoiding the need of a trusted party, are robust with respect to combination of information from different sources, and provide an optimal trade-off between privacy and utility.

The PI and her team have achieved the following HYPATIA objectives:

We have investigated Differential Privacy (DP) and further developed the theory of metric differential privacy (metric DP). This line of research builds on the notion of distance-based privacy that was proposed in our seminal paper “Geo-indistinguishability: Differential privacy for location-based systems”, which has almost 1300 citations on Google Scholar. Metric privacy generalizes the property of differential privacy (DP) on datasets to generic metric domains. A natural field of application is that of location privacy, where the metric is defined as the geographical distance. The resulting property, which we call geo-indistinguishability, has been implemented using planar Laplace noise. The tool, called Location Guard, has more than 60,000 users. Geo-indistinguishability and the planar Laplace mechanism have been adopted as a component of several other tools and frameworks for location privacy, including LP-Guardian, LP-Doctor, Secure Nearby-Friends, SpatialVision QGIS plugin, and it is one of the input methods in STAC.

The main new contributions on DP and metric DP produced thanks to Hypatia are the following:

• Generation of optimal mechanisms for metric DP via machine learning. To the best of our knowledge, we were the first to propose to use of machine learning to generate optimal privacy mechanisms. This was a particularly challenging task since the problem is not strictly convex (has local minima).

• A logical characterization of metric DP: We have developed a probabilistic logic to reason about metric privacy and DP in general. To the best of our knowledge, ours was the first proposal of a logical formalism for DP.

• An analysis of the trade-off between privacy and utility that has refined the state of the art and clarified some misconceptions about universal optimality, namely the property of a mechanism of providing an optimal trade-off with utility for all notions of utility that are anti-monotonic with respect to the distance, and for all the prior distributions.

• A method for the reconstruction of the original distribution from individually sanitized data collections. This method, which we call Generalised Bayesian Update, is based on the Expectation-Maximization technique developed in statistics, and it allows different individuals to use different sanitization mechanisms. We have experimented with the k-Randomized-Response and the Geometric mechanisms, validating the method from both the correctness and the performance standpoints.

Furthermore, we have investigated the quantitative information flow (QIF) framework, which is a framework for privacy alternative to DP and based on information-theoretic and decision-theoretic notions, building on our seminal papers “ Measuring information leakage using generalized gain functions”, and “Additive and multiplicative notions of leakage, and their capacities”, the second of which won the prestigious NSA award for the Best Scientific Cybersecurity Paper for the year 2014.

The main new contributions produced thanks to Hypatia are:

• A game-theoretic approach to the notions of leakage and privacy. To the best of our knowledge, we have been the first ones to develop a game-theoretic approach to the notions of leakage and privacy. We have shed light on the use of randomized strategies, and shown that the optimal strategy, both for the defender and the adversary, is usually randomized. This is the first formal proof that the adversary has an interest in using randomization (for the defender it was already known).

• A machine-learning method to estimate the leakage of information. To the best of our knowledge, we have been the first to propose a black-box machine-learning approach to measure QIF.

• We have also written a book on the foundations of QIF.

I have been invited to give various keynote talks (CONCUR 2020, and several workshops), and lectures on these topics (e.g. the IDESSAI Summer Schools 2021 and 2022 and the the Elsa Summer School 2022). Furthermore, the thesis of my PhD student Marco Romanelli has received one of the four prizes for the best PhD thesis in cosupervision (all domains) of the Université Franco-Italienne (Prix de thèse en cotutelle UFI 2021), and my student Natasha Fernandez, whose thesis is in large part the result of our work on QIF, has received the John Makepeace Bennett Award for the best PhD thesis in Australasia in 2022. My collaborator Mario Alvim has been appointed by the Brazilian National Institute for Statistics (INEP) to sanitize the Brazilian educational census, which contains the longitudinal micro-data of all the Brazilian students since 2007, and he is applying the QIF framework.

The results listed above all contain contributions that advance the state of the art.

Concerning the future period of the project, we plan to further investigate the optimality issue in local mechanisms for privacy protection. More specifically, develop a compositional method for privacy-preserving federated learning which also optimizes the trade-off between privacy and two kinds of utility, namely quality of service and preservation of statistical information (three-way optimality). We also plan to put fairness into the equation, namely, we intend to study mechanisms for data sanitization that allow achieving privacy and at the same time removing bias from training data, while preserving the accuracy of the model.

Higher performance of the Iterative Bayesian Update w.r.t. SOTA methods for statistical estimation

Periodic Reporting for period 3 - HYPATIA (Privacy and Utility Allied)

Condividi questa pagina

Scarica