Periodic Reporting for period 3 - HYPATIA (Privacy and Utility Allied)
Periodo di rendicontazione: 2022-10-01 al 2024-03-31
The objective of this project is to develop the theoretical foundations, methods and tools to protect the privacy of the individuals while letting their data to be collected and used for statistical purposes. We aim in particular at developing mechanisms that can be applied and controlled directly by the user thus avoiding the need of a trusted party, are robust with respect to combination of information from different sources, and provide an optimal trade-off between privacy and utility.
We have investigated Differential Privacy (DP) and further developed the theory of metric differential privacy (metric DP). This line of research builds on the notion of distance-based privacy that was proposed in our seminal paper “Geo-indistinguishability: Differential privacy for location-based systems”, which has almost 1300 citations on Google Scholar. Metric privacy generalizes the property of differential privacy (DP) on datasets to generic metric domains. A natural field of application is that of location privacy, where the metric is defined as the geographical distance. The resulting property, which we call geo-indistinguishability, has been implemented using planar Laplace noise. The tool, called Location Guard, has more than 60,000 users. Geo-indistinguishability and the planar Laplace mechanism have been adopted as a component of several other tools and frameworks for location privacy, including LP-Guardian, LP-Doctor, Secure Nearby-Friends, SpatialVision QGIS plugin, and it is one of the input methods in STAC.
The main new contributions on DP and metric DP produced thanks to Hypatia are the following:
• Generation of optimal mechanisms for metric DP via machine learning. To the best of our knowledge, we were the first to propose to use of machine learning to generate optimal privacy mechanisms. This was a particularly challenging task since the problem is not strictly convex (has local minima).
• A logical characterization of metric DP: We have developed a probabilistic logic to reason about metric privacy and DP in general. To the best of our knowledge, ours was the first proposal of a logical formalism for DP.
• An analysis of the trade-off between privacy and utility that has refined the state of the art and clarified some misconceptions about universal optimality, namely the property of a mechanism of providing an optimal trade-off with utility for all notions of utility that are anti-monotonic with respect to the distance, and for all the prior distributions.
• A method for the reconstruction of the original distribution from individually sanitized data collections. This method, which we call Generalised Bayesian Update, is based on the Expectation-Maximization technique developed in statistics, and it allows different individuals to use different sanitization mechanisms. We have experimented with the k-Randomized-Response and the Geometric mechanisms, validating the method from both the correctness and the performance standpoints.
Furthermore, we have investigated the quantitative information flow (QIF) framework, which is a framework for privacy alternative to DP and based on information-theoretic and decision-theoretic notions, building on our seminal papers “ Measuring information leakage using generalized gain functions”, and “Additive and multiplicative notions of leakage, and their capacities”, the second of which won the prestigious NSA award for the Best Scientific Cybersecurity Paper for the year 2014.
The main new contributions produced thanks to Hypatia are:
• A game-theoretic approach to the notions of leakage and privacy. To the best of our knowledge, we have been the first ones to develop a game-theoretic approach to the notions of leakage and privacy. We have shed light on the use of randomized strategies, and shown that the optimal strategy, both for the defender and the adversary, is usually randomized. This is the first formal proof that the adversary has an interest in using randomization (for the defender it was already known).
• A machine-learning method to estimate the leakage of information. To the best of our knowledge, we have been the first to propose a black-box machine-learning approach to measure QIF.
• We have also written a book on the foundations of QIF.
I have been invited to give various keynote talks (CONCUR 2020, and several workshops), and lectures on these topics (e.g. the IDESSAI Summer Schools 2021 and 2022 and the the Elsa Summer School 2022). Furthermore, the thesis of my PhD student Marco Romanelli has received one of the four prizes for the best PhD thesis in cosupervision (all domains) of the Université Franco-Italienne (Prix de thèse en cotutelle UFI 2021), and my student Natasha Fernandez, whose thesis is in large part the result of our work on QIF, has received the John Makepeace Bennett Award for the best PhD thesis in Australasia in 2022. My collaborator Mario Alvim has been appointed by the Brazilian National Institute for Statistics (INEP) to sanitize the Brazilian educational census, which contains the longitudinal micro-data of all the Brazilian students since 2007, and he is applying the QIF framework.
Concerning the future period of the project, we plan to further investigate the optimality issue in local mechanisms for privacy protection. More specifically, develop a compositional method for privacy-preserving federated learning which also optimizes the trade-off between privacy and two kinds of utility, namely quality of service and preservation of statistical information (three-way optimality). We also plan to put fairness into the equation, namely, we intend to study mechanisms for data sanitization that allow achieving privacy and at the same time removing bias from training data, while preserving the accuracy of the model.