Periodic Reporting for period 4 - HYPATIA (Privacy and Utility Allied)
Berichtszeitraum: 2024-04-01 bis 2024-09-30
The objective of this project was to develop the theoretical foundations, methods, and tools to protect the privacy of individuals while allowing their data to be collected and used for statistical purposes. We aimed, in particular, at developing mechanisms that can be applied and controlled directly by the user, thus avoiding the need for a trusted party, are robust with respect to the combination of information from different sources, and provide an optimal trade-off between privacy and utility.
We have investigated Differential Privacy (DP) and further developed the theory of metric differential privacy (metric DP). This line of research builds on the notion of distance-based privacy that was proposed in our seminal paper “Geo-indistinguishability: Differential privacy for location-based systems”, which has more than 1500 citations on Google Scholar and has received the ACM CCS Test-of-Time award in 2023. Metric privacy generalizes the property of differential privacy (DP) on datasets to generic metric domains. A natural field of application is that of location privacy, where the metric is defined as the geographical distance. The resulting property, which we call geo-indistinguishability, has been implemented using planar Laplace noise. The tool, called Location Guard, has more than 300,000 users.
The main new contributions on DP and metric DP produced thanks to Hypatia are the following:
• Generation of optimal mechanisms for metric DP via machine learning.
• A logical characterization of metric DP: We have developed a probabilistic logic to reason about metric privacy and DP in general.
• An analysis of the trade-off between privacy and utility that has refined the state of the art and clarified some misconceptions about universal optimality.
• A method for the reconstruction of the original distribution from individually sanitized data collections.
• Design and establish the theoretical foundations of PRIVIC, a method to collect data incrementally and anonymously that optimizes the trade-off between privacy, quality of service, and accuracy of statistical analysis.
• A study of the interaction between local differential privacy and various notions of fairness.
• A study of how local differential privacy affects various mechanisms for causal discovery.
• A method to optimize the sensitivity in differential privacy learning.
• A method to improve privacy, personalization, and fairness in federated learning.
Furthermore, we have investigated the quantitative information flow (QIF) framework, which is a framework for privacy alternative to DP and based on information-theoretic and decision-theoretic notions, building on our seminal papers “ Measuring information leakage using generalized gain functions”, which received the IEEE CSF Test-of-Time award in 2024, and “Additive and multiplicative notions of leakage, and their capacities”, the second of which won the prestigious NSA award for the Best Scientific Cybersecurity Paper for the year 2014. The main new contributions produced thanks to Hypatia are:
• A game-theoretic approach to the notions of leakage and privacy.
• A machine-learning method to estimate the leakage of information. To the best of our knowledge, we have been the first to propose a black-box machine-learning approach to measure QIF.
• A book on the foundations of QIF.
• An analysis of the leakage of information of various mechanisms implementing privacy and metric differential privacy.
Concerning the Exploitation and Dissemination:
• The PI has been invited to give various keynote talks: CONCUR 2020, FLOC 2022, CODASPY 2023, MobiliT.AI 2023, ITASEC 2024, MFPS 2024, WISE 2024, and several workshops.
• The PI has also been invited to give lectures in advanced schools: the IDESSAI Summer Schools 2021 and 2022, the Elsa Summer School 2022, the GDR Summer School 2023, and the Hi!Paris Summer School 2024.
• The thesis of the PI's PhD student Marco Romanelli has received the Prix de thèse en cotutelle UFI 2021
• The thesis of the PI's PhD Natasha Fernandez has received the John Makepeace Bennett Award for the best PhD thesis in Australasia in 2022.
• The PI has received the Gran Prix of the French Academy of Science for the results achieved in Hypatia
• Mario Alvim, who has been working on the Hypatia project, has been appointed by the Brazilian National Institute for Statistics (INEP) to sanitize the Brazilian educational census, which contains the longitudinal micro-data of all the Brazilian students since 2007, and he is applying the QIF framework.
• The PI and his team are collaborating with the French National Institute for Demographic Studies on the anonymization of the databases resulting from their data collections, and we are applying the concepts and methods developed in Hypatia.
PRIVIC constitutes the main contribution of the PhD thesis of Sayan Biswas, which has been awarded the second prize from IP Paris for the best PhD thesis in 2023 in Computer Science, Data, and Artificial Intelligence.
One of the main uses of PRIVIC is the anonymization of tabular data. On this topic, we have been collaborating with the French National Institute for Demographic Studies on the anonymization of the databases resulting from the data they collect from surveys. An advantage of our method is that it allows the data to be anonymized locally (i.e. it does not require a trusted server), while approaching the performance (in terms of privacy-utility trade-off) of the state-of-the-art methods based on a trusted server.
Apart from PRIVIC, most of the work listed above is quite innovative. In particular, we have introduced some new lines of research, like the study of the trade-off between privacy and causal discovery, and the estimation of leakage via machine learning.