Periodic Reporting for period 2 - MoDynStruct (The design and evaluation of modern fully dynamic data structures)
Reporting period: 2023-03-01 to 2024-12-31
More specifically, the algorithms that we design and analyze are based on so-called combinatorial inputs such as tables of numbers, networks, and feature vectors of objects. Depending on the input, the goal is to compute various statistics over the numbers, analyze properties of the network, or group the feature vectors by “similarity” into sets, called clusters. Algorithms that perform the latter task are called clustering algorithms and the sensitive information they protect is usually the value of each individual feature vector. For an example of a network application consider a social network, where nodes correspond to humans and connections correspond to relationships between humans. In this case, what is considered “private” information that should be protected might include which connections exist in a given network, or even which nodes exist. Under this “privacy condition”, an algorithm might need to count how many highly connected subnetworks of between five and twenty nodes exist in the graph.
This project designs two types of algorithms: first, algorithms where the data set is static, and second, algorithms where the data set changes over time. The focus is mostly on the latter setting, which creates additional challenges. For instance, prior answers might become wrong and new answers need to be output, but computing these new answers requires additional computation. Moreover, outputting new answers might also disclose more information about the input.
We also designed new algorithms to protect the privacy of the input data for fundamental problems. One example is the so-called prefix sum problem: Given a sequence of (positive or negative) numbers, output the sum of all the numbers so far every time a new number is given, while not disclosing any individual number. This is a basic problem as it can be used, for example, to count the number of elements in a dynamic set, where elements are added and removed, or to maintain the average of a sequence of numbers. We designed an algorithm based on a novel matrix factorization with provably smaller privacy loss (for any given accuracy) than any prior algorithm and also showed through an empirical evaluation on various data sets that it performs better than all prior algorithms. As we learnt recently this algorithm has already been sped up by a large IT company, which shows the relevance of this work to better protect the privacy of our data in our daily life.
We further developed a clustering algorithm for a dynamic set of feature vectors that protects the privacy of these feature vectors. This is the first ever algorithm of this type.
Another goal is to invent algorithms that accept a dynamically changing network as input, and can compute properties of the network while still protecting the privacy of the network data, i.e. either the existence of connections or of whole nodes. There already exist some algorithms that do so and, even though their privacy guarantees are relatively weak, outperforming them seems challenging. Still, we will try to do for certain network properties to improve their performance in practice.