Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

The design and evaluation of modern fully dynamic data structures

Periodic Reporting for period 2 - MoDynStruct (The design and evaluation of modern fully dynamic data structures)

Reporting period: 2023-03-01 to 2024-12-31

Every computer program is based on an algorithm, which is a detailed description of each of the steps that the program has to execute . Not all algorithms are equally good at solving a computational problem. Some take more steps (and thus need more resources—such as time—to execute, which in turn influences the electricity needed for the computer), some take fewer steps, some “leak” more information about the underlying data set, some “leak” less. Both the resource consumption and the privacy protection of confidential data are key issues for our society. This research project thus has two goals: to design algorithms that are as efficient as possible and algorithms that leak as little sensitive information as possible.

More specifically, the algorithms that we design and analyze are based on so-called combinatorial inputs such as tables of numbers, networks, and feature vectors of objects. Depending on the input, the goal is to compute various statistics over the numbers, analyze properties of the network, or group the feature vectors by “similarity” into sets, called clusters. Algorithms that perform the latter task are called clustering algorithms and the sensitive information they protect is usually the value of each individual feature vector. For an example of a network application consider a social network, where nodes correspond to humans and connections correspond to relationships between humans. In this case, what is considered “private” information that should be protected might include which connections exist in a given network, or even which nodes exist. Under this “privacy condition”, an algorithm might need to count how many highly connected subnetworks of between five and twenty nodes exist in the graph.

This project designs two types of algorithms: first, algorithms where the data set is static, and second, algorithms where the data set changes over time. The focus is mostly on the latter setting, which creates additional challenges. For instance, prior answers might become wrong and new answers need to be output, but computing these new answers requires additional computation. Moreover, outputting new answers might also disclose more information about the input.
We developed novel algorithms for dynamically changing networks that analyze a large variety of network properties. These properties might include the number of network subpatterns or the maximum amount that can “flow” through the network from a given “source” (starting point) to a given “sink” (ending point), where each connection in the network has a certain “capacity”. We also studied the setting with a dynamic feature set, i.e where feature vectors are added or removed, and a clustering fulfilling certain conditions must be maintained. We designed two new algorithms for this problem, each for a different definition of similarity. Both these algorithms need fewer steps to complete their task than any previously known algorithm and, thus, are more efficient.

We also designed new algorithms to protect the privacy of the input data for fundamental problems. One example is the so-called prefix sum problem: Given a sequence of (positive or negative) numbers, output the sum of all the numbers so far every time a new number is given, while not disclosing any individual number. This is a basic problem as it can be used, for example, to count the number of elements in a dynamic set, where elements are added and removed, or to maintain the average of a sequence of numbers. We designed an algorithm based on a novel matrix factorization with provably smaller privacy loss (for any given accuracy) than any prior algorithm and also showed through an empirical evaluation on various data sets that it performs better than all prior algorithms. As we learnt recently this algorithm has already been sped up by a large IT company, which shows the relevance of this work to better protect the privacy of our data in our daily life.

We further developed a clustering algorithm for a dynamic set of feature vectors that protects the privacy of these feature vectors. This is the first ever algorithm of this type.
All our algorithms improve over prior algorithms. Apart from the first privacy-protecting clustering algorithm for dynamic input mentioned above, we also developed the fastest known algorithm for finding “bottlenecks”, called cuts, in a network—relevant when considering the maximum flow problem described above . This algorithm is special as the number of steps it needs to compute the result is provably close to optimal, and it is the first such algorithm. We received the Best Paper Award at the leading conference on combinatorial algorithms for this research result. For this algorithm, the input needs to remain static, i.e. unchanged. Thus, the next project goal is to turn this algorithm into an algorithm that can handle dynamically changing networks.

Another goal is to invent algorithms that accept a dynamically changing network as input, and can compute properties of the network while still protecting the privacy of the network data, i.e. either the existence of connections or of whole nodes. There already exist some algorithms that do so and, even though their privacy guarantees are relatively weak, outperforming them seems challenging. Still, we will try to do for certain network properties to improve their performance in practice.
PI Monika Henzinger (Copyright Peter Rigaud)
My booklet 0 0