Skip to main content
An official website of the European UnionAn official EU website
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Engineering Scalable Algorithms for the Basic Toolbox ScAlBox

Periodic Reporting for period 2 - ScAlBox (Engineering Scalable Algorithms for the Basic Toolbox ScAlBox)

Reporting period: 2022-03-01 to 2023-08-31

ScAlBox develops basic algorithmic tools that can be used in a wide spectrum of applications and
that scale orders of magnitude better than the state of the art with respect to input size or number of
processors.
In the last decades, we witness a transition into the information age with profound effects on
science, technology, and our daily life. This transition is driven by a growing spectrum of computer
applications that process larger and larger data sets using more and more complex algorithms. A
major road block to this progress is the scalability challenge which stems from the collision of two
revolutionary developments. On the one hand, we observe an explosion of the amount of data to be
processed (big data). On the other hand, the performance of a single processor core is stagnating (the
power wall). The widening gap between required and available performance can be closed by
efficiently exploiting many parallel processors.
ScAlBox develops crucial tools for overcoming this roadblock by developing scalable algorithmic building blocks.
It bridges gaps between theory and practice by using the methodology of
algorithm engineering that integrates modeling, design, analysis, implementation, and experimental
evaluation. The goal is to provide algorithms and software libraries that scale to a huge number of processors
and gives hard performance guarantees for arbitrary inputs.
Wide impact is achieved by focusing on the basic toolbox of algorithms and data structures that
are needed in many applications.
In the first reporting period, a particular focus
have been basic operations on very large graphs.
Graphs model relations (edges) between objects
(nodes) and are thus a universal modelling tool
for computer science. For example, graphs
model such diverse things as road networks, social
networks, discretized models in numerical
simulations, (real and artifical) neural networks,
or control flow in computer programs. For example, we investigated
the minimum spanning tree problem where we look
for the minimum cost selection of edges that
connect all nodes. Our algorithm scales to many
thousands of processing cores and trillions of
edges, with performance and scalability
several orders of magnitude better than the
previous state-of-the-art. Similarly, a case study
in graph analysis considered the fundamental task
of finding triangles which are indicative of
densely connected parts of the graph. This
problem requires communication and
computation that grows faster than the network
size. Nevertheless, the designed code is able to
process huge graphs in reasonable time even if
they only fit into the joined memory of many
thousands of processors.

Another success story is our Mallob SAT solver.
SAT solving is about deciding whether a formula in
propositional logic is satisfiable, i.e. whether
it has an assignment of truth values to logical variables
that makes the formula true. This is a
prototypical hard computational problem that
underlies many applications such as hardware and
software verification, automatic theorem proving,
cryptographical problems, etc. Mallob is able to
effecively coordinate hundreds to thousands of
sequential SAT solvers to jointly solve complex
SAT problems. It does so by effectively sharing
learned information (clauses) between the
individual solvers. Another factor of
parallelization is achieved by flexibly assigning
computing resources to multiple streams of problem
instances. Mallob, is malleable, i.e. within
milliseconds it can adapt the amount of used
resources based on the difficulty of the input and
the available processing power. In contrast,
traditional supercomputers manage their resources
on the time scale of hours which would be useless
for SAT solving where it is difficult to see
whether a problems requires seconds or weeks of
computation.

A crucial tool for storing and processing large
data sets are data compression techniques that
allow processing the data without uncompressing
it. We obtained several surprising results in this
direction by showing that theoretical lower bounds
on space consumption can actually be approached by
practical solutions. This includes a dictionary
data structure that can store a set of compressed
objects with virtually no space overhead for
making it searchable. Another such component are
perfect hash function that assign unique yet
compact identifiers to objects. This functionality
helps to improve data bases and various
applications, e.g. in bioinformatics.
We will further improve the results described
above, for example looking at more graph problems
and even larger machines as well as on large data
sets from grand challenge applications, The
techniques used to parallelize SAT solving might
turn out be useful also for solving other
computationally hard problems: How can multiple
solvers for the problem exchange information that
helps them to collectively make progress. The
results on compressed data structures are unfolding
right now and will produce further interesting
results in the next years.
illustration of the interactions going on in the mallob distributed SAT solver