Symmetry and Similarity

Project Information

SymSim

Grant agreement ID: 101054974

DOI

10.3030/101054974

EC signature date 30 May 2022

Start date 1 October 2022

End date 30 September 2027

Funded under

European Research Council (ERC)

Total cost

€ 2 495 575,00

EU contribution

€ 2 495 575,00

2 495 575,00

Coordinated by

RHEINISCH-WESTFAELISCHE TECHNISCHE HOCHSCHULE AACHEN
Germany

Periodic Reporting for period 1 - SymSim (Symmetry and Similarity)

Reporting period: 2022-10-01 to 2025-03-31

This project aims to develop an algorithmic theory of similarity between graphs. Graphs are versatile models for representing complex data ranging from chemical molecules to social interactions. In dealing with graphical data and enabling modern data analysis techniques, a fundamental task is to compare graphs and measure their similarity, preferably semantically meaningful and algorithmically efficient. However, it is not clear at all how to achieve this. In many application areas, for example, computer vision, database systems, and formal verification, researchers have proposed (often ad-hoc) solutions to this problem tailored for the specific application, but a general theory is missing. We will develop such a theory in this project.

Similarity of graphs has many different facets. We will identify the common core of different approaches to similarity, but also exhibit their differences. We will design methods for comparing different similarity measures and for obtaining a semantic understanding of similarity. We will develop criteria for the suitability of various similarity measures for different types of applications.

A particular focus of our research will be on efficient algorithms for computing similarity. A perfect similarity measure is of little use if we do not have an efficient way of determining how similar two graphs are.

A classic algorithmic problem in this context is the graph isomorphism problem, which involves deciding whether two graphs are structurally identical. Determining the precise computational complexity of this problem, or of the equivalent problem of computing all symmetries of a graph, is regarded as one of the most important open questions in theoretical computer science. Building on recent progress, we will design new algorithms that break barriers towards a polynomial-time algorithm for the isomorphism problem.

We made significant progress towards our goals in the project's first two years. We formulated a conceptual framework for graph similarity and, within it, compared different similarity measures and studied their computational complexity. An important distinction we introduced is between operational and declarative similarity measures. Under the operational view, two graphs are similar if one can easily be transformed into the other. Under the declarative view, two graphs are similar if they have similar properties. There are natural and well-known examples of both types of similarities. It is one of our central goals to connect these two views and show that, in some sense, they are dual.

A natural declarative approach to similarity is comparing the frequencies of patterns in the two graphs. Technically, this leads to similarities based on homomorphism embeddings, which have been a focus of our attention. By developing a novel mathematical machinery drawing from areas such as representation theory and functional analysis, we were able to connect them to natural operational similarity measures based on matrix norms.

In practice, we typically learn features and similarities of graphs from data. Graph Neural Networks (GNNs) are the method of choice. It is known that they are related to homomorphism embeddings. We studied the expressivity and generalisation properties of GNNs. Our main result is a precise characterisation of GNNs in terms of logic and classical circuit complexity.

Towards the graph isomorphism problem, we gave a new isomorphism algorithm for the class of tournaments, a specific graph class that has played a very interesting role for the graph isomorphism problem. The runtime of our algorithm is parameterised in terms of the twin width of the input tournaments and is very efficient as long as the twin width is small.

The combinatorial Weisfeiler-Leman algorithm plays a central role in both theory and practice of the graph isomorphism problem, and it has applications beyond that, most notably in machine learning. Going back to Fürer (2001), it was a long-standing open question how many iterations the k-dimensional WL algorithm requires in the worst case? We settled this question by establishing a strong lower bound, complemented by a non-trivial upper bound. Our novel proof technique for the lower bound has also found other applications since then.

Our results have been published in leading journals and conferences and mark significant progress beyond the sate of the art.

Two results stand out in my mind.

The first is the lower bound on the number of iterations of the Weisfeiler-Leman algorithm because it solved a long-standing and well-known open problem by a novel technique that has also found other applications since then.

The second is the characterisation of the expressiveness of graph neural networks in terms of logic and circuit complexity. It establishes a surprising and very clean connection between the “analogue” computation model of graph neural networks, operating with real numbers of unbounded precision, and classical computation models based on Boolean logic.

Periodic Reporting for period 1 - SymSim (Symmetry and Similarity)

Share this page Share this page on social networks

Download Download the content of the page