Relaxed Semantics Across the Data Analytics Stack

Project Information

RELAX

Grant agreement ID: 101072456

DOI

10.3030/101072456

EC signature date 13 July 2022

Start date 1 March 2023

End date 28 February 2027

Funded under

Marie Skłodowska-Curie Actions (MSCA)

Total cost

No data

EU contribution

€ 2 537 877,60

Coordinated by

CHALMERS TEKNISKA HOGSKOLA AB
Sweden

Periodic Reporting for period 1 - RELAX (Relaxed Semantics Across the Data Analytics Stack)

Reporting period: 2023-03-01 to 2025-02-28

The RELAX European Doctoral Network is training a cohort of 12 highly adaptable researchers to become experts in the design of scalable and efficient data-intensive software systems. The research fellows are learning to master the specific skill of navigating the semantics or correctness conditions of applications, with the goal of enhancing scalability, response times, and availability. Working across the disciplinary specialisms of data science, data management, distributed computing and computing systems, the fellows are in the process of developing knowledge of the broad issues underpinning data analytics systems. The bespoke training programme fosters intellectual enquiry and combines technical and scientific research training with courses in innovation, management, and leadership. The training network addresses a critical skills gap in data analytics expertise, needed to support innovation and employment in a fast-growing European data economy. 14 partner organisations representing 8 countries benefit first-hand through intersectoral collaboration and an Open Innovation model.

Here we present results from several of the 12 RELAX sub-projects. As planned, we began studying and understanding the interplay and interdependencies between data, algorithmic semantics, application domain considerations, and performance characteristics of computing systems. We considered batch and streaming data, graph data systems and neural networks (NNs).
For graph data, we focused our research on the Single-Source Shortest Path (SSSP) problem at the beginning, a fundamental problem in graph analysis algorithms. DC12 has shown that it is possible by relaxing the synchronous model execution to achieve better parallelism-induced redundant work and efficient parallelism. The proposed algorithm achieves competitive or better performance compared to the state of the art. Combining results from this research and another DC project we formed the RELAXed Traversers team that competed and won the best solution award in the FastCode Programming Challenge for the fastest Single Source Shortest Path (SSSP) problem. This challenge was hosted by the 30th ACM Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2025). Our solution "Relax and don’t Stop: Graph-aware Asynchronous SSSP" solution managed to process 150.89 million edges per second, significantly outperforming the second-best solution, which processed 52.56 million edges per second.
For NNs we had several results that show the power of relaxation on efficiency. DC7 has presented a stochastic weight-sharing quantisation technique specifically tailored to Bayesian NNs (BNN) that can significantly reduce (relax) the effective number of parameters of a BNN while obtaining results on par with state-of-the-art in large datasets and architectures. Complementary to this result, DC6 has found methods to analyse the behaviour of NNs under small changes in inputs, parameters, and activation values when using compression techniques such as quantization and pruning for an infinite family of quantisation schemes. For DNNs DC2 has shown that by relaxing the synchronization requirements during the parallel training phase in a controllable way one can achieve faster training without losing accuracy.
For data streams among other we investigated data summarization and data compression and explainable AI (XAI). Data summarization has emerged as a useful technique for extracting insights from massive data sets into compact synopses structures, while typically requiring much less space and computation. DC1 has proposed a novel concurrent data structure for finding heavy hitters, a fundamental operator on data analysis. The parallel method proposed maintains significant higher throughput compared to state-of-the-art methods while supporting higher accuracy. DC9 has developed a feature importance method that finds the model’s important features for the task of discriminating a pair of classes. This technique can be effectively computed in a streaming scenario. Feature importance is one of the most popular techniques in XAI.

RELAX lays the foundations for a novel approach to designing software systems for data analytics, rooted in the observation that the inherent inaccuracy, uncertainty, and volume of data enables active adaptation of the semantics of data analytics without affecting the robustness of the insights that are derived from that data. The expected outcomes of RELAX include new and improved methods, algorithms and data structures for data analysis designed by the above principles. They will become available as part of: i) the RELAX data analytics toolkit, ii) the RELAX novel methods for DNN training and deployment and iii) the Relaxed coordination protocol for bulk synchronous systems. We expect that RELAX will contribute and consolidate two major shifts in data analytics: (i) to make the semantics of data analytics explicit in the design of data pipelines, algorithms and systems; (ii) to design methodologies, techniques and tools to elaborate, modify or adapt these semantics in view of achieving improved system-level performance metrics. The scientific outcomes of RELAX are expected to help industry to scale analytics to increasingly larger and more complex data sets while maintaining low response times.
In addition to the scientific and research outcome, by the end of the project the RELAX European Doctoral Network is expected to produce a cohort of 12 highly mobile and adaptable researchers, experts in the design of scalable and efficient data-intensive software systems addressing a critical skills gap in data analytics expertise, needed to support innovation and employment in a fast-growing European data economy.

FastCode Programming Challenge (PPoPP 2025): The Organizing Chair presents the results

Periodic Reporting for period 1 - RELAX (Relaxed Semantics Across the Data Analytics Stack)

Share this page Share this page on social networks

Download Download the content of the page