SparCity: An Optimization and Co-design Framework for Sparse Computation

Periodic Reporting for period 1 - SPARCITY (SparCity: An Optimization and Co-design Framework for Sparse Computation)

Reporting period: 2021-04-01 to 2022-09-30

SparCity aims to create a supercomputing framework that will provide efficient algorithms and coherent tools specifically designed for sparse computations on emerging High Performance Computing (HPC) systems, while also opening up new usage areas for sparse computations in data analytics and deep learning. The objectives of the project are:

• to develop a comprehensive characterization mechanism for sparse
computations based on analytical and ML-based performance and energy models,
• to develop advanced node-level optimizations for sparse computation
on modern parallel architectures.
• to devise topology-aware partitioning algorithms and communication optimizations for system-level parallelism,
• to create digital SuperTwins of supercomputers to evaluate what-if hardware
scenarios,
• to demonstrate the effectiveness and usability of the SparCity framework by enhancing
the efficiency of challenging real-life applications.
• to deliver a robust, well-supported and documented SparCity framework into the hands
of end-users from industry and academia.
Overall, SparCity is a forward-looking project with a significant contribution to building
Europe’s strengths in the application of HPC and related software tools, in the adoption of low energy processing technologies, and in the development of advanced software and services for its citizens.

Work performed in WP1 includes (i) Feature extraction of sparse computation, where more than 100 core features of sparse matrices, tensors, and graphs were identified. (ii) Using these features, machine learning (ML) based prediction models have been developed for SpMV. (iii) For sparse computation-aware modeling several different approaches were proposed, including the Mansard Roofline Model (MaRM), scaling the roofs of Cache-Aware Roofline Model (CARM) based on the specifics of sparse kernels and input matrices, and developing the roofline model for the Graphcore IPUs. (iv) For communication modeling, we have leveraged the ComDetective and ReuseTracker tools, which were extended to support AMD machines.

In WP2, groups of node-level optimizations were carried out: (i) To discover the mixed-precision opportunities in sparse computations, we proposed row-wise mixed- and multi- precision SpMV methods that are suitable for both CSR and ELLPACK-R storage formats. (ii) The evaluation of the proposed ML-based models are performed and the results demonstrate the effectiveness of the proposed models on SpMV. (iii) We propose a compiler and runtime system that takes advantage of the shared underlying processing pattern and data in order to decrease the redundant computation and data access. (iv) For memory access regularization, we developed fast and high-quality CPU- and GPU based Influence Maximization tools, HyperFuser and SuperFuser, exploiting fast vectorized instruction patterns on distributed multi-CPU and multi-GPU systems. (v) Regarding the data and computation reordering task, an extended version of the Reverse Cuthill-McKee (RCM) algorithm is implemented to handle all sparse matrices, including non-symmetric or non-square ones.

WP3 is concerned with system-level static and dynamic optimizations for sparse computations, based on the principles of balancing the computational load and minimizing the impact of communication operations. We address these issues in two ways, first by developing and applying novel partitioning algorithms and second by ensuring that the available computation and communication resources can be used most efficiently by a given application.

WP4 focuses on the design and implementation of the digital twin, SuperTwin and preprocessing library SparseBase. SparseBase reached its release v0.2; users can download and use it through https://github.com/sparcityeu/sparsebase. The main functionalities of SuperTwin, i.e. probing (see D3.1) benchmarking (see D4.1/D4.2) metric-data storage, and monitoring are now implemented in the single-node setting.

WP5 focuses on four real-life applications that use sparse data structures and/or perform sparse computations. For the simulation of cardiac electrophysiology, we created new realistic meshes, carried out code optimization, ported a simple version of the code to Graphcore IPUs, and implemented an improved numerical algorithm. For detecting wildfires on social networks, we developed methodologies to build huge networks that contain acquaintance relations among social network users, with one example as a large interaction network with more than 1.6 billion edges based on COVID-19 related conversations. For epistasis detection, we developed novel search algorithms needed for high-order detection. The corresponding high-performance implementations of the new algorithms have been developed for CPUs, GPUs and Graphcore IPUs. For autonomous driving, we implemented multiple-object tracking using Tensorflow2, involving graph neural network (GNN) layers. Furthermore, the initial version of an open repository of sparse problem instances has been created.

Research conducted in the first half of the project has already resulted in eleven scientific peer-reviewed publications, which demonstrate the capability of the proposed works to surpass the state-of-the-art. In particular, we compiled more than 100 core features for sparse matrices, graphs and sparse tensors and developed lightweight extraction methods. The identified features and lightweight extraction methods are publicly available for wide adoption, and they are already used for ML-based performance prediction. By following the guidelines of proposed performance models, we accelerated the execution of epistasis detection (important application in bioinformatics) for about 18.5x while providing increased accuracy when compared to the existing models in the literature.

Since SpMV is one of the most used sparse computation kernels, the node level optimizations focused on its performance improvement. We have developed an easy mixed-precision method for SpMV and its CSR and ELLPACK-R based storage formats. While we have developed ML models that can predict and optimize SpMV execution performance, we have also identified the combinations of SpMV algorithms and CPU architectures that work well together and we have also provided a theoretical explanation for the observed performance. For the graph algorithms, we developed two CPU- and GPU-based influence maximization tools, namely HyperFuser and SuperFuser and their performance is several times faster than existing implementations, and they scale well to multiple GPUs.

The four real-world applications have all adopted Graphcore IPUs. These greatly expand the current application horizon of this ”AI processor”. Moreover, novel algorithms in the contexts of social network analysis and epistasis detection were developed. These can result in potential scientific breakthroughs. In particular, the efficiency of fourth-order epistasis detection has considerably exceeded the state of the art, outperforming it by 12.4x.

Lastly, we are advancing the state-of-the-art by developing the most comprehensive sparse preprocessing library SparseBase and performance monitoring, analysis, visualization tool,
SuperTwin. We believe these tools will shift the SotA on their line of duty and help the HPC engineers and researchers working on sparse data to do the same on their domains.

SparCity_illustration.PNG

Periodic Reporting for period 1 - SPARCITY (SparCity: An Optimization and Co-design Framework for Sparse Computation)

Related documents

Share this page

Download