Periodic Reporting for period 1 - TICOH (Taming Irregular Computations On Heterogeneous processors)
Reporting period: 2017-05-01 to 2019-04-30
The objective of the project Taming Irregular Computations On Hterogeneous processors (TICOH) is to address the issue of currently unsatisfactory utilization of heterogeneous computing for irregular problems such as graph and sparse matrix processing. Following a multi-level approach which bridges the domains of performance measurement, benchmark data analysis, modeling, data structure construction, algorithm design and application integration, TICOH will explore best practices that toward the best performance for irregular computations on the best hardware selection. Specifically, the main focus of the project will be to (a) identify and understand bottlenecks of current heterogeneous computing (e.g. latency and bandwidth of synchronization and communication in heterogeneity-aware parallel kernels), (b) benchmark and model heterogenous processors composed of CPU, GPU and high-bandwidth memories (e.g. AMD Bristol Ridge, Intel Skylake and NVIDIA Tegra), (c) design and evaluate new data structures and algorithms for irregular problems aiming for fully use computing and memory resources provided by heterogeneous processors, and (d) integrate and apply the newly designed approaches for high-level applications (e.g. scientific software, graph databases and sparse convolutional neural networks). By empirically investigating these issues, the ultimate goal of the project is to allow a broad range of real-world applications to further benefit from heterogeneous hardware in the new era.
Now more and more heterogeneous processors are equipping modern supercomputers. Unfortunately, despite the progress of hardware infrastructure, the utilization of heterogeneous computing is still relatively low in practice. The objective of the project TICOH is to address the issue of currently unsatisfactory utilization of heterogeneous computing for irregular problems such as graph and sparse matrix processing. Achieving this requires a multi-level approach for best practices that toward best performance for irregular computations on best hardware selection."
The Fellow constructed a benchmarks suite containing sparse matrix and graph problems (D2.1-M02) by using around 1000 matrices/graph for benchmarking and collecting a rich set of experimental data (D2.2-M05). Most of the data have been published with a joint paper ``Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels'' (SC '17) and demonstrated in the fellow's talk ``Scalability Analysis of Sparse Matrix Computations on Many-core Processors'' (Sparse Days '17 etc.). A performance model (D2.3-M09) named ``stepping model'' presented in the above SC '17 paper as well. This paper has been nominated as a best paper award at SC '17 conference. Another execution model (D2.3-M09) named ``Warp-Consolidation'' also has been developed and published as a paper ``Warp-Consolidation: A Novel Execution Model for GPUs'' at ICS '18 conference.
The Fellow also developed several parallel algorithms for sparse matrix multiplication (D3.2-M17) and published papers ``Register-based Implementation of the Sparse General Matrix-matrix Multiplication on GPUs'' at PPoPP '18 and ``Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication'' at journal IJPP. As for parallel sparse triangular solve (D3.3-M21) the Fellow published papers ``Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides'' at the journal CCPE and ``swSpTRSV: A Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures'' at PPoPP '18. The fellow also researched the depth-first search algorithm (D3.1-M13) but has not programmed efficient code and published paper on it.
To summarize, from M01 to M17, the fellow have published in total nine technical papers and given 13 invited talks (six at conferences/workshops and seven at institutions, (D5.2-M03 M06, M07, M10, M12, M14) under the support of the MSCA TICOH project. The website of the TICOH project has been online in July 2017 (D5.1-M01). The fellow also has co-organized two minisympisia at international conferences and served as a technical program committee member of four international conferences and two workshops, and a reviewer of a number of internaltional journals.