The A64FX CPU powers the current #1 supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices.
Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, the Erlangen Regional Computing Center (RRZE) team will detail how they construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops. They will describe how the machine model points to peculiarities in the microarchitecture to keep in mind when optimizing applications, and how, applying the ECM model to sparse matrix-vector multiplication (SpMV), they motivate why the CRS matrix storage format is inappropriate and how the SELL-C-sigma format can achieve bandwidth saturation for SpMV. In this context, they will also look into some code optimization strategies that are relevant for A64FX and compare SpMV performance with AMD Rome, Intel Cascade Lake and NVIDIA V100.
Registration url: https://attendee.gotowebinar.com/register/3926945771611115789
webinar, hpc, energy, performance, energyefficiency, kernel