Periodic Reporting for period 1 - MaX (MAterials design at the eXascale)
Berichtszeitraum: 2023-01-01 bis 2023-12-31
Objective I. Lighthouse applications
Selected community codes for quantum materials modelling, largely adopted by a broad body of end users and already working on pre-exascale systems, will be turned into lighthouse applications, ready to run on new HPC exascale platforms, while further developing their scientific capabilities. Codes have been ported to and enabled on new heterogeneous systems from multiple vendors and based on different architectures, and their scalability and performance will be enhanced by taking advantage of the hardware soon to be available.
Objective II. Exascale workflows and data
The evaluation of complex materials properties will be encoded in exascale-oriented workflows, allowing lighthouse applications to be driven automatically and orchestrated to exploit exascale capabilities, deliver resilience and fault tolerance, and ensure the dissemination of the entire simulation protocol, results, and data. These exascale workflows will tackle the complex logic, data movement, and possibly concurrent execution of multiple flagship codes, as needed by the targeted simulations.
Objective III. Addressing technical challenge
The technical challenges faced by the flagship codes at exascale are identified and addressed, to provide technology insight and solutions to the code developers. This will start from the analysis of performance to discover the bottlenecks and the direction for optimisation at scale. We will leverage on the exploitation of innovative features of programming models to help maximising the performance of novel architectures (from data vectorisation to the removal of communication synchronisation). The integration of programming solutions will allow codes to run efficiently on heterogeneous systems with different accelerators. The interaction with runtime middleware will be taken into account, together with the I/O optimisation —or its replacement with run-time inter-process communications—down to the actual deployment of the codes onto EuroHPC machines.
Objective IV. Co-design and technology exploitation
HW and SW innovations are monitored and included in the MAX co-design cycle, while optimising the exploitation of heterogeneous architectures, and delivering a set of validated co-design vehicles and best practices to be shared with other HPC stakeholders. MaX co-design acts at different levels (chip, node, system) targeting HW relevant for the development of EuroHPC exascale machines. In addition to optimal performance, we will explore techniques to reduce energy to solution. The energy efficiency will be tackled from the hardware side, by deploying the codes on cutting edge processors and accelerators, and from the software side by using European energy aware runtime systems.
- DeviceXlib, MaX library for performance portability, was restructured and demonstrated on NVIDIA, AMD, and INTEL GPU-accelerated architectures, exploiting multiple programming models (CUDA-Fortran, OpenACC, and OpenMP-offload) and multiple accelerated libraries (cuSOLVER, ROCBlas, MKL-GPU).
- Besides NVIDIA GPUs, already supported in production releases, MAX codes were ported also on accelerated architectures based on AMD and INTEL GPUs.
- The parallel performance of all codes was benchmarked and targeted for improvement, with special emphasis on GPU-accelerated machines. A milestone single run on the 90% of the Leonardo machine (12000 A100 next GPUs, 3000 nodes) was achieved by the Yambo code (see figure).
Obj II. Key results year 1:
- A set of target scientific grand challenges to be addressed by means of MAX exascale workflows was identified, refined, and selected. The involved workflows have been rationalised and mapped to a few archetype workflow structures. Code interoperability requirements were identified.
- Technical tools aimed at workflow implementation, such as the AiiDA orchestrator and the HyperQueue meta-scheduler, were further developed and adapted. At the same time, more software tools aimed at encoding complex workflows were identified and field tested (including, e.g. ZeroMQ for socket management, bigDFT RemoteManager for task automation).
- Early execution of selected steps of the workflows provided a demonstration of the calculation feasibility and technical data to identify technical issues.
Obj III. Key results year 1:
- All MAX codes were benchmarked and profiled on Leonardo, using the JUBE software (the result of the design of an internal procedure). This data collection allows for wide scope performance analysis and scalability assessment of the MaX codes.
- Bottlenecks were identified and technical solutions based on advanced features and or programming models were attempted. An example is the use and assessment of GPUDirect communication on LUMI-G as exploited in the FFTXlib of Quantum ESPRESSO.
- A procedure for CI/CD of MAX codes on EuroHPC machines was designed jointly with CASTIEL. All technical requirements arisen so far were addressed.
Obj IV. Key results year 1:
- Relevant parts of each application to be optimised for the Rhea processor or EUPEX hardware have been identified and extracted as mini-apps from the MAX codes. Preliminary versions of the mini-apps have been made available and used for co-design.
Concerning FAIR data and storage, we implemented a mirroring mechanism that enables the data stored at CSCS to be also available from CINECA (and soon from the Julich HPC centre).
Deployment of MAX flagship codes on EuroHPC machines was achieved on all the adopted inequivalent technologies.