MAterials design at the eXascale

Informations projet

MaX

N° de convention de subvention: 101093374

DOI

10.3030/101093374

Date de signature de la CE 9 Decembre 2022

Date de début 1 Janvier 2023

Date de fin 31 Decembre 2026

Financé au titre de

Digital, Industry and Space

Coût total

€ 8 496 392,50

Contribution de l’UE

€ 4 248 196,00

4 248 196,00

4 248 196,50

Coordonné par

CONSIGLIO NAZIONALE DELLE RICERCHE
Italy

Periodic Reporting for period 2 - MaX (MAterials design at the eXascale)

Période du rapport: 2024-01-01 au 2025-06-30

The overall goal of the third phase of MAX is to enable the European materials simulation community to deal with massively parallel heterogeneous computing systems, and to tackle scientific challenges thus far deemed to be forbidding. After the first two MaX phases –focused on porting, maintaining, and scaling up complex state-of-the-art computer codes to multiple heterogeneous architectures– we will take previous achievements to the exascale level. We will turn flagship codes into lighthouse applications running on thousands of accelerated nodes, and will enable them to work cooperatively within tightly bound exascale workflows. In this process, we will deploy co-design actions contributing to the evolution of a strong European technology and ecosystem. Below are the main objectives and results of year 1.
Objective I. Lighthouse applications
Selected community codes for quantum materials modelling, largely adopted by a broad body of end users and already working on pre-exascale systems, will be turned into lighthouse applications, ready to run on new HPC exascale platforms, while further developing their scientific capabilities.
Objective II. Exascale workflows and data
The evaluation of complex materials properties will be encoded in exascale-oriented workflows, allowing lighthouse applications to be driven automatically and orchestrated to exploit exascale capabilities, deliver resilience and fault tolerance, and ensure the dissemination of the entire simulation protocol, results, and data.
Objective III. Addressing technical challenge
The technical challenges faced by the flagship codes at exascale are identified and addressed, to provide technology insight and solutions to the code developers. This will start from the analysis of performance to discover the bottlenecks and the direction for optimisation at scale. We will leverage on the exploitation of innovative features of programming models to help maximising the performance of novel architectures (from data vectorisation to the removal of communication synchronisation).
Objective IV. Co-design and technology exploitation
HW and SW innovations are monitored and included in the MAX co-design cycle, while optimising the exploitation of heterogeneous architectures, and delivering a set of validated co-design vehicles and best practices to be shared with other HPC stakeholders. MaX co-design acts at different levels (chip, node, system) targeting HW relevant for the development of EuroHPC exascale machines. In addition to optimal performance, we will explore techniques to reduce energy to solution. The energy efficiency will be tackled from the hardware side, by deploying the codes on cutting edge processors and accelerators, and from the software side by using European energy aware runtime systems.

Obj I. Key results M13-30:
- The porting of MaX codes to accelerated architectures based on NVIDIA, AMD, and INTEL GPUs has been targeted for extension and further optimization. Benchmark runs on all pre-exascale machines have been collected.
- The intra- and inter-node performance of all codes has been continuously benchmarked and sensibly optimized. Large scale calculations exploiting hundreds of GPU-accelerated nodes have been demonstrated multiple times.
- A large number of external optimized libraries from the ecosystem (from linear algebra to advanced algorithms and solvers) have been tested on MaX codes and can be exploited in production runs.
A- ll codes have produced public stable releases (often more than one per year), with developments at an earlier stage made available to users via dedicated branches in public git repositories.
- While undergoing this technical development, the codes have been equipped with a number of new features (scientific capabilities, performance and robustness algorithms), aimed at addressing selected scientific grand challenges via exascale workflows, as developed by WP2.

Obj II. Key results M13-30:
- The set of identified target scientific grand challenges to be addressed by MaX exascale workflows has been consolidated and further extended with new use cases.
- Hero runs exploiting AiiDA on the Leonardo supercomputer (up to 3000 nodes) have been executed to train ML interatomic potentials.
- Previously identified Code interoperability requirements have been addressed and implemented.
- Technical tools aimed at workflow implementation such as the AiiDA orchestrator and the HyperQueue meta-scheduler have been further developed and adapted.
- FAIR data and storage: the implemented mirroring mechanism involving CSCS and CINECA has been extended to the Juelich Supercomputing Centre.

Obj III. Key results M13-30:
- Work focused on deploying the MaX flagship codes on many other EuroHPC systems monitoring and reporting their efficiency and performance.
- Notably, benchmark data on all the accelerated partitions of the EuroHPC pre-exscale machines, as well as preliminary data on Jupiter, have been collected and analyzed.
- Technical solutions based on advanced features and or programming models have been experimented in realistic runs of the flagship code.
- Deployment of MaX flagship codes on EuroHPC machines has been achieved, with a coverage of supported architectures of 98%.

Obj IV. Key results M13-30:
- MaX mini-apps have been fully extracted, made available in a dedicated repository, and exploited for codesign activities.
- Codesign: 11 technical co-design reports covering analysis and optimisation and 3 general reports providing information on EUPEX and SiPearl technology to the MaX consortium have been released.
- Advanced Hardware: MaX flagship codes have been tested on advanced hardware, relevant for the European HPC roadmap.
- Energy efficiency has been evaluated on multiple production systems, now including GPU-accelerated partitions, for all MaX codes (via Meric). The results show a gain on energy to solution of 10-19% on CPU, and 6-9% on GPU.

All codes produced public stable releases (often more than one) with developments at an earlier stage made available to users via dedicated branches in public git repositories. While undergoing this technical development, the codes were equipped with a number of new features (scientific capabilities, performance and robustness algorithms), aimed at addressing selected scientific grand challenges via exascale workflows, as developed by WP2.
Concerning FAIR data and storage, we implemented a mirroring mechanism that enables the data stored at CSCS to be also available from CINECA and from the Julich HPC centre.
Deployment of MAX flagship codes on EuroHPC machines was achieved on all the adopted inequivalent technologies.

Summary of energy savings on EUROHPC systems and of co-design progress

Chart from Comparison of strong scalability of newly released Yambo v5.3

Yambo code on Leonardo (M12)

Hero runs with AiiDA on the Leonardo supercomputer at CINECA, Booster partition

PoC for GPU exploitation M12

Deployment table of MaX codes on EuroHPC machines

Periodic Reporting for period 2 - MaX (MAterials design at the eXascale)

Télécharger Télécharger le contenu de la page