Targeting Real chemical accuracy at the EXascale

The CoE TREX “Targeting Real chemical accuracy at the EXascale” has commenced its activities on 1 October 2020.
TREX focus is the development and application of quantum mechanical simulations in the framework of stochastic quantum Monte Carlo (QMC) methods This methodology encompasses various techniques at the high-end in the accuracy ladder of electronic structure approaches and is uniquely positioned to fully exploit the massive parallelism of the upcoming exascale architectures.
TREX objective is to develop an open-source and high-performance software platform, which integrates TREX community codes (QMC: TurboRVB, QMC=CHEM, CHAMP, and NECI; quantum chemistry: GammCor and Quantum Package; machine learning: QML) in an inter-operable manner. This will be achieved by
● Building two libraries:
1) An I/O library in C for exchanging information among codes;
2) A QMC kernel library (QMCkl) for high-performance QMC simulations,
with two implementations:
a) an easy-to-read and easy-to-modify implementation in Fortran.
b) a high-performance library in C tailored to different architectures .
● Modernizing TREX codes and refactoring them to make use of these libraries.
● Integrating TREX codes in AiiDa for workload management and HTC.
● Integrating the platform with machine learning tools (QML) to “teach” TREX accuracy to more approximate electronic structure approaches.

flagship codes of TREX are run on the following supercomputer:
QMC=Chem: TGCC (FR), CINES (FR), CALMIP (FR), FZJ (DE), Argonne (USA)
CHAMP: SURFSARA (NL), FZJ(DE)
TurboRVB : CINECA (IT), FZJ (DE), FUGAKU (Kobe, Japan), SUMMIT (Oak Ridge, USA)
NECI: FZJ (DE), DPG (DE)
GammCor: FZJ (DE)
Quantum Package: TGCC (FR), CINES (FR), CALMIP (FR), CRIANN (FR), FZJ (DE), SURFSARA (NL), Argonne (USA)
All our codes are under open-source licenses.

a) Most TREX codes (CHAMP, TurboRVB, NECI, and GammCor) have been profiled with the MAQAO profiling tools from one of TREX partners.
Bottlenecks have been identified and modification of the codes has started with sizeable performance improvements:
● GammCor: 20% speedup on single core;
● CHAMP: 25% speedup on single core.
b) Construction and optimization of the QMCkl kernel library has started.
We have worked on a software component related to the so-called Jastrow correlation factor, which is an important part of the many-body wave functions used in QMC calculations. The relevant software has been “extracted” from the CHAMP code, rewritten in an easy-to-read fashion (via intrinsic-reference-to-parameters - IRP - programming tools), algorithmically improved, and further optimized by TREX computer scientists.
This has resulted in a speedup of up to 20x for the computation of the Jastrow factor on a single core, as shown in the Figure below, where we plot the speedup as a function of the number of electrons in the system. This implementation is also parallelized within shared memory, and is currently being ported to GPU.
c) The alpha-version of the I/O library for handling communication between TREX flagship codes has been released.
The I/O library has a developer-friendly front-end interface and can handle various types of files (text and HDF5 binary formats) as a back end.
The current release focuses on handling large files containing information on the many-body wave function. We are currently testing its interfacing with one of the flagship codes (CHAMP).
d) A user-friendly Fortran-based parser has been designed to parse the input files of multiple flagship TREX codes. This implementation can handle multiple data types and file formats.
Our TREX parser is based on the libfdf library developed within the E-CAM CoE.
e) This month, the flagship code TurboRVB has been further optimized during a GPU hackathon. The code can now run using NVIDIA HPC compiler on many different platforms (x86, POWER, and Arm) and is ready to run on the upcoming Leonardo pre-exascale supercomputer. The gain with respect to the CPU version is so far a factor of 8x but further optimizations are possible.

QMC codes can have an almost ideal scaling by spawning independent trajectories on different computing units, independently of the size of the problem. This parallelism will of course be exploited. The challenge we face is to take advantage of parallelism within a trajectory to reach the ergodic regime faster. Our goal is to be able to compute a single trajectory within a compute node taking advantage of accelerators and shared-memory parallelism. In large-scale simulations, the time to compute one Monte Carlo step in a trajectory is in the order of 0.01 CPU seconds, so memory transfers to/from GPU and synchronization barriers have a huge impact on performance. The algorithms need to be re-expressed to allow an asynchronous implementation, hiding the latencies of data transfers.
Our new implementation of the Jastrow factor reaches 80% of the peak node performance on a dual-CPU Skylake node. We expect to reach comparable performance measures for the most time-consuming kernels implemented in our library.

Periodic Reporting for period 1 - TREX (Targeting Real chemical accuracy at the EXascale)

Compartir esta página

Descargar