Final Report Summary - PICSSAR (Development of a new generation of highly scalable and accurate 3D Particle-In-Cell codes)
                                Context and goal of the project: The success of PetaWatt (PW) laser facilities presently under construction, which aim at producing promising particle and light sources from relativistic laser-plasma interactions, will rely on the strong coupling between experiments and large-scale simulations with Particle-In-Cell (PIC) codes. Standard PIC codes currently in use fail to accurately describe these new interaction regimes partly because the finite difference (FDTD) Maxwell solver used to compute electromagnetic fields generates strong instabilities when particles move at relativistic velocities (e.g numerical dispersion, numerical Cerenkov instability) . At present, the mitigation of these instabilities requires the use of very high resolution, which dramatically increases the computation time, and prevents realistic 3D modeling. Our project aims at building a new generation of highly accurate PIC codes, which will enable realistic 3D simulations of these yet unexplored interaction regimes. These PIC codes will use highly precise very high order/pseudo-spectral methods to solve Maxwell’s equations. Despite their accuracy, such methods have however hardly been used so far, due to the low MPI scalability of the global Fourier transform to 10,000s of cores only, which is not enough to take advantage of supercomputer architectures required for 3D modeling. In this context, the goal of the project was to design massively parallel pseudo-spectral solvers (scalable to a million cores) to enable 3D PIC simulations of PW laser-plasmas interactions previously out of reach of standard codes. 
Objectives of the project: The two main objectives of the projects were to (i - outgoing phase -LBNL/USA) design, implement and test a new parallelization technique for pseudo-spectral Maxwell solvers that would enable scaling of pseudo-spectral PIC codes on up to a million cores as required for 3D simulations (ii - return phase CEA Saclay/France) Benchmark these new solvers by confronting 3D pseudo-spectral PIC simulations of relativistic laser plasma interactions to real laser-plasma experiments performed at CEA Saclay on the UHI100 100 TerraWatts (TW) laser.
Results of the project: the two objectives of the project were successfully met, which has resulted in the production of several important deliverables:
- The design, implementation and benchmarking of the open source exascale library PICSAR ('Particle-In-Cell Scalable Application Ressource'). The library PICSAR contains the highly optimized massively parallel pseudo-spectral solvers developed by the fellow during the project that are now accessible by the entire laser-plasma community thanks to this project. PICSAR contains also highly optimized versions (on current petascale and future exascale computers) of the most time consuming part of the PIC algorithm. PICSAR is fully documented on the website page (https://www.picsar.net(s’ouvre dans une nouvelle fenêtre)) that contains all publications related to PICSAR as well as relevant code documentation. PICSAR has now been successfully coupled to the 3D PIC codes WARP (http:warp/lbl/gov) and SMILEI (http://www.maisondelasimulation.fr/smilei/(s’ouvre dans une nouvelle fenêtre)) that now benefit from all the cutting-edge implementations already done in PICSAR.
- A large number of publications (9) in high impact physical/computational journals (cf. publication list) as well as two book chapters.
Below, we further detail the different achievements that were done as part of objectives (i) and (ii) listed above.
I. Achievements made as part of objective 1 - To enable parallelization of pseudo-spectral solvers on up to a million cores, J-L Vay et al (collaborator@outgoing host) recently proposed to adopt a cartesian domain decomposition parallelization technique. In this technique, the simulation grid would be split in different cartesian subdomains (handled by different MPI processes), each one with guard regions at their edges that hold copies of fields from adjacent subdomains. At each time step, Maxwell's equations would be solved locally (using local FFTs - not global FFTs) by each process and guard cells updated from fields of adjacent subdomains. This technique is local and would be thus highly scalable. However, very high order solvers or pseudo-spectral solvers (infinite order) would in principle require a very large number of guard cells to avoid truncation errors at subdomain edges when solving Maxwell's equations with local FFTs. Here, we demonstrated analytically that actually, these errors remain very low even for a moderately low number of guard cells, therefore enabling scaling of this technique to a large number of cores. The physical argument underpinning this analytical result is the following: as truncation errors produced in guard cells propagate at finite speed (speed of light), if enough guard cells are used, these errors would remain in the guard regions during one time step (which are erased when exchanged each time step) and would therefore not enter the simulation domain. In the following, we provide a detailed list of achievements related to this objective.
- Achievement #1 (publication #1 - cf. list of publications): we first demonstrated that truncation errors do not grow during the simulations and are actually lower than machine precision for a large enough number of guard cells. To this end, the fellow developed an analytical model that can predict the amplitude and phase of truncation errors as a function of physical and numerical parameters (angle of propagation of light, light wavelength, time step, mesh size, solver order, number of guard cells). Thanks to this model, it is now possible to predict the number of guard cells required to keep truncation errors below a a certain level required to avoid spurious numerical artefacts in the simulations.
- Achievement #2 (publication #5): we then implemented this technique in the high performance library PICSAR. We benchmarked our parallel implementation on the largest clusters available in the US (MIRA/ALCF and CORI-KNL/NERSC). We notably demonstrated that our implementation enables scaling of pseudo-spectral solvers on up to a million cores.
- Achievement #3 (publication #2): As the parallelization technique may employ a large number of guard cells, MPI domain sizes need to be larger (i.e. the total number of MPI processes needs to be reduced for a given problem), which calls for efficient implementation of guard cells exchanges between subdomains as well as a very good shared memory parallelization of the whole PIC algorithm. Along this line, considerable work has been performed to enable highly efficient OpenMP shared-memory implementation of the hotspot routines of the PIC algorithm. Optimizations achieved exploit the three levels of parallelism made accessible on modern architectures and include: design of new data structures to allow for much better memory locality (particle tiling) and dynamic load balancing between OpenMP threads, portable vectorization algorithms of particle routines on emerging architectures, improved MPI stencil communications for guard cells exchanges. Thanks to these cutting edge optimizations, PICSAR is now at the forefront of advanced computing and is ready to the transition to exascale computing. It starts to be widely used by the PIC community to help transitioning their PIC codes to future exascale computers.
II. Achievements made as part of objective 2 - During the return phase, PICSAR was used to perform the first realistic 3D simulations of laser-plasma mirror interaction in the relativistic regime. This allowed to benchmark PICSAR against standard codes as well as laser-plasma mirror experiments performed at CEA Saclay with the 100TW laser UHI100.
- Achievement #1 (publications #3,4,5,7): First, we benchmarked PICSAR against standard codes on 3D simulations of the production of electrons and high order harmonics on plasma mirrors. These benchmarks revealed that the new massively pseudo-spectral solvers implemented in PICSAR brings a considerable speed-up over FDTD solvers to achieve a given accuracy. In particular, PICSAR brings a speed-up of x700 in terms of time-to-solution compared to standard codes to reach convergence.
- Achievement #2 (publications #4,5): Then, we benchmarked PICSAR against several experiments of electron and harmonic generation performed at CEA Saclay. In each case, PICSAR was abled to reproduce with high fidelity very fine features observed in experiments and that could not be observed before with standard codes. This will considerably help the laser-plasma community in the understanding and modeling of experimental results performed at high power laser facilities.
Project website: https://www.picsar.net(s’ouvre dans une nouvelle fenêtre)
						
                        
                        					
                    
                    
                    
                    
                    
                                        
                    
                                        
				Objectives of the project: The two main objectives of the projects were to (i - outgoing phase -LBNL/USA) design, implement and test a new parallelization technique for pseudo-spectral Maxwell solvers that would enable scaling of pseudo-spectral PIC codes on up to a million cores as required for 3D simulations (ii - return phase CEA Saclay/France) Benchmark these new solvers by confronting 3D pseudo-spectral PIC simulations of relativistic laser plasma interactions to real laser-plasma experiments performed at CEA Saclay on the UHI100 100 TerraWatts (TW) laser.
Results of the project: the two objectives of the project were successfully met, which has resulted in the production of several important deliverables:
- The design, implementation and benchmarking of the open source exascale library PICSAR ('Particle-In-Cell Scalable Application Ressource'). The library PICSAR contains the highly optimized massively parallel pseudo-spectral solvers developed by the fellow during the project that are now accessible by the entire laser-plasma community thanks to this project. PICSAR contains also highly optimized versions (on current petascale and future exascale computers) of the most time consuming part of the PIC algorithm. PICSAR is fully documented on the website page (https://www.picsar.net(s’ouvre dans une nouvelle fenêtre)) that contains all publications related to PICSAR as well as relevant code documentation. PICSAR has now been successfully coupled to the 3D PIC codes WARP (http:warp/lbl/gov) and SMILEI (http://www.maisondelasimulation.fr/smilei/(s’ouvre dans une nouvelle fenêtre)) that now benefit from all the cutting-edge implementations already done in PICSAR.
- A large number of publications (9) in high impact physical/computational journals (cf. publication list) as well as two book chapters.
Below, we further detail the different achievements that were done as part of objectives (i) and (ii) listed above.
I. Achievements made as part of objective 1 - To enable parallelization of pseudo-spectral solvers on up to a million cores, J-L Vay et al (collaborator@outgoing host) recently proposed to adopt a cartesian domain decomposition parallelization technique. In this technique, the simulation grid would be split in different cartesian subdomains (handled by different MPI processes), each one with guard regions at their edges that hold copies of fields from adjacent subdomains. At each time step, Maxwell's equations would be solved locally (using local FFTs - not global FFTs) by each process and guard cells updated from fields of adjacent subdomains. This technique is local and would be thus highly scalable. However, very high order solvers or pseudo-spectral solvers (infinite order) would in principle require a very large number of guard cells to avoid truncation errors at subdomain edges when solving Maxwell's equations with local FFTs. Here, we demonstrated analytically that actually, these errors remain very low even for a moderately low number of guard cells, therefore enabling scaling of this technique to a large number of cores. The physical argument underpinning this analytical result is the following: as truncation errors produced in guard cells propagate at finite speed (speed of light), if enough guard cells are used, these errors would remain in the guard regions during one time step (which are erased when exchanged each time step) and would therefore not enter the simulation domain. In the following, we provide a detailed list of achievements related to this objective.
- Achievement #1 (publication #1 - cf. list of publications): we first demonstrated that truncation errors do not grow during the simulations and are actually lower than machine precision for a large enough number of guard cells. To this end, the fellow developed an analytical model that can predict the amplitude and phase of truncation errors as a function of physical and numerical parameters (angle of propagation of light, light wavelength, time step, mesh size, solver order, number of guard cells). Thanks to this model, it is now possible to predict the number of guard cells required to keep truncation errors below a a certain level required to avoid spurious numerical artefacts in the simulations.
- Achievement #2 (publication #5): we then implemented this technique in the high performance library PICSAR. We benchmarked our parallel implementation on the largest clusters available in the US (MIRA/ALCF and CORI-KNL/NERSC). We notably demonstrated that our implementation enables scaling of pseudo-spectral solvers on up to a million cores.
- Achievement #3 (publication #2): As the parallelization technique may employ a large number of guard cells, MPI domain sizes need to be larger (i.e. the total number of MPI processes needs to be reduced for a given problem), which calls for efficient implementation of guard cells exchanges between subdomains as well as a very good shared memory parallelization of the whole PIC algorithm. Along this line, considerable work has been performed to enable highly efficient OpenMP shared-memory implementation of the hotspot routines of the PIC algorithm. Optimizations achieved exploit the three levels of parallelism made accessible on modern architectures and include: design of new data structures to allow for much better memory locality (particle tiling) and dynamic load balancing between OpenMP threads, portable vectorization algorithms of particle routines on emerging architectures, improved MPI stencil communications for guard cells exchanges. Thanks to these cutting edge optimizations, PICSAR is now at the forefront of advanced computing and is ready to the transition to exascale computing. It starts to be widely used by the PIC community to help transitioning their PIC codes to future exascale computers.
II. Achievements made as part of objective 2 - During the return phase, PICSAR was used to perform the first realistic 3D simulations of laser-plasma mirror interaction in the relativistic regime. This allowed to benchmark PICSAR against standard codes as well as laser-plasma mirror experiments performed at CEA Saclay with the 100TW laser UHI100.
- Achievement #1 (publications #3,4,5,7): First, we benchmarked PICSAR against standard codes on 3D simulations of the production of electrons and high order harmonics on plasma mirrors. These benchmarks revealed that the new massively pseudo-spectral solvers implemented in PICSAR brings a considerable speed-up over FDTD solvers to achieve a given accuracy. In particular, PICSAR brings a speed-up of x700 in terms of time-to-solution compared to standard codes to reach convergence.
- Achievement #2 (publications #4,5): Then, we benchmarked PICSAR against several experiments of electron and harmonic generation performed at CEA Saclay. In each case, PICSAR was abled to reproduce with high fidelity very fine features observed in experiments and that could not be observed before with standard codes. This will considerably help the laser-plasma community in the understanding and modeling of experimental results performed at high power laser facilities.
Project website: https://www.picsar.net(s’ouvre dans une nouvelle fenêtre)
 
           
        