Programming models get ready for exascale computing
Imagine 100 million processes being run in parallel on a single computer, or 50 million laptops connected to each with a super-fast network. That’s what exascale computing is. Whilst current computers can be programmed with the likes of Java or Python, special software is required to allow the exchange of data between all these processes or laptops. This is where message-passing and PGAS come into play. ‘We believe these two software programmes will be key to enabling this exchange of data, as they are already working very well on today’s largest supercomputers’ says Stefano Markidis, EPIGRAM project manager and Assistant Professor in high-performance computing at KTH, Sweden. Running these programming models on exascale computers, however, is a different story. The amount of memory needed for the storing process would be immensely huge, to the point where the memory footprint is sure to become a serious problem. Making these processes operate collectively or in sync would require algorithms much more advanced than the ones currently available. At the same time, a single system combining existing programming models would, to this day, be unable to run efficiently. Researchers have been addressing these key challenges – extreme parallelism and interoperability – under a single project named EPIGRAM (Exascale Programming Models). The project focused on message-passing and PGAS, and more specifically on the improvement of two of their associated programming systems, MPI and GPI. ‘Scientific applications often have several synchronisation points at which the fastest laptop waits for the slowest one to catch up. That works with a few laptops, but when having consider 50 million of them, you need to start thinking about algorithm synchronisation. This is what we did under EPIGRAM. We improved the performance of communication operations on a very large number of processes by decreasing their memory consumption, improving collective operations and introducing emerging computing models,’ Markidis explains. They also enhanced the interoperability of MPI and GPI by integrating them in one MPI implementation called EMPI4Re. From space weather to fluid dynamics Once developed, EPIGRAM concepts had to be tested in large-scale applications with exascale potential. The consortium opted for space weather and fluid dynamics with the development of iPIC3D, a Particle-in-Cell code for space physics simulations, and a kernel for Nek5000, a computational fluid dynamics (CFD) code. Both applications can scale up to 1 million cores and rely respectively on C/C++ and Fortran – the most used programming languages in High-Performance Computing. ‘We are extremely satisfied our pilot applications,’ Markidis says. ‘iPIC3D improved performance by a factor of three when compared to previous implementation, and we managed to develop a new simplified Nek5000 communication kernel that can be used by other projects.’ Two HPC projects under Horizon 2020 are already using the work done under EPIGRAM: the INTERTWINE project, which builds upon EPIGRAM’s interoperability work and has iPIC3D as a pilot application, and the EXAFLOW project which uses Nek5000 and adopted EPIGRAM’s new Nek5000 communication kernel. ‘We would like to have a direct follow-up project in the near future,’ Markidis says. ‘But in the meantime, we can already see how our work on MPI is having an impact on HPC application developers. We presented our concepts to application developers from different domains, and we expect that some of our concepts, like an isomorphic collective, will be picked up and adopted by the community in the near future.’ The project’s work on standardisation was critical for adoption of new MPI and GPI features by people developing application codes for supercomputers. ‘Application developers are ensured that new features are going to be implemented and are not likely to be modified,’ Markidis explains. EPIGRAM concepts, which were tested using a micro-benchmark, do not only quicken the execution time of various MPI and GPI operations, but it also considerably reduces the amount of memory needed for such operations. The project showcased the limitations of some concepts in MPI, and strongly impacted the work of GPI-2 implementers thanks to the development of scalable dynamic connections.