CORDIS - Forschungsergebnisse der EU
CORDIS

Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon

Periodic Reporting for period 3 - EuroEXA (Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon)

Berichtszeitraum: 2019-12-01 bis 2021-12-31

HPC has been identified as one of the key pillars of the Digital Single Market (DSM) strategy adopted by the European Commission recognising its capacity to promote science, industrial innovation and ultimately
social prosperity. Societal challenges, human curiosity and industrial innovation demand solutions to problems with higher quality, in shorter time, and at larger scales. We require the solution of new and complex challenges in global climate change, air pollution, high-frequency trading, huge social networks, personalised healthcare, energy-efficient combustion engines, optimised design of new materials and many others. Several critical research areas and problem classes such as weather prediction with fine granularity, climate change, large eddy simulation for turbulence modeling in aeronautics, the challenges in fusion research, to name a few, are beyond current computing capabilities and need exaflop-level performance and even more. Unfortunately, scaling up to exascale is far from trivial and can by no means rely upon conventional scaling approaches embraced during the last fifteen years. Current practices for building supercomputers rely on architectures, microprocessors, accelerators and memory modules that were designed and optimised to cope with the demands of different markets (desktop, server, graphics, gaming, etc.) glued together with high-performance interconnection networks. While this was a viable and cost-effective approach to drive systems up to the current scale, it cannot lead to the exascale in a straightforward way.

HPC is well known for the gap between the theoretical peak performance of an actual platform and the achieved performance when running real applications. To reduce this disparity, a platform must be designed based on a thorough understanding of the applications and the system software, while the applications themselves must leverage the full capabilities of the underlying hardware and software stack. EuroEXA targeted the design and implement of a new system architecture that better balances the required computing resources compared to today’s systems, supporting the acceleration of key applications. To accomplish this, we followed a system-level co-design approach with appropriate employment of a wide range of typical and non-conventional HPC codes.
EuroEXA is delivering innovative elements across the stack, in particular:

HPC architecture: EuroEXA proposed and implemented a novel, open, HPC architecture that supports multi-node accelerator connection based on the UNIMEM technology, an overlay interconnect (called Trifecta) that optimises on three levels bringing data and communication together at various levels of the peer hierarchy, and a new compute board (called CRDB) that directly connects network and storage with the accelerator.

Compute node: EuroEXA designed an enhanced General Purpose Processor with transaction-native communication and integrated memory compression in the memory subsystem. Although the node did not reach the implementation phase within the frame of the project, the developments were solid enough to support further exploitation.

Interconnection network: Being an integral part of its architecture, the Trifecta network is a multi-layered, scalable, heterogeneous topology interconnect architecture incorporating the best topology at each layer. Trifecta is implemented with commodity hardware (FPGAs) and based on European IP developed within the project.

Infrastructure: EuroEXA designed and implemented a meta-modular infrastructure with high density of equipment, high efficiency and high effectiveness. The infrastructure demonstrates innovations in Total Liquid Cooling and Consolidated Power delivery, whilst maintaining interoperability with Open Compute and COM Express. Key performance indicators are demonstrated on a large thermal proxy that has evolved to commercial need, whilst delivering a modular data centre to support EuroEXA test beds.

System software and runtime systems: EuroEXA validates the viability of its approach through an end-to-end integration of its components and IP. In addition, it adapts and enhances a number of key runtime systems for HPC including a EuroEXA-optimized MPI implementation, OmpSs@FPGA, Maxeler and UNIMEM-enhanced HLS for FPGA acceleration and the integration of FPGA support in GPI.

Applications: EuroEXA supported a large campaign of application porting and optimization on top of the proposed architecture and supported runtime systems. The goal is twofold: on one hand to validate the underlying technology with real applications and on the other hand to port and optimize the applications themselves to utilize the resources of a EuroEXA-based HPC system.
EuroEXA targeted the development of competitive European technology across the entire HPC stack, including data center infrastructure, system architecture, compute node and acceleration, programming models and runtime systems and optimization of applications from key sectors. EuroEXA contributes to the European HPC ecosystem, among others, the UNIMEM system architecture, the CRDB compute node, the Trifecta interconnect, a meta-modular infrastructure with high density, enhanced versions of key programming frameworks (OmpSS@FPGA, Maxeler, GPI) together with the human skills further developed to implement and advance these technologies.