Skip to main content

DEEP - Extreme Scale Technologies

Periodic Reporting for period 3 - DEEP-EST (DEEP - Extreme Scale Technologies)

Okres sprawozdawczy: 2020-07-01 do 2021-03-31

With its results on the hardware (HW), software (SW) and application fields, the DEEP-EST project strongly contributes to the convergence of High Performance Computing (HPC) and High Performance Data Analytics (HPDA). The benefit for the general European society is two-folded: i) New scientific and technical results are achieved that bring advances in fields like neuroscience, medicine, or climate and weather forecast; ii) New HPC technologies and products are developed in Europe, contribution to increase its digital sovereignty and the jobs associated to it.

Project Objectives
1 Develop an energy efficient system architecture that fits HPC and HPDA workloads, and satisfies the requirements of end-users and e-infrastructure operators -> Achieved
2 Build a fully working Modular Supercomputing Architecture (MSA) system prototype -> Achieved
3 Foster European technologies -> Achieved
4 Build a resource management and scheduling system fully supporting the MSA -> Achieved
5 Enhance and optimise programming models -> Achieved
6 Validate the full HW/SW stack with relevant HPC and extreme data workloads, and demonstrate the benefits of the MSA -> Achieved
Six European HPC application teams participated in DEEP-EST providing a total of 15 codes:
• NMBU: Neuroscience: NEST, Arbor, Elephant
• NCSA: Molecular dynamics: GROMACS
• Astron: Radio astronomy: (FPGA and GPU) Imager, GPU Correlator
• KULeuven: Space weather: xPic, GMM, DLMOS
• UoI: Data analytics in Earth science: NextDBSCANN, NextSVM, deep learning frameworks
• CERN: High energy physics: CMS Reconstruction, CMS classification

These applications contributed to DEEP-EST both through co-design and validation of the HW and SW developed in the project, and are now themselves in a much better position to exploit the performance provided by Exascale modular/heterogeneous supercomputers. With these codes and synthetic benchmarks, the DEEP-EST prototype has been modelled and regularly benchmarked to evaluate and measure its performance, scheduling, scaling characteristics, and energy efficiency.

The DEEP-EST HW prototype consists of three compute modules: Cluster Module (CM), Extreme Scale Booster (ESB) and Data Analytics Module (DAM). These are complemented by two service modules: Scalable Storage Service Module (SSSM), and AllFlash Storage Module (AFSM). The construction of the DEEP-EST prototype, and its ESB module in particular, involved a number of technology innovations. The DEEP-EST prototype has become a leading-edge system with production quality features, when considering its size, performance, power efficiency, and overall quality. In fact, it is operated and used for project partners and external users as a production system.

The lower layers of the DEEP-EST SW-stack (e.g. network bridging or ressource management) have been adapted to provide best support for the underlying HW, while hiding these modifications from the end user. The DCDB and Wintermute frameworks from BADW-LRZ make the collection of a wide variety of system-monitoring information accessible to users and operators through Grafana. The programming environment abstracts the MSA HW complexity behind the interfaces and parallel programming paradigms that have become de-facto standard in HPC: MPI and OpenMP, in the specific implementations ParaStation MPI and OmpSs-2. MPI and OpenMP are complemented by parallel programming tools for acceleration devices, and frameworks for machine learning and deep learning applications. Resiliency and IO functionality has been implemented in the FTI library, and BeeGFS and SIONlib, respectively.

Exploitation: The DEEP-EST consortium has advanced the development of key European technologies, which can now be exploited by the respective partners:
• Megware: energy efficient integration, amplified portfolio with new accelerated node designs.
• EXTOLL, by its Fabri³ integrated fabric switch, can acquire strategic customers to realize its next-gen network chip. The network attached technologies NAM and GCE will be exploited as Fabri³-attached or as standalone products.
• ParTec: brings the new features into future versions of ParaStation Modulo, for which ParTec offers commercial support.
• FHG-ITWM with its spin-off ThinkParQ offer commercial support for BeeGFS and its new features.
• BSC: improved OmpSs-2 (promoting them in the OpenMP standardization committee) plus Extrae/Paraver/Dimemas performance analysis and modelling tools, and scheduling models.
• BADW-LRZ: system monitoring tool DBDC used on ts Tier-0 and Tier-1 systems.
• JUELICH: Open Source JUBE benchmarking environment and SIONlib I/O concentrator library.
• Application developers (NMBU, KU Leuven, Astron, UoI, CERN, and NCSA) exploit their improved codes to advance science in their respective research fields.

Dissemination: Scientific results have been published in 42 open access articles and events, including several training events. The website is the main hub of project content, and social media channels (Twitter, LinkedIn, Facebook) have increased attention to the project events and materials. Recently a new Podcast channel on SoundCloud has been added. DEEP EST supports in particular initiatives that foster diversity in HPC. An Early Access Program opened the DEEP-EST prototype and its SW to the wider HPC community. The DEEP-EST dissemination and communication strategy will continue in the soon starting DEEP-SEA project.
The varied results created in DEEP-EST contribute directly to the achievement of the EuroHPC JU objectives and the ETP4HPC Strategic Research Agenda.

DEEP-EST introduced and demonstrated the MSA, which is already being adopted by the international HPC community, e.g. by the EuroHPC Petascale system MeluXina in Luxembourg and the pre-Exascale system Leonardo in Italy (both with installation scheduled in 2021). Worldwide the trend towards modularity is observed in the planned Exascale system Tianhe-3 in China, and the Japanese Wisteria.

DEEP-EST has advanced key EU technologies: the energy efficient integration by Megware; the EXTOLL network; the cluster management and middleware SW by ParTec; the file system BeeGFS by FHG-ITWM; the OmpSs programming environment and the Extrae/Paraver/Dimemas tools by BSC; the JUBE benchmarking environment and SIONlib I/O library by JUELICH; and the DCDB energy monitoring and analysis software toolset by LRZ.

DEEP-EST has also improved 15 European application codes, impacting positively on neuroscience, molecular dynamics, radio astronomy, space weather, Earth science, and high energy physics. These codes run now more efficiently, achieve a better performance, and scale better, what increases their potential for contribution to the advancement of fields with high societal impact, such as medicine, drug design, or Earth observation.

The MSA development roadmap continues within the EuroHPC JU "SEA-" and "Pilot" projects: DEEP-SEA will make the MSA programming more dynamic; IO-SEA will improve the IO-capabilities; and RED-SEA will develop network technologies for better for intra- and inter-module communication; the EUPEX pilot will build a modular pilot system integrating EU-technologies; and the HPCQS project will bring a Quantum module into the MSA.

With this elaborated development roadmap, the results of the DEEP-EST project are in a very good position to reach the target of their inclusion in the first European Exascale platforms.
DEEP-EST Project Logo
DEEP-EST Modular Supercomputing Architecture
DEEP-EST Software
DEEP-EST Prototype