By 2023 High Performance Computers (HPC) should be able to compute 1018 operations per second – or Exascale
RECIPE addresses a crucial problem, namely to develop a software that is between the hardware and the applications and that is able to make the system reliable despite the increasing number of resources and the increasing time between failures.
RECIPE provides:
- A hierarchical runtime resource management infrastructure able to optimise energy efficiency and to minimise the occurrence of thermal hotspots. Such infrastructure will also enforce the time constraints imposed by the application, ensuring reliability for both time- critical and throughput-oriented computations;
- A predictive reliability methodology to support QoS in face of both transient and long-term hardware failures;
- A set of integration layers to allow the resource manager to interact with both the application and the underlying deeply heterogeneous architecture§;
- A simulation-based platform for validating the resource management policies at large scale.
RECIPE’s goals
1. To increase the energy efficiency of HPC systems by 25%, with an improvement of 15% of MTTF;
2. To improve the energy-delay product by up to 25%;
3. To reduce the occurrence of fault executions by 20% with recovery times compatible to real-time performance and full exploitation of available resources under non-saturated conditions.
RECIPE assessed its results against real world use cases, addressing key application domains:
1. Geophysical exploration: thanks to the efficient implementation of the RTRM, the resulting Full Waveform Inversion tool reduces the uncertainty of current seismic exploration surveys;
2. Environmental monitoring and meteoreology: the developed RTRM will improve the ability to keep the status of water basins under control and the behaviour of power plants exploiting renewable energy sources (RES) such as wind turbines;
3. Bio-medical machine learning and big data analytics: the developed software infrastructure will enable the deployment of the epileptic seizure detection algorithms in a prototype platform able to manage a large-scale population while meeting the real-time requirements of the application.
To enact this ambitious research and innovation program, the RECIPE project relies on a consortium composed of leading academic partners, including POLIMI, the largest technical University in Italy, providing expertise on resource management and programming models as well as scientific coordination; EPFL, the leading provider of thermal models for HPC; UPV, one of the key innovators in optimized interconnection networks, CeRICT, providing expertise on accelerators; as well as two supercomputing centers: BSC, one of the leading HPC providers in Europe with the MareNostrum, classed 13th in the Top 500 in June 2017, PSNC, another Top 500 HPC center in Poland; a research hospital from Switzerland, CHUV, and an SME active in product design and development, IBTS, which provides effective exploitation avenues through industry-based use cases.