Skip to main content

Runtime Exploitation of Application Dynamism for Energy-efficient eXascale computing

Periodic Reporting for period 1 - READEX (Runtime Exploitation of Application Dynamism for Energy-efficient eXascale computing)

Reporting period: 2015-09-01 to 2017-02-28

High Performance Computing (HPC) is an indispensable tool for researchers and industry. It is used to simulate and predict climate change, analyse tectonic movements, or create weather forecasts. Like any other computing device (e.g. laptops or smartphones), HPC systems are constantly becoming faster, which enables more fine-grained simulations, e.g. for better weather forecasts. However, another optimization target, energy efficiency, has not been in the focus of HPC for a long time. This changed in recent years. For example, the ETP4HPC Strategic Research Agenda lists energy as a technical research priority.
The READEX project , funded by the European Union‘s Horizon 2020 research and innovation programme under grant agreement No 671657, targets the energy efficiency of HPC systems. The project partners come from different fields: embedded systems and HPC, academia and industry.
READEX targets to establish a software tool suite that can be used to increase the energy efficiency of HPC applications. The energy efficiency can be increased by different parameters. One of these parameters is, for example, to reduce the processor clock frequency when the application does not profit from a high processor frequency. Other parameters include lowering the parallelism of programs, or the selection of different implementations of software routines. The software tool suite consists of different parts: first the application that is to be tuned, i.e. the target application, is instrumented, to distinguish different regions. Second, the instrumentation is used to test different parameter configurations before the target application is executed in production. Here, single regions and phases of a program are classified into scenarios, where one scenario defines a set of optimal parameters for the region. These scenarios are stored in an application tuning model (ATM). This process is called design-time analysis (DTA). Afterwards, the instrumentation is reused when the application is executed in production for runtime tuning. Whenever a known scenario is observed, parameters are set according to the ATM. The effect of these measures can be a reduced runtime of the target application or a reduced power consumption of the HPC system, where both measures can increase the energy efficiency. An example is shown in Figure 1.

Summary of the context and overall objectives of the project
The rising power consumption of HPC systems increases costs for operators of HPC centres. However, a manual optimization of program codes and an analysis of energy-efficient software and hardware configurations for parallel applications is a time-consuming and ineffective task. Hence, software that automatizes this process is preferable. The READEX project implements such a process with scalability and runtime efficiency in mind.
While the average performance of HPC systems increases over time, their power consumption also rises. This increases the air pollution and contributes to global warming. Furthermore, the power bill of HPC systems hosted at academia sites is paid via taxes. Here, tax payers’ money can be saved, when energy efficiency is increased. Likewise, energy savings on computing simulations executed at industry-sites can reduce product-costs or increase the competitiveness of European manufacturers.

READEX Objectives
Objective 1: Static Energy Efficiency Tuning
Static energy efficiency tuning aims at exploring the effects of optimizing parameter settings for whole application runs. While this approach can cover applications that provide a uniform behaviour over the whole program run, it is not able to cope with changing application requirements. However, some components implemented for dynamic tuning can also be used for this approach and vice-versa.

Objective 2: Manual Energy Efficiency Tuning
Manual energy efficiency tuning needs user interaction and depends on a previous knowledge of the code. Even though higher energy savings can be achi
Within the first 18 project months, the project partners implemented most of the basic functionality of the tool suite, tested hardware and software mechanisms to increase energy efficiency, reached out to other researchers, and published their findings in scientific papers.

The project partners defined interfaces between the single software libraries that had to be developed. They also investigated hardware and software parameters that can be targeted for energy efficiency tuning. One example is given in Figure 2, where the impact of processor core and uncore frequency on power consumption is shown. These parameters are then changed during DTA and runtime tuning. Furthermore, the project partners implemented an alpha prototype that implements the tuning steps DTA, and runtime tuning. The single parts of the READEX tool suite and, in comparison, manual effort have been used by the project partners to evaluate dynamism and energy saving potential of HPC applications (see Figure 3).

The project partners reach out to the general public, and the scientific communities via website, Twitter, ResearchGate, YyouTtube, scientific publications, and active participation at workshops and conferences.
Given that the target audience of the READEX project is the HPC community there are only indirect effects to the general public. These are described above. Furthermore, the READEX project created interfaces that can be used for purposes beyond energy efficiency tuning. For example, they have also been used for debugging purposes, and to modify existing codes to lower the overhead of performance monitoring too.
Figure 1: Default (top) and tuned (bottom) execution of OpenMP parallel Block Tri-diagonal solver.
Figure 3: Evaluation of the tuning potential for selected applications and software kernels
Figure 2: Heatmap of the energy consumption of a stream benchmark for different core and uncore freq