Skip to main content

Middleware for memory and data-awareness in workflows

Periodic Reporting for period 1 - MAESTRO (Middleware for memory and data-awareness in workflows)

Reporting period: 2018-09-01 to 2020-02-29

Maestro Middleware for memory and data-awareness in workflows

The Maestro project will build a data and memory-aware middleware framework for High Performance Computing (HPC). This will provide a bridge between the operating system and the applications. The goal is to provide better control of the flow of data across the many layers of memory in HPC.

The problem

High Performance Computing (HPC) and High Performance Data Analytics (HPDA) opens up the opportunity to solve a wide variety of questions and challenges. The number and complexity of challenges that HPC and HPDA can help with are limited by the performance of computer software and hardware. Increasingly, performance is now limited by how fast data can be moved within the memory and storage of the hardware. So far, little work has been done to improve data movement.

How will Maestro help?

Maestro will develop a new framework to improve the performance of data movement in HPC and HPDA. The framework will consider two key components: data and memory.

Data movement awareness: Moving data in computer memory had not always been a performance bottleneck. Up until recently, performance was limited by the number of calculations that could be completed. Great improvements have been made in computational performance, but the software for memory has not changed during this time. Maestro will develop a better understanding of the performance barriers of data movement.

Memory awareness: memory in computer hardware is now increasingly complex. Historically, applications have been unaware of the particulars of memory layout. However, as memory becomes more complex, software performance is limited by data movement across the layers of memory. To improve software performance it is now important that software has an 'awareness' of memory and how to optimise data movement.

Societal impact

Maestro will develop a framework to improve performance of data movement in applications. By improving the ease-of-use of complex memory hierarchies Maestro will help by:
● improving the performance of software, and therefore the energy consumption and CPU hours used by software;
● encouraging the uptake of parallel computing and HPC systems by new communities by lowering the memory performance barrier.

Maestro has the potential to influence a broad range of human discovery and knowledge, as every computational application relies on data movement.
The project adopted a co-design process for designing the new Maestro middleware. For a relevant set of applications, which all require high-performance computing (HPC) capabilities, requirements have been identified and documented. On this basis the core middleware as well as different system software components designed and initial versions implemented. The system software components include tools for code analysis, a data-aware run-time, a prototype for an Maestro-aware workflow framework, a framework for guided I/O, telemetry tools as well as tools for dynamics provisioning of storage.

The main technical achievements of the initial part of the Maestro projects are the following:
● Establishment of detailed requirements, justified through relevant HPC use cases, to influence the design decisions for the Maestro middleware
● Specification of the core middleware and its first implementations
● Design of execution framework architecture as well as development of the access semantics
● Realisation of the first prototype working implementation of the MIO interface
● Specification of demonstrators for the Maestro technology as well as some early prototypes
Maestro is based on a completely new approach for creating a data- and memory-aware middleware layer. The holistic approach for addressing lacking data- and memory-awareness is in itself novel and a progress beyond current state of the art. More specifically, the project worked as planned on progressing state of the art in the following areas:
● Data models: A new approach has been specified and implemented that provide a much higher abstraction level than existing approaches
● Workflow management: A concept for introducing data- and memory-awareness in existing workflow frameworks have been formulated
● Dynamic Provisioning: A new solution for dynamic provisioning of storage has been implemented and demonstrated
Maestro architecture overview