Skip to main content

Cross-Layer Abstractions and Run-time for
I/O Software Stack of Extreme-scale systems

Final Report Summary - CLARISSE (Cross-Layer Abstractions and Run-time for I/O Software Stack of Extreme-scale systems)

In the last decades scientific discovery has heavily relied on computational simulations mostly executed as parallel applications on large-scale supercomputers. As more and more data becomes available through better instruments, the growing ubiquity of sensors and increasing connectivity the requirements of these applications have shifted. Today, we need to fundamentally redesign the infrastructure and the operating system software for the novel age of data-intensive science.

The research project called CLARISSE (Cross-Layer Abstractions and Run-time for I/O Software Stack of Extreme-scale systems) focuses on investigating how to lay novel foundations to the data management in large-scale supercomputers from data centers around the world. University Carlos III of Madrid (UC3M) coordinates this project in cooperation with Argonne National Laboratory (USA), one of the key actors in research and development of system software for large-scale parallel supercomputers. The technologies developed in this project could be applied for significantly advancing the performance and scalability of parallel scientific and engineering applications such as climate modeling, new material design, astrophysics, genetics, and bioengineering.

The goal of the CLARISSE project is to increase the performance, scalability, programmability, and robustness of data management of parallel scientific applications in order to actively support the current interest in designing large-scale parallel computing infrastructures that are two orders of magnitude faster than today’s supercomputers. The key challenge is that the software for managing today’s supercomputers has been historically designed by several actors in an uncoordinated manner; today, this approach is a major obstacle to increasing the scale of current systems. The CLARISSE project is exploring a new solution to this problem by designing novel mechanisms and abstractions for coordinating data management at different system layers.

For achieving the broad goal mentioned above the CLARISSE project focussed on three main research objectives: 1) To investigate, design and implement control mechanisms for cross-layer dissemination of application hints, run-time feedback, notifications, and shipping of I/O functionality throughout the I/O software stack. 2) To explore algorithms and to design and implement mechanisms and policies for the adaptive control of the storage I/O data path in order to improve the I/O software stack scalability and resilience. 3) To study and develop techniques for exposing and exploiting data locality throughout the I/O software stack in order to reduce the storage I/O traffic and improve the performance.

In an initial exploratory phase we have identified, studied and executed a number of benchmarks and applications from various scientific domains, which generate or access a large amount of data through the common HPC I/O storage stack. For capturing various metrics at different layers of the I/O software stack we leveraged Darshan, a tool for application-level I/O characterization developed at Argonne. Based on domain knowledge and correlation analysis we significantly reduced the number of performance counters. We analyzed the performance of the current I/O stack and we identified a series of key problems that impact the scalability of existing solutions, which we discuss below. Based on this study we drawn a set of conclusions that guided us in the process of designing novel abstractions and a novel run-time for the I/O storage stack. We group these conclusions in four main categories: interference, file system access semantics, network topology and storage hierarchy performance. We developed both analytical and machine learning models for the data movement in the existing storage I/O stack. We concentrated our efforts on the collective I/O operations, as they already represent state-of-the-art optimizations in the storage I/O stack. In particular we studied the data flow in the most used collective I/O implementation, two-phase I/O. We were interested in understanding the following aspects: 1) the performance bottlenecks on current large machines 2) the performance limits when tuning for best-effort configurations 3) the impact of interference of other applications on the performance.

For addressing the first objective we worked on the design, implementation and evaluation of a novel cross-layer control backplane. After investigating various options for the whole software architecture, we opted for a design on three separate layers: a control backplane, a data backplane and a policy layer. The CLARISSE control backplane acts as a coordination framework that targets to support the global improvement of key aspects of data staging including load-balance, I/O scheduling, and resilience. A prototype of the CLARISSE backplane has been implemented as a publish/subscribe system. Furthermore, we have designed and implemented a distributed monitoring infrastructure that can be used for disseminating, filtering and aggregating of arbitrary numbers of metrics system-wide.

For addressing the second objective, we have designed and implemented a buffering substrate for the data plane, that can be used as an adaptive buffering mechanism to be deployed at various stack layers. Additionally, we have proposed a set of novel abstractions for the data plane: targets, data distributions, I/O contexts, I/O tasks, and task queues. Based on these abstractions and on the buffering substrate we have implemented two collective I/O methods whose data staging can be controlled by the CLARISSE backplane. On top of data and control plane, we have illustrated the capabilities of CLARISSE through 2 policy case studies: an elastic collective I/O and a parallel I/O scheduling policy.

For addressing the third objective we followed a codesign approach for showing that by trading off data locality and computational load balance we can substantially reduce the I/O traffic and improve the performance and scalability over existing practice. This involved parallel coordinated development of both Swift, a workflow management system, and CLARISSE data locality management in collaboration with Argonne researchers . The result of these efforts were novel locality-aware scheduling policies for scientific workflows. These new policies allow for the control of both data and task placement with various degrees of strictness: hard placement, soft placement (best effort), no-control. In addition to the contributions for the scientific computing community, our approach also represents a useful step for the expected convergence between HPC and big data.

Research infrastructures are key components of Europe’s competitiveness in research, as research excellence requires excellent infrastructures. One of the main European infrastructure-related priorities is addressing the challenges of performing scientific computing on Exaflop machines. The results from the CLARISSE project are expected to smooth the way toward reaching Exaflop scalability, by providing cross-layer mechanisms and a run-time for storage I/O, which will facilitate the redesign for scalability of I/O stack layers by addressing expected increases in concurrency and storage hierarchy depth. CLARISSE has provided a framework targeting to facilitate the redesign of I/O software stack and the development of highly scalable cross-layer optimizations.

The CLARISSE project created valuable knowledge of high relevance for research and industrial collaboration networks such as ETP4HPC, Hipeac, NESUS COST action. The participation of the fellow’s group at University Carlos III in these initiatives will guarantee that this knowledge will be adequately applied in designing the the High Performance Computing systems of tomorrow.

More details: