Novel system software approach for more powerful supercomputers
Software for managing today’s supercomputers has been historically designed by several actors in an uncoordinated manner. This approach is an obstacle to increasing the scale of current systems. The EU-funded CLARISSE (Cross-layer abstractions and run-time for I/O software stack of extreme-scale systems) project sought to boost the performance, scalability, programmability and robustness of data management of parallel scientific applications. This will contribute to the design of large-scale parallel computing infrastructures that are two orders of magnitude faster than current supercomputers. Project partners investigated, designed and implemented control mechanisms for cross-layer dissemination of application hints, run-time feedback, notifications and shipping of input/output (I/O) functionality throughout the I/O software stack. They built a prototype control backplane for use as a publish/subscribe system, and a distributed monitoring infrastructure to disseminate, filter and aggregate arbitrary numbers of metrics system-wide. To improve I/O software stack scalability and resilience, researchers explored algorithms, and designed and implemented mechanisms and policies for the adaptive control of the storage I/O data path. These include a buffering substrate and a set of novel abstractions for the data plane. They implemented two collective I/O methods whose data staging can be controlled by the backplane. The CLARISSE team studied and developed techniques for exposing and exploiting data locality throughout the I/O software stack in order to reduce storage I/O traffic and improve performance. Results show that by trading off data locality and computational load balance, I/O traffic can be substantially reduced while improving performance and scalability over existing practice. The project introduced locality-aware scheduling policies for scientific workflows that allow for the control of both data and task placement with various degrees of strictness. Coordinating data management at different system layers has become feasible thanks to CLARISSE. The developed technologies could be applied to advancing the performance and scalability of parallel scientific and engineering applications such as climate modelling, material design, astrophysics, genetics and bioengineering.
Keywords
CLARISSE, cross-layer abstractions, run-time, software stack, extreme-scale systems