Exascale is characterized not just by Exaflop computational capability, but also by massive volumes of data generated by simulations running on such systems and increasingly by data generated through massive scientific experiments, crowdsourcing, and expanding sensor networks continually multiplying the volume of data. Such data must be analysed to derive valuable insights through which innovations and understanding are made possible in a vast spectrum of domains such as physics, computational biology, neuroscience, pharmaceutics, energy, and industrial manufacturing - which is critical for societal scientific and technological progress. The SAGE project, which incorporates research and innovation in hardware and enabling software, will significantly improve the performance of data access and enable computation and analysis to be performed more locally to data wherever it resides in the architecture, drastically minimising data movements between compute and data storage infrastructures. With a seamless view of data throughout the platform, incorporating multiple tiers of storage from memory to disk to long-term archives, it will enable Application Programming Interfaces and programming models to easily use such a platform to efficiently utilize the most appropriate data analytics techniques suited to the problem space.
The following are the overall objectives of the SAGE project:
• Provide a next-generation multi-tiered object-based data storage system (hardware and enabling software) supporting current and future-generation persistent storage media, (solid-state and disc) within an I/O hierarchy . We term this “Percipient Storage”.
• The project;
o Redefines the storage subsystem as an integral part of the computational infrastructure.
o Provides integrated computational capability anywhere in the storage system.
• Significantly improves the overall scientific output through advancements in systemic I/O performance and latency, and drastically reduces data movements hence improving energy efficiency by:
o providing the ability to flexibly move appropriate computational workloads to where the data resides
o providing a storage architecture built from the ground up to handle Exascale I/O
o providing a potential to use resources in the computational cluster as part of the storage system
• Provides a roadmap of technologies supporting data access for both Exascale/Exabyte and High Performance Data Analytics (HPDA) requirements:
o Targeting scalability to 500-1000PBytes, with bandwidth in the order of 60TB/sec with a storage system energy footprint of less than approximately 5KW/petabyte;
o With flexible and efficient usage of HPDA application environments regardless of the compute node’s architecture and implementation.
• Investigates and documents the requirements of relevant HPC applications and their storage use cases as part of a co-design approach.
• Provides programming models and access methods for the SAGE architecture and validates their usability, including (but not limited to) legacy applications and ‘Big-Data’ data access and analysis methods.
• Validates the the full system in a relevant environment, for a relevant set of applications and benchmarks on a SAGE prototype integrated into an HPC data centre, validating performance, scalability, energy efficiency and the reduction in data transport requirements.
Once accomplished, these objectives will firmly establish European excellence in the areas of Exascale storage, data centric computing, HPDA, and the emerging field of Big Data Extreme Computing (BDEC), and significantly impact computational scientific research.