CORDIS - Forschungsergebnisse der EU
CORDIS

IO Software for Exascale Architecture

Periodic Reporting for period 1 - IO-SEA (IO Software for Exascale Architecture)

Berichtszeitraum: 2021-04-01 bis 2022-09-30

IO-SEA aims to provide a novel data management and storage platform for exascale computing based on hierarchical
storage management (HSM) and on-demand provisioning of storage services. The platform will efficiently make use of
storage tiers spanning NVMe and NVRAM at the top all the way down to tape-based technologies. System requirements are
driven by data intensive use-cases, in a very strict co-design approach. The concept of ephemeral data nodes and data
accessors is introduced that allow users to flexibly operate the system, using various well-known data access paradigms,
such as POSIX namespaces, S3/Swift Interfaces, MPI-IO and other data formats and protocols. These ephemeral resources
eliminate the problem of treating storage resources as static and unchanging system components – which is not a tenable
proposition for data intensive exascale environments. The methods and techniques are applicable to exascale class data
intensive applications and workflows that need to be deployed in highly heterogeneous computing environments.
Critical aspects of intelligent data placement are considered for extreme volumes of data. This ensures that the right
resources among the storage tiers are used and accessed by data nodes as close as possible to compute nodes –
optimising performance, cost, and energy at extreme scale. Advanced IO instrumentation and monitoring features will be
developed to that effect leveraging the latest advancements in AI and machine learning to systematically analyse the
telemetry records to make smart decisions on data placement. These ideas coupled with in-storage-computation remove
unnecessary data movements within the system.

Problem/issue being addressed:
IO-SEA provides a novel software stack since the currently used paradigms may not scale up to the exascale. Strongly relying on object store and HSM features,
it fundamentally tries to provide solution to the following forthcoming issues:
- Data Scalability : managing an always increasing amount of data and metadata
- System Scalability: manage even bigger client systems
- Data Heterogeneity: manage the wide variety of types of data and provide well adapted solutions for each
- Data Placement: as different kinds of data and different kinds of storage media exist, it is important to place the right piece of data at the right place

Why is it important for society?
Exascale is critical to society. It will bring the compute power to simulate and anticipate the coming challenges
that our society will face. Global Warming is a well known example. In order to work correctly, exascale require efficient storage: this is Io-SEA's objective.

What are the overall objectives?
Provide an efficzent I/O software stack, efficient and flexible for the future exascale systems.
Work done form M1 to M18

The IO-SEA project is on tracks. The whole design of the solution is now established, this is a concrete result of an active collaboration between the different work packages. This includes all the technical WP, including the WP1 which embeds representatives of the user community. This means that this design is the result of the requirement from the users addressed by system designers.

Interfaces between the users and the IO-SEA stack have been defined. The way data will be organized and exposed to the user, via the ephemeral services, is clear. A few ephemeral services are now implemented and ready to use : in particular, we have a NFS server with full POSIX compliance and a burst buffer tool. Those ephemeral service can be associated with a compute job via an IO framework that will allocate data nodes and start the ephemeral services. The HSM feature in the involved object store, Phobos and MOTR, is managed by HESTIA, a new API dedicated to that purpose. The robinhood tool has been updated to help the HSM feature place data at the right location. regarding the user interface, a first version of DASI, a new flexible and advanced middleware library, has been provided. The whole system is under the supervision of an integrated supervision system designed fopr that purpose. This system has started to collect data, this data will be used in the second half of the project to be used by a "recommandation system", based on AI technologies, to help in making systems optimizations and better data placement.

The basic building blocks are ready now, we are starting the complete integration. Early steps of the integration are done and are quite promising.

The team members have scientifically contributed to publications and events at major HPC events, including BoF at SC21, ISC22 and SC22.
IO-SEA now provides a few items far beyond the state of the art.
The ephemeral service feature has no equivalent at the moment. Used with burst buffers, it extends this feature and makes it more flexible.
Data placement using AI is about to start within IO-SEA, if we consider the I/O domain as a whole, this kind of technology, at the crossroad of several domains, may be a real game changer.

By the end of the project,; we expect to provide a fully integrated IO software stack to be used on any exascale system. Deploying IO-SEA on the EUPEX pilot is one of this objective, it is part of the EUPEX project.
As the integration of the IO-SEA stack will be performed at FZJ, on the DEEP test cluster, we'll have a concrete implementation and usage of IO-SEA on this platform.

Potential impacts: the IO-SEA impacts overlap the impacts of the exascale in general. Bigger supercomputer means larger problems to be solved. This includes many of the issues that our society will face in the near future. Since economical domains are impacted (finance, logistics, industry process optimization) others have connections to the life of the man on the street (medicine and ecology for example).