Periodic Reporting for period 2 - ASPIDE (ASPIDE: exAScale ProgramIng models for extreme Data procEssing)
Reporting period: 2019-12-15 to 2021-06-14
The ASPIDE project aims to provide programming models to assist developers in building data-intensive applications for Exascale systems, while ensuring compliance with requested data management and performance.
• O1. Design and develop of a new Exascale programming models for extreme data applications.
• O2. Build new tools for monitoring extreme data analytics algorithms and applications.
• O3. Adapt the data management techniques to the extreme scale applications.
• O4. Validate the concepts and tools usefulness through extreme data applications.
The project’s main social indirect benefit will be generated by the development of a energy-aware support system for extreme data processing that will allow to reduce the energy consumed by the supercomputing centers (benefits to health by less pollution, global warming avoidance by less CO2, public cost reductions). The project is expected also to provide user-friendly APIs and tools for extreme data application development in the supercomputing field. This will provide an excellent opportunity for extending its usage to communities that are now constrained by complexity, thus allowing to solve new challenges and probably reducing unemployment in the application sectors.
The project activities include experimentation with human brain is the less known organ of the human body. Brain morphology is in continuous change across life span and its biological and functional implications remain unclear. After decades of developments in Magnetic Resonance Imaging (MRI), neuroimaging has become a key technique for the evaluation and quantification, in vivo, of brain maturation changes related to neuro development, aging, learning or disease. Moreover, the project activities includes a urban computing application development that will be aligned with the efforts of evolving towards smart cities and will provide a demonstrator of how urban data can be exploited for social good.
We provided a model and tool for efficient monitoring data gathering and selection of aggregation points, a novel tool for data pre-processing, data filtering and dimensionality reduction, and an engine for data analysis for events detection in monitoring data. We have performed an in-depth evaluation of the tools and models and discussed their performance and behaviour. The above mentioned tools, have been developed and provided as independent modules.
We have developed I/O software amenable for bridging the HPC and HPDA. A containers paradigm has been built to express any storage in the form of a list of blocks that could be local or distributed depending on the scheduler allocation of jobs. This containers are provided to the applications as part of a common middleware (AIDE) that can be used for efficiently using a computing infrastructure
At the moment, most monitoring tools on HPC systems concerns direct measures of performance metrics. In a more realistic world, multiple antagonistic objectives are requested to be respected. Indeed, in some cases improving performance or energy is a difficult choice as improving one has a negative impact on the other. In order to reach an efficient state, any runtime aiming at optimizing energy and performance needs lots of information on the system state and needs the ability to predict the impact of its decisions. Several tools exist already to evaluate metrics of servers or applications without actual hardware, only using resource consumption information at the system level. Current solutions are either imprecise or intensive resource consumers. ASPIDE will also design and implement adapted prediction tools.
Traditionally, I/O is an activity that is performed before or after the main simulation or analysis computation, or periodically for activities such as checkpointing, resulting in a certain overhead. The resiliency of an application in an Exascale system depends significantly on the I/O system because saving the state of the application in the form of checkpoint/restart remain an essential component of the global system resilience. We consider that new memory and storage hierarchies can drastically impact performance and resilience, and will seek the possibility of including substantial non-volatile memory within nodes, interconnects and networks or within I/O storage nodes. Seamless data migration between the multiple tiers of memory and storage is key to achieve the Exascale. However, traditional homogeneous I/O interfaces do not explicitly exploit the purpose of an I/O operation. In ASPIDE we focus specially on providing cross-layer data management and strategies for increase the performance of applications: in-memory storage, hierarchical locallity, and leveraging the data layout.