Periodic Reporting for period 2 - PROCESS (PROviding Computing solutions for ExaScale ChallengeS)
Reporting period: 2019-05-01 to 2020-10-31
The goal of the project is to deliver a comprehensive, mature and modular service-based set of solutions and tools, specially developed to enable extreme scale data processing in both, scientific research and advanced industry settings. All software solutions developed within the project are available to the community as open source packages.
The final results of PROCESS are a set of services focused on extreme data for exascale systems, driven by the requirements of five representative pilot use cases. For each of them, a service prototype was assembled to demonstrate the usefulness of the PROCESS solutions in real-world settings.
PROCESS is focused on helping key players in the new data-driven ecosystem, such as top-level HPC, e-infrastructures and big data centres on the one hand, and scientific communities and companies with mission critical extreme scale data challenges on the other hand, thereby enabling the uptake of these powerful systems for addressing grand challenges with huge societal impact.
The results of PROCESS can be defined as an ecosystem of solutions which already supports the transition to the exascale era, while introducing only negligible overhead. This implies a number of advantages that confirm the positive assessment of the project in terms of impact. All PROCESS software is released together with manuals and instructions under a permissive open source license and available online, including orchestration files to deploy a standalone instance of a subset of the PROCESS services (mini-process) for testing and demonstration purposes as well as a starting point for any further development
The tools and services have been developed on the basis of requirements of five challenging use cases. Each of those used a subset of the PROCESS’s modular ecosystem adapted to their needs to validate and demonstrate the overall architecture. To ensure broad support for other use cases in the future, the dissemination was further focused on outreach to other communities within scientific and business events.
Due to the modular approach of PROCESS, each service can be used on its own or combined with different modules and services. The PROCESS data service is the most important service group developed within the project towards extreme data. It was therefore important, to constantly revise and evaluate the PROCESS architecture and its implication on the data service and leverage new insights during each step to further improve the development.
PROCESS has put big emphasis on the reusability and compatibility of each released service. The alpha release therefore already contained a containerised micro-infrastructure, ready to be used standalone or with other services. This micro-infrastructure has been further developed and modified accordingly to fulfil all requirements for each use case.
While it is very important for the sustainability of the project’s results to ensure the best quality, it is also important to disseminate these results to new communities, that may benefit from the PROCESS ecosystem. The consortium has therefore put great effort with 84 dissemination activities and 18 publications.
Another focus of the project was the evaluation and validation, to ensure the correctness and best possible performance of the developed solutions. The validation approach demonstrated the capabilities of the ecosystem to scale towards exascale deployments. Starting with the release of the initial performance model of PROCESS, it has been further refined and improved upon with each subsequent release. It provides solid evidence for the PROCESS ecosystem, to enhance the usability of large-scale systems and to enable application to scale up and improve their outcome.
The invocation protocols used by services today are not suitable for transferring significant volumes of data as they mix the invocation and actual data transfer. New data delivery models need to be researched where the invocation protocol is separated from data movement with the aim to reduce the execution time of workflows, especially in the case of streaming applications. The problem will become more challenging assuming data is distributed across RIs, loosely coupled, and stored in a variety of storage resources ranging from a simple file system to heterogeneous cloud storage.
PROCESS outputs areshowcased through five use cases: exascale learning on medical image data, the LOFAR e-infrastructure, Airline revenues management and validating long-term agricultural modeling and simulation.
PROCESS positive impacts are based on three principles: Leapfrog beyond the current state-of-the-art, ensuring broad research and innovation impact and Supporting the long tail of science and broader innovation. In practical terms, PROCESS outputs allow more intuitive and easier to use exascale data services for broader communities, fostering wider uptake and seeking to expand European e-infrastructure user bases, to secure stronger impact and sustainability.
The use cases enhanced not only the developed services, but contributed to their community and the overall challenges:
The diagnostic support tool developed in the medical use case has led to advantages in manipulation and interpretability of medical imaging, with better diagnostic models thanks to the evaluation and interpretation of results from different depth learning techniques.
The PROCESS project has provided a solution for executing containerised workflows. Furthermore, the required large data transfers are fully automated, allowing the LOFAR astronomers to focus on the data processing and the results. PROCESS has added the LOFAR easy-to-use web portal for selecting observation data sets and processing pipelines. By integrating it, the user-friendliness and portability of such data processing pipelines has been significantly improved. The developments of the use case specific code were also useful for the community and will probably serve as a basis for follow-up projects.
The revenue simulation of the airline use case clearly shows that the developed algorithms promise a revenue increase. The results achieved have the potential to be translated into production for more precise offers regarding a greater value for customers and more income for the company.
The set of SME-oriented solutions allows for easier agricultural analysis based on Copernicus data sets while protecting the assets of SMEs.