Skip to main content

Verified Exascale Computing for Multiscale Applications

Periodic Reporting for period 1 - VECMA (Verified Exascale Computing for Multiscale Applications)

Reporting period: 2018-06-15 to 2019-12-14

VECMA uses computer simulations to predict weather and climate change, model refugees, understand materials, develop nuclear fusion, and inform medical decisions. But if we are to use simulations in order to affect real world problems then those simulations need to be reliable and trustworthy. In more technical terms, they need to be validated, verified, and their uncertainty needs to be quantified, so that they successfully model real life applications and be dependable decision-making tools.

VECMA is developing a software toolkit, called VECMAtk, to enable automated validation, verification, and uncertainty quantification (VVUQ) of computer simulations. While these tools are currently deployed on a number of in-house applications, they are applicable more widely and independent of scientific domain. More broadly, VECMA aims to create a unified European VVUQ package that computer simulations can be benchmarked against. To that end, VECMAtk has been made open-source and widely available in European high-performance computing (HPC) centres. Regular version updates have been, and shall continue to be released over the lifetime of this project (June 2021) by which time the goal is to create a legacy that will sustain itself beyond the end of the project.
The core activity of the project has been the design and development of new algorithms performing multiscale verification, validation and uncertainty quantification (VVUQ) across time and space scales and the capture of these algorithms in VV and UQ primitives. During the first half of the project, a family of semi-intrusive UQ algorithms has been developed, non-intrusive at the single-scale level yet intrusive at the multiscale level, in terms of the coupling of the single-scale components.

The core algorithms have been embedded in a software suite, VECMAtk, which enables automated execution of VVUQ, deployable on exascale platforms. We have achieved a series of releases of VECMAtk leading up to the M18 release, which is made up of the following tools:
FabSim3: An automation toolkit for complex simulation tasks. FabSim3 helps users to perform complex remote tasks from a local command-line, and to automatically organise their data and environment variables when they perform these tasks.
EasyVVUQ: A Python library designed to facilitate VVUQ for a wide variety of simulations.
QCG Pilot Job: a lightweight implementation of the Pilot Job mechanism. It can be easily in- corporated into scientific workflows to provide efficient and reliable execution of large number of computational jobs.
QCG-Now: a portable desktop program that allows the preparation and running of computational jobs on HPC machines.
QCG-Client: a command-line interface to the QCG middleware. QCG-Client provides support for a variety of computing jobs, from simple ones to complex distributed workflows.
EasyVVUQ-QCGPilotJob: a lightweight integration code that simplifies usage of EasyVVUQ with a QCG Pilot Job execution engine.
MUSCLE3: The third incarnation of the Multiscale Coupling Library and Environment. Its purpose is to make it easy to create coupled multiscale simulations, and to enable efficient Uncertainty Quantification of such models using advanced semi-intrusive algorithms.

The attached figure shows how the different tools are combined when using each of the four application tutorials. VECMAtk components are given in boxes, while the application tutorials are indicated with coloured lines. Note that EasyVVUQ can run either on a local desktop, for ease of use, or on a remote HPC resource, for improved performance.

All aforementioned work has further guided the development of a number of applications including climate, fusion, migration and materials. The applications team of VECMA have focused along three main lines: 1. Establishing application software readiness. 2. Fast-track implementation of (non-intrusive) VVUQ into applications. 3. Deep-track implementation of (intrusive) VVUQ into applications. The first two are already completed, while the third is ongoing, according to schedule.

Lastly, we have made considerable progress in designing and developing the project’s technical architecture / infrastructure and building the testbed which has integrated SuperMUC-NG, a TIER-0 supercomputer of the Leibniz Supercomputing Centre (LRZ), Germany. The testbed offers the hardware and software systems used for the project to enable and facilitate execution as well as VVUQ of multiscale applications. Work on the infrastructure side of the project has resulted in the first prototype of the VECMA computing environment.
VECMA methods have been used in a number of cases that have advanced the state of the art in computational science. For example, accurate assessments of the uncertainties in protein-ligand free-energy calculations have been made possible through the use of VECMAtk. In the application domain of migration, considerable developments have been facilitated by VECMA in understanding better refugee dynamics and relating that to food security. These are just two examples of the tangible impact of VECMA in science and society. In terms of more theoretical progress beyond the state of the art, one example is the project’s contribution within computer science to a method for queue wait-time prediction in supercomputing clusters. The method was designed for use as part of a multi-criteria brokering mechanisms for resource selection in a multi-site HPC environment.

Having established the main UQPs, we are now looking into developing performance models and testing those, as well as in developing more advanced UQPs. Moving into the next phase of VECMA also means that we will start to investigate formal methods for validation and verification of multiscale models. In terms of software implementation, we are currently working on the deep-track implementation of UQPs and VVPs, on optimizing the scalability of VECMAtk, and on assembling combined VVUQ procedures and automation. This is in synchronization with the applications development side of the project, where we are currently targeting the deep-track VVUQ techniques and implementation. This concerns mainly the semi-intrusive algorithms and the definition of a toolkit for surrogate models.

We anticipate this project to have tangible technical as well as societal impact, owing to its versatility and wide applicability. Upcoming exascale systems offer tremendous opportunities for computational science, however, important algorithmic and technological challenges remain. It is these challenges that VECMA promises to overcome in order to be able to fully exploit these emerging opportunities and enable a paradigm shift to exascale computing. Through realisation of automated UQ and accelerated V&V by application developers worldwide, as well as through influencing next generation compute architectures, there will be improved fidelity of simulations irrespective of the application domain, leading to industrial and societal impact. Systematic dissemination, outreach and training associated with the releases of VECMAtk, will create impact by raising awareness on the case for high-fidelity exascale computing in multiple sectors. Our innovation management plan was produced to provide a framework to capture innovation activities in the project and promote interdisciplinary entrepreneurial opportunities within the research activities, from invention through to exploitation.
How the VECMAtk tools are combined when using each of the four application tutorials
VECMA logo