European Commission logo
English English
CORDIS - EU research results
CORDIS

Event category

Event
Content archived on 2022-07-06

Article available in the following languages:

EN

Webinar: FTI : Using the state-of-the-art multi-level checkpointing library

This webinar will focus on how to guarantee high reliability to high-performance applications running in large infrastructures. In particular, they will cover all the technical content necessary to implement scalable multilevel checkpointing for tightly coupled applications. This will include an overview of the internals of the FTI library, and explain how multilevel checkpointing is implemented today, together with examples that the audience can test and analyze on their own laptops.

Energy icon Energy
1 April 2020 - 1 April 2020
Barcelona, Spain
© EU
Date: April 1st, 2020, 11 AM

Large scale infrastructures for distributed and parallel computing offer thousands of computing nodes to their users to satisfy their computing needs. As the need for massively parallel computing increases in industry and development, cloud infrastructures and computing centers are being forced to increase in size and to transition to new computing technologies.

While the advantage for the users is clear, such evolution imposes significant challenges, such as energy consumption and fault tolerance. Fault tolerance is even more critical in infrastructures built on commodity hardware. Recent works have shown that large scale machines built with commodity hardware experience more failures than previously thought.

In this webinar, Leonardo Bautista Gomez and Kai Keller, respectively Senior Researcher and Software Engineer at the Barcelona Supercomputing Center, will focus on how to guarantee high reliability to high-performance applications running in large infrastructures. In particular, they will cover all the technical content necessary to implement scalable multilevel checkpointing for tightly coupled applications. This will include an overview of the internals of the FTI library, and explain how multilevel checkpointing is implemented today, together with examples that the audience can test and analyze on their own laptops, so that they learn how to use FTI in practice, and ultimately transfer that knowledge to their production systems.

Speakers: Leonardo Bautista Gomez, Senior researcher at BSC, and Kai Keller, Software engineer at BSC

Registration url: https://attendee.gotowebinar.com/register/2628884328056255755

Keywords

energy, hpc, highperformance, fti, applications, eocoe