Skip to main content

A Practical Approach to Fault-Tolerant Massively Parallel Systems

Objective

The objective of FTMPS is to develop techniques and system software capable of accommodating component failures in massively parallel computers in order to permit extremely long executions of application code, where a real-time response is not required.

A transputer-based system, featuring redundant processor nodes, a fault-tolerant communications network architecture and an independent network of control processors provides the environment for the work.

The project examines:

concurrent failure detection on a node and system basis
checkpointing and restart of applications
post-failure recovery behaviour
quantitative failure modelling.

Coordinator

Parsytec Anwendungen GmbH
Address
Roermonder Straße 197
52072 Aachen
Germany

Participants (5)

British Aerospace Defence Ltd
United Kingdom
Address
Guild Centre Lords Walk
PR1 1RE Preston
Friedrich-Alexander-Universität Erlangen Nürnberg
Germany
Address
Martensstraße 3
91058 Erlangen
KATHOLIEKE UNIVERSITEIT LEUVEN
Belgium
Address
Tervuursevest, 101
3000 Leuven
UNIVERSIDADE DE COIMBRA
Portugal
Address
Largo Marques De Pombal
3000 Coimbra
UNIVERSITAET - GESAMTHOCHSCHULE PADERBORN
Germany
Address
Warburger Strasse 100
33098 Paderborn