Objectif
The objective of FTMPS is to develop techniques and system software capable of accommodating component failures in massively parallel computers in order to permit extremely long executions of application code, where a real-time response is not required.
A transputer-based system, featuring redundant processor nodes, a fault-tolerant communications network architecture and an independent network of control processors provides the environment for the work.
The project examines:
concurrent failure detection on a node and system basis
checkpointing and restart of applications
post-failure recovery behaviour
quantitative failure modelling.
Champ scientifique
Thème(s)
Data not availableAppel à propositions
Data not availableRégime de financement
Data not availableCoordinateur
52072 Aachen
Allemagne