Objective
The objective of FTMPS is to develop techniques and system software capable of accommodating component failures in massively parallel computers in order to permit extremely long executions of application code, where a real-time response is not required.
A transputer-based system, featuring redundant processor nodes, a fault-tolerant communications network architecture and an independent network of control processors provides the environment for the work.
The project examines:
concurrent failure detection on a node and system basis
checkpointing and restart of applications
post-failure recovery behaviour
quantitative failure modelling.
Topic(s)
Data not availableCall for proposal
Data not availableFunding Scheme
Data not availableCoordinator
52072 Aachen
Germany
Participants (5)
PR1 1RE Preston
91058 Erlangen
3000 Leuven
3000 Coimbra
33098 Paderborn