Final Report Summary - DEPENDABLECLOUD (Towards the dependable cloud:Building the foundations for tomorrow's dependable cloud computing.)
The project had initially proposed two main research and technological advances, centered around addressing the following two main research problems.
The first was to find principled ways to handle arbitrary, non-crash faulty behavior, while also not being concerned with adversarial faults, in order to overcome the overly pessimistic assumptions of the Byzantine fault model.
The central achievement of this first vector was defining a new system model and fault model called the Visigoth model (with an initial publication at EuroSys 2015), which encompasses this goal. This has the potential to start a new line of consensus-based systems, better suited to data center and cloud computing environments. Furthermore, it has already sparked some follow-up research activity.
The second research objective was to find methods for allowing replicated systems, such as those underlying today's cloud services, to have fast operations in the common case, by allowing operations to proceed optimistically at a single replica, while also resorting to a more expensive coordination whenever necessary for avoiding the undesired effects of the lack of that coordination.
The initial and pivotal achievement here consisted of the definition of a new consistency model, called RedBlue consistency, and an associated methodology that enables building replicated systems that are fast whenever possible by resorting to low-coordination Blue operations whenever this is deemed safe, and only requiring strongly consistent Red operations whenever that is necessary for avoiding violating application invariants. Several results of this project then improve on this result in several ways, such as making the process of selecting between Red and Blue operations automatic using program analysis techniques, or making the consistency model more fine-grained by only synchronizing operations pairwise. Again, this not only has the potential to change the way that replicated and cloud-based systems are programmed, but it also has already sparked a large follow-up research activity from other groups, to further grow the scientific building around this topic.
The first was to find principled ways to handle arbitrary, non-crash faulty behavior, while also not being concerned with adversarial faults, in order to overcome the overly pessimistic assumptions of the Byzantine fault model.
The central achievement of this first vector was defining a new system model and fault model called the Visigoth model (with an initial publication at EuroSys 2015), which encompasses this goal. This has the potential to start a new line of consensus-based systems, better suited to data center and cloud computing environments. Furthermore, it has already sparked some follow-up research activity.
The second research objective was to find methods for allowing replicated systems, such as those underlying today's cloud services, to have fast operations in the common case, by allowing operations to proceed optimistically at a single replica, while also resorting to a more expensive coordination whenever necessary for avoiding the undesired effects of the lack of that coordination.
The initial and pivotal achievement here consisted of the definition of a new consistency model, called RedBlue consistency, and an associated methodology that enables building replicated systems that are fast whenever possible by resorting to low-coordination Blue operations whenever this is deemed safe, and only requiring strongly consistent Red operations whenever that is necessary for avoiding violating application invariants. Several results of this project then improve on this result in several ways, such as making the process of selecting between Red and Blue operations automatic using program analysis techniques, or making the consistency model more fine-grained by only synchronizing operations pairwise. Again, this not only has the potential to change the way that replicated and cloud-based systems are programmed, but it also has already sparked a large follow-up research activity from other groups, to further grow the scientific building around this topic.