A Rigorous Approach to Consistency in Cloud Databases

Modern Internet services store data in novel cloud databases, which partition and replicate the data across a large number of machines and a wide geographical span. To achieve high availability and scalability, cloud databases need to maximise the parallelism of data processing. Unfortunately, this leads them to weaken the guarantees they provide about data consistency to applications. The resulting programming models are very challenging to use correctly, and we currently do not have advanced methods and tools that would help programmers in this task.

The goal of the project is to develop synergy of novel reasoning methods, static analysis tools and database implementation techniques that maximally exploit parallelism inside cloud databases, while enabling application programmers to ensure correctness. We intend to achieve this by first developing methods for reasoning formally about how weakening the consistency guarantees provided by cloud databases affects application correctness and the parallelism allowed inside the databases. This will build on techniques from the areas of programming languages and software verification. The resulting theory will then serve as a basis for practical implementation techniques and tools that harness database parallelism, but only to the extent such that its side effects do not compromise application correctness.

The proposed project is high-risk, because it aims not only to develop a rigorous theory of consistency in cloud databases, but also to apply it to practical systems design. The project is also high-gain, since it will push the envelope in availability, scalability and cost-effectiveness of cloud databases.

We have investigated issues of consistency in cloud databases from several angles. First, we proposed specifications of widely used consistency models. This includes snapshot isolation, consistency models for collaborative editing, and models that guarantee that clients eventually agree on a global sequence of operations, while seeing a subsequence of this final sequence at any given point of time. We have also investigated consistency issues in the context of transactional memory.

Second, we have investigated methods for reasoning about implementations of consistency models and programs using them. We have proposed a novel proof method for proving Paxos-like algorithms for consensus and methods for systematically obtaining robustness criteria for applications using weak consistency models, i.e. ensuring that despite using a weakly consistent database, these applications do not expose any non-strongly-consistent behaviors to their users.

Third, we proposed abstractions that help in simplifying the development of protocols for maintaining data consistency. One such abstraction encapsulates the functionality of transaction commit protocols: this generalizes the classical Atomic Commit Problem (ACP) to a multi-shot formulation that more faithfully describes the requirements of modern systems. Another class of abstractions help maintain liveness of protocols for data consistency even under aggressive fault modes.

Finally, we have developed novel implementations of consistency protocols. This included latency-efficient protocols for leaderless consensus protocols for maintaining strong consistency, protocols that allow programmers to mix strong and weak consistency, and protocols that exploit Remote Direct Memory Access (RDMA).

N/A

Periodic Reporting for period 4 - RACCOON (A Rigorous Approach to Consistency in Cloud Databases)

Share this page

Download