Skip to main content

A Rigorous Approach to Consistency in Cloud Databases

Periodic Reporting for period 3 - RACCOON (A Rigorous Approach to Consistency in Cloud Databases)

Reporting period: 2020-01-01 to 2021-06-30

Modern Internet services store data in novel cloud databases, which partition and replicate the data across a large number of machines and a wide geographical span. To achieve high availability and scalability, cloud databases need to maximise the parallelism of data processing. Unfortunately, this leads them to weaken the guarantees they provide about data consistency to applications. The resulting programming models are very challenging to use correctly, and we currently do not have advanced methods and tools that would help programmers in this task.

The goal of the project is to develop synergy of novel reasoning methods, static analysis tools and database implementation techniques that maximally exploit parallelism inside cloud databases, while enabling application programmers to ensure correctness. We intend to achieve this by first developing methods for reasoning formally about how weakening the consistency guarantees provided by cloud databases affects application correctness and the parallelism allowed inside the databases. This will build on techniques from the areas of programming languages and software verification. The resulting theory will then serve as a basis for practical implementation techniques and tools that harness database parallelism, but only to the extent such that its side effects do not compromise application correctness.

The proposed project is high-risk, because it aims not only to develop a rigorous theory of consistency in cloud databases, but also to apply it to practical systems design. The project is also high-gain, since it will push the envelope in availability, scalability and cost-effectiveness of cloud databases.
We have investigated issues of consistency in cloud databases from several angles. First, we investigated specifications of consistency models, in particular snapshot isolation and consistency models that guarantee that clients eventually agree on a global sequence of operations, while seeing a subsequence of this final sequence at any given point of time. We have also investigated consistency issues in the context of transactional memory.

Second, we have investigated methods for reasoning about implementations of consistency models and programs using them. We have proposed a novel proof method for proving Paxos-like algorithms for consensus and methods for systematically obtaining robustness criteria for applications using weak consistency models, i.e. ensuring that despite using a weakly consistent database, these applications do not expose any non-strongly-consistent behaviours to their users.

Finally, we have investigated novel implementations of consistency protocols. We have focused on transaction processing systems that combine sharding and replication, and proposed several efficient protocols for such systems. First, we have generalised the classical Atomic Commit Problem (ACP) to a multi-shot formulation that more faithfully describes the requirements of modern systems and proposed a latency-optimal algorithm for solving it. We also proposed an algorithm for solving this problem in systems that reduce the replication factor by using an external reconfiguration service; this work also highlighted the impact on consistency protocols of using techniques such as Remote Direct Memory Access (RDMA). An alternative way of processing transactions consistently is using atomic multicast. We have proposed and implemented an algorithm for atomic multicast that achieves lower latency and higher throughput than state-of-the-art protocols.
As detailed in the "work performed" section, we have obtained a number of results related to data consistency in cloud databases, including specification of consistency models, reasoning about correctness of applications using them and novel implementations of consistency models. In the rest of the project, we plan to further investigate data consistency issues in cloud databases as foreseen by the DoA.