Provably Correct Networks

Periodic Reporting for period 4 - CORNET (Provably Correct Networks)

Periodo di rendicontazione: 2022-07-01 al 2023-06-30

Networks are the backbone of our society, but configuring them is error-prone and tedious: misconfigured networks result in headline grabbing network outages that affect countless users and result in millions in lost revenues while security breaches endanger our safety. While these problems affect today’s fixed-function networks, a new trend towards making networks more programmable will exacerbate the problem.

Programmability is great for enabling novel applications, but it will make networks more fragile and create the potential for future disruptions and security breaches. If we extrapolate from the robustness problems in today’s networks (where router failures grounded airplanes for a few days for Delta airline in 2016), it becomes clear that programmable networks will breed more incidents resulting in potentially severe disruptions to society.

The CORNET project aims to make programmable networks more robust by building verification software which can catch potential problems before the network is changed either via software updates or configuration changes. The end goal is to provide provable correctness guarantees to networks, ensuring operators that the network behaves as expected all the time.

A special focus of CORNET is on datacenter networks, which are the high performance networks that power cloud computing. In this context, a new problem emerges: correctness verification is insufficient to capture performance impairments which reduce application performance and can result in outages. Network performance impairments includes issues such as (1) equipment misconfiguration (e.g. buffer thresholds and ECN marking thresholds), (2) gray failures such as lossy links, together with (3) network traffic synchronization (incast) and (4) load-balancing flow collisions are common in datacenters.

To achieve provably correct networks without requiring verification expertise from network operators, CORNET takes a novel approach to network verification. It runs the control plane as given, but intercepts the outputs (heading to the data plane). It verifies at runtime these control plane outputs, checking that the resulting network (data plane plus control plane forwarding rules) behaves according to the specification. When an error is detected, this is flagged to the network operators and optionally prevented from being inserted in the network (when this is possible). An overview of this approach was published in the Hotnets workshop in 2017 (Integrating Verification and Repair into the Control Plane – Raiciu et al.).

The cornerstone of this approach is the ability to quickly check whether a given dataplane behaves correctly as per its specification. In the first part of CORNET, much of our work has focused on verifying programmable dataplanes specified in the P4 language; hardware targets capable of running P4 programs emerged roughly at the time when our project was starting, cementing the way for programmable dataplanes. The CORNET project is leading the way in verifying programmable dataplanes, having published two papers in Sigcomm 2018 and 2020.

Vera was our first project targetting P4 verification, and one of two papers that first tackled this problem; the other project is p4v, from Cornell and Barefoot networks (both papers were published in Sigcomm 2018). Vera did verify quickly (under one second) that P4 programs behaved according to spec for small and medium-sized programs, but took much longer to test complex programs (hours). This meant it can be used at program development to find out possible bugs, but not at runtime to make sure the network is bug free, as we envisaged.
bf4 (bug-free P4 programs) is our second project on this topic (Sigcomm 2020). It uses verification techniques to generate simple conditions which must be obeyed at runtime by the controller, and these can be enforced at runtime very quickly (milliseconds). bf4 ensures that a running P4 program does not contain any of list of possible bugs arising from cases when the P4 program behaves in an undefined way. bf4 is reasonably also fast at development time, checking complex programs and generating the control-plane conditions in a few minutes.

Another focus of CORNET has been on helping operators specify correctness requirements. Formal languages are normally used to this effect, but these are cumbersome to use for most network administrators. Instead, we devised a simpler way to describe the desired properties: by equivalence to a simpler network: for instance, an operator can specify that the real network must behave in the same way as an idealized network which is much easier to understand. netdiff is the tool we have built that automatically checks the equivalence of two network data planes given as input; if the two data planes are not equivalent, it provides to the users an example of behaviour that is different across the two programs. Using netdiff we were able to find bugs in a widely used network virtualization software (Openstack Neutron).

The work above applies to the Internet at large and was performed in the first three years of CORNET. In the latter phase of the project, our focus moved to datacenter networks which is where network impairments created most issues operationally. Our work was two-fold: first, we designed, implemented and tested EQDS (NSDI22), a new datacenter protocol that aims to offer high and predictable network performance by tunneling existing legacy protocols (TCP and RoCEv2). EQDS can offer near-optimal performance assuming the network does not have gray failures or performance-related misconfigurations; the second part of this work is monitoring the achieved performance to detect network impairments automatically.

To make programmable networks more robust, CORNET has made advances in a series of fields at the intersection of networking, formal methods, programming languages and security.

netdiff – our dataplane equivalence checking tool – addresses a fundamental open problem in computing, that is known to be undecidable, which means no program can be written for it that always decides correctly and in finite time whether two programs are equivalent. netdiff achieves just this, but it does so by using domain-specific limitations: network dataplane are simpler than general programs as they don’t have loop instructions. The difficulty of checking equivalence still remains, though, and the key advance netdiff brings is handling packet duplication, a core feature in networks, which was not handled in prior work.

Both Vera and bf4 significantly advance the domain of knowledge in P4 verification. Vera is one of the two established verification tools for P4 that are now used by other researchers (the code is open source). bf4 is the first tool in this domain which does not require humans to provide annotations for the control plane, being fully automated. To achieve this goal, bf4 uses formal methods techniques to automatically generate annotations instead, as well as domain-specific optimizations that make it both precise and fast at the same time.

EQDS pioneers a new approach in ensuring high performance, and enables automated performance verification as a side effect. Testament to the potential of this approach is the acquisition of Correct Networks (a spinoff on EQDS) by Broadcom Inc in 2022. Now part of Broadcom, EQDS is in the process of being standardized at the newly formed Ultra Ethernet Consortium.

cornet-poster.jpg

Periodic Reporting for period 4 - CORNET (Provably Correct Networks)

Condividi questa pagina

Scarica