Skip to main content

Efficient Formally Secure Compilers to a Tagged Architecture

Periodic Reporting for period 5 - SECOMP (Efficient Formally Secure Compilers to a Tagged Architecture)

Reporting period: 2021-07-01 to 2021-12-31

Severe low-level vulnerabilities abound in today's computer systems, allowing cyber-attackers to remotely gain full control. This happens in big part because our programming languages, compilation chains, and hardware architectures were designed in an era of scarce hardware resources and too often trade off security for efficiency. Mainstream programming languages like C are currently insecure, since any undefined behavior (e.g. a buffer overflow) can compromise the security of the whole application. Secure compilation using the coarse-grained protection mechanisms provided by mainstream hardware architectures would be too inefficient and inconvenient for most practical scenarios.

This line of research is aimed at leveraging emerging hardware capabilities for fine-grained compartmentalization and memory safety to build the first efficient secure compilation chains for realistic programming languages, in particular for C. Such compilation chains enforce that compartmentalized applications are compiled securely, so every component in C is protected from other C components compromised by undefined behavior. To achieve such security without sacrificing efficiency, we compile to a tagged architecture, which associates a metadata tag to each word and efficiently propagates and checks tags according to software-defined rules.

At the conclusion of the project, we have achieved and in some ways exceeded our most important objectives. We have built a secure compilation chain for C components based on the CompCert formally verified C compiler, as well as several smaller prototype compilation chains targeting a tagged architecture and software-fault isolation. We used a combination of machine-checked proofs and property-based testing in the Coq proof assistant to provide high confidence that our compilation chains achieve an unprecedented level of security.

To achieve this we went significantly beyond the state of the art, by overcoming two major conceptual challenges (detailed in a separate section below):
(1) formally defining what it means for a compilation chain to be secure, both in general and in our concrete setting of C compartmentalization;
(2) devising scalable machine-checked proof techniques for such secure compilation chains.
We have built a secure compilation chain for C components based on CompCert and several smaller prototype compilation chains targeting a tagged architecture and software-fault isolation (Challenge #1, WP1 and WP3; Challenge #3, WP4). As originally planned, we used a combination of machine-checked proofs and property-based testing in Coq to provide high confidence that our compilation chains are indeed secure (Challenge #6; WP1-3, 6, 7). Achieving this required going significantly beyond the state of the art, by overcoming major conceptual challenges (detailed in the following section).

In addition to solving these main conceptual challenges, our project had several other important results:

- In addition to compartmentalization, we also proposed novel formal characterizations for the end-to-end security guarantees provided by memory safety, both heap safety and stack safety (Challenge #2, WP2).

- We built a prototype achieving secure interoperability between subsets of F* and ML, and we formally proved in F* that it achieves a variant of secure compilation (Challenge #5; WP6 and WP7).

- We extended the F* verification system with Dijkstra monads, monadic reification, monotonic state, tactics, and relational reasoning. We also built an extraction mechanism from a subset of F* called Low* to C (WP7), which opens the way for further integration with our secure compilation chain for C.

- We studied the safe interoperability between code respecting a strong static typing discipline and dynamically typed code. In particular, we contributed to a better understanding of gradual typing, parametricity, and their combination (Challenge #5, WP1-3, 6 , 7).

- We introduced SSProve, a general verification framework for structuring game-based cryptographic proofs in a modular way, by exploiting the compositional nature of protocols such at TLS (WP8).

- We introduced Luck, a new domain-specific language for property-based generators, and integrated the main innovations into QuickChick, our property-based testing framework for Coq (Challenge #6; WP1-3, 6, 7).

Our achievements have resulted in many research papers at top-tier conferences in programming languages (POPL, ICFP, ESOP, OOPSLA, PLDI) and security (CCS, CSF), and five of these papers have received distinguished paper awards. Our software and machine-checked proofs are available on GitHub.
Our project resulted in the first formally secure compilation chains for compartmentalized applications. For practitioners this shows that building secure compilation chains with water-tight security guarantees is possible, and also that to achieve this one can use fine-grained protection mechanisms (such as our tagged architecture) in order to not have to give up efficiency or the familiar C programming model. For the formal methods community this shows that secure compilation definitions and proofs can be applied to more realistic settings, and provides novel techniques for doing so.

The main conceptual technical challenges we have overcome in this project was devising formal security definitions and scalable machine-checked proofs for such compilation chains. On the definition side, we were the first to formally express what it means to securely compile C components which are mutually distrustful and which can be dynamically compromised by undefined behavior. This has lead us to explore and discover a wide range of formal secure compilation criteria, which are stated in terms of preserving various property classes against adversarial contexts. These novel criteria allow for a fine-grained trade-off between security on the one side, and efficiency and proof difficulty on the other. These criteria already had a high impact on the formal security community.

Yet formally defining secure compilation in our setting was only the first step, since proving secure compilation with good confidence is notoriously challenging. In this project we introduced more scalable proof techniques, which we have mechanized in Coq, and which allow reusing a compiler correctness result in the style of CompCert, avoiding duplicated work. Moreover, our techniques support fine-grained memory sharing by passing secure pointers between components. This enables the most natural C programming style even in compartmentalized applications, while making good use of the fine-grained protection provided by tagged architectures for enforcing security.

Two more parts of the project lead to multiple breakthroughs. The first provided significant conceptual and practical improvements to formal verification by extending the state-of-the-art F* programming language with Dijkstra monads, monadic reification, monotonic state, tactics, and relational reasoning. Our Secure F*-ML interoperability work greatly benefited from these improvements. The second part of the project that significantly improved on the state of the art studied the safe interoperability between code respecting a strong static typing discipline and dynamically typed code. In particular, we contributed to a better understanding of gradual typing, parametricity, and their combination.
The formal criteria for secure compilation we discovered