Code Sanitization for Vulnerability Pruning and Exploitation Mitigation

Periodic Reporting for period 2 - CodeSan (Code Sanitization for Vulnerability Pruning and Exploitation Mitigation)

Période du rapport: 2021-09-01 au 2023-02-28

Despite enormous efforts to secure software, several hundred security bugs are reported publicly each month. Systems software is prone to low-level bugs caused by undefined behavior (e.g. memory corruption, type confusion, or API confusion). While bugs are annoying to users by themselves (as they may impact productivity), they become vulnerabilities if they are reachable through adversarially controlled inputs. Exploits abuse undefined behavior exposed through vulnerabilities to execute attacker-specified code and to leak information.

So far, adversaries are at an advantage as one or few vulnerabilities are sufficient to take control over a system. Orthogonally, the defender must mitigate all vulnerabilities in the code. It is comparatively much easier to find one or a few bugs, while finding all bugs is incredibly hard as it requires a complete analysis of the program and its state space. In this project, we address this asymmetry and shift the power balance towards the defender by leveraging their knowledge of the code and environment.

For CodeSan, we focus on three intertwined thrusts: policy-based sanitization, automatic test inference, and reflective mitigation at runtime. The first two tasks enable automatic bug discovery during development and advance the state of the art in software testing. The third task leverages insights from the first two tasks to adjust mitigation strategies against any remaining bugs, carefully leveraging the limited resource budget to defend against the most likely attack vectors.

At the core of the proposal, we embrace incompleteness. Compared to complete techniques such as formal verification, bounded model checking, or symbolic execution, we anticipate that we cannot completely explore a program's state space. Instead, we enable developers to find (and ultimately fix) as many bugs as possible while defending against any remaining unknown issues. This assumption allows us to scale to large software systems up to many millions of lines of code, such as the Linux Kernel or web browsers. Our approach is based on three thrusts: policy-based sanitization, automatic test inference, and reflective mitigations. We now discuss the three thrusts in more detail.

Policy-based sanitization focuses on detecting bugs when they are reached and triggered. Sanitizers instrument the target program. When the program is later executed, sanitizers will alert the developer if one of the sanitizer checks is reached and triggered; i.e. a failure condition is actively observed. Sanitizers, therefore, require the program under test to be executed concretely. The most commonly used sanitizer for existing testing projects is Address Sanitizer, which detects common buffer overflows and underflows along with some use-after-free accesses. We expand the status quo through type-based verification and by adapting sanitizers for alternate use cases such as embedded platforms or binary-only instrumentation. Our key insight is that adapting sanitizers to specific use cases allows more effective and efficient detection of failure conditions, making broader use of sanitization possible.

Automatic test inference explores the program state space and drives the analysis towards unexplored or underexplored areas. Automated fuzz testing (“fuzzing”) has become the de facto standard for finding bugs in target programs. The fuzzer automatically generates new inputs and executes the target with them. The fuzzer then distinguishes between inputs that improve coverage or trigger bugs, storing them for further processing. Our focus in this thrust is three-fold. First, we improve fuzzing strategies for well-tested programs, allowing the fuzzer to go deeper or broader. Second, we adapt fuzzing towards under-explored targets, allowing fuzzers to analyze embedded systems or operating system kernels. Third, we develop tools that empower developers to better analyze any newly discovered crashes. A key finding of our research in this thrust is that developer time must be carefully provisioned towards where it helps most, allowing the developer to customize the search based on their intuition.

Reflective mitigations leverage knowledge from the first two tasks to fine-tune mitigations to undertested areas. Existing mitigations already make exploitation that gain code execution much harder. Together, these mitigations result in around 5% of performance overhead. The combined resource budget for all mitigations remains at around 5%, and platform administrators are reluctant to enable mitigations with higher cost. Our research focuses on adding mitigations only where needed, allowing more precise checks where needed but, based on the intermediate data from the first two thrusts, reducing the number of checks for code that is considered safe.

As a cross-cutting thrust, we analyzed common weaknesses across different platforms and targets. This allows us to infer core properties and to establish baselines for the other thrusts. In particular, we have looked at the security of the Bluetooth protocol and its implementations (BreakMi at TCHES22; BlueTooth security in cars at WOOT22; BLUR at AsiaCCS22; and LIGHTBLUE at SEC21), the Android ecosystem (LibRARIAN at ICSE21), the GPU ecosystem (GPU Kernels at MICRO21), and measuring fuzzer metrics/semantics (MAGMA at SIGMETRICS21). In this thrust, we develop core metrics that enable research along our three key thrusts.

For the first thrust, policy-based sanitization, we have focused on making existing sanitizers more efficient (FuZZan at ATC20), implementing broader and more extensive policies (PacMem at CCS22), targeting sanitization for new ecosystems (ProFactory at SEC22) and enabling developers to better analyze any discovered crashes (Igor at CCS21 and Evocatio at CCS22). The key insight for this thrust is that customizing sanitization for test generation and specific environments enable better use of the constrained resources and allows developers to quickly discover bugs.

In the second thrust, we improve automatic test inference. We focus on different environments such as embedded systems (HALucinator at SEC20), the USB device to kernel interface (USBFuzz at SEC20), fuzzing the JavaScript-browser interface (Minerva at FSE22) and the WebGL interface (GLFuzz at SEC23), grammar-based input generation (GramaTron at ISSTA21), and the trusted execution environments on Android (TEEzz at SP23), along with efficient seed selection (SeedSel at ISSTA21). The key insight in this thrust is that new environments require different techniques and customization, allowing the exploration to improve over a naive/non-customized approach.

As part of the third thrust, reflective mitigations, we explore new customized mitigations against new threats. We adjust control-flow integrity to specific environments (ANCILE at CODASPY21), compartmentalize kernel modules (HAKCs at NDSS22) and stacks (DataGuard at NDSS22), and protect the kernel against double-fetch vulnerabilities (Midas at SEC22). The key insight in this thrust is that leveraging the knowledge of the concrete usage of the code enables custom-tailored mitigations that are much more effective at stopping vulnerability chains.

As we advance, we will continue research on all three major research thrusts with an extended focus on the third thrust (reflective mitigations) as results from the first two thrusts are more readily available.

Overview of the CodeSan tasks and their dependencies

Periodic Reporting for period 2 - CodeSan (Code Sanitization for Vulnerability Pruning and Exploitation Mitigation)

Partager cette page

Télécharger