CORDIS - Forschungsergebnisse der EU
CORDIS

Symbolic Analysis of Temporal and Functional Behavior of Networked Systems

Periodic Reporting for period 4 - SYMBIOSYS (Symbolic Analysis of Temporal and Functional Behavior of Networked Systems)

Berichtszeitraum: 2020-02-01 bis 2021-07-31

The goal of SYMBIOSYS is to achieve reliability and interoperability of networked (software) systems, a crucial requirement in today’s networked information society. To this end, we devise a software and systems analysis methodology that – for the first time – considers the vital influence factors that determine the behavior of networked systems, especially including input and temporal uncertainty of network interactions. With SYMBIOSYS, we want to be able to automatically and efficiently explore and analyze the vast amount of distributed execution paths in networked systems in a highly structured manner inspired by Symbolic Execution.
The combination of the benefits of model checking (rigorous exploration) and of dynamic software testing (analyzing real systems’ code) represents a quantum leap in the field of networked systems analysis. Orthogonal to and complementing formal model-based approaches, which target the design of reliable systems on an abstract (model-) level, we also address system- and implementation-level aspects of (typically heterogeneous) implementations that interact via (possibly unpredictable) networks.
For this, we developed the required conceptual and theoretic foundation as well the associated methodology for systematically exploring all program behavior influenced by input and temporal factors. Further, as the search space to be covered is growing dramatically when considering both input and time, we aimed to reduce the complexity of the overall analysis by various means, during all stages of analysis. Lastly, we implemented and explored how our methods can help to identify software flaws and other problems that could, e.g. compromise interoperability, availability or security of distributed software systems.
The primary focus of the SYMBIOSYS project was not so much the implementation of a working prototype of the Symbolic Distributed Temporal Execution (SDTE) approach, but rather to devise methods that enable it in a manner that is performant and capable enough to enable the analysis of anything more than the most trivial of examples. For this, we had to combine several strands of significant improvements along different trajectories to mitigate the complexity (stemming from the combination of SDE and STE) and path explosion (a problem common to all Symbolic Execution techniques).
In the following, we will highlight some of our achievements and point towards the relevant publications (conference name and year in braces) we have published during the course of the project.

1. Symbolic Execution as well as the proposed SDTE are based on Satisfiability Modulo Theories (SMT) constraint solvers. As both of SDE and STE create substantial additional load on the constraint solver, we aimed at mitigating the unbounded increase in distributed and temporal constraint clauses. While this meant devising techniques that avoid adding additional constraints, we also evaluated strategies to reduce the burden of inevitable constraints. As part of this work, we have designed and implemented a partial theory solver for intervals (ASE 2018), which is able to answer a particular kind of constraint queries that is common in Symbolic Execution.

2. Path explosion in its essence is a problem that leads a Symbolic Execution engine to wastefully recompute and reexecute actions in an aggressive manner, growing explosively with the number of exploration paths through a program. We proposed an approach that is based on extending the automation of memoization to cope with impure languages such as C++ (PADS 2016). Memoization caches computation results instead of repeatedly recomputing them and was proposed decades ago. This is especially useful in the context of symbolic execution, where a target program may perform the same operations on a large number of paths.

3. Next to caching previously computed results, we also developed a method that can identify entire (previously visited) program states in order to tame path explosion and to enable a scalable and comprehensive analysis methology. Using a novel fingerprinting approach, we are able to capture program states of complex programs during symbolic execution and efficiently compare them based on these fingerprints. We first developed this technique to detect infinite loops during symbolic execution (CAV 2018). This approach can scale to real-world programs and our prototype implementation was able to detect previously unknown bugs in widely-deployed software, some of which went undetected for 16 years.

4. To efficiently capture all relevant instantiations of temporal orders, we integrated symbolic execution for multithreaded programs with a partial-order reduction algorithm (CAV 2020). This reduction allows us to reason about equivalence classes that cause identical behavior based on the (in)dependence of concurrent events in SDTE. Here, we also extended our previous fingerprinting scheme to reduce redundancy based on a thread- (rather than state-) level using cutoff events in our unfolding-based technique. Finally, this framework has established a theoretical foundation that is more general than SDTE, our originally devised technique.

5. While we made significant advances in the Symbolic Execution technology underlying SDTE, we also explored the versatile analysis capabilities it enables based on our rigorous software exploration methods. Next to infinite loop detection (CAV 2018) and data races (CAV 2020) in multithreaded applications, we looked into floating point handling (ASE 2017) and applied our techniques to analyze the interoperability of novel network protocols such as QUIC (EPIQ 2018). Finally, we proposed methods based on symbolic execution for the investigation of performance issues in network functions (SIGCOMM Poster 2016, conditionally accepted paper at CoNEXT 2021).

Along the way, we implemented various prototype implementations demonstrating our techniques. These software artifacts also enabled to evaluate how our methods fare when faced with analyzing real-world software. During the course of this project, we have been able to find a total of 21 previously uncovered bugs in various open-source software projects, among them widely deployed Linux command line utilities (GNU coreutils, BusyBox and toybox) or memcached, a prominent distributed memory object caching system.
In the beginning of the project, we saw SDE and STE as two mostly separate, but equally important, avenues towards establishing SDTE. In particular, we deemed it necessary to focus on complexity reduction techniques that aim at either of the two, to achieve the necessary efficiency for SDTE. Starting with the fingerprinting approach of CAV 2018, however, we found that a more general strategy that covers both techniques might lead to more significant results. With our further work (CAV 2020), we were able to realize a strategy that is even more general than what we envisioned for SDTE and that integrates several meaningful ways of complexity reduction.
symbiosys.png