Skip to main content
European Commission logo print header

Modular Open Platform for Static Analysis

Periodic Reporting for period 4 - MOPSA (Modular Open Platform for Static Analysis)

Reporting period: 2020-12-01 to 2021-11-30

Software errors are pervasive, with results ranging from mere annoyance (a crashing Web browser) to life-threading critical system failures (in planes and cars). The cumulative effect of software errors causes a large economic loss: the report on the Cost of Poor Software Quality in the US for 2020 established by the Consortium for Information & Software Quality evaluated this cost at $2.08 trillion. Currently, software correctness is mainly ensured through testing: monitoring the program on a selected set of executions, which is effective at uncovering bugs quickly, but cannot ensure freedom from bugs as tests only cover a fraction of all possible executions. Even achieving a significant code coverage requires a large effort and does not scale up to the size and complexity of future software.

The MOPSA project aims to improve the effectiveness of program verification, which would directly benefit our software-reliant society. Instead of tests, the project considers formal verification, which employs logic to reason on programs and provide rigorous guarantees on their correctness. Among formal methods, our focus is on semantic static analysis based on the theory of Abstract Interpretation. It relies on approximations to provide sound tools that are fully automated and work directly on the source code. High automation makes static analysis very attractive for the general programming community. Abstract interpretation helps ensuring soundness, which is an important factor for safety assurance: by erring on the safe side (e.g. over-approximating the range of variables when an exact computation is not possible or impractical) the analysis infers only true facts on the program behavior and can provably demonstrate its correctness with respect to well-defined criteria. Approximations nevertheless cause incompleteness: not every true fact about a program can be inferred, and the analysis can fail to prove that a correct program is correct, an unfortunate effect we strive to minimize by developing ever more complex abstractions.

Abstract Interpretation has enjoyed a growing success and moved form academic research to the embedded critical industry (with tools such as Polyspace and Astrée). The objective of the MOPSA project is to advance the state of Abstract Interpretation methods towards the point where they can be used by programmers on general software. The scientific challenges we must overcome include designing scalable analyses, producing relevant information, supporting novel popular languages (such as Python), and analyzing properties more adapted to the continuous development of software. The project also builds an extensible open-source static analysis platform demonstrating these new techniques, usable by researchers and software engineers to create a momentum encouraging further research, development, and use of Abstract Interpretation techniques in general software verification.
The MOPSA project has made several contributions towards the improvement of static analysis based on Abstract Interpretation by targeting novel languages and properties. We have developed novel abstract domains, which are analysis components targeting individual properties and language constructs, and new ways to combine them to enable effective cross-language analyses.

Firstly, we addressed the scalability issue. Focusing on the analysis of C programs, we designed modular abstractions inferring function summaries. They allow reusing analysis results from functions called in related execution contexts, and help speed up the analysis of programs with libraries. We developed notably modular abstractions to check the manipulation of C strings. We also considered scaling analysis setups to more complex software depending on libraries. As, to be sound, the full program and its dependencies must be made available to the analyzer, we developed a specification language to help define concisely and soundly the effect of libraries that cannot be conveniently provided or analyzed in source form.

Secondly, we addressed the language issue, striving to go beyond the analysis of static imperative and object-oriented languages (which remain even now the core target for safety verification by semantic static analysis). We considered the analysis of Python, a dynamic language challenging for formal methods due to its involved semantic. The interpretation of key parts of the language requires a precise knowledge of the dynamic state of the program which, as we demonstrated, paradoxically makes it a perfect target for the highly precise flavor of static analyses considered in the project. We developed a semantic of Python 3 programs realistically based on the behavior of its main interpreter, CPython, and proposed abstractions that enable the automatic inference of program types and values. We also considered programs mixing different languages, a common occurrence. After developing general theoretical principles on how to combine existing independent analyses for different languages into a multi-lingual analysis, we applied this idea to the safety analysis of Python programs that call native C libraries.

Thirdly, we studied how to maintain the safety of software when they are evolving, due to bug fixes or porting to new targets. We developed an analysis of software patches for C to detect the semantic differences in program executions caused by a program change and help prevent unwanted changes or regressions. We also developed a portability analysis for C able to detect statically whether a program behaves differently on little and big-endian architectures.

Throughout the project, we developed a novel platform: the MOPSA analyzer. It leverages our new vision for semantic static analyses handling multiple languages and properties with a design promoting sharing and cooperation of abstractions. The aforementioned analyses have been implemented within and thanks to MOPSA, and successfully experimented on realistic benchmarks (including, in the case of portability analysis, large industrial code). The platform is available as open-source at https://gitlab.com/mopsa/mopsa-analyzer to promote future use in academia and industry.
The static analysis of dynamic languages aiming towards soundness is a recent field and focuses mainly on JavaScript. Among the results of the project, a key contribution is the first formal semantic of a realistic subset of Python adapted to sound static analysis, and the subsequent development of analyses for dynamic languages that enable a high level of precision (value-sensitivity, flow-sensitivity, relational properties).

We also made significant progress towards the semantic static analysis of modern software with our practical demonstration on how an analysis for programs using multiple languages could be designed. A benefit of our approach is that abstractions developed independently for different languages can be combined to achieve a multi-lingual analysis.

Finally, we progressed beyond the state of the art by introducing formal methods to portability analyses, which were successfully formalized, implemented and applied to check large industrial code.

We expect future work to continue along these directions and leverage the MOSPA platform, and its unique ability to combine heterogeneous abstractions, to analyze novel languages and properties. This includes ongoing work on the analysis of security properties and smart-contracts.
Analyzing an error with MOPSA CVE-2020-8625 error on BIND
Combination of abstract domains in MOPSA for the analysis of Python programs using C libraries