Testing Program Analyzers Ad Absurdum

Informations projet

MirandaTesting

N° de convention de subvention: 101076510

DOI

10.3030/101076510

Date de signature de la CE 15 Mars 2023

Date de début 1 Juillet 2023

Date de fin 30 Juin 2028

Financé au titre de

European Research Council (ERC)

Coût total

€ 1 499 991,00

Contribution de l’UE

€ 1 499 991,00

1 499 991,00

Coordonné par

TECHNISCHE UNIVERSITAET WIEN
Austria

Periodic Reporting for period 1 - MirandaTesting (Testing Program Analyzers Ad Absurdum)

Période du rapport: 2023-07-01 au 2025-12-31

Software is pervasive in modern society, and its reliability is critical.

Program analysis refers to automatically examining a piece of software with the goal of detecting correctness issues or verifying their absence. Program analyzers are tools that implement program-analysis techniques. In recent years, as day-to-day life increasingly depends on software, more and more program analyzers are being built and used in practice. In fact, there is an abundance of popular analyzers developed both in academia and industry.

We rely on program analyzers to “guard” software reliability, but who will guard the guards? Program analyzers are highly complex tools, implementing sophisticated algorithms and performance optimizations. In addition, analyzers typically integrate several self-contained, core analysis components, such as specialized solvers, which are already complex by themselves. Due to this overall complexity, program analyzers are all the more likely to contain correctness issues. The most dangerous kind of correctness issue in analyzers is a critical bug, which we define as a bug leading to a wrong response, e.g. returning ‘correct’ for incorrect software, or leading to a right response for the wrong reasons. The latter type of critical bug is relevant because it is also likely to result in a wrong analyzer response under different circumstances.

In modern society, critical bugs may have disastrous consequences, e.g. when analyzing software used for transportation, banking, or secure communication. As a concrete example, consider the Astrée analyzer, which has been used to verify the absence of runtime errors in the flight-control software of Airbus A340 and A380. What if it missed an error? It is, therefore, imperative to check program analyzers for critical bugs.

Verifying the absence of critical bugs in a program analyzer is prohibitively expensive. Contrary to verification, automated test generation can be used to effectively find such bugs. Existing testing approaches, however, are still limited for this application domain.

The goal of this project is to develop an overarching methodology for more rigorous testing of program analyzers than ever before. The key idea is to first expose more information about why a program analyzer reaches a particular response for a certain piece of code, e.g. why was the code found correct? This information will be used to interrogate the analyzer further aiming to force it into a contradiction. In other words, anything the analyzer says during interrogation can and will be used against it. Finding a contradiction signifies that an analyzer response or its justification for a response is wrong, and that a critical bug has been detected. We call this methodology “interrogation testing”.

If successful, this project will enable systematic testing of entire program-analyzer classes. As a result, analyzers will exhibit fewer critical bugs, potentially preventing catastrophic outcomes in safety-critical domains.

In the first two years of the project, we designed the interrogation-testing methodology and framework [2]. We instantiated it to test different classes of program analyzers, e.g. ones that reason about reachability, such as abstract interpreters, model checkers, and symbolic-execution engines, [2,3] and Datalog engines [5]. Moreover, we used our methodology to create a fuzzer-benchmarking platform for greybox fuzzers [4]. Finally and a bit unexpectedly, we found that interrogation testing is applicable beyond program analyzers. In particular, we used it to test entire zero-knowledge systems [1], whose growing applicability and use in authentication, online voting, etc. makes it imperative that we rigorously test these systems. Overall, we found numerous critical bugs, the vast majority of which were promptly fixed by the developers.

1. Christoph Hochrainer, Anastasia Isychev, Valentin Wüstholz and Maria Christakis. Fuzzing Processing Pipelines for Zero-Knowledge Circuits. In Proceedings of the 31st International Conference on Computer and Communications Security (CCS'25), 2025. ACM.

2. David Kaindlstorfer, Anastasia Isychev, Valentin Wüstholz and Maria Christakis. Interrogation Testing of Program Analyzers for Soundness and Precision Issues. In Proceedings of the 39th International Conference on Automated Software Engineering (ASE'24), 2024. ACM.

3. Markus Fleischmann, David Kaindlstorfer, Anastasia Isychev, Valentin Wüstholz and Maria Christakis. Constraint-Based Test Oracles for Program Analyzers. In Proceedings of the 39th International Conference on Automated Software Engineering (ASE'24), 2024. ACM.

4. Jiradet Ounjai, Valentin Wüstholz and Maria Christakis. Green Fuzzer Benchmarking. In Proceedings of the 32nd International Symposium on Software Testing and Analysis (ISSTA'23), 2023. ACM.

5. Muhammad Numair Mansur, Valentin Wüstholz and Maria Christakis. Dependency-Aware Metamorphic Testing of Datalog Engines. In Proceedings of the 32nd International Symposium on Software Testing and Analysis (ISSTA'23), 2023. ACM.

On the scientific front, interrogation testing aims to provide practical and actionable techniques that Formal Methods experts can adopt to increase reliability of their analyzers. Our techniques can also be used during development of new features or entirely new tools, making correctness concerns a primary objective early on in the implementation effort. On the other hand, the project will provide Software Testing experts with a solid conceptual foundation for proliferating the application domain of testing analyzers. Overall, we expect to increase synergies between these fields, thereby inspiring new research ideas for testing in related domains, e.g. compilers, interpreters, debuggers, or version control systems. Our methodology might even be applicable beyond development tools, to other software that is notoriously difficult to test. We have already seen such an example with zero-knowledge systems.

We expect interrogation testing to also achieve societal impact. Program analyzers will exhibit fewer critical bugs, hence increasing the quality of analyzed software, especially in safety-critical settings. This may even impact software certification. Specifically, analyzers are being used to check whether software meets certification requirements, e.g. for road vehicles, safety-related electrical control systems, or safety-related railway software. We envision a future where analyzers used for certification must be interrogation tested, thereby indirectly raising the certification standards for safety-critical software. Moreover, by improving analyzer quality, users will place more trust in analysis results, making program analyzers even more widely applicable. In short, software - a key innovation driver in our society - will become more reliable.

Periodic Reporting for period 1 - MirandaTesting (Testing Program Analyzers Ad Absurdum)

Télécharger Télécharger le contenu de la page