Periodic Reporting for period 2 - AutoProbe (Automated Probabilistic Black-Box Verification)
Reporting period: 2022-10-01 to 2024-03-31
Automata learning – automated discovery of automata models from system observations such as test logs – is emerging as a highly effective bug-finding technique with applications in verification of bank cards and basic network communication protocols. The design of algorithms for automata learning is a fundamental research problem and in the last years much progress has been made in developing and understanding of new algorithms (including the PI’s own work). Yet, existing algorithms do not support crucial quantitative or concurrency aspects that are essential in modeling properties such as network congestion and fault-tolerance.
The central objective of this project is to develop a new verification framework that enables automated model- based verification for probabilistic and concurrent systems, motivated by applications in networks. We will provide active learning algorithms, in the style of Angluin’s seminal L⋆ algorithm, for automata models that were so far too complex to be tackled. The development will be based on rigorous semantic foundations, developed by the PI in recent years, which provide correctness for the algorithms in a modular way.
The project will significantly advance model-based verification in new and previously unexplored directions. This line of research will not only result in fundamental theoretical contributions and insights in their own right but will also impact the practice of concurrent and probabilistic network verification and lead to new and more powerful techniques for the use of engineers and programmers.
- practical applications of automata learning -- understanding the limits
We developed Prognosis, a framework offering automated closed-box learning and analysis of models of network protocol implementations. Prognosis can learn models that vary in abstraction level from simple deterministic automata to models containing data operations, such as register updates, and can be used to unlock a variety of analysis techniques -- model checking temporal properties, computing differences between models of two implementations of the same protocol, or improving testing via model-based test generation. Prognosis is modular and easily adaptable to different protocols (e.g. TCP and QUIC) and their implementations.
- new algorithms for automata learning
We have also extended the Kearns-Vazirani learning algorithm to be able to handle systems that change over time. We presented a new learning algorithm that can reuse and update previously learned behavior, implement it in the LearnLib library, and evaluate it on large examples, to which we make small adjustments between two runs of the algorithm. In these experiments our algorithm significantly outperforms both the classic Kearns-Vazirani learning algorithm and the current state-of-the-art adaptive algorithm.
- decidability and completeness results for probabilistic languages
We developed a (co)algebraic framework to study a family of process calculi with monadic branching structures and recursion operators. Our framework features a uniform semantics of process terms and a complete axiomatisation of semantic equivalence. We show that there are uniformly defined fragments of our calculi that capture well-known examples from the literature like regular expressions modulo bisimilarity and guarded Kleene algebra with tests. We also derive new calculi for probabilistic and convex processes with an analogue of Kleene star.
- Verification of stream-sampling for probabilistic programs
Probabilistic programming languages rely fundamentally on some notion of sampling, and this is doubly true for probabilistic programming languages which perform Bayesian inference using Monte Carlo techniques. Verifying samplers - proving that they generate samples from the correct distribution - is crucial to the use of probabilistic programming languages for statistical modelling and inference. However, the typical denotational semantics of probabilistic programs is incompatible with deterministic notions of sampling. This is problematic, considering that most statistical inference is performed using pseudorandom number generators.
We developed a higher-order probabilistic programming language centered on the notion of samplers and sampler operations. We equipped this language an operational and denotational semantics in terms of continuous maps between topological spaces. Our language also supports discontinuous operations, such as comparisons between reals, by using the type system to track discontinuities. This feature might be of independent interest, for example in the context of differentiable programming.
Using this language, we developed tools for the formal verification of sampler correctness.
- Conflict Aware Learning
Active automata learning algorithms cannot easily handle conflict in the observation data (different outputs observed for the same inputs). This inherent inability to recover after a conflict impairs their effective applicability in scenarios where noise is present or the system under learning is mutating. We proposed the Conflict-Aware Active Automata Learning (C3AL) framework to enable handling conflicting information during the learning process. The core idea is to consider the so-called observation tree as a first-class citizen in the learning process. Though this idea is explored in recent work, we take it to its full effect by enabling its use with any existing learner and minimizing the number of tests performed on the system under learning, specially in the face of conflicts.
In the coming period, we will be working on two lines: development of automata learning algorithms in the concurrent and probabilistic setting, and applications of learning in runtime verification.