Periodic Reporting for period 1 - DEUCE (Data-Driven Verification and Learning Under Uncertainty)
Período documentado: 2023-01-01 hasta 2025-06-30
We now outline the main objectives of the project DEUCE: Data-Driven Verification and Learning under Uncertainty. Reinforcement learning (RL) agents learn to behave optimally via trial and error without the need to encode complicated behavior explicitly. However, RL generally lacks mechanisms to ensure correct and safe behavior regarding sophisticated tasks and safety specifications.
Formal verification (FV), particularly model checking, guarantees a system's correctness based on rigorous methods and precise specifications. Despite active development by researchers from all over the world, fundamental challenges obstruct the application of FV to RL so far.
We identify three key challenges that frame the objectives of this proposal.
(1) Complex environments with large degrees of freedom induce large state and feature spaces. This curse of dimensionality poses a longstanding problem for verification.
(2) Common approaches for the correctness of RL systems employ idealized discrete state spaces. However, realistic problems are often continuous.
(3) Knowledge about real-world environments is inherently uncertain.
To ensure safety, correctness guarantees need to be robust against such imprecise knowledge about the environment.
The main objective of the DEUCE project is to develop novel and data-driven verification methods that tightly integrate with RL. To cope with the curse of dimensionality, we devise learning-based abstraction schemes that distill the system parts that are relevant for correctness. We employ and define models whose expressiveness captures various types of uncertainty. These models are the basis for formal and data-driven abstractions of continuous spaces. We provide model-based FV mechanisms that ensure safe and correct exploration for RL agents.
The objective of DEUCE is to elevate the scalability and expressiveness of verification toward real-world deployment of reinforcement learning.
Within the first key challenge, handling complex and high-dimensional environments, we provided several results that tackle systems of high-dimensional representations, such as pixel representations or multi-agent settings. We highlight one of the first works that performs safe reinforcement learning from pixels, published at ICLR 2023, and a novel approach to handle multi-agent systems that operate under partial information and uncertainty with many agents, published at AAAI 2024.
For the second key challenge, we were able to provide novel results on the abstraction and verification of dynamical systems. Intuitively, high-dimensional, continuous, state and action spaces are abstracted into discrete spaces. The main result is that we provide guarantees that link the original system and the abstraction. To highlight two results, we published one paper at the AAAI 2024 conference dealing with various types of epistemic and aleatoric uncertainty and a comprehensive overview in the Journal of AI research. Moreover, we provided the first approach to safe and shielded reinforcement learning under partial observability, published at AAAI 2023.
For the third challenge, we provided a string of results that help the verification and learning of uncertain systems. As a highlight, we defined novel semantics for a complicated model: robust partially observable Markov decision processes. Our semantics help to understand uncertainty in a game-based fashion, published at IJCAI 2024. This work is already proving to have significant implications for the rest of the project and the research area in general. Moreover, we coined the term of 'active measuring' in reinforcement learning. Our techniques are able to actively exploit a system's potentially very expensive sensor, if the uncertainty is too high to make safe and correct decisions. One of the results in this stream has been published at AAAI 2024. Finally, we delved into safe and reliable offline reinforcement learning to account for the inherent uncertainty that occurs in offline settings with limited data. The first result handling partial observability is published at AAAI 2023.