Deployable Decision-Making: Embracing Semantics for Robotic Safety in Everyday Scenarios

Información del proyecto

SSDM

Identificador del acuerdo de subvención: 101155035

DOI

10.3030/101155035

Proyecto finalizado el 31 Agosto 2025

Fecha de la firma de la CE 3 Mayo 2024

Fecha de inicio 1 Junio 2024

Fecha de finalización 31 Mayo 2026

Financiado con arreglo a

Marie Skłodowska-Curie Actions (MSCA)

Coste total

Sin datos

Aportación de la UE

€ 173 847,36

Coordinado por

TECHNISCHE UNIVERSITAET MUENCHEN
Germany

Periodic Reporting for period 1 - SSDM (Deployable Decision-Making: Embracing Semantics for Robotic Safety in Everyday Scenarios)

Período documentado: 2024-06-01 hasta 2026-05-31

As robots move from traditional industrial environments into everyday human-centric settings, their ability to act safely increasingly depends on more than simply avoiding collisions or respecting predefined analytical constraints. Robots must develop a semantic understanding of their operating environment and take actions that are contextually appropriate. For instance, if a robot is assigned the task of heating food in a microwave, it must not only identify the microwave and the food but also understand that metallic materials should be removed beforehand to avoid potential fire hazards. Recent progress in computer vision and machine learning has enabled robots to extract rich semantic information from sensory inputs such as RGB-D images and language. However, embedding this semantic knowledge into safe and contextually appropriate behaviour remains an open challenge.

A significant bottleneck arises at the interface between the perception and action modules of robotic systems. While various safe control frameworks enable robots to adhere to safety constraints, these constraints are often assumed to be provided in particular analytical forms beforehand (e.g. as control barrier functions). Bridging the gap between semantic understanding and safe action execution necessitates translating semantic constraints from perception into explicit functions defined in the robot's state and action space. This translation process must account for perceptive uncertainties (e.g. sensor noise and incomplete maps), dynamic environmental conditions, and possible interactions with humans in shared spaces. Addressing these complexities is pivotal to achieving the safe and reliable deployment of robotic systems in real-world, everyday scenarios.

This project aims to address the challenge of semantically safe robot decision-making with the following actionable steps:

(1) providing a comprehensive review of semantics-driven robot decision-making frameworks as well as the necessary perception and safe control building blocks,
(2) exploring efficient environment representations that facilitate downstream language-conditioned contextual reasoning and semantically-informed decision-making,
(3) developing the theoretical foundation for seamlessly integrating perception and action modules to enable semantically safe actions,
(4) deriving uncertainty-aware approaches to account for perceptive uncertainties and environment variations, and
(5) demonstrating the effectiveness of the overall approach in real-world scenarios.

Through these steps, we aim to establish the theoretical foundations and algorithmic tools necessary for semantically safe robot decision-making. These advancements will represent a step towards designing safe and efficient decision-making algorithms for robots operating in unstructured, human-centric environments, where intelligent and reliable robotic systems can not only improve daily life but also support human counterparts in specialized applications (e.g. collaborative construction, manufacturing, and healthcare).

During the reporting period, the project advanced the scientific objective of semantically safe robot decision-making through the design, development, and validation of novel perception–action integration frameworks. The research was structured along the key objectives defined at the start of the project, with the following main achievements:

(1) Review of existing literature: A comprehensive review of semantics-driven robot decision-making frameworks was conducted, covering the essential perception and safe control building blocks. This work provided both the theoretical context and a system-level perspective on how semantic information can be embedded into robot decision-making pipelines. The review has laid the foundation for subsequent contributions by systematically identifying both the existing trends and limitations in bridging semantic perception to safe control. In parallel, we benchmarked 12 safe decision-making algorithms, spanning model-based control and reinforcement learning, and identified key trade-offs in generalization, efficiency, and deployment feasibility. These results provide a set of practical guidelines for selecting appropriate safe decision-making approaches in different applications.

(2) 3D Environment Mapping for Semantic Reasoning: The project advanced methods for building efficient metric–semantic environment representations that support contextual reasoning and decision-making. In particular, a vision–language perception module was developed that builds open-vocabulary, instance-level 3D maps from onboard RGB-D sensing. Beyond leveraging SAM-based segmentation and CLIP features for open-vocabulary capabilities, motivated by the formulation in POCD [1], our system explicitly models stationarity scores for individual objects, enabling it to systematically track semi-static changes in the environment (e.g. when objects are moved, removed, or reintroduced without continuous observation). This capability allows robots to autonomously maintain up-to-date semantic maps, actively search for objects using natural language instructions, and reason about potential unsafe behaviours. As compared with typical mapping baselines, our proposed approach achieved higher mapping efficiency and improved overall task completion performance.

(3) Integration of Semantics into Safe Robot Decision-Making. Progress in this objective can be grouped into three threads, which are summarized below.

(a) Semantic Safety: A semantic safety filter was developed that leverages open-vocabulary perception together with large language model (LLM) reasoning to derive constraints such as unsafe spatial relationships (e.g. "moving a cup of water above a laptop is unsafe due to the risk of spillage"), cautious behaviours (e.g. "transporting a knife near balloons should be avoided to prevent accidental popping"), and orientation locks (e.g. “the end effector should not rotate when holding a cup of water, otherwise spillage occurs”). Embedded into a control barrier function (CBF)-based control framework, the filter ensured that both geometric and semantic constraints were satisfied in tabletop manipulation and real-world kitchen transportation tasks. Notably, all experiments confirmed 100% compliance with both types of constraints.

(b) Semantic Exploration: Building on the mapping framework, we developed a novel semantic exploration scheme that allows robots to (i) actively maintain maps over regions that are likely outdated and (ii) perform open-vocabulary object navigation based on natural language inputs. We evaluated our approach in three real-world environments (i.e. our kitchen and office spaces) and showed that by incorporating stationarity scores, our system reliably detected semi-static changes (95% detection rate) and greatly improved mapping efficiency, which ultimately resulted in more efficient semantic reasoning and faster completion of downstream navigation tasks.

(c) Semantic Interaction: We further explored two complementary modes of safe interaction with humans. The first is kinematics-based interaction in human-shared spaces. Based on prior work SICNav [2], we proposed a bi-level optimization approach that explicitly models human intent and reactions within the hierarchical task model predictive control (HTMPC) [3] framework for safe and efficient sequential mobile manipulation with humans continuously moving in proximity to the robot. Validated on two mobile manipulator platforms, this approach enabled safe and efficient task execution without relying on overly conservative approximations. The second mode centers on language-based interaction. Here, we introduced SwarmGPT, in which semantic reasoning from large language models (LLMs) is coupled with a distributed MPC framework to guarantee safe execution of language-driven tasks. This framework opens the door to safe semantic human–robot interaction in more complex robot systems, beyond single agents.

(4) Enhancing Reliability in Perceptive Safe Decision-Making Methods: We addressed two major aspects to improve reliability in perceptive safe decision-making, which are detailed below.

(a) Perception-Aware Control: We delved into safe decision-making with perception-driven inputs and compared a perceptive HTMPC framework leveraging CBF constraints against a standard perceptive MPC approach. Our results show that, due to the inherent velocity-bounding property of CBFs, the framework enables earlier reactions and more reliable behaviour under sensor noise and incomplete maps.

(b) Theoretical Reliability of CBFs: We identified and resolved a critical limitation related to relative degree assumptions in CBF safety certification. A proposed multi-CBF formulation mitigates inactivity issues and ensures robust constraint satisfaction, validated in both simulations and quadrotor experiments.

(5) Experimental Validation in Real-World Environments: All proposed frameworks were implemented and validated on real robotic platforms, including fixed-base robot arms, two types of mobile manipulators, and an aerial swarm. Beyond laboratory tests, demonstrations were conducted in our kitchen and office spaces to showcase the applicability of our methods in real-world, unstructured environments.

Overall, the project delivered new theoretical insights, including addressing foundational limitations of CBF safety certification, and practical algorithmic tools for semantically safe decision-making. These were extensively validated in simulation and hardware, demonstrating both scientific novelty and deployability in human-centric environments. To date, the project has produced a continuous stream of publications: 3 accepted/appeared, 3 under review, and 3 in preparation, that are either published or targeting top-tier robotics venues. We have been making publications and related resources (e.g. research videos, implementation, and datasets) publicly available whenever possible. Besides these efforts, we have actively engaged the community by organizing workshops on related topics, delivering invited talks, and showcasing our robots in action through open-house events and outreach activities. Further details of the project can be found on the website www.semanticcontrol.com.

References:
[1] J. Qian, V. Chatrath, J. Yang, J. Servos, A. P. Schoellig, and S. L. Waslander, “POCD: Probabilistic object-level change detection and volumetric mapping in semi-static scenes,” in Proc. of the Robotics: Science and Systems (RSS) Conference, 2022.
[2] S. Samavi, J. R. Han, F. Shkurti, and A. P. Schoellig, "SICNav: Safe and Interactive Crowd Navigation Using Model Predictive Control and Bilevel Optimization," in IEEE Transactions on Robotics, vol. 41, pp. 801-818, 2025, doi: 10.1109/TRO.2024.3484634.
[3] X. Du, S. Zhou, and A. P. Schoellig, "Hierarchical Task Model Predictive Control for Sequential Mobile Manipulation Tasks," in IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1270-1277, Feb. 2024, doi: 10.1109/LRA.2023.3342671.

The project delivered several advances that go beyond the state-of-the-art in robotics, particularly in embedding semantic understanding into safe robot decision-making:

(1) Semantic Safety Filters: Introduced the first framework that integrates open-vocabulary perception and large language model reasoning into control barrier function (CBF)-based safety filters. This enables robots to enforce contextually relevant safety constraints (e.g. handling fragile items, preventing fire hazards) in real-world manipulation and kitchen tasks.

(2) Metric–Semantic Mapping with Semi-Static Change Tracking: Developed a novel vision–language perception module that systematically models object stationarity, allowing robots to maintain accurate semantic maps and adapt to semi-static changes in human environments. This supports robust long-term operation in dynamic, everyday settings.

(3) Safe Interaction: Investigated two complementary modes of safe interaction, including the development of a novel bi-level optimization framework that enables safe mobile manipulation around humans as well as the SwarmGPT framework that ensures safe execution of language-driven tasks with multi-agent systems.

(4) Reliability in Perceptive Safe Control: Identified and addressed a foundational limitation in CBF-based safety certification (relative degree assumption), proposing a validated multi-CBF formulation. Demonstrated a perception-aware HTMPC framework that improves robustness under sensor noise and incomplete maps.

(5) Real-World Validation: Our methods were validated across different hardware platforms, including manipulators, mobile manipulators, and aerial swarms, in realistic environments such as kitchens and offices. These experiments allowed us to gain confidence in the deployability of our methods, beyond methodological novelty.

As outlined above, in this project, we have developed novel frameworks for semantically informed robot decision-making that pave the way for further interdisciplinary research that tightly couples robot learning, perception, and control theory. The potential impacts of the project outcome are as follows:

(1) Scientific Impact: The project has made significant scientific contributions by establishing the foundations and developing algorithmic tools for semantically safe robot decision-making in unstructured, human-centric environments. These are environments where intelligent and reliable robots can improve daily life and support humans in tasks that would otherwise be difficult to accomplish (e.g. where precision is required). Our results include benchmarking safe decision-making algorithms to identify strengths and limitations for real-world deployment, developing perception pipelines for semi-static environments, and demonstrating semantic reasoning across three key dimensions: semantic safety, semantic exploration, and semantic interaction. Our results extend beyond theoretical advances, having been validated across multiple robotic embodiments and real-world human-centric environments. Our work represents one of the first efforts in semantically safe decision-making, and this direction is gaining increasing traction within the robotics community.

(2) Societal and Economic Impacts: By embedding semantic reasoning into safety-critical decision-making, the project moves robotics closer to reliable real-world deployment in human-centric settings. This research equips robots with “common sense” reasoning capabilities, allowing them to anticipate the consequences of their actions and act in compliance with social norms. Such capabilities are crucial for increasing transparency and trust in autonomous systems. Potential applications range from household assistance to specialized fields such as healthcare, construction, and advanced manufacturing, with clear benefits in efficiency, safety, and usability. Economically, the methods and resources generated could lower the entry barrier for startups and SMEs; further through real-world robot demonstrations, the project could serve as a catalyst for transferring the latest large-scale-model-based technologies to downstream robotics applications.

An illustration of the perception-action loop for semantically safe robot decision-making.

Periodic Reporting for period 1 - SSDM (Deployable Decision-Making: Embracing Semantics for Robotic Safety in Everyday Scenarios)

Descargar Descargar el contenido de la página