Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch Deutsch
CORDIS - Forschungsergebnisse der EU
CORDIS

Artificial Intelligence for Traffic Safety between Vehicles and Vulnerable Road Users

Periodic Reporting for period 1 - VeVuSafety (Artificial Intelligence for Traffic Safety between Vehicles and Vulnerable Road Users)

Berichtszeitraum: 2022-10-01 bis 2024-09-30

Traffic safety is a fundamental criterion in vehicular environments and for many artificial intelligence-based systems, such as autonomous vehicles. Some areas in urban environments, such as intersections and shared spaces, present high risks where vehicles and vulnerable road users (VRUs) directly interact. By advancing state-of-the-art artificial intelligence methodologies, the VeVuSafety project aims to build deep learning frameworks to understand road users' behavior in various mixed traffic situations, ensuring the safety of both vehicles and VRUs. The project objectives include: WP1, mapping the traffic environment with static and dynamic objects, such as road surfaces and road users; WP2, predicting the trajectories of various road users, such as pedestrians, cyclists, and vehicles; WP3, analyzing road users' dynamic behaviors and their interactions; and WP4, facilitating autonomous driving and safer traffic conditions.
To achieve the objectives of the VeVuSafety project, in WP1 we proposed an uncertainty-aware approach to map continuous driving space from a bird’s-eye view (BEV) for road surface segmentation and vehicle detection. Open-source point cloud data captured from real-world and simulated driving scenarios was leveraged to train an evidential deep learning model to quantify the uncertainty of segmentation and detection results, providing reliable confidence scores for the predicted outcomes. It is worth mentioning that, compared to using image-based data, point cloud data contains less sensitive information, such as license plates and faces, which is beneficial for data privacy protection. In WP2, we proposed deep generative models, such as Conditional Variational Auto-Encoders and Mixture Models with Transformer attention mechanisms, to predict heterogeneous road users’ multimodal behaviors, including moving speed, turning angle, and mutual interactions among road users. The Mixture Models, i.e. Gaussian/Laplacian Mixture Models, enable us to estimate the likelihood of different behavior patterns of road users. Our models achieved state-of-the-art performance on multiple popular benchmarks for pedestrian, vehicle, and other road users' trajectory prediction, including the ETH/UCY, nuScenes, and Argoverse benchmarks. In WP3, to model road users' dynamic behaviors and their interactions, we proposed Graph Convolutional Networks to learn the constraints of environmental contexts, such as road lanes from high-definition maps, and the guidance of predecessors by tracing the trajectories of road users who demonstrate similar movement dynamics in the same scene. Our method outperformed concurrent approaches and won the Best Paper Award at the ROAD++ workshop at ICCV 2023. In WP4, to facilitate autonomous driving and enhance traffic safety, we proposed incorporating physical models, such as social force models with different forces (e.g. repulsive forces for collision avoidance and attractive forces for social connections), to reduce potential collisions that deep learning models may generate. The research results have been published in top-tier journals and international conference proceedings with open access and regularly promoted on the VeVuSafety project website and LinkedIn, in order to facilitate further research in the autonomous driving community. Furthermore, to reach a broader audience, especially graduate and PhD students, we organized workshops at IEEE ITSC and IV, as well as seminars at multiple universities to present the research of the VeVuSafety project.
This two-year VeVuSafety project demonstrates that deep learning-based artificial intelligence methodologies are highly beneficial for understanding road users' behavior in diverse mixed traffic situations, enhancing the safety of vehicles and vulnerable road users (VRUs). By using evidential deep learning, we can reliably map continuous driving environments with point cloud data for road surface segmentation and road user detection (WP1). This approach can be extended to incorporate multimodal input data, such as images and radar. Our proposed deep generative models efficiently predict potential trajectories for heterogeneous road users in various traffic scenarios, supporting safe path planning for autonomous vehicles (WP2). We have shown that Graph Convolutional Networks can incorporate environmental contexts for scene-compliant trajectory prediction (WP3) and that physical models can be combined with deep learning models to prevent generating false, collision-prone trajectories.

Furthermore, the project’s scientific contributions have strong potential for real-world applications. For example, we presented an uncertainty-based method for selecting collective perception messages by manipulating the generated evidential bird’s-eye-view maps (WP1). This method improves communication efficiency among connected autonomous vehicles (CAVs) by sharing only the information the ego vehicle is uncertain about, while other CAVs provide more certain data from their perspectives. This approach reduces communication overhead by 87%, with only a slight drop in performance for collective perception, which is critical for real-time communication in autonomous driving systems. In WP3, for trajectory prediction, we demonstrated that dynamic scene context from traversed trajectories can compensate for the lack of map data to achieve scene-compliant trajectory predictions. Moreover, to facilitate interactive autonomous driving (WP4), we explored various human-machine interfaces (HMI), including visual, textual, auditory, and multimodal interfaces, to ensure clear communication between autonomous vehicles and other road users in diverse driving situations, such as shared spaces or bottleneck roads.

Looking ahead, with the development of Vision Language Models (VLMs), we aim to explore their potential for generating both visual and textual cues to enhance the explainability of deep learning models' decision-making processes. We also plan to leverage Large Language Models (LLMs), trained on vast amounts of text data, to address long-tail scenarios for behavior modeling, motion prediction, and situation-aware detection, further advancing end-to-end autonomous driving.
vevusafety-profile-image.png
Mein Booklet 0 0