Progressive Bird's Eye View Perception for Safety-Critical Autonomous Driving: A Comprehensive Survey
- URL: http://arxiv.org/abs/2508.07560v1
- Date: Mon, 11 Aug 2025 02:40:46 GMT
- Title: Progressive Bird's Eye View Perception for Safety-Critical Autonomous Driving: A Comprehensive Survey
- Authors: Yan Gong, Naibang Wang, Jianli Lu, Xinyu Zhang, Yongsheng Gao, Jie Zhao, Zifan Huang, Haozhi Bai, Nanxin Zeng, Nayu Su, Lei Yang, Ziying Song, Xiaoxi Hu, Xinmin Jiang, Xiaojuan Zhang, Susanto Rahardja,
- Abstract summary: Bird's-Eye-View (BEV) perception has become a foundational paradigm in autonomous driving.<n>This survey provides the first comprehensive review of BEV perception from a safety-critical perspective.
- Score: 20.7823289124196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bird's-Eye-View (BEV) perception has become a foundational paradigm in autonomous driving, enabling unified spatial representations that support robust multi-sensor fusion and multi-agent collaboration. As autonomous vehicles transition from controlled environments to real-world deployment, ensuring the safety and reliability of BEV perception in complex scenarios - such as occlusions, adverse weather, and dynamic traffic - remains a critical challenge. This survey provides the first comprehensive review of BEV perception from a safety-critical perspective, systematically analyzing state-of-the-art frameworks and implementation strategies across three progressive stages: single-modality vehicle-side, multimodal vehicle-side, and multi-agent collaborative perception. Furthermore, we examine public datasets encompassing vehicle-side, roadside, and collaborative settings, evaluating their relevance to safety and robustness. We also identify key open-world challenges - including open-set recognition, large-scale unlabeled data, sensor degradation, and inter-agent communication latency - and outline future research directions, such as integration with end-to-end autonomous driving systems, embodied intelligence, and large language models.
Related papers
- All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles [7.863490977061713]
Autonomous Vehicles (AVs) are transforming the future of transportation through advances in intelligent perception, decision-making, and control systems.<n>Their success is tied to one core capability, reliable object detection in complex and multimodal environments.<n>Recent breakthroughs in Computer Vision (CV) and Artificial Intelligence (AI) have driven remarkable progress.<n>This survey bridges that gap by delivering a forward-looking analysis of object detection in AVs.
arXiv Detail & Related papers (2025-10-30T16:08:25Z) - OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows [77.95511352806261]
Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms.<n>We propose OS-Sentinel, a novel hybrid safety detection framework that combines a Formal Verifier for detecting explicit system-level violations with a Contextual Judge for assessing contextual risks and agent actions.
arXiv Detail & Related papers (2025-10-28T13:22:39Z) - A holistic perception system of internal and external monitoring for ground autonomous vehicles: AutoTRUST paradigm [29.72376845511303]
This paper introduces a holistic perception system for internal and external monitoring of autonomous vehicles, with the aim of demonstrating a novel AI-leveraged self-adaptive framework of advanced vehicle technologies and solutions.<n>In-cabin monitoring system includes AI-powered sensors that measure air-quality and perform thermal comfort analysis for efficient on and off-board.<n>On the other hand, external monitoring system perceives the surrounding environment of vehicle, through a LiDAR-based cost-efficient semantic segmentation approach.
arXiv Detail & Related papers (2025-08-25T12:32:13Z) - MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving [63.875372281596576]
MetAdv is a novel adversarial testing platform that enables realistic, dynamic, and interactive evaluation.<n>It supports flexible 3D vehicle modeling and seamless transitions between simulated and physical environments.<n>It enables real-time capture of physiological signals and behavioral feedback from drivers.
arXiv Detail & Related papers (2025-08-04T03:07:54Z) - Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving Competition [57.698383942708]
Vehicle-to-everything (V2X) communication has emerged as a key enabler for extending perception range and enhancing driving safety.<n>We organized the End-to-End Autonomous Driving through V2X Cooperation Challenge, which features two tracks: cooperative temporal perception and cooperative end-to-end planning.<n>This paper describes the design and outcomes of the challenge, highlights key research problems including bandwidth-aware fusion, robust multi-agent planning, and heterogeneous sensor integration.
arXiv Detail & Related papers (2025-07-29T09:06:40Z) - Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles [11.412978676426205]
Multi-sensor fusion plays a critical role in enhancing perception for autonomous driving.<n>This paper formalizes multi-sensor fusion strategies into data-level, feature-level, and decision-level categories.<n>We present key multi-modal datasets and discuss their applicability in addressing real-world challenges.
arXiv Detail & Related papers (2025-06-27T03:43:48Z) - Towards Intelligent Transportation with Pedestrians and Vehicles In-the-Loop: A Surveillance Video-Assisted Federated Digital Twin Framework [62.47416496137193]
We propose a surveillance video assisted federated digital twin (SV-FDT) framework to empower ITSs with pedestrians and vehicles in-the-loop.<n>The architecture consists of three layers: (i) the end layer, which collects traffic surveillance videos from multiple sources; (ii) the edge layer, responsible for semantic segmentation-based visual understanding, twin agent-based interaction modeling, and local digital twin system (LDTS) creation in local regions; and (iii) the cloud layer, which integrates LDTSs across different regions to construct a global DT model in realtime.
arXiv Detail & Related papers (2025-03-06T07:36:06Z) - A Survey of World Models for Autonomous Driving [63.33363128964687]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>World models offer high-fidelity representations of the driving environment that integrate multi-sensor data, semantic cues, and temporal dynamics.<n>This paper systematically reviews recent advances in world models for autonomous driving.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - Vehicle-to-Everything Cooperative Perception for Autonomous Driving [46.292402824957975]
Vehicle-to-everything cooperative perception plays a crucial role in extending the perception range and increasing detection accuracy.<n>Key techniques for enabling reliable perception sharing, such as agent selection, data alignment, and feature fusion, are examined in detail.<n>The paper concludes by outlining promising research directions to support future advancements in vehicle-to-everything cooperative perception.
arXiv Detail & Related papers (2023-10-05T13:19:48Z) - The Integration of Prediction and Planning in Deep Learning Automated Driving Systems: A Review [43.30610493968783]
We review state-of-the-art deep learning-based planning systems, and focus on how they integrate prediction.
We discuss the implications, strengths, and limitations of different integration principles.
arXiv Detail & Related papers (2023-08-10T17:53:03Z) - AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for
Assistive Driving Perception [26.84439405241999]
We present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle.
AIDE facilitates holistic driver monitoring through three distinctive characteristics.
Two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations.
arXiv Detail & Related papers (2023-07-26T03:12:05Z) - Camera-Radar Perception for Autonomous Vehicles and ADAS: Concepts,
Datasets and Metrics [77.34726150561087]
This work aims to carry out a study on the current scenario of camera and radar-based perception for ADAS and autonomous vehicles.
Concepts and characteristics related to both sensors, as well as to their fusion, are presented.
We give an overview of the Deep Learning-based detection and segmentation tasks, and the main datasets, metrics, challenges, and open questions in vehicle perception.
arXiv Detail & Related papers (2023-03-08T00:48:32Z) - Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion
Transformer [28.15612357340141]
We propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser)
We process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection.
Our framework provides more semantics and are exploited to better constrain actions to be within the safe sets.
arXiv Detail & Related papers (2022-07-28T11:36:21Z) - Differentiable Control Barrier Functions for Vision-based End-to-End
Autonomous Driving [100.57791628642624]
We introduce a safety guaranteed learning framework for vision-based end-to-end autonomous driving.
We design a learning system equipped with differentiable control barrier functions (dCBFs) that is trained end-to-end by gradient descent.
arXiv Detail & Related papers (2022-03-04T16:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.