MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation
- URL: http://arxiv.org/abs/2304.09801v1
- Date: Wed, 19 Apr 2023 16:37:17 GMT
- Title: MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation
- Authors: Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong,
Huchuan Lu, Zhenguo Li, and Ping Luo
- Abstract summary: In real-world applications, sensor corruptions and failures lead to inferior performances.
We propose a robust framework, called MetaBEV, to address extreme real-world environments.
We show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities.
- Score: 104.12419434114365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perception systems in modern autonomous driving vehicles typically take
inputs from complementary multi-modal sensors, e.g., LiDAR and cameras.
However, in real-world applications, sensor corruptions and failures lead to
inferior performances, thus compromising autonomous safety. In this paper, we
propose a robust framework, called MetaBEV, to address extreme real-world
environments involving overall six sensor corruptions and two extreme
sensor-missing situations. In MetaBEV, signals from multiple sensors are first
processed by modal-specific encoders. Subsequently, a set of dense BEV queries
are initialized, termed meta-BEV. These queries are then processed iteratively
by a BEV-Evolving decoder, which selectively aggregates deep features from
either LiDAR, cameras, or both modalities. The updated BEV representations are
further leveraged for multiple 3D prediction tasks. Additionally, we introduce
a new M2oE structure to alleviate the performance drop on distinct tasks in
multi-task joint learning. Finally, MetaBEV is evaluated on the nuScenes
dataset with 3D object detection and BEV map segmentation tasks. Experiments
show MetaBEV outperforms prior arts by a large margin on both full and
corrupted modalities. For instance, when the LiDAR signal is missing, MetaBEV
improves 35.5% detection NDS and 17.7% segmentation mIoU upon the vanilla
BEVFusion model; and when the camera signal is absent, MetaBEV still achieves
69.2% NDS and 53.7% mIoU, which is even higher than previous works that perform
on full-modalities. Moreover, MetaBEV performs fairly against previous methods
in both canonical perception and multi-task learning settings, refreshing
state-of-the-art nuScenes BEV map segmentation with 70.4% mIoU.
Related papers
- MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation [14.67253585778639]
MaskBEV is a masked attention-based multi-task learning paradigm.
It unifies 3D object detection and bird's eye view (BEV) map segmentation.
It achieves 1.3 NDS improvement in 3D object detection and 2.7 mIoU improvement in BEV map segmentation.
arXiv Detail & Related papers (2024-08-17T07:11:38Z) - U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization [81.76044207714637]
Relocalization is essential for intelligent vehicles when GPS reception is insufficient or sensor-based localization fails.
Recent advances in Bird's-Eye-View (BEV) segmentation allow for accurate estimation of local scene appearance.
This paper presents U-BEV, a U-Net inspired architecture that extends the current state-of-the-art by allowing the BEV to reason about the scene on multiple height layers before flattening the BEV features.
arXiv Detail & Related papers (2023-10-20T18:57:38Z) - UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities [7.470926069132259]
We propose an end-to-end multi-modal 3D object detection framework designed for robustness against missing modalities.
UniBEV can operate on LiDAR plus camera input, but also on LiDAR-only or camera-only input without retraining.
We compare UniBEV to state-of-the-art BEVFusion and MetaBEV on nuScenes over all sensor input combinations.
arXiv Detail & Related papers (2023-09-25T20:22:47Z) - SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera
Videos [20.51396212498941]
SparseBEV is a fully sparse 3D object detector that outperforms the dense counterparts.
On the test split of nuScenes, SparseBEV achieves the state-of-the-art performance of 67.5 NDS.
arXiv Detail & Related papers (2023-08-18T02:11:01Z) - Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline [76.48192454417138]
Bird's-Eye View (BEV) representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception.
This paper proposes a framework, termed Fast-BEV, which is capable of performing faster BEV perception on the on-vehicle chips.
arXiv Detail & Related papers (2023-01-29T18:43:31Z) - BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving.
Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation.
We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z) - PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View [26.264139933212892]
Bird's-Eye-View (BEV) is superior to other 3D detectors for autonomous driving and robotics.
transforming image features into BEV necessitates special operators to conduct feature sampling.
We propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.
arXiv Detail & Related papers (2022-08-19T15:19:20Z) - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework.
It unifies multi-modal features in the shared bird's-eye view representation space.
It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z) - M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified
Birds-Eye View Representation [145.6041893646006]
M$2$BEV is a unified framework that jointly performs 3D object detection and map segmentation.
M$2$BEV infers both tasks with a unified model and improves efficiency.
arXiv Detail & Related papers (2022-04-11T13:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.