M-BEV: Masked BEV Perception for Robust Autonomous Driving
- URL: http://arxiv.org/abs/2312.12144v1
- Date: Tue, 19 Dec 2023 13:25:45 GMT
- Title: M-BEV: Masked BEV Perception for Robust Autonomous Driving
- Authors: Siran Chen, Yue Ma, Yu Qiao, Yali Wang
- Abstract summary: Bird-Eye-View (BEV) has attracted extensive attention, due to low-cost deployment and desirable vision detection capacity.
Existing models ignore a realistic scenario during the driving procedure, which largely deteriorates the performance.
We propose a generic Masked BEV (M-BEV) perception framework, which can effectively improve robustness to this challenging scenario.
- Score: 30.110634411996404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D perception is a critical problem in autonomous driving. Recently, the
Bird-Eye-View (BEV) approach has attracted extensive attention, due to low-cost
deployment and desirable vision detection capacity. However, the existing
models ignore a realistic scenario during the driving procedure, i.e., one or
more view cameras may be failed, which largely deteriorates the performance. To
tackle this problem, we propose a generic Masked BEV (M-BEV) perception
framework, which can effectively improve robustness to this challenging
scenario, by random masking and reconstructing camera views in the end-to-end
training. More specifically, we develop a novel Masked View Reconstruction
(MVR) module for M-BEV. It mimics various missing cases by randomly masking
features of different camera views, then leverages the original features of
these views as self-supervision, and reconstructs the masked ones with the
distinct spatio-temporal context across views. Via such a plug-and-play MVR,
our M-BEV is capable of learning the missing views from the resting ones, and
thus well generalized for robust view recovery and accurate perception in the
testing. We perform extensive experiments on the popular NuScenes benchmark,
where our framework can significantly boost 3D perception performance of the
state-of-the-art models on various missing view cases, e.g., for the absence of
back view, our M-BEV promotes the PETRv2 model with 10.3% mAP gain.
Related papers
- OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping [25.801868221496473]
OneBEV is a novel BEV semantic mapping approach using merely a single panoramic image as input.
A distortion-aware module termed Mamba View Transformation (MVT) is specifically designed to handle the spatial distortions in panoramas.
This work advances BEV semantic mapping in autonomous driving, paving the way for more advanced and reliable autonomous systems.
arXiv Detail & Related papers (2024-09-20T21:33:53Z) - Robust Bird's Eye View Segmentation by Adapting DINOv2 [3.236198583140341]
We adapt a vision foundational model, DINOv2, to BEV estimation using Low Rank Adaptation (LoRA)
Our experiments show increased robustness of BEV perception under various corruptions.
We also showcase the effectiveness of the adapted representations in terms of fewer learnable parameters and faster convergence during training.
arXiv Detail & Related papers (2024-09-16T12:23:35Z) - Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms.
We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction.
Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z) - RoadBEV: Road Surface Reconstruction in Bird's Eye View [55.0558717607946]
Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance.
Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction.
This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo.
arXiv Detail & Related papers (2024-04-09T20:24:29Z) - CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow [20.550935390111686]
We introduce CLIP-BEVFormer, a novel approach to enhance the multi-view image-derived BEV backbones with ground truth information flow.
We conduct extensive experiments on the challenging nuScenes dataset and showcase significant and consistent improvements over the SOTA.
arXiv Detail & Related papers (2024-03-13T19:21:03Z) - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception [104.87876441265593]
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space.
Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored.
We design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features.
arXiv Detail & Related papers (2024-01-13T04:21:24Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - FB-BEV: BEV Representation from Forward-Backward View Transformations [131.11787050205697]
We propose a novel View Transformation Module (VTM) for Bird-Eye-View (BEV) representation.
We instantiate the proposed module with FB-BEV, which achieves a new state-of-the-art result of 62.4% NDS on the nuScenes test set.
arXiv Detail & Related papers (2023-08-04T10:26:55Z) - VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view
Attention for Multi-view 3D Object Detection [47.926010021559314]
transformer-based detectors have demonstrated remarkable performance in 2D visual perception tasks.
However, their performance in multi-view 3D object detection remains inferior to the state-of-the-art (SOTA) of convolutional neural network based detectors.
We propose a novel BEV feature generation method, dual-view attention, which generates attention weights from both the BEV and camera view.
arXiv Detail & Related papers (2023-04-03T15:00:36Z) - Understanding the Robustness of 3D Object Detection with Bird's-Eye-View
Representations in Autonomous Driving [31.98600806479808]
Bird's-Eye-View (BEV) representations have significantly improved the performance of 3D detectors with camera inputs on popular benchmarks.
We evaluate the natural and adversarial robustness of various representative models under extensive settings.
We propose a 3D consistent patch attack by applying adversarial patches in thetemporal 3D space to guarantee the consistency.
arXiv Detail & Related papers (2023-03-30T11:16:58Z) - PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View [26.264139933212892]
Bird's-Eye-View (BEV) is superior to other 3D detectors for autonomous driving and robotics.
transforming image features into BEV necessitates special operators to conduct feature sampling.
We propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling.
arXiv Detail & Related papers (2022-08-19T15:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.